Projects
Projects - [link to repository][https://github.com/lisazhao513/CS-STAT-Projects]
Spam or Ham Classification
- Developed a text classification model using logistic regression to identify spam emails based on keyword frequency and email length features.
- Performed feature engineering, including tokenization and length-based features, to improve classification accuracy.
- Achieved robust performance using a train-validation-test split approach, validated with confusion matrix and accuracy metrics.
Simpsons Transcript Text Analysis
- Developed an interactive R Shiny application (https://lisazhao513.shinyapps.io/Simpsons_Transcript_Text_Analysis/) to analyze and visualize all 33 seasons of The Simpsons TV show transcripts, incorporating features such as character-specific dialogue trends and sentiment analysis across all or specific seasons.
- Utilized natural language processing techniques to extract insights from transcripts, including word frequency and sentiment scoring, enabling users to explore thematic changes over the series.
- Designed an intuitive user interface with dynamic plots, filters, and search functionality, allowing users to engage with detailed character-level and season-level text data interactively.
Predicting Housing Prices in Cook County
- Developed a linear regression model to predict housing prices based on multiple features, such as number of bedrooms, square footage, and location.
- Applied data preprocessing techniques including handling missing values, encoding categorical variables, and feature scaling for model readiness.
- Evaluated model performance using RMSE and cross-validation to assess generalization across different subsets of the data.