Relationships Between Presidents Twitter and Political Issues
Link: See the Github repo
Analysis Report: Report
Dynamic Visualizations of Political Concerns of Presidents on Twitter
Twitter is one of the most popular social media in U.S. and used by influential politicians including US presidents. Twitter has been proven to be a tool to predict the real world. In this project, we sought to answer a question: Do politicians’ tweets match to their real-life behaviors? With the answer of this question, we can forecast the changes of political concerns by analyzing tweets. In this project, I analyzed the Twitter behaviors of Donald Trump and Barack Obama and compared their Twitter behaviors to what they do in the government. Also, used an NLP approach to explored their tweets, and find out their characteristics. After finding out their characteristics, I compared their tweets to find the changes of political concerns between two presidents.
SF Bay Area Used Car Market Visualization
Author: Jimmy Chan, Esmond Chu
Link: See the Github repo
Web App (hosted by Heroku, it may take a few minutes for activation): http://dash-viz.herokuapp.com/
Interactive Visualization: Data Visualization
Since Craigslist doesn’t have user-friendly webpage interfaces, it’s not convenient for the car buyers. Therefore we want to have a more intuitive car’s information when we are choosing used cars. In this project, my team and I tried to provide a user-friendly visualization of used car market prices by applying machine learning and web scraping techniques. Also, we created an interactive visualizations web app by Dash and Bokeh which give us an intuitive data exploration.
Predicting Horse Race Winners in Hong Kong Racetrack
Link: See the Github repo
Analysis Report: Report
Data Wrangling Code: Data Wrangling
The purpose of this project is to examine the possibility of yielding a positive return on Hong Kong horse racing by statistical and machine learning approaches. The project analyzed the 2014-2016 Hong Kong racing season data from Hong Kong Jockey Club and extracted features from the data. I built a multinomial logistic regression model and used lasso method for variable selection to predict the winning probability of a horse which yielded a positive return of more than 150% on a hold-out set of 30% of races over 1500 races. In addition to the multinomial logistic regression model, I combined it with random forest to predict the 2017-2018 Hong Kong racing seasons which yielded a positive return of 170% on a hold-out set of 30% of races over 1000 races. The details of the model and winner prediction cannot be shared publicly, but I am happy to share or provide a printout for any interested individuals.
Analysis of State of the Union Speeches
Team: Jimmy Chan, Rick Chen and Peijin Fang
The goal of this project is to analyze the State of the Union speeches from 1790 to 2017. We did text processing with NLTK to compute aggregate text statistics of the speeches as well as visualize some of their properties. After text processing with NLTK, we created a speech-by-speech matrix containing the distances between every pair of speeches in word frequency space. We used the speech-by-speech distance matrix to create visualizations(MDS plots) of the presidents using multi-dimensional scaling from scikit-learn. We used the MDS plots of Euclidean distance and Jensen-Shannon divergence to analyze the similarity of different speeches.
Due to the faculty reuses some project problems from semester to semester, the professor suggests us not to upload this project. Therefore I cannot share this project publicly, but I am happy to share a private repository if you are interested in this project.