Data Science

Pass Prediction In Soccer

Links: [Blog] | [Code]

Description: Analysis of data for the Football Pass Prediction Challenge of the 5th Workshop on Machine Learning and Data Mining for Sports Analytics. It consisted of mostly of player location data for over 12,000 passes from Belgian soccer. The stated goal was to predict the recipient of a pass.

Summary: The main pipeline involved feature engineering to create a feature for distance, another for the product of x-y coordinates, scaling numeric features, and dummying the sole categorical column. A 5-fold cross validated untuned Logistic Regression model with a 23% accuracy score compared to a naive baseline of 5.6%.

Classifying Hate Speech

Links: [Blog] | [Code]

Description: Identifying hate speech is an important task on the internet. I used scikit-learn and the nltk package to build a hate speech classifier using Twitter data from CrowdFlower.

Summary: The final model utilized the Random Forest Classifier and achieved 76% accuracy on unseen data, a 26% increase over the baseline accuracy of 50%. I productionized the model as an app which allows a user to submit text to be classified. The app is no longer currently deployed.



Links: [Site]

Description: A website to follow GitHub organizations. Users are notified whenever new repositories are created. The front end uses a no code platform called Bubble to take care of user and database management. The back end uses Python for processing data, updating data, and sending emails.


Links: [Code]

Description: A collection of personal Python Data Science scripts.


Links: [Code]

Description: A collection of personal R Data Science scripts.

ODSC Meetup Map

Links: [App][Code]

Description: An interactive app made in R which shows the Open Data Science Conference’s (ODSC) meetups around the world. Data is scraped from the using rvest, and displayed as a map via the leaflet library. The flexdashboard library provides the layout.