Data Science / Machine Learning Portfolio

This page offers an overview of some of my significant data science analyses and machine learning experiments. Where possible I have tried to open-source my code / notebooks on Github, so you should go there as well.


Pima Indians Diabetes (binary classification)

I ran a variety of algorithms on the data set using the Weka GUI platform for machine learning. The best accuracy I was able to achieve on this data set was using a logistic regression model. This performed with 77.47% accuracy (standard deviation of 4.39%).

This was a good learning experience to have a better sense of the ML workflow, but slow to implement. As my first experiment I learnt a lot, but realise there are lots of parts to improve my understanding.


Afghanistan Military Press Releases (data analysis)

I read (and added/manually (!) coded into a database) over 4000 press releases from ISAF in Afghanistan, analysing the resultant data set for trends relating to the military targeting campaign against opposition groups and fighters. It was published by the Afghanistan Analyst’s Network.


Taliban public punishments, 1996–2001

I compiled public data (from newspapers and other media sources) on the Taliban movement's policy of executions and public punishments to try to present an overview of how common these practices were during the period of their rule. This was a visualisation project that aimed to answer specific questions about the spread of the practice over time.