Last active: 2 months ago
I am a successful data scientist with an academic background in Applied Mathematics as well as Computer science, and professional experience in quantitative finance, healthcare, (social) network analysis, and politics. Drawing on my technical skills, knowledge of statistical and computational principles, and real world experience, I provide high-quality data services: prediction, explanation, visualization, collection, and deployment.
I have recently been involved with a number of interesting projects and competitions, including two top 10% results in Kaggle competitions, a model to predict individual's voting behavior in the US, a novel purely-functional python web scraper implementation, winning an energy consumption data science competition, collaborating with a healthcare startup to apply both supervised and unsupervised machine learning methods to health insurance data, and as a finance quant implementing machine learning methods to develop long-term investment strategies that managed millions of dollars. These projects highlight my diverse skill-set that encompass the full data science stack, from data engineering to web scraping, and from predictive modeling to data visualization.
For data engineering and data pipeline/infrastructure development, I have experience with a number of tools, including Hadoop, mongodb, Spark, Lucene for search, and Nutch for when scraping needs to be scalable. For building data collection systems, I use python's BeautifulSoup for highly specified scraping projects, and requests with gevent for projects that necessitate multithreaded code. Data science necessarily involves a large amount of data wrangling and cleaning, for which I rely heavily on python's pandas library as well as unix command line tools.
For predictive modeling and descriptive analysis (unsupervised learning), I implement both machine learning methods and traditional statistics techniques: convolutional neural nets for image processing, recurrent neural nets for highly chaotic time series data, Hidden Markov Models for repetitive sequences, Random Forest or Gradient Boosted Machines for homogenous data, and AR(I)MA for classic econometrics. To create static charts and graphs, I prefer R's ggplot library. However, for interactive visualizations I swear by d3 and its' derivatives, as well as Leaflet for mapping. Lastly, I typically use Django or Flask for web development, but am comfortable with Node as well.