Data Analytics / Web Scraping
I have been involved in the field of web scraping & data analytics for the past 4 years during which I have undertaken many challenging and intriguing projects. I am a graduate from the prestigious institute of IIT in India, which is know for its technical rigour and world class pedagogy.
Web scraping is one of my main fortes. I am well versed with python and its libraries for data scraping (like scrapy, beautifulsoup, lxml, selenium) and also an expert in import.io.
- Recently I built a bot which would take login id and password for an online portal and then download specific data fields from various pages of that portal.
- Further I built crawlers to scrap websites like http://bit.ly/1O37mxg and others in its domain, wherein the data of all the years was scrapped, formatted, geocoded through smartystreets and then saved in a csv file.
- I have also scrapped multi-level websites like http://bit.ly/1LZ3R7U to get the best in-category winners for various sections and saved the result in a mongoDB database.
My TECHNICAL SKILL-SET includes the following.
* Web scrapping/ web crawling
* Data munging/processing
* Descriptive & Inferential Statistics
* Correlation analysis and Association rule
* Regression analysis- Linear, Multiple, logistic
* Principal Component /Factor analysis
* Cluster analysis and Discriminant analysis
* Cross Validation & Bootstrap
* Linear Model Selection & regularization
* Tree based methods & SVMs
* Boosting algorithms (particularly XGBOOST)
* Online learning tools (Vowpal Wabbit, FTRL)
- Python (pandas, numpy, scipy, scikit-learn, nltk, statsmodel, matplotlib, ggplot, requests, urllib2, pypy, beautifulsoup, scrappy, etc)
- R (glmnet, randomforests, rpart, rocr, caret, tm, catools, texttools, wordcloud, ggplot2, regular expressions, etc)
- SQL (mySQL, postgreSQL)
Few of my important data analytics PROJECTS are highlighted below:
• Build a model to classify wealth management customer portfolio of a bank for better targeted marketing using ensemble of gradient boosting algorithm (XGBOOST) and FTRL-Proximal online learning algorithm
• Predicted the loan disbursal probability of retail customers of a bank using GRADIENT BOOSTING ALGORITHM (XGBOOST).
• Build a learning model using an online learning tool called VOWPAL WABBIT to predict the click through rate of mobile ads. Handled a data set of 40 million rows (~1 GB).
• Forecasted bike rental demand using ensemble model of GRADIENT BOOSTING regressors and RANDOM FORESTS.
• Analyzed what aspects and characteristics of people's lives contribute to their happiness by using RIDGE regression, based on data collected form an online poll.
• Investigated the role of wild birds in the spread of avian influenza in poultry by applying CLUSTER ANALYSIS, STOCHASTIC AND LOGISTIC REGRESSION MODELS