We want a scraper to scrape yelp.com
We want to load up custom urls in this format
And it will just visit all the results for each city and tag.
Your bot will visit each result, for example this page http://www.yelp.com/biz/icodice-chicago and it gets website
We want to scrape the city, state and zip code and categories, so in this example:
Software Development, Graphic Design, Web Design
these 3 are the categories, i'll refer to them as tags
I want all the results from this scrape saved in a MYSQL database that will allow me to search and use filter so I can do email blasts from these. I have a separate email program to send, but I just need to pull these and export to csv.
This will be linux based.
We need this work with private proxies.
It needs to email me daily report of proxies NOT working so I can report the list to proxy company and get new ones and I need a very simple way to upload new proxies via url to our bot.
This bot will be deployed on multiple servers and not just one. I will upload an excel with 10,000's of custom urls (like i mentioned at very top of this document) and the bot needs to manage the different servers feeding it new ones constantly and ensuring it's not sending same urls to different servers.
It needs to be smart because I don't want to manage this. If it gets stuck, it needs to restart and fix itself and know where it left off.
There should be a login page where I can see live stats and easily see what each server is doing if it's active and scraping or stuck for some reason.