Need an experienced python developer who can create a highly scalable Google scraper.
The software will need to scan in excess of 300,000 keywords a day.
I envision that there will be a main "hub" process that will gather keywords from a MySQL database (based off my model) and distribute tasks to external python processes in order to mitigate CPU usage and bandwidth.
- Distribute keywords to be scraped to external python processes
External Python Process:
- Multi threaded to scan multiple keywords at once
- Gather top 100 results from Google
- Be able to use multiple TLD's (.com, .co.uk, etc)
The results will then need to be inserted into a database.
I have in excess of 15,000 IPs to do this so IP blocking isn't an issue.