We need to scrape a list of products from a website. The website has an API that returns JSON format. API can be used anonymously but has limits for the requests (100 items max per listing, and 500 calls per IP per hour)
We need to retrieve information for 50k products.
The scrapping would consist on getting the whole list of products (which already includes the information per product), and then resolve a URL for each product (an extra API call per product). No alarms should be raised, thus no API limits should be forced. Different IPs should be used to distribute the scrapping and minimize the time scrapping.
Working source code should be deliver in Python or PHP. External open source libraries are ok as long as you include them or the install instructions to get them (Ubuntu platform).
Deliverable should also include the JSON file with the 50k products in the API format, including the resolved URL on each product.