I want to hire a Python/Scrapy expert to code me and teach me how to use a Scrapy bot that does the following.
I want to be able to have Scrapy read a text file with a seed list of around 100k urls, have Scrapy visit each URL, and extract all external URLs (URLs of Other Sites) found on each of those Seed URLs and export the results to a separate text file.
Scrapy should only visit the URLs in the text file, not spider out and follow any other URL.
I want to be able to have Scrapy work as fast as possible, I don't need proxy support, I want to be able to export domains that give 403 errors to a separate text file.
I also want to be informed how I could scale my link extraction for more speed and to be able to parse millions of URLs per day.