have a bunch of (scheduled) python scrapers using selenium webdriver, that's just too slow to run and unreliable. Like to migrate that to Scrapy.
At the moment, scripts output to individual excel files (with minor data cleaning), and a separate python script copy/pastes the latest of each file to one excel file (multiple sheets). This file is then normalized/recategorized before uploading into postgresql.
Job is either: i) Scrapy to 1 excel file (and the next stage takes over from there) or ii) scrapy to pgsql. In the latter case, duplicates need to be removed, dates parsed, and there is a reference xls file for items recategorization.
Main thing is more robust scraping, not too much overhead.