Scrapy Framework Jobs

27 were found based on your criteria {{ paging.total|number:0 }} were found based on your criteria

show all
  • Hourly ({{ jobTypeController.getFacetCount("0")|number:0}})
  • Fixed Price ({{ jobTypeController.getFacetCount("1")|number:0}})
Fixed-Price - Intermediate ($$) - Est. Budget: $250 - Posted
General idea is to run crawler for previously added domains and collect all external links and save them into mysql database. This process needs to be repeatable for given or new domains on daily/weekly basis. Crawler needs to have a simple management panel based on known framework xcrud - to see all results and manage crawler options. Project Requirements Simple panel management to see and control crawling process Crawler needs to collect external links, has to have an option to collect all external links or links which are containing particular domains names taken from previously created list Save all links to mysql database Detect if site is not responding, if so, save this event into database Add / remove / edit new domains to crawl Run / stop crawler from management panel Able to start task form the last failure Able to see actual crawling process Able to send email notifications if crawler will suddenly stop Able to add domain to the domains list by url: http://mycrawler.com/add_domain.php?domain_name=www.newdomain.com (at this point only domain adding is required) - script should return result (added or display error if occurred ) Xcrud framework will be provided. Dev space will be provided. More details will be provided if you are interested in.
Skills: Scrapy MySQL Programming PHP Python
Fixed-Price - Intermediate ($$) - Est. Budget: $100 - Posted
have a bunch of (scheduled) python scrapers using selenium webdriver, that's just too slow to run and unreliable. Like to migrate that to Scrapy. At the moment, scripts output to individual excel files (with minor data cleaning), and a separate python script copy/pastes the latest of each file to one excel file (multiple sheets). This file is then normalized/recategorized before uploading into postgresql. Job is either: i) Scrapy to 1 excel file (and the next stage takes over from there) or ii) scrapy to pgsql. In the latter case, duplicates need to be removed, dates parsed, and there is a reference xls file for items recategorization. Main thing is more robust scraping, not too much overhead.
Skills: Scrapy Machine learning Natural language processing PostgreSQL Programming