Crawlers Jobs

78 were found based on your criteria {{ paging.total|number:0 }} were found based on your criteria

show all
  • Hourly ({{ jobTypeController.getFacetCount("0")|number:0}})
  • Fixed Price ({{ jobTypeController.getFacetCount("1")|number:0}})
Fixed-Price - Intermediate ($$) - Est. Budget: $500 - Posted
Hi! I'm looking for someone familiar with Python, Django, Celery, Big Data, Postgres. We are currently working on a SaaS product and need to improve and enhance a given code. You'll be dealing with a SaaS product (not in production nor set up on a live server yet) that is for the purpose of lead generation. Company and contact information shall be automatically obtained by the system and give the user the possibility to find prospective customers. The following things need to be done and integrated in the existing code: - check and debug celery worker / tasks for working properly again (maybe seperate different tasks to different workers; check why saving problems occur) - improve code and make it more efficient and faster - replace GoogleScraper & Proxy with Bing API (if possible just adding Bing API would be favorite - so that I just need to select in Django admin view which way to go for obtaining data (seperated between person or company data) - a) scraper or b) API - every API request need to be downloaded in our own DB and each result needs to be stored there - if multiple addresses have been found for a company, the one with highest identity factor should be choosen and shown as main address - complete sites and all related subpages from websites should be downloaded and stored in DB (corporate website e.g. 1)Home 2) About Us 3) News 4) Team 5) Customers 6) Products 6a) Product A 6b) Product B ... etc ... - subsquently all the important text (about us/home/product texts) shall be extracted and saved in the main database, directly associated with the company.
Skills: Web Crawler Data mining Data Science Data scraping
Fixed-Price - Entry Level ($) - Est. Budget: $50 - Posted
Looking for someone to extract the information of an event's website. The information is the contact details of the 719 companies exhibiting in the trade show (Name of the company, description, categories, website, email, phone, physical address). We'd like to have all this information put together in an Excel SpreadSheet.
Skills: Web Crawler Data scraping Scrapy Web scraping
Hourly - Entry Level ($) - Est. Time: Less than 1 week, Less than 10 hrs/week - Posted
I need someone who can use Scrapebox to harvest urls for specific niches, WHOIS data, emails validation and other information that ScrapeBox Addons can easily do. Harvest URLs (Specific niches) - > Test URLs for specific Information - > Gather WHOIS Data -> Validate Emails If you know how to use Scrapebox I can hire you for a couple of hours to help me! Thank you!
Skills: Web Crawler scrapebox
Hourly - Intermediate ($$) - Est. Time: More than 6 months, 30+ hrs/week - Posted
Research the Job boards daily, find out who posted the advertisement within the company and gleen email addresses for future correspondence. Work for an established on-shore/off-shore BPO and KPO technology company You'll be assisting the USA based Sr. VP of Business Development and his team on business to business executive level email communications.
Skills: Web Crawler Internet research Research
Fixed-Price - Intermediate ($$) - Est. Budget: $50 - Posted
I am looking to gather information about hotels in Amsterdam. On the website booking.com, currently 390 hotels are listed. Start your job on this URL http://www.booking.com/reviews/nl/city/amsterdam.en-gb.html Please see the attached documents for instructions and an example of the files required. Please indicate the number of days it will take to deliver the end result and your best price. List your preferred language of scraping (Python, C#, Java, Perl, etc.); no preference on my side. If you have ay questions, or the attached documents are ambiguous, please ask.
Skills: Web Crawling C# Data scraping Java
Hourly - Entry Level ($) - Est. Time: Less than 1 week, Less than 10 hrs/week - Posted
Looking for a web scraper that can scrape specific businesses for me in the beauty/hair industry in the Kent UK region. Looking for the website URLs of businesses and other generic type sites within the same market. Hair, beauty, cosmetic industry. Looking for 5 search terms to scrape this region of the UK. Please let me know the cost when applying. Thanks
Skills: Web Crawler Data mining Data scraping Web scraping
Fixed-Price - Intermediate ($$) - Est. Budget: $300 - Posted
I need pricing and other relevant data on the lodging industry in the vicinity of the Bluecut fire in Southern California, which burned from August 16th through August 22nd. I am looking for a freelancer who can use data scraping techniques and internet archives to scrape - at a minimum - the prices and zip codes for each listing by date and by number of guests. For each day and each listing within 100 miles of the fire I would like the price for a one night stay. If you could also extract the qualitative aspects of listings - such as airbnb listing has a gym (1 or 0) that would be stellar and we can negotiate a bonus for that. I would like the data to range from June 16th through October 22nd. Even though the data do not fully exist yet I would like to begin the project sooner than later to find out from an expert what the technological capabilities are for scraping from these or similar sites. Some basic information about the fire can be found here: http://inciweb.nwcg.gov/incident/4962/
Skills: Web Crawling Data Science Data scraping "Extract, Transform and Load (ETL)"
Hourly - Expert ($$$) - Est. Time: Less than 1 month, 10-30 hrs/week - Posted
In summary: I want to be able to configure scrapy for multiple locations via a simple website. I want scrapy to grab a session token, spoof the IP, grab my data and save the CSV to an S3 bucket. I want to be able to: 1) login to my own secure website hosted in AWS 2) display simple 4 column form with column names (see attachment) 3) Setup new scrapes 4) refresh recurring scrapes 3) in detail For setting up New Scrapes: "Get New DataSource" launches new tab or similar (e.g., Chrome extension?) wherein I login into my new datasource and then navigate to the area that I want to scrape, specify the table and somehow specify "Get Data". It should be able to handle easier REST url requests or more difficult ones with an obscured header variables). While I'm open to variation, I'm envisioning something similar to the pinterest chrome extension but with regards to data tables within secure websites. Once, the scrape configuration is saved, then it starts 4) get data "refresh" 4) in detail click "REFRESH" spawns new tab wherein user only logs in. Session token is grabbed by service. All requested data is navigated to and pulled on the back end. Note: some IP spoofing on the login or on the backend service will be required. 5) back end service should exist as AWS Lambda callable code. As such, variables should reside separately and load per request. 6) I anticipate using this with a node.js service ... so, looking for callable compliance (i.e., I know that scrapy is natively python) 7) data should be saved consistently/statically to a dedicated S3 bucket (per logged in user) ... authenticated URL can be made available. Finally, I'm okay with pulling in Scrapy and AWS libraries. I do want to minimize code complexity beyond that am looking for clean, well documented, quick code.
Skills: Web Crawler Data scraping Scrapy
Fixed-Price - Intermediate ($$) - Est. Budget: $45 - Posted
I need an Excel list created of all the solo and two attorney firms who are members of the Boston Bar (I currently use -http://www.sljinc.org/atty_resources.php) By solo and two attorney firms, I mean they is only 1 or 2 attorneys working at the office. I would like them in these categories: Name, Firm Name, address, email, phone and website (if provided). If you can determine if they are a solo or two attorney firm that would be great because that's who I'm targeting. Each will need to be a separate column header on the spreadsheet. This ensures that I can easily filter the data.
Skills: Web Crawler Data mining Data Recovery Data Science