You've landed at the right place. oDesk is now Upwork. Learn about the new platform.

Web Scraping Jobs

232 were found based on your criteria {{ paging.total | number:0 }} were found based on your criteria

show all
  • Hourly ({{ jobTypeController.getFacetCount("hourly") | number:0}})
  • Fixed Price ({{ jobTypeController.getFacetCount("fixed") | number:0}})
show all
only
only
only
show all
only
only
only
only
only
show all
only
only
only
Looking for the Team App?
Download the New Upwork Team App
Fixed-Price - Intermediate ($$) - Est. Budget: $50 - Posted
Create a tool that will identify products that are for sale on Amazon and NOT on sale on eBay. The search criteria would be based on Title Keyword comparison. The following filters should be in place: 1. Specify Min/Max price of the products searched - For example if i only want to search products less then $50, it will not show results that are higher then $50. 2. Amazon Category selection - I should be able to search based on Amazon category (this is based on every category available in Amazon) 3. Select the number of keywords that will the tool will compare between Amazon and eBay - For example - if i select 6 words, it will grab the first 6 words of every title within the specific category and within the specific min/max price and search for titles with the first 6 words in eBay. It will only show a result IF that item is NOT sold on eBay. - I should be able to select between 3 words and 7 words (3 words being a broad search, 7 words being a more specific search) 4. Show sales rank of Amazon search results The purpose of this tool is to simply find items that are selling on Amazon that ARE NOT being sold on eBay.
Skills: Web scraping Amazon MWS Amazon Web Services eBay API
Fixed-Price - Intermediate ($$) - Est. Budget: $800 - Posted
List of URLs: lakorn.guchill.com www.seriesubthai.tv www.kodhit.mobi cuptv.com www.startclip.com cn.upyim.com www.jengmak.com www.songdee.com th.hao123.com www.friv.com newsupdate.todayza.com diply.com tvshow.guchill.com www.subthaiseries.com www.tunwalai.com www.yumzap.com www2.adintrend.com devian.tubemate.home lakorn.guchill.com www.kodhit.mobi cuptv.com www.jengmak.com Step 1 Review the list of URL’s above, mine the top 1000 pages from that URL by using a crawler. Step 2 Extract terms from each of the these web pages that are mined Step 3 Determine if the terms from these web pages, match terms from the entertainment list or match elements from the news list. (Fill in the attached worksheet) Step 4 For each site create a new worksheet and repeat process
Skills: Web scraping Data Analytics Data scraping Machine learning
Hourly - Entry Level ($) - Est. Time: More than 6 months, 10-30 hrs/week - Posted
We need someone to help us with doing data entry to help us calculate our costs each month. We receive invoices on a daily basis and we need someone to enter in the data from each invoice into a spreadsheet. You will need the following skills: Familiar with Google Docs Good communication Enjoy working with others Attention to detail This is an on going position and may lead to other projects if it goes well. When you apply, please tell me about your past experience.
Skills: Web scraping Administrative Support Data Entry English
Fixed-Price - Entry Level ($) - Est. Budget: $10 - Posted
I want to scrape the site of the federal reserve, especially the following page: http://www.federalreserve.gov/newsevents/speech/2007speech.htm (please see attachment 01) I want the script to go to every of the links (highlighted in the jpeg 01) and get access each link and get the "body" of every speech (see jpeg 02). Then to combine all speeches (speeches have to be separated by page break) into a pdf document. it should be made easy to select the year (see jpeg 01, top of the page) so as to easily scrape article from the year im interested in
Skills: Web scraping Python
Fixed-Price - Expert ($$$) - Est. Budget: $50 - Posted
I'm going to Angola, a Portuguese speaking country, and need someone to conduct some research (within financial services/ banking and telecommunications) on this country, some important individuals and other aspects of these industries. A lot of this information will be in Portuguese and I need someone with excellent research skills - web research/ deep mining - and native (or near native) Portuguese: written, reading and speaking. Ideally the individual will have lived or live in Angola - though this is not completely necessary. I'll provide you with the list of names (approx. 10) , companies/ societies (approx. 10) and specific details on the aforementioned industries. I'd need this work completed by: 16th February. It should be presented in a table either using MS Word or Excel. All to be written in English. Please ask any needed questions. Price given is negotiable depending on skill, expertise, and speed of delivery.
Skills: Web scraping Data scraping Internet research Market research
Fixed-Price - Entry Level ($) - Est. Budget: $100 - Posted
The deliverable is a script and documentation for how to run the script created by you that I can reuse on my own in the future. The script would need to do the the following... I have a list of names in Excel rows (Excel column A). Row by row, I want to take names and put them in input text boxes on this website... http://meckcama.co.mecklenburg.nc.us/relookup/ Upon pressing search button on the site, either the site will return a list of names or nothing. If names come up, I want Excel column B to say "yes". If nothing comes up, I want it to say "no". The script will need to control the whole process from start to finish once launched without a bunch of manual interaction. I have a Mac and Office for Mac so whatever script you send me has to be compatible.
Skills: Web scraping Microsoft Excel Scripting
Hourly - Entry Level ($) - Est. Time: Less than 1 week, Less than 10 hrs/week - Posted
I need an internet scrape done, with all of a specific type of business. I would need the following pieces of inofrmation on each business: 1. Name 2. Location (Address with Zip Code) 3. Industry 4. Contact Phone Number (if possible) 5. Contact Email (if possible) 6. Graphic (if possible) I then need all the information put into excel format. I will make hiring decision on Feb. 12, 2016 Please answer the questions below in your proposal.
Skills: Web scraping Data mining Data scraping Internet research
Fixed-Price - Expert ($$$) - Est. Budget: $100 - Posted
I would like to scrape data from the following website: https://www.courts.mo.gov/casenet/cases/searchCases.do?searchType=caseNumber The website requires you to enter a case number and then provides data on the case. I have a list of 1,048,575 case numbers that I would like data on. For an example, enter: 0516-CV05476. For a given case there are multiple tabs. I would like to scrape only the data from the "Case Header" tab and the "Civil Judgments" tab (when available). Output should be in tab separated format (TSV) with UTF-8 character encoding.
Skills: Web scraping Data scraping
Fixed-Price - Expert ($$$) - Est. Budget: $250 - Posted
I would like to scrape data from this USPTO website: http://portal.uspto.gov/pair/PublicPair It has a CAPTCHA, but only seems to require it to be filled out once per session. (I haven't verified how many queries it takes to trigger it to reappear). After the CAPTCHA you can look up a patent application by "Publication Number." I have a list of 3,996,534 publication numbers. An example of a publication number would be: 20120257316. For each publication number, I want to scrape the basic bibliographic data that shows up in the "Application Data" tab. I want the scraped data in tab separated values (tsv) format with UTF-8 character encoding. On the USPTO website, when you look up a publication number, there is also usually a tab called "Image File Wrapper", which contains pdf files associated with the patent application. For each publication number that has the "Image File Wrapper" tab present, I want to download the pdf files with the document code "OATH" and "ADS." Note, even when the Image File Wrapper tab is present, both of these pdfs may not be present. In particular the ADS file is often missing. Thus the project has two deliverables: - A TSV file with one row for each publication number and one column for each piece of bibliographic data in the application data sheet. - A folder with one directory for each publication number (the directories should be named by publication number) containing 0-2 pdf files corresponding to the OATH and ADS documents when present.
Skills: Web scraping Data scraping
Looking for the Team App?
Download the New Upwork Team App
Fixed Price Budget - ${{ job.amount.amount | number:0 }} to ${{ job.maxAmount.amount | number:0 }} Fixed-Price - Est. Budget: ${{ job.amount.amount | number:0 }} Open to Suggestion Hourly - Est. Time: {{ [job.duration, job.engagement].join(', ') }} - Posted
Skills: {{ skill.prettyName }}
Looking for the Team App?
Download the New Upwork Team App