Java Regular Expression / Web Scraping Expert

Closed - This job posting has been filled and work has been completed.
Web, Mobile & Software Dev Other - Software Development Posted 3 years ago


More than 30 hrs/week
1 to 3 months


Duration: up to 2 months

Task: Create web crawlers and parsers using our library

Details for Crawler:
[+] You will browse websites with Firefox + Firebug and analyze structure
[+] You MUST be familiar with regular expressions and Java
[+] You will implement a few Java classes per crawler
[+] Java classes will extract some information from an HTML page and create more requests using the information
[+] You may need to create a new project folder from a template
[+] You will need to edit an XML control file
[+] We have a library for crawling web sites that you must use
[+] We have some documentation. You will still have to figure out a lot of things on your own.

Details for Parser:
[+] Input files are text, html, or PDF files
[+] Parser is written in Java with regular expressions, html parsing, and/or PDF parsing implementing our interface
[+] Java class will extract the information from an HTML page or PDF document and pass it to our library for uploading to the repository
[+] We will score the output using an automatic tool
[+] Output is stored using a provided Java library
[+] There are training materials to teach you our library
[+] Correctness is valued over speed
[+] A validation tool is provided to help check output files (but will not catch all mistakes so you must be careful)

Each Deliverable:
[1] Pick up work on our TaskMan system
[2] Check out source code for that task
[3] Try running it locally
[4] Make appropriate modifications to the Java code until it is working ok
[5] Commit + push changes
[6] Use the TaskMan system to start a run of your code
[7] Check the score when it’s done
[8] Release the task on TaskMan to get credit

Expected workload:
At least 10 tasks per week (about 1-4 hours per ticket)
( first few will take longer as you are unfamiliar with our library )

Must be comfortable text chatting in English over Skype. Must check in online or on email 3 times per week.

** Ability to read and understand English web pages is a must. **

** Familiarity with US college/university record systems a plus **

** Being a college/university student or recent graduate is a plus. **

About the Client

(4.56) 57 reviews

United States
Santa Clara 03:36 AM

92 Jobs Posted
85% Hire Rate, 1 Open Job

Over $50,000 Total Spent
102 Hires, 0 Active

$10.00/hr Avg Hourly Rate Paid
11,621 Hours

Member Since Dec 18, 2009