Java Regular Expression / Web Scraping Expert

Closed - This job posting has been filled and work has been completed.
Web, Mobile & Software Dev Web Development Posted 3 years ago


More than 30 hrs/week
1 to 3 months

Start Date

June 28, 2013


Duration: up to 2 months FULL TIME

Task: Create web crawlers and parsers using our library to collect data from public web sites for the good of our customers

We are ONLY interested in SERIOUS and DEDICATED candidates that can devote a lot of time, skill and effort to this worthy project. We wish to compensate you fairly for the value you help us create and we have a proven record of rewarding the best contractors with raises, bonuses, and/or repeat employment and/or referrals in the future. You can see from our team profile that this is a serious organization.

Please submit with your application the solution to this puzzle as an interview test:

Details for Crawler:
[+] You MUST be familiar with regular expressions and Java
[+] You will implement a few Java classes per crawler
[+] Java classes will extract some information from an HTML page and create more requests using the information
[+] We have a library for crawling web sites that you must use
[+] We have some documentation. You will still have to figure out a lot of things on your own.

Details for Parser:
[+] Input files are text, html, or PDF files
[+] Parser is written in Java with regular expressions, html parsing, and/or PDF parsing implementing our interface
[+] Java class will extract the information from an HTML page or PDF document and pass it to our library for uploading to the repository
[+] We will score the output using an automatic tool
[+] Output is stored using a provided Java library
[+] There are training materials to teach you our library
[+] Correctness is valued over speed

Each Deliverable:
[1] Pick up work on our TaskMan system
[2] Check out source code for that task
[3] Try running it locally
[4] Make appropriate modifications to the Java code until it is working ok
[5] Commit + push changes
[6] Use the TaskMan system to start a run of your code
[7] Check the score when it’s done
[8] Release the task on TaskMan to get credit

Expected workload:
At least 10 tasks per week (about 1-4 hours per ticket)
( first few will take longer as you are unfamiliar with our library )

Must be comfortable text chatting in English over Skype. Must check in online or on email 3 times per week.

** Ability to read and understand English web pages is a must. **

** Familiarity with US college/university record systems a plus **

** Being a college/university student or recent graduate is a plus. **

About the Client

(4.56) 57 reviews

United States
Santa Clara 03:51 PM

92 Jobs Posted
85% Hire Rate, 1 Open Job

Over $50,000 Total Spent
102 Hires, 0 Active

$10.00/hr Avg Hourly Rate Paid
11,621 Hours

Member Since Dec 18, 2009