Website scraping

Closed - This job posting has been filled and work has been completed.
Web & Mobile Development Other - Web & Mobile Development Posted 2 years ago

Hourly Job

Hours to be determined
Less than 1 month

Details

The project is to improve on how we scrape and parse websites.

Following describes what we currently do.

We have a long list of specific websites, all news related.

We currently use QueryPath and a queue process.

We have default QueryPath selector strings for things like title, byline, article, and omit (for things to remove from the article).

We have custom selector strings for specific websites to override the defaults.

We have a test page for each site to test parsing a reference page and save the QueryPath strings.

We first retrieve the full HTML for the page and do our best to look for URI to second or more pages for the same article. Each site has their own convention for page URI.

Next we parse the article, title, and byline. We have extensive regular expressions to help.


About the Client

(4.98) 104 reviews

United States
Sumas 10:00 AM

252 Jobs Posted
61% Hire Rate, 1 Open Job

Over $50,000 Total Spent
210 Hires, 17 Active

$25.52/hr Avg Hourly Rate Paid
10,816 Hours

Member Since Nov 29, 2007