web crawler

Closed - This job posting has been filled and work has been completed.
Web, Mobile & Software Dev Desktop Software Development Posted 3 years ago


More than 30 hrs/week
1 to 3 months


   we are a research team for an experimental purpose project at lab, in a couple of years may bring it to the real world
   1) we want to download, cache and archive lots of pages, retaining the sitemap graph, and origin of the page, and metadata, resources, attachments preserved
   2) they should be well organized, archived, indexed and filtering out unwanted pages, we may develop own algorithm for it.
   3) easily programmable and integrating with many of our in house software for data intensive processing
   4) easy of monitoring, troubleshooting and capacity management and planning, backing up pages for additional storages.
   5)potentially integrated with hadoop and mapreduce, bigtable and support clustering and network storage
   6)potentially integrated with additional third party software, like sql server, noSql, mongo, and drupal, so forth.

Skills: research, troubleshooting, management

About the Client

(5.00) 2 reviews

Taian 02:41 PM

1 Job Posted
100% Hire Rate, 1 Open Job

$367 Total Spent
2 Hires, 0 Active

$11.11/hr Avg Hourly Rate Paid
33 Hours

Member Since Jan 26, 2013