Download Wikipedia pages using API (in Python)

Web, Mobile & Software Dev Web Development Posted 2 years ago

Fixed Price

Delivery by June 7, 2013

$50.00

Budget

Details

I have a list of 644 Wikipedia pages which you must process using python. (csv file is attached)

For each page you must provide me with :

1. a XML file with a list of all "revision" ids

2. for each revision, download the relevant Wikipedia page and store it as a HTML document

The final output is organized as follows:

1. a folder for each wikipedia page (eg: ./Pope_John_Paul_II/)

2. an xml file (Pope_John_Paul_II.xml) which contains a list of revisions in that folder. (use the following API: http://www.mediawiki.org/wiki/API:Properties#revisions_.2F_rv)

3. a revisions folder with a html file named after each revision id (eg: ./Pope_John_Paul_II/revisions/552708734.html) which is downloaded from Wikipedia's mobile version -- in this case http://en.m.wikipedia.org/w/index.php?oldid=552708734

4. Python scripts for this entire process, which I will use to replicate your data collection process.

Open Attachment

Skills Required:

Client Activity on this Job

Last Viewed: 1 year ago

Applicants: 6

Hired: 1


About the Client

(5.00) 10 reviews

United States
Somerville 11:07 PM

28 Jobs Posted
61% Hire Rate, 1 Open Job

$402 Total Spent
18 Hires, 2 Active

$10.00/hr Avg Hourly Rate Paid
5 Hours

Member Since Oct 23, 2012