Extraction of Wikipedia/Wiktionary-Topics

Closed - This job posting has been filled and work has been completed.
Web, Mobile & Software Dev Scripts & Utilities Posted 2 years ago

Fixed Price

Delivery by April 26, 2013

$50.00

Budget

Details

I need a list of all Articles of wikipedia (de.wikipedia.org and en.wikipedia.org) and all entries in Wiktionary (German and English).

It could be an automated script to extract these items, but not required. It is also fine to create the list partly automated, partly manual. But it should be complete in terms below:

The list of articles of wikipedia should cover
- categories
- content articles
- referring articles

The wiktionary export should - if possible - should be grouped by type (i.e. "Noun", "Adjective", ...)

The lists should be plain text files like that:

content-pages-en.txt:
  :
  :
  A.C. Smith
  A.C. St. Louis
  A.C. Stephens
  :
  :

If possible, also with URL-ending or full-URL:

content-pages-en.txt:
  :
  :
  A.C. Smith; A.C._Smith
  A.C. St. Louis; A.C._St._Louis
  A.C. Stephens
  :
  :

Please give me feedback if you see much more or less effort.


About the Client

(4.53) 23 reviews

Germany
M√ľnchen 04:29 AM

50 Jobs Posted
57% Hire Rate, 1 Open Job

Over $30,000 Total Spent
31 Hires, 3 Active

$16.43/hr Avg Hourly Rate Paid
2,333 Hours

Member Since Nov 7, 2008