extract popular phrases from the text corpus

Closed - This job posting has been filled and work has been completed.
Web, Mobile & Software Dev Scripts & Utilities Posted 1 year ago

Hourly Job

Hours to be determined
Less than 1 week

Expert Level

I am willing to pay higher rates for the most experienced freelancers


I need you to perform analysis on natural language text corpora.
you would be given a text corpous of ~600K sentences in english (with typos, slang etc) here is what you need to do create a script which:

1) extracts popular phrases (with frequencies) using various methods:
(you can use the ones from here http://www.quora.com/What-is-the-best-way-to-analyze-a-corpus-of-text-to-determine-the-most-popular-phrases and here http://www.quora.com/Whats-the-best-way-to-extract-phrases-from-a-corpus-of-text-using-Python or suggest your own)
2) has a control panel, with parameters for the extraction (length of the n-grams, stop-words lists, work with adjectives, etc)
3) since the dataset contains a lot of spelling errors and slang there should also be a control which allows to work with substitution dictionaries (e.g. we provide a dictionary of popular typos and then run the extraction again with that dictionary in mind)

a sample of the text data is attached

if you have a ready solution (yours or know of someone else's) that is totally fine

About the Client

(4.98) 100 reviews

United States
Sf 02:48 AM

126 Jobs Posted
70% Hire Rate, 1 Open Job

Over $50,000 Total Spent
141 Hires, 0 Active

$20.10/hr Avg Hourly Rate Paid
4,287 Hours

Member Since Sep 3, 2013