We are searching for a programmer who can extract postings data/content from forums. The data extraction should only contain POSTINGS, without the information about username, time, "thank you" for postings, quotes and other stuff. If the threads contain more pages, all pages should be extracted and combined into one. The format including <p> should be consistent so that the content can be read easily.
The extracted content should be automatically be pushed into google Translate; with the possibility to choose the entry and export language(s). The translated version of the forum content should be exported as a .csv file, including the whole forum content.
Important information for data scraping: There should be Time-Delay used/different IPs for crawling, so that the server does not block the crawling. For a proof of concept please contact us for information about the forums which we want to be crawled/extracted.
In order to receiver example Forums with manually done Translations (in order to see what the result of the automatisation should be) please contact us.