E-artnow is an e-book & Print on demand publisher. Until now we have focused on re-publishing classics but now we want to try to produce quality books with NLP, based mainly on Wikipedia articles, by using Summarisation, Paraphrasing and Topic Modeling.
We need a Programmer for an ongoing basis with minimum working time requirement of 10 hours per week and minimum availability for at least the next 12 months to come.
Tasks include mostly (but not exclusively) to use existing Topic Modeling, NLP scripts & Libraries to be processing Wikipedia articles for producing "new" Texts (which will be formatted into e-Books and Print on Demand books by other programmers) that are not just "copies" of existing Wikipedia articles, by using different kinds of Topic Modeling (either Wikipedias own categorizations and lists - or some suitable existing Topic Modeling scripts) + NLP summarization and/or paraphrasing of those Wikipedia articles.
We do not want to impose a certain concept or content for the books and then try to create a tailored script that does exactly that. We want to try a different approach: to study selected existing scripts that we could "somehow" use to create "some kind of content" from Wikipedia articles and why not also from our own classics ebooks. We are interested in the possibility of extractive summarizing of our existing classics ebooks also. We are also interested in producing Wikipedia based Quiz books and "didactic books" with questions and answers about the content and why not "language training" books with gradual translation of content into another language (you read an english text and it has included some french words/translations of specific words like adjectives, verbs etc>).
So because we don't have a clear end format in mind we would like to test some existing scripts and adapt our publishing projects to the capacities and limits of those scripts or find better existing scripts that perform similar tasks. Of course some (or extensive) adaptations of these scripts will be necessary.
So here are some existing scripts that we are interested in testing out:
Gensim Experiments on English Wikipedia: https://radimrehurek.com/gensim/wiki.html
WikiArticleCreator: Not sure if this is some kind of Topic Modeling?
or maybe try to use / build this:
WikiQuestions and answers generation:
If any scripts developed by Stanfords NLP could be of use that would be great:
Thanks for your attention and let us know if you have any ideas how to organize and start with these projects?
OBS. This is a work for hire contract:
WORK for Hire Agreement: the "NLP & Machine Learning programmer for Summarization, Paraphrasing & Topic Modeling" who will perform the above detailed Work and the Publisher e-artnow s.r.o. intend this to be a contract for services and each considers the products and results of the services to be rendered by the "NLP & Machine Learning programmer for Summarization, Paraphrasing & Topic Modeling" who will perform the above detailed Work, hereunder the "WORK", to be a work made for hire with no credits for authorship. The "NLP & Machine Learning programmer for Summarization, Paraphrasing & Topic Modeling" acknowledges and agrees that the WORK (and all rights therein, including, without limitation, authorship and copyright) belongs to and shall be the sole and exclusive property of the Publisher. If for any reason the WORK would not be considered a work made for hire under applicable law, the "NLP & Machine Learning programmer for Summarization, Paraphrasing & Topic Modeling" does hereby sell, assign, and transfer to the Publisher, its successors and assigns, the entire right, title and interest in and to the copyright in the WORK and any registrations and copyright applications relating thereto and any renewals and extensions thereof, and in and to all works based upon, derived from, or incorporating the WORK throughout the world.