I am looking for a person that will guide me how to train from plain text so called:
1. language model with classes (class-based ) - http://www.cs.cmu.edu/~roni/11661/PreviousYearsHandouts/classlm.pdf, http://projects.csail.mit.edu/cgi-bin/wiki/view/SLS/SriLM
2. skip-based langauge model,
3. factored langauge model - http://ssli.ee.washington.edu/people/duh/papers/flm-manual.pdf, http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc10
To my knowledge all this could be done with SRILM tool after some preprocessing steps.
I am familiar with tools like SRILM, IRSTLM or KENLM but till now I trained only normal models.
I need gudance how train class based, skip n-gram and factored from normal textual data like http://opus.lingfil.uu.se/OpenSubtitles2016.php
Data pre-processing also should be included in the guide. If needed tools provided.
The resulting model should be in ARPA format.