Well, my problem is that I have some texts that have some mistakes or lack some words inside of them. I would like to have a tool that would allow me to repair them automatically using n-gram language model in APRA format. My text in format like one sentence per line in UTF-8, I need the tool to work under Ubuntu linux - I am using version 16.04.
In the text there may be following errors:
- lack of word or words in a sentence (with unk symbol): e.g. "Ala ma unk i dwa psy." should be corrected to "Ala ma kota i dwa psy.". NOTE. There can be many unk in one sentence.
- wrong for form or inflection: e.g. "Ala kupił zielony samochód." and should be "Ala kupiła zielony samochód."
- (very rare) totally wrong word in text: e.g. "Rąb ma pole o wymiarach." and should be "Romb ma pole o wymiarach."