Automatic text normalisation

From Apertium

Revision as of 12:52, 23 March 2014 by Ksnmi (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

General ideas

Diacritic restoration
Reduplicated character reduction
- How to learn language specific settings? -- e.g. in English certain consonants can double, but others cannot, same goes for vowels. Can we learn these by looking at a corpus ?

Code switching

For the language subpart... we can actually train and keep copies of most frequently corrected words across languages and then refer to that list...
- Maybe this will be too heavy for the on the run application ( needs discussion )

Retrieved from "https://wiki.apertium.org/w/index.php?title=Automatic_text_normalisation&oldid=47777"