Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

Ideas for Google Summer of Code/Automatic diacritic restoration

From Apertium
Jump to: navigation, search

Kevin Scannell has a Perl implementation of various statistical restoration algorithms called charlifter, which has been trained for more than 100 languages using web crawled data. Details are in his paper linked below. You can try the system here (or install the Firefox extension here).

A port of the algorithm to C++ should be easy. The more subtle issue is to optimize smoothing of the statistical models on a language-by-language basis.

References
Personal tools