Difference between revisions of "Ideas for Google Summer of Code/Automatic diacritic restoration"

From Apertium
Jump to navigation Jump to search
(New "read more" page for diacritic restoration)
 
Line 1: Line 1:
  +
[[User:Kevin Scannell|Kevin Scannell]] has a Perl implementation of various statistical restoration algorithms called [http://sourceforge.net/projects/lingala/ charlifter], which has been trained for more than 100 languages using web crawled data. Details are in his paper linked below. You can try the system [http://logipam.org/charlifter/index.php here].
  +
  +
A port of the algorithm to C++ should be easy. The more subtle issue is to optimize smoothing of the statistical models on a language-by-language basis.
   
 
;References
 
;References

Revision as of 18:17, 13 March 2010

Kevin Scannell has a Perl implementation of various statistical restoration algorithms called charlifter, which has been trained for more than 100 languages using web crawled data. Details are in his paper linked below. You can try the system here.

A port of the algorithm to C++ should be easy. The more subtle issue is to optimize smoothing of the statistical models on a language-by-language basis.

References