Difference between revisions of "Automatic text normalisation"

From Apertium
Jump to navigation Jump to search
(Created page with " ==General ideas== * Diacritic restoration * Reduplicated character reduction ** How to learn language specific settings? -- e.g. in English certain consonants can double, bu...")
 
Line 7: Line 7:
   
 
==Code switching==
 
==Code switching==
  +
  +
* For the language subpart... we can actually train and keep copies of most frequently corrected words across languages and then refer to that list...
  +
** Maybe this will be too heavy for the on the run application ( needs discussion )

Revision as of 12:52, 23 March 2014

General ideas

  • Diacritic restoration
  • Reduplicated character reduction
    • How to learn language specific settings? -- e.g. in English certain consonants can double, but others cannot, same goes for vowels. Can we learn these by looking at a corpus ?

Code switching

  • For the language subpart... we can actually train and keep copies of most frequently corrected words across languages and then refer to that list...
    • Maybe this will be too heavy for the on the run application ( needs discussion )