Automatic text normalisation

From Apertium
Revision as of 12:48, 23 March 2014 by Francis Tyers (talk | contribs) (Created page with " ==General ideas== * Diacritic restoration * Reduplicated character reduction ** How to learn language specific settings? -- e.g. in English certain consonants can double, bu...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

General ideas

  • Diacritic restoration
  • Reduplicated character reduction
    • How to learn language specific settings? -- e.g. in English certain consonants can double, but others cannot, same goes for vowels. Can we learn these by looking at a corpus ?

Code switching