Difference between revisions of "Ideas for Google Summer of Code/Accent and diacritic restoration"

From Apertium
Jump to navigation Jump to search
(Created page with '{{TOCD}} Many languages use diacritics and accents in normal writing, and Apertium is designed to use these, however in some places, especially for example. instant messaging, ir…')
 
 
(10 intermediate revisions by the same user not shown)
Line 4: Line 4:
 
==Tasks==
 
==Tasks==
   
* Finish the port of Kevin Scannell's [[charlifter]] to C++
+
* Finish the port of Kevin Scannell's [[charlifter]] to C++, it should respect [[superblanks]]
 
* Allow rule-based replacements of character sequences.
 
* Allow rule-based replacements of character sequences.
  +
* Train models for all languages in Apertium.
* ...
 
  +
* Inform charlifter with target-language information from a target-language model.
   
 
==Coding challenge==
 
==Coding challenge==
   
 
* Install Apertium
 
* Install Apertium
* Check out [https://apertium.svn.sourceforge.net/svnroot/apertium/branches/gsoc2011/charlifter last year's charlifter]
+
* Check out and compile [https://apertium.svn.sourceforge.net/svnroot/apertium/branches/gsoc2011/charlifter charlifter from GSOC2011]
  +
* Adjust the code such that it respects [[superblanks]].
   
 
==Frequently asked questions==
 
==Frequently asked questions==
   
  +
* none yet, ''ask us something!'' :)
==Previous GSOC projects==
 
   
  +
==See also==
   
  +
* [[Charlifter]]
[[Category:Ideas for Google Summer of Code|Geriaoueg vocabulary assistant]]
 
  +
  +
 
[[Category:Ideas for Google Summer of Code|Accent and diacritic restoration]]

Latest revision as of 13:39, 21 March 2013

Many languages use diacritics and accents in normal writing, and Apertium is designed to use these, however in some places, especially for example. instant messaging, irc, searching in the web etc. these are often not used or untyped. This causes problems as for the engine, traduccion is not the same as traducción. Create an optional module to restore diacritics and accents on input text, and integrate it into the Apertium pipeline.

Tasks[edit]

  • Finish the port of Kevin Scannell's charlifter to C++, it should respect superblanks
  • Allow rule-based replacements of character sequences.
  • Train models for all languages in Apertium.
  • Inform charlifter with target-language information from a target-language model.

Coding challenge[edit]

Frequently asked questions[edit]

  • none yet, ask us something! :)

See also[edit]