Ideas for Google Summer of Code/Robust tokenisation

From Apertium
< Ideas for Google Summer of Code
Revision as of 12:51, 29 January 2018 by Francis Tyers (talk | contribs) (Created page with " ==Task== * Update lttoolbox to be fully Unicode compliant with regards to alphabetical symbols. ==Coding challenge== * Remove all multiwords from an Apertium languag...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Task

  • Update lttoolbox to be fully Unicode compliant with regards to alphabetical symbols.


Coding challenge

  • Remove all multiwords from an Apertium language pair and put them in an apertium-separable dictionary.
  • Make sure that the output before/after is identical.