Ideas for Google Summer of Code/Robust tokenisation
< Ideas for Google Summer of Code
Jump to navigation
Jump to search
Revision as of 13:06, 29 January 2018 by TommiPirinen (talk | contribs)
Task
- Update lttoolbox to be fully Unicode compliant with regards to alphabetical symbols.
Coding challenge
- Remove all multiwords from an Apertium language pair and put them in an apertium-separable dictionary.
- Make sure that the output before/after is identical.