Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

Ideas for Google Summer of Code/Robust tokenisation

From Apertium
Jump to: navigation, search

[edit] Task

  • Update lttoolbox to be fully Unicode compliant with regards to alphabetical symbols.

[edit] Coding challenge

Write a program that uses data from Unicode to classify characters in an input stream into alphabetic and non-alphabetic.


echo "This! Is a tešt тест ** % test." | ./classify-symbols
C h
C i 
C s 
X ! 
C I 
C s


[edit] Further readings

Personal tools