Ideas for Google Summer of Code/Robust tokenisation
< Ideas for Google Summer of Code
Jump to navigation
Jump to search
Revision as of 01:06, 31 January 2019 by Francis Tyers (talk | contribs)
Task
- Update lttoolbox to be fully Unicode compliant with regards to alphabetical symbols.
Coding challenge
Write a program that uses data from Unicode to classify characters in an input stream into alphabetic and non-alphabetic.
e.g.
echo "This! Is a tešt тест ** % test." | ./classify-symbols C T C h C i C s X ! X C I C s ...