Apertium Turkic/TODO
Jump to navigation
Jump to search
This is a general to-do list for the Apertium Turkic working group.
Website
This section outlines what's left to get http://turkic.apertium.com/ up and running.
software infrastructure
Get apertium-apy working stably- merge simple-html and html-tools so that simple-html can be automatically extracted from html-tools
apache forwarding for html-tools(unnecessary!)- init scripts and cron testers for apertium-html-tools, gateway, and apertium-apy
- find some way to have it retry restarting if it fails because the port is still reserved by the OS
optional: spell checker and language detection stuff
- spell checking mode in apertium-apy
- integrate spell checker interface into html-tools
- get language detection interface working (in progress)
- language detection mode in apertium-apy (prototype done)
what to include
make the following pairs available to the site:
- pairs: kaz-tat, tur-kir, kaz-kir, tat-bak, kaz-kaa, tuk-tur?, tur-uzb?, kaz-eng?
- transducers: kaz, tat, kir, tur, bak, chv, kum, nog, kaa, uzb?, tuk?
prettifying
- localised language names in analysis, generation, and spell-check modes
- get a working theme together
- make sandbox mode disabled unless an appropriate switch is passed to apertium-html-tools
- add a note (localised to various languages) along the lines of "Found a mistake? Help us fix it!" with link to Apertium Turkic
future
- consider including the web concordancer on the site (and consider what corpora to provide search access to...)
Things that need to be figured out
- How can we count lexc stems effectively? - JNW's bash script can be generalised (and rewritten in python), and it'll come close
Issues introduced by new build process
- How can we do single-category testvoc now?
- How can we make vanilla transducers (without MT-specific "wrong" POSes)
- How can we count trimmed stems?