User:Firespeaker/Wishlist

From Apertium
Jump to navigation Jump to search

The following are projects that I would love to see working as part of apertium.

Current smallish things[edit]

Maybe for GCI 2013?

  • libvoikko support for MS Word (might not be possible)
  • online spell-check interface using apertium API
    • interface
    • code to check (via API or apy)
    • apertium-apy code that does spell-checking (with suggestions, etc.)
  • get apertium-apy working correctly for Turkic pairs
    kaz-tat works now
  • features for apertium-apy
    • better error handling for apertium-apy
    • write init script for apertium-apy
      • with crash-detection?
        one for systemd (Fedora, Arch Linux, SUSE) which makes it restart whenever it exits (unless it was stopped by systemd itself)
        one for Upstart (Ubuntu), also restarts on exit
        or you can put it in inittab
        Note: if you want to detect a "hang", this would be better done by e.g. a cronjob that actually tries to translate something, and if it doesn't within respond nicely (e.g. curl times out and exits with non-zero status, or the translation is wrong/empty), then "initctl restart apertium-apy" (Upstart) or "systemctl restart apertium-apy" (systemd)
      • allow running of multiple instances on different ports
        this already works
    • set up client to cycle through a list of servers and ports if the default is not responding
      • or would it be better to have a central gateway that does this?
    • a [backwards-compatible? or make a runtime switch] way to do spell checking
    • a [backwards-compatible? or make a runtime switch] way to do morphological analyses
  • get bible aligner working (or rewrite it)
  • migrate apertium-quality away from distribute to newer setup-tools so it installs correctly in more recent versions of python (known incompatible: python3.3 OS X, known compatible: MacPorts python3.2)
  • combine the following features above into one pretty Apertium Turkic website:

integration of twol debugger into apertium-quality[edit]

    • I'd also reallly like to see someone adopt apertium-quality and maintain it

module for templatic syntax[edit]

    • also, for e.g. compound verbs some sort of interlanguage
      • integrated into bidix format?

an extra module to convert numbers and alphabetisms and acronyms to words (based on something in lexc) so that phonology can happen and then back to its original form[edit]

    • you'd need some notation like 10:%{он%} so the form is extracted and popped back in
    • this is likely to only work on stems that don't change, at least with a non-complicated implementation (but that's fine for most Turkic languages)

Bugs[edit]

HFST :0 context bug[edit]

  1. %{A%}:ә <=> [ :FrontVow :Cns* :Cns ]/[ %>: | % | :0 ] _ ;
    except
    [ %{ъ%}: ]/[ %>: ] _ ;
  2. %{A%}:ә <=> [ :FrontVow :Cns* :Cns ]/[ %>: | % | [ :0 - %{ъ%}: ] ] _ ;
  • (2) avoids applying after %{ъ%}:0 %>:0 and (1) does not, but these should really work the same way

HFST tokenisation bug[edit]