Ideas for Google Summer of Code/Extend lttoolbox to have the power of HFST

From Apertium
Jump to navigation Jump to search

Some language pairs in Apertium use HFST where most language pairs use Apertium's own lttoolbox. This is due to the fact that writing morphologies for languages that have features such as the vowel harmony found in Turkic languages is very hard with the current format supported by lttoolbox. The mixture of HFST and lttoolbox makes it harder for people to develop some language pairs.

Tasks

  • Extend lttoolbox (perhaps writing a preprocessor for it) so that it can be used to do the morphological transformations currently done with HFST.
  • And yes, of course, writing something that translates the current HFST format to the new lttolbox format.
  • Proof of concept:
    • Come up with a new format that can express all of the features found in the Kazakh transducer;
    • implement this format in Apertium;
    • Implement the Kazakh transducer in this format and integrate it in the English--Kazakh pair.

Coding challenge

Frequently asked questions

  • none yet, ask us something! :)

See also