Ideas for Google Summer of Code/Closer integration with HFST

From Apertium
Jump to navigation Jump to search

This is a set of subtasks to make it easier for Apertium developers to use the Helsinki Finite-State Toolkit (HFST). HFST is a great toolkit for working with morphological transducers, but it is pretty difficult to install, and also not very well integrated with Apertium / doesn't really follow the Apertium way of doing things. We'd like to make it more closely integrated.

Tasks

  • Create a new XML-based format for lexc inspired by lttoolbox (see Development ideas for dictionary format)
  • Add a compiler for this format, with support for direction restrictions.
  • Fix this bug in hfst-proc tokenisation.
  • Modify the HFST build process to make a "minimal" Apertium-centred install.
  • Add lttoolbox as a backend to HFST.
  • Make hfst-expand obey flag diacritics.

Coding challenge

  • Install Apertium and HFST
  • Install a language pair which uses both Apertium and HFST.

Frequently asked questions

Previous GSOC projects