Ideas for Google Summer of Code/Cyclical paths in .dix format

From Apertium
Jump to navigation Jump to search

At the moment it is not possible to define cyclical paths in lttoolbox's XML-based transducer format. The idea of this project is to implement support for cyclical paths and then carry out a thorough performance characterisation, comparing lttoolbox compilation and processing speed compared to other tools such as HFST and XFST.

The lttoolbox binary format supports cyclical paths, but the lexicon format does not. The objective of this project is to make the lexicon format support cyclical paths. Furthermore, we would like to have a detailed performance characterisation of lttoolbox as compared to HFST and XFST. An outstanding candidate would also provide suggestions or patches for improving the performance where possible.

Difficulties / caveats

  • There can be at most one <g/> element in any analysis, meaning there can be no #'s in the actual cycles.

See also