Difference between revisions of "Ideas for Google Summer of Code/Closer integration with HFST"

From Apertium
Jump to navigation Jump to search
 
(3 intermediate revisions by the same user not shown)
Line 7: Line 7:
 
* Create a new XML-based format for [[lexc]] inspired by [[lttoolbox]] (see [[Development ideas for dictionary format]])
 
* Create a new XML-based format for [[lexc]] inspired by [[lttoolbox]] (see [[Development ideas for dictionary format]])
 
* Add a compiler for this format, with support for direction restrictions.
 
* Add a compiler for this format, with support for direction restrictions.
* Fix [http://sourceforge.net/tracker/?func=detail&aid=3383731&group_id=224521&atid=1061990 this bug] in <code>hfst-proc</code> tokenisation.
+
* Fix [https://sourceforge.net/p/hfst/bugs/153/ this bug] in <code>hfst-proc</code> tokenisation.
** the link says it's fixed, is it? (or is it that we want <code>^al/*al$ ^žaktare/*žaktare$</code> instead of <code>^al žaktare/*al žaktare$</code>?)
 
 
* Modify the HFST build process to make a "minimal" Apertium-centred install.
 
* Modify the HFST build process to make a "minimal" Apertium-centred install.
 
* Add [[lttoolbox]] as a backend to HFST.
 
* Add [[lttoolbox]] as a backend to HFST.
Line 19: Line 18:
   
 
==Frequently asked questions==
 
==Frequently asked questions==
  +
* none yet, ''[[contact|ask us]] something!'' :)
   
  +
==See also==
==Previous GSOC projects==
 
   
   

Latest revision as of 23:58, 5 April 2013

This is a set of subtasks to make it easier for Apertium developers to use the Helsinki Finite-State Toolkit (HFST). HFST is a great toolkit for working with morphological transducers, but it is pretty difficult to install, and also not very well integrated with Apertium / doesn't really follow the Apertium way of doing things. We'd like to make it more closely integrated.

Tasks[edit]

  • Create a new XML-based format for lexc inspired by lttoolbox (see Development ideas for dictionary format)
  • Add a compiler for this format, with support for direction restrictions.
  • Fix this bug in hfst-proc tokenisation.
  • Modify the HFST build process to make a "minimal" Apertium-centred install.
  • Add lttoolbox as a backend to HFST.
  • Make hfst-expand obey flag diacritics.

Coding challenge[edit]

  • Install Apertium and HFST
  • Install a language pair which uses both Apertium and HFST.

Frequently asked questions[edit]

  • none yet, ask us something! :)

See also[edit]