Difference between revisions of "Ideas for Google Summer of Code/Closer integration with HFST"
Jump to navigation
Jump to search
(→Tasks) |
|||
Line 9: | Line 9: | ||
* Fix [http://sourceforge.net/tracker/?func=detail&aid=3383731&group_id=224521&atid=1061990 this bug] in <code>hfst-proc</code> tokenisation. |
* Fix [http://sourceforge.net/tracker/?func=detail&aid=3383731&group_id=224521&atid=1061990 this bug] in <code>hfst-proc</code> tokenisation. |
||
** the link says it's fixed, is it? (or is it that we want <code>^al/*al$ ^žaktare/*žaktare$</code> instead of <code>^al žaktare/*al žaktare$</code>?) |
** the link says it's fixed, is it? (or is it that we want <code>^al/*al$ ^žaktare/*žaktare$</code> instead of <code>^al žaktare/*al žaktare$</code>?) |
||
+ | *** yes, we want the same behaviour as lttoolbox. |
||
* Modify the HFST build process to make a "minimal" Apertium-centred install. |
* Modify the HFST build process to make a "minimal" Apertium-centred install. |
||
* Add [[lttoolbox]] as a backend to HFST. |
* Add [[lttoolbox]] as a backend to HFST. |
Revision as of 10:39, 10 March 2012
This is a set of subtasks to make it easier for Apertium developers to use the Helsinki Finite-State Toolkit (HFST). HFST is a great toolkit for working with morphological transducers, but it is pretty difficult to install, and also not very well integrated with Apertium / doesn't really follow the Apertium way of doing things. We'd like to make it more closely integrated.
Tasks
- Create a new XML-based format for lexc inspired by lttoolbox (see Development ideas for dictionary format)
- Add a compiler for this format, with support for direction restrictions.
- Fix this bug in
hfst-proc
tokenisation.- the link says it's fixed, is it? (or is it that we want
^al/*al$ ^žaktare/*žaktare$
instead of^al žaktare/*al žaktare$
?)- yes, we want the same behaviour as lttoolbox.
- the link says it's fixed, is it? (or is it that we want
- Modify the HFST build process to make a "minimal" Apertium-centred install.
- Add lttoolbox as a backend to HFST.
- Make
hfst-expand
obey flag diacritics.