Difference between revisions of "User:Firespeaker/HFST bug"
Jump to navigation
Jump to search
Firespeaker (talk | contribs) |
Firespeaker (talk | contribs) |
||
Line 1: | Line 1: | ||
{{TOCD}} |
|||
In 2011, a bug in how HFST handles words containing spaces was [http://sourceforge.net/p/hfst/bugs/59/ documented and resolved] (apparently in [http://hfst.svn.sourceforge.net/viewvc/hfst?view=revision&revision=1518 r1518]?), but it introduced a new bug. This page documents the new behaviour. |
In 2011, a bug in how HFST handles words containing spaces was [http://sourceforge.net/p/hfst/bugs/59/ documented and resolved] (apparently in [http://hfst.svn.sourceforge.net/viewvc/hfst?view=revision&revision=1518 r1518]?), but it introduced a new bug. This page documents the new behaviour. |
||
Revision as of 07:07, 18 January 2013
In 2011, a bug in how HFST handles words containing spaces was documented and resolved (apparently in r1518?), but it introduced a new bug. This page documents the new behaviour.
text.lexc
Make sure to include the space in '%
' under Multichar_Symbols
.
Multichar_Symbols % LEXICON Root erke:erke # ; erke% me:erke% me # ; medvedev:medvedev # ;
Compiling
$ hfst-lexc test.lexc -o test.hfst
$ hfst-invert test.hfst | hfst-fst2fst -w -o test.hfst.ol
Testing
Some correctly analysed forms
$ echo "erke" | hfst-proc test.hfst.ol
^erke/erke$
$ echo "erke me" | hfst-proc test.hfst.ol
^erke me/erke me$
$ echo "medvedev" | hfst-proc test.hfst.ol
^medvedev/medvedev$
The incorrectly analysed form
$ echo "erke medvedev" | hfst-proc test.hfst.ol
^erke medvedev/*erke medvedev$
Expected output
This form is analysed correctly by a transducer identical to the one above except with the "erke me" form removed:
$ echo "erke medvedev" | hfst-proc test2.hfst.ol
^erke/erke$ ^medvedev/medvedev$
Another test case
This one is meant to be more familiar to English-speakers :)
Multichar_Symbols % LEXICON Root word:word #; word% form:word% form #; formation:formation #;
$ echo "word formation" | hfst-proc test3.hfst.ol
^word formation/*word formation$
$ echo "formation word" | hfst-proc test3.hfst.ol
^formation/formation$ ^word/word$
Other materials
- spectie explains the bug to firespeaker
- irc.freenode.net#hfst