Difference between revisions of "User:Firespeaker/HFST bug"
Jump to navigation
Jump to search
Firespeaker (talk | contribs) |
Firespeaker (talk | contribs) |
||
Line 2: | Line 2: | ||
In 2011, a bug in how HFST handles words containing spaces was [http://sourceforge.net/p/hfst/bugs/59/ documented and resolved] (apparently in [http://hfst.svn.sourceforge.net/viewvc/hfst?view=revision&revision=1518 r1518]?), but it introduced a new bug. This page documents the new [incorrect!] behaviour. It appears to only affect transducers written in lexc. |
In 2011, a bug in how HFST handles words containing spaces was [http://sourceforge.net/p/hfst/bugs/59/ documented and resolved] (apparently in [http://hfst.svn.sourceforge.net/viewvc/hfst?view=revision&revision=1518 r1518]?), but it introduced a new bug. This page documents the new [incorrect!] behaviour. It appears to only affect transducers written in lexc. |
||
[https://sourceforge.net/p/hfst/bugs/153/ A bug report] |
[https://sourceforge.net/p/hfst/bugs/153/ A bug report] was filed in January of 2013 along with a patch for a test case. In March of 2013, [[User:Ftyers|spectie]] posted a patch that fixed the bug but introduced an issue with newlines and full stops. As of today, the bug has still not been fixed. |
||
== text.lexc == |
== text.lexc == |
Revision as of 15:56, 28 April 2013
In 2011, a bug in how HFST handles words containing spaces was documented and resolved (apparently in r1518?), but it introduced a new bug. This page documents the new [incorrect!] behaviour. It appears to only affect transducers written in lexc.
A bug report was filed in January of 2013 along with a patch for a test case. In March of 2013, spectie posted a patch that fixed the bug but introduced an issue with newlines and full stops. As of today, the bug has still not been fixed.
text.lexc
Make sure to include the space in '%
' under Multichar_Symbols
.
Multichar_Symbols % LEXICON Root erke:erke # ; erke% me:erke% me # ; medvedev:medvedev # ;
Compiling
$ hfst-lexc test.lexc -o test.hfst
$ hfst-invert test.hfst | hfst-fst2fst -w -o test.hfst.ol
Testing
Some correctly analysed forms
$ echo "erke" | hfst-proc test.hfst.ol
^erke/erke$
$ echo "erke me" | hfst-proc test.hfst.ol
^erke me/erke me$
$ echo "medvedev" | hfst-proc test.hfst.ol
^medvedev/medvedev$
The incorrectly analysed form
$ echo "erke medvedev" | hfst-proc test.hfst.ol
^erke medvedev/*erke medvedev$
Expected output
This form is analysed correctly by a transducer identical to the one above except with the "erke me" form removed:
$ echo "erke medvedev" | hfst-proc test2.hfst.ol
^erke/erke$ ^medvedev/medvedev$
Another test case
This one is meant to be more familiar to English-speakers :)
Multichar_Symbols % LEXICON Root word:word #; word% form:word% form #; formation:formation #;
$ echo "word formation" | hfst-proc test3.hfst.ol
^word formation/*word formation$
$ echo "formation word" | hfst-proc test3.hfst.ol
^formation/formation$ ^word/word$
Notes
- This doesn't seem to affect transducers written in other formats. E.g., the transducer that results from
apertium-eng-kaz.eng.dix
outputs the following: $ echo "right there" | apertium -d . eng-kaz-morph
^right there/right there<adv>$^./.<sent>$
$ echo "right the" | apertium -d . eng-kaz-morph
^right/right<adj>/right<adv>/right<n><sg>$ ^the/the<det><def><sp>$^./.<sent>$
$ echo "right therein" | apertium -d . eng-kaz-morph
^right/right<adj>/right<adv>/right<n><sg>$ ^therein/*therein$^./.<sent>$
Other materials
- spectie explains the bug to firespeaker
- irc.freenode.net#hfst