Difference between revisions of "User:Firespeaker/HFST bug"

Revision as of 07:15, 18 January 2013

text.lexc

Make sure to include the space in '% ' under Multichar_Symbols.

Multichar_Symbols

% 

LEXICON Root

erke:erke # ;
erke% me:erke% me # ;
medvedev:medvedev # ;

Compiling

$ hfst-lexc test.lexc -o test.hfst
$ hfst-invert test.hfst | hfst-fst2fst -w -o test.hfst.ol

Testing

Some correctly analysed forms

$ echo "erke" | hfst-proc test.hfst.ol

^erke/erke$

$ echo "erke me" | hfst-proc test.hfst.ol

^erke me/erke me$

$ echo "medvedev" | hfst-proc test.hfst.ol

^medvedev/medvedev$

The incorrectly analysed form

$ echo "erke medvedev" | hfst-proc test.hfst.ol

^erke medvedev/*erke medvedev$

Expected output

This form is analysed correctly by a transducer identical to the one above except with the "erke me" form removed:

$ echo "erke medvedev" | hfst-proc test2.hfst.ol

^erke/erke$ ^medvedev/medvedev$

Another test case

This one is meant to be more familiar to English-speakers :)

Multichar_Symbols

% 

LEXICON Root

word:word #;
word% form:word% form #;
formation:formation #;

$ echo "word formation" | hfst-proc test3.hfst.ol

^word formation/*word formation$

$ echo "formation word" | hfst-proc test3.hfst.ol

^formation/formation$ ^word/word$

Notes

This doesn't seem to affect transducers written in other formats. E.g., the transducer that results from apertium-eng-kaz.eng.dix outputs the following:
- $ echo "right there" | apertium -d . eng-kaz-morph

^right there/right there<adv>$^./.<sent>$

- $ echo "right the" | apertium -d . eng-kaz-morph

^right/right<adj>/right<adv>/right<n><sg>$ ^the/the<det><def><sp>$^./.<sent>$

Other materials

spectie explains the bug to firespeaker
irc.freenode.net#hfst

@@ Line 56: / Line 56: @@
 * <code>$ echo "formation word" | hfst-proc test3.hfst.ol</code>
 : <code>^formation/formation$ ^word/word$</code>
+== Notes ==
+* This doesn't seem to affect transducers written in other formats.  E.g., the transducer that results from <code>apertium-eng-kaz.eng.dix</code> outputs the following:
+** <code>$ echo "right there" | apertium -d . eng-kaz-morph</code>
+:: <code>^right there/right there<adv>$^./.<sent>$</code>
+** <code>$ echo "right the" | apertium -d . eng-kaz-morph</code>
+:: <code>^right/right<adj>/right<adv>/right<n><sg>$ ^the/the<det><def><sp>$^./.<sent>$</code>

Difference between revisions of "User:Firespeaker/HFST bug"

Revision as of 07:15, 18 January 2013

Contents

text.lexc

Compiling

Testing

Some correctly analysed forms

The incorrectly analysed form

Expected output

Another test case

Notes

Other materials

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools