Difference between revisions of "User:Firespeaker/HFST bug"

Latest revision as of 06:33, 27 May 2021

text.lexc[edit]

Make sure to include the space in '% ' under Multichar_Symbols.

Multichar_Symbols

% 

LEXICON Root

erke:erke # ;
erke% me:erke% me # ;
medvedev:medvedev # ;

Compiling[edit]

$ hfst-lexc test.lexc -o test.hfst
$ hfst-invert test.hfst | hfst-fst2fst -w -o test.hfst.ol

Testing[edit]

Some correctly analysed forms[edit]

$ echo "erke" | hfst-proc test.hfst.ol

^erke/erke$

$ echo "erke me" | hfst-proc test.hfst.ol

^erke me/erke me$

$ echo "medvedev" | hfst-proc test.hfst.ol

^medvedev/medvedev$

The incorrectly analysed form[edit]

$ echo "erke medvedev" | hfst-proc test.hfst.ol

^erke medvedev/*erke medvedev$

Expected output[edit]

This form is analysed correctly by a transducer identical to the one above except with the "erke me" form removed:

$ echo "erke medvedev" | hfst-proc test2.hfst.ol

^erke/erke$ ^medvedev/medvedev$

Another test case[edit]

This one is meant to be more familiar to English-speakers :)

Multichar_Symbols

% 

LEXICON Root

word:word #;
word% form:word% form #;
formation:formation #;

$ echo "word formation" | hfst-proc test3.hfst.ol

^word formation/*word formation$

$ echo "formation word" | hfst-proc test3.hfst.ol

^formation/formation$ ^word/word$

Notes[edit]

This doesn't seem to affect transducers written in other formats. E.g., the transducer that results from apertium-eng-kaz.eng.dix outputs the following:

$ echo "right there" | apertium -d . eng-kaz-morph

^right there/right there<adv>$^./.<sent>$

$ echo "right the" | apertium -d . eng-kaz-morph

^right/right<adj>/right<adv>/right<n><sg>$ ^the/the<det><def><sp>$^./.<sent>$

$ echo "right therein" | apertium -d . eng-kaz-morph

^right/right<adj>/right<adv>/right<n><sg>$ ^therein/*therein$^./.<sent>$

Other materials[edit]

spectie explains the bug to firespeaker
irc.oftc.net#hfst

@@ Line 1: / Line 1: @@
+{{TOCD}}
-In 2011, a bug in how HFST handles words containing spaces was [http://sourceforge.net/p/hfst/bugs/59/ documented and resolved] (apparently in [http://hfst.svn.sourceforge.net/viewvc/hfst?view=revision&revision=1518 r1518]?), but it introduced a new bug.  This page documents the new behaviour.
+In 2011, a bug in how HFST handles words containing spaces was [http://sourceforge.net/p/hfst/bugs/59/ documented and resolved] (apparently in [http://hfst.svn.sourceforge.net/viewvc/hfst?view=revision&revision=1518 r1518]?), but it introduced a new bug.  This page documents the new [incorrect!] behaviour.  It appears to only affect transducers written in lexc.
+[https://sourceforge.net/p/hfst/bugs/153/ A bug report] was filed in January of 2013 along with a patch for a test case.  In March of 2013, [[User:Francis Tyers|spectie]] posted a patch that fixed the bug but introduced an issue with newlines and full stops.  As of today, the bug has still not been fixed.
 == text.lexc ==
@@ Line 35: / Line 38: @@
 * <code>$ echo "erke medvedev" | hfst-proc test2.hfst.ol</code>
 : <code>^erke/erke$ ^medvedev/medvedev$</code>
+== Another test case ==
+This one is meant to be more familiar to English-speakers :)
+<pre>
+Multichar_Symbols
+%
+LEXICON Root
+word:word #;
+word% form:word% form #;
+formation:formation #;
+</pre>
+* <code>$ echo "word formation" | hfst-proc test3.hfst.ol</code>
+: <code>^word formation/<span style="color:red;">*</span>word formation$</code>
+* <code>$ echo "formation word" | hfst-proc test3.hfst.ol</code>
+: <code>^formation/formation$ ^word/word$</code>
+== Notes ==
+<ul>
+<li>This doesn't seem to affect transducers written in other formats.  E.g., the transducer that results from <code>apertium-eng-kaz.eng.dix</code> outputs the following:</li>
+<ul>
+  <li><code>$ echo "right there" | apertium -d . eng-kaz-morph</code></li>
+: <code>^right there/right there<adv>$^./.<sent>$</code>
+  <li><code>$ echo "right the" | apertium -d . eng-kaz-morph</code></li>
+: <code>^right/right<adj>/right<adv>/right<n><sg>$ ^the/the<det><def><sp>$^./.<sent>$</code>
+  <li><code>$ echo "right therein" | apertium -d . eng-kaz-morph</code></li>
+: <code>^right/right<adj>/right<adv>/right<n><sg>$ ^therein/*therein$^./.<sent>$</code>
+</ul>
+</ul>
 == Other materials ==
 * [http://wiki.apertium.org/wiki/Talk:Ideas_for_Google_Summer_of_Code/Closer_integration_with_HFST spectie explains the bug to firespeaker]
+* irc.oftc.net#hfst

Difference between revisions of "User:Firespeaker/HFST bug"

Latest revision as of 06:33, 27 May 2021

Contents

text.lexc[edit]

Compiling[edit]

Testing[edit]

Some correctly analysed forms[edit]

The incorrectly analysed form[edit]

Expected output[edit]

Another test case[edit]

Notes[edit]

Other materials[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools