Difference between revisions of "User:Firespeaker/HFST bug"

From Apertium
Jump to navigation Jump to search
Line 19: Line 19:
   
 
== Testing ==
 
== Testing ==
  +
=== Some correctly analysed forms ===
 
* <code>$ echo "erke" | hfst-proc test.hfst.ol</code>
 
* <code>$ echo "erke" | hfst-proc test.hfst.ol</code>
:: <code>^erke/erke$</code>
+
: <code>^erke/erke$</code>
 
* <code>$ echo "erke me" | hfst-proc test.hfst.ol </code>
 
* <code>$ echo "erke me" | hfst-proc test.hfst.ol </code>
:: <code>^erke me/erke me$</code>
+
: <code>^erke me/erke me$</code>
 
* <code>$ echo "medvedev" | hfst-proc test.hfst.ol</code>
 
* <code>$ echo "medvedev" | hfst-proc test.hfst.ol</code>
:: <code>^medvedev/medvedev$</code>
+
: <code>^medvedev/medvedev$</code>
  +
=== The incorrectly analysed form ===
 
* <code>$ echo "erke medvedev" | hfst-proc test.hfst.ol</code>
 
* <code>$ echo "erke medvedev" | hfst-proc test.hfst.ol</code>
:: <code>^erke medvedev/<span style="color: red">*</span>erke medvedev$</code>
+
: <code>^erke medvedev/<span style="color: red">*</span>erke medvedev$</code>
  +
  +
=== Expected output ===
  +
This form is analysed with a transducer with the "erke me" form in it:
  +
* <code>$ echo "erke medvedev" | hfst-proc test2.hfst.ol</code>
  +
: <code>^erke/erke$ ^medvedev/medvedev$</code>

Revision as of 07:35, 17 January 2013

In 2011, a bug in how HFST handles words containing spaces was documented and resolved, but it introduced a new bug. This page documents the new behaviour.

text.lexc

Multichar_Symbols

% 

LEXICON Root

erke:erke # ;
erke% me:erke% me # ;
medvedev:medvedev # ;

Compiling

  1. $ hfst-lexc test.lexc -o test.hfst
  2. $ hfst-invert test.hfst | hfst-fst2fst -w -o test.hfst.ol

Testing

Some correctly analysed forms

  • $ echo "erke" | hfst-proc test.hfst.ol
^erke/erke$
  • $ echo "erke me" | hfst-proc test.hfst.ol
^erke me/erke me$
  • $ echo "medvedev" | hfst-proc test.hfst.ol
^medvedev/medvedev$

The incorrectly analysed form

  • $ echo "erke medvedev" | hfst-proc test.hfst.ol
^erke medvedev/*erke medvedev$

Expected output

This form is analysed with a transducer with the "erke me" form in it:

  • $ echo "erke medvedev" | hfst-proc test2.hfst.ol
^erke/erke$ ^medvedev/medvedev$