Difference between revisions of "Inconditional section"

From Apertium
Jump to navigation Jump to search
Line 40: Line 40:


== postblank / preblank ==
== postblank / preblank ==
Note that ''postblank'' and ''preblank'' sections work exactly like ''inconditional'' with respect to how they tokenise the input. The only difference is that anything in a postblank section will make lt-proc output a space after the token (in preblank, before the token). So if "☃" is in postblank (tagged sent), and "foo" and "bar" are in a regular section (tagged n), then we get:
The ''postblank'' and ''preblank'' sections work exactly like ''inconditional'' with respect to how they tokenise the input.
The only difference is that anything in a postblank section will make lt-proc output a space after the token (in preblank, before the token).

So if "☃" is in postblank (tagged sent), and "foo" and "bar" are in a regular section (tagged n), then we get:
<pre>
<pre>
$ echo 'foo☃bar' | lt-proc analyser.bin
$ echo 'foo☃bar' | lt-proc analyser.bin

Revision as of 10:39, 15 August 2013

inconditional

An inconditional ('unconditional') section of a dictionary typically contains punctuation, and such things.

The main section of a dictionary works on a longest-match basis.

Inconditional means 'if you see this, stop processing immediately and start reading a new word'. Stop when you reach the end of a possible transduction.

You could say that the "only" difference is that a space is not required to start a new match.

$ echo 23men |apertium -d . en-it-anmor
^23/23<num>$^men/man<n><pl>$^./.<sent>$

It doesn't need the space between 23 and men because numbers are in an 'inconditional' section.

<dictionary>
  <alphabet>ab</alphabet>
  <sdefs>
    <sdef n="aa"/>
    <sdef n="ab"/>
  </sdefs>
  <section id="foo" type="inconditional">
    <e><p><l>a</l><r>a<s n="aa"/></r></p></e>
    <e><p><l>aa</l><r>aa<s n="aa"/></r></p></e>
  </section>
</dictionary>

$ echo aaa |lt-proc  sample.bin
^aa/aa<aa>$^a/a<aa>$

$ echo aaaa |lt-proc  sample.bin
^aa/aa<aa>$^aa/aa<aa>$

$ echo aaaaa |lt-proc  sample.bin
^aa/aa<aa>$^aa/aa<aa>$^a/a<aa>$

postblank / preblank

The postblank and preblank sections work exactly like inconditional with respect to how they tokenise the input. The only difference is that anything in a postblank section will make lt-proc output a space after the token (in preblank, before the token).

So if "☃" is in postblank (tagged sent), and "foo" and "bar" are in a regular section (tagged n), then we get:

$ echo 'foo☃bar' | lt-proc analyser.bin
^foo/foo<n>$^☃/☃<sent>$ ^bar/bar<n>$

If "☃" were in preblank, we'd get:

$ echo 'foo☃bar' | lt-proc analyser.bin
^foo/foo<n>$ ^☃/☃<sent>$^bar/bar<n>$

Why is this useful?

TODO

See also