(→postblank / preblank)
(Link to French page)
|Line 1:||Line 1:|
[[Section inconditionnelle|En français]]
Revision as of 12:44, 7 October 2014
An inconditional ('unconditional') section of a dictionary typically contains punctuation, and such things.
The main section of a dictionary works on a longest-match basis.
Inconditional means 'if you see this, stop processing immediately and start reading a new word'. Stop when you reach the end of a possible transduction.
You could say that the "only" difference is that a space is not required to start a new match.
$ echo 23men |apertium -d . en-it-anmor ^23/23<num>$^men/man<n><pl>$^./.<sent>$
It doesn't need the space between 23 and men because numbers are in an 'inconditional' section.
<dictionary> <alphabet>ab</alphabet> <sdefs> <sdef n="aa"/> <sdef n="ab"/> </sdefs> <section id="foo" type="inconditional"> <e><p><l>a</l><r>a<s n="aa"/></r></p></e> <e><p><l>aa</l><r>aa<s n="aa"/></r></p></e> </section> </dictionary> $ echo aaa |lt-proc sample.bin ^aa/aa<aa>$^a/a<aa>$ $ echo aaaa |lt-proc sample.bin ^aa/aa<aa>$^aa/aa<aa>$ $ echo aaaaa |lt-proc sample.bin ^aa/aa<aa>$^aa/aa<aa>$^a/a<aa>$
postblank / preblank
The postblank and preblank sections work exactly like inconditional with respect to how they tokenise the input. The only difference is that anything in a postblank section will make lt-proc output a space after the token (in preblank, before the token).
So if "☃" is in postblank (tagged sent), and "foo" and "bar" are in a regular section (tagged n), then we get:
$ echo 'foo☃bar' | lt-proc analyser.bin ^foo/foo<n>$^☃/☃<sent>$ ^bar/bar<n>$
If "☃" were in preblank, we'd get:
$ echo 'foo☃bar' | lt-proc analyser.bin ^foo/foo<n>$ ^☃/☃<sent>$^bar/bar<n>$
Why is this useful?