Matching unknown words
Jump to navigation
Jump to search
From time to time, the question comes up of how to match unknown words in transfer. In interchunk, this is quite easy, as each unknown word has the chunk lemma 'unknown', but it's un- or under-documented how this should be done using apertium-transfer.
The answer is to use a cat-item
with an empty tags
attribute:
<cat-item tags=""/>
An example:
<?xml version="1.0"?> <transfer default="chunk"> <section-def-cats> <def-cat n="any"> <cat-item tags="*"/> </def-cat> <def-cat n="unk"> <cat-item tags=""/> </def-cat> </section-def-cats> <section-def-attrs> </section-def-attrs> <section-def-vars> </section-def-vars> <section-rules> <rule> <pattern> <pattern-item n="any"/> </pattern> <action> <out> <chunk name="any"> <tags> <tag><clip pos="1" side="tl" part="tags"/></tag> </tags> <lu><clip pos="1" side="tl" part="whole"/></lu> </chunk> </out> </action> </rule> <rule> <pattern> <pattern-item n="unk"/> </pattern> <action> <out> <chunk name="unk"> <tags> <tag><clip pos="1" side="tl" part="tags"/></tag> </tags> <lu><clip pos="1" side="tl" part="whole"/></lu> </chunk> </out> </action> </rule> </section-rules> </transfer>
Note that tags
must be present (otherwise, the opening brace of the chunk is omitted).
Use:
echo '^foo<n><sg>$ ^*bar$'|apertium-transfer -n unk.t1x unk.bin ^any<n><sg>{^foo<n><sg>$}$ ^unk{^*bar$}$