Difference between revisions of "Matching unknown words"
Jump to navigation
Jump to search
(Created page with 'From time to time, the question comes up of how to match unknown words in transfer. In interchunk, this is quite easy, as each unknown word has the chunk lemma 'unknown', but it'…') |
(cat) |
||
Line 72: | Line 72: | ||
^any<n><sg>{^foo<n><sg>$}$ ^unk{^*bar$}$ |
^any<n><sg>{^foo<n><sg>$}$ ^unk{^*bar$}$ |
||
</pre> |
</pre> |
||
[[Category:Writing transfer rules]] |
Revision as of 13:56, 21 December 2011
From time to time, the question comes up of how to match unknown words in transfer. In interchunk, this is quite easy, as each unknown word has the chunk lemma 'unknown', but it's un- or under-documented how this should be done using apertium-transfer.
The answer is to use a cat-item
with an empty tags
attribute:
<cat-item tags=""/>
An example:
<?xml version="1.0"?> <transfer default="chunk"> <section-def-cats> <def-cat n="any"> <cat-item tags="*"/> </def-cat> <def-cat n="unk"> <cat-item tags=""/> </def-cat> </section-def-cats> <section-def-attrs> </section-def-attrs> <section-def-vars> </section-def-vars> <section-rules> <rule> <pattern> <pattern-item n="any"/> </pattern> <action> <out> <chunk name="any"> <tags> <tag><clip pos="1" side="tl" part="tags"/></tag> </tags> <lu><clip pos="1" side="tl" part="whole"/></lu> </chunk> </out> </action> </rule> <rule> <pattern> <pattern-item n="unk"/> </pattern> <action> <out> <chunk name="unk"> <tags> <tag><clip pos="1" side="tl" part="tags"/></tag> </tags> <lu><clip pos="1" side="tl" part="whole"/></lu> </chunk> </out> </action> </rule> </section-rules> </transfer>
Note that tags
must be present (otherwise, the opening brace of the chunk is omitted).
Use:
echo '^foo<n><sg>$ ^*bar$'|apertium-transfer -n unk.t1x unk.bin ^any<n><sg>{^foo<n><sg>$}$ ^unk{^*bar$}$