Difference between revisions of "Matching unknown words"
Jump to navigation
Jump to search
(cat) |
(Category:Documentation in English) |
||
Line 74: | Line 74: | ||
[[Category:Writing transfer rules]] |
[[Category:Writing transfer rules]] |
||
[[Category:Documentation in English]] |
Latest revision as of 11:06, 24 March 2012
From time to time, the question comes up of how to match unknown words in transfer. In interchunk, this is quite easy, as each unknown word has the chunk lemma 'unknown', but it's un- or under-documented how this should be done using apertium-transfer.
The answer is to use a cat-item
with an empty tags
attribute:
<cat-item tags=""/>
An example:
<?xml version="1.0"?> <transfer default="chunk"> <section-def-cats> <def-cat n="any"> <cat-item tags="*"/> </def-cat> <def-cat n="unk"> <cat-item tags=""/> </def-cat> </section-def-cats> <section-def-attrs> </section-def-attrs> <section-def-vars> </section-def-vars> <section-rules> <rule> <pattern> <pattern-item n="any"/> </pattern> <action> <out> <chunk name="any"> <tags> <tag><clip pos="1" side="tl" part="tags"/></tag> </tags> <lu><clip pos="1" side="tl" part="whole"/></lu> </chunk> </out> </action> </rule> <rule> <pattern> <pattern-item n="unk"/> </pattern> <action> <out> <chunk name="unk"> <tags> <tag><clip pos="1" side="tl" part="tags"/></tag> </tags> <lu><clip pos="1" side="tl" part="whole"/></lu> </chunk> </out> </action> </rule> </section-rules> </transfer>
Note that tags
must be present (otherwise, the opening brace of the chunk is omitted).
Use:
echo '^foo<n><sg>$ ^*bar$'|apertium-transfer -n unk.t1x unk.bin ^any<n><sg>{^foo<n><sg>$}$ ^unk{^*bar$}$