Matching unknown words

From Apertium
Revision as of 13:32, 21 December 2011 by Jimregan (talk | contribs) (Created page with 'From time to time, the question comes up of how to match unknown words in transfer. In interchunk, this is quite easy, as each unknown word has the chunk lemma 'unknown', but it'…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

From time to time, the question comes up of how to match unknown words in transfer. In interchunk, this is quite easy, as each unknown word has the chunk lemma 'unknown', but it's un- or under-documented how this should be done using apertium-transfer.

The answer is to use a cat-item with an empty tags attribute:

     <cat-item tags=""/>

An example:

<?xml version="1.0"?>
<transfer default="chunk">
 <section-def-cats>
   <def-cat n="any">
     <cat-item tags="*"/>
   </def-cat>

   <def-cat n="unk">
     <cat-item tags=""/>
   </def-cat>

 </section-def-cats>

 <section-def-attrs>
 </section-def-attrs>

 <section-def-vars>
 </section-def-vars>

 <section-rules>
   <rule>
     <pattern>
       <pattern-item n="any"/>
     </pattern>
     <action>
       <out>
         <chunk name="any">
           <tags>
             <tag><clip pos="1" side="tl" part="tags"/></tag>
           </tags>
           <lu><clip pos="1" side="tl" part="whole"/></lu>
         </chunk>
       </out>
     </action>
   </rule>
   <rule>
     <pattern>
       <pattern-item n="unk"/>
     </pattern>
     <action>
       <out>
         <chunk name="unk">
           <tags>
             <tag><clip pos="1" side="tl" part="tags"/></tag>
           </tags>
           <lu><clip pos="1" side="tl" part="whole"/></lu>
         </chunk>
       </out>
     </action>
   </rule>
 </section-rules>
</transfer>

Note that tags must be present (otherwise, the opening brace of the chunk is omitted).

Use:

echo '^foo<n><sg>$ ^*bar$'|apertium-transfer -n unk.t1x unk.bin 
^any<n><sg>{^foo<n><sg>$}$ ^unk{^*bar$}$