Difference between revisions of "Extract"
Jump to navigation
Jump to search
Line 2: | Line 2: | ||
There are two versions of '''extract''', the first version supports Unicode (although not in paradigm names), the second doesn't support Unicode, but supports a system of constraints. For Apertium use, I recommend the first version. Any constraints can be applied using the [[constraint grammar]]. |
There are two versions of '''extract''', the first version supports Unicode (although not in paradigm names), the second doesn't support Unicode, but supports a system of constraints. For Apertium use, I recommend the first version. Any constraints can be applied using the [[constraint grammar]]. |
||
==Paradigms== |
|||
;Apertium |
|||
<pre> |
|||
<pardef n="wol/f__n"> |
|||
<e> |
|||
<p> |
|||
<l>f</l> |
|||
<r>f<s n="n"/><s n="sg"/> |
|||
</p> |
|||
</e> |
|||
<e> |
|||
<p> |
|||
<l>ves</l> |
|||
<r>f<s n="n"/><s n="pl"/> |
|||
</p> |
|||
</e> |
|||
</pardef> |
|||
</pre> |
|||
;Extract |
|||
<pre> |
|||
paradigm wol_f__n = |
|||
x+"f" |
|||
{ x+"ves" & ~(x+"ing")} ; |
|||
</pre> |
|||
==External links== |
==External links== |
Revision as of 07:31, 30 September 2008
The extract tool is a program for matching word forms (for example from a corpus) to lemmata and paradigms. The paradigms in extract are not the same as Apertium paradigms in that they can contain both "inclusions" and "exclusions" for matching purposes. For example, if you wanted to match nouns but not verbs in English, you might write an extract paradigm saying "root + s", but not "root + ing".
There are two versions of extract, the first version supports Unicode (although not in paradigm names), the second doesn't support Unicode, but supports a system of constraints. For Apertium use, I recommend the first version. Any constraints can be applied using the constraint grammar.
Paradigms
- Apertium
<pardef n="wol/f__n"> <e> <p> <l>f</l> <r>f<s n="n"/><s n="sg"/> </p> </e> <e> <p> <l>ves</l> <r>f<s n="n"/><s n="pl"/> </p> </e> </pardef>
- Extract
paradigm wol_f__n = x+"f" { x+"ves" & ~(x+"ing")} ;