Difference between revisions of "Task ideas for Google Code-in/Add words from frequency list"

From Apertium
Jump to navigation Jump to search
(Created page with '==Examples== The paradigms (inflectional classes) will be different depending on the dictionary format and the language in question. When in doubt, ask your mentor for help. ==…')
 
Line 8: Line 8:
 
When using lttoolbox you will also need to find:
 
When using lttoolbox you will also need to find:
   
* the ''stem'' of the word, that is the part onto which inflectional endings are added. For example, the stem for "wolf" is "wol" because the singular is "wol + f" and the plural is "wol + ves".
+
* the ''stem'' of the word, that is the part onto which inflectional endings are added.
  +
** e.g. the stem for "wolf" is "wol" because the singular is "wol + f" and the plural is "wol + ves".
 
* the ''paradigm'' of the word. Paradigms in the <code>.dix</code> file come in <code>pardef</code> elements. Find the one that given your stem generates all the valid surface forms of the lemma.
 
* the ''paradigm'' of the word. Paradigms in the <code>.dix</code> file come in <code>pardef</code> elements. Find the one that given your stem generates all the valid surface forms of the lemma.
   
Line 34: Line 35:
 
|}
 
|}
 
</div>
 
</div>
 
 
   
 
===Using <code>.lexc</code>===
 
===Using <code>.lexc</code>===

Revision as of 17:47, 1 November 2013

Examples

The paradigms (inflectional classes) will be different depending on the dictionary format and the language in question. When in doubt, ask your mentor for help.

Using .dix

See also: Starting a new language with lttoolbox

When using lttoolbox you will also need to find:

  • the stem of the word, that is the part onto which inflectional endings are added.
    • e.g. the stem for "wolf" is "wol" because the singular is "wol + f" and the plural is "wol + ves".
  • the paradigm of the word. Paradigms in the .dix file come in pardef elements. Find the one that given your stem generates all the valid surface forms of the lemma.

If a paradigm for the word does not exist then you will need to add a new one. Ask your mentor for help with this.

Before After
n   ^3570/3570<num>$ ^горад/горад$
n   ^2491/2491<num>$ ^тэрыторыі/тэрыторыя$
n   ^2409/2409<num>$ ^вайны/вайна$
n   ^2316/2316<num>$ ^цэнтр/цэнтр$
 <e lm="горад"><i>горад</i><par n="..."/></e>
 <e lm="тэрыторыя"><i>тэрыторы</i><par n="..."/></e>
 <e lm="вайна"><i>вайн</i><par n="..."/></e>
 <e lm="цэнтр"><i>цэнтр</i><par n="..."/></e>

Using .lexc