Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

Bilingual dictionary

From Apertium
(Difference between revisions)
Jump to: navigation, search
(See also)
(Nouns: adding a comment)
Line 19: Line 19:
 
</pre>
 
</pre>
   
Basically, lemma in language <math>x</math> on the left and lemma in language <math>y</math> on the right. In the first example the gender does not change, but we include it anyway (to make the dictionaries more useful for people who want to re-use them). In the second example the gender changes.
+
Basically, lemma in language <math>x</math> on the left and lemma in language <math>y</math> on the right. In the first example the gender does not change, but we include it anyway (to make the dictionaries more useful for people who want to re-use them ''and may be aso to know the gender of the article associated to the name (not useful for english or esperanto) [[User:Bech|Bech]] 22:21, 14 September 2011 (UTC)'').
  +
In the second example the gender changes.
   
 
This is the output in <code>lt-expand</code> format:
 
This is the output in <code>lt-expand</code> format:

Revision as of 23:21, 14 September 2011

Contents

Below you will find some notes on how to work with bilingual dictionaries (or transfer lexica) in Apertium, they aren't complete but should give an overview of the basic cases and patterns.

Two genders for one language, one for the other

Two genders for each language

If both of the languages involved have two genders, the following patterns are likely:

Nouns

Noun has one gender in language x and one gender in language y and the same number pattern in both

There will probably be one entry in the bilingual dictionary, in either of the following patterns:

    <e><p><l>skingomz<s n="n"/><s n="f"/></l><r>radio<s n="n"/><s n="f"/></r></p></e>
    <e><p><l>skinwel<s n="n"/><s n="m"/></l><r>télévision<s n="n"/><s n="f"/></r></p></e> 

Basically, lemma in language x on the left and lemma in language y on the right. In the first example the gender does not change, but we include it anyway (to make the dictionaries more useful for people who want to re-use them and may be aso to know the gender of the article associated to the name (not useful for english or esperanto) Bech 22:21, 14 September 2011 (UTC)). In the second example the gender changes.

This is the output in lt-expand format:

    skingomz:skingomz<n><f><sg>      ←→   radio:radio<n><f><sg>
    skingomzioù:skingomz<n><f><pl>   ←→   radios:radio<n><f><pl>
    skinwel:skinwel<n><m><sg>        ←→   télévision:télévision<n><f><sg>
    skinwelioù:skinwel<n><m><pl>     ←→   télévisions:télévision<n><f><pl>
Noun has two genders in language x and one gender in language y and the same number pattern in both

There will probably be three entries in the bilingual dictionary, in the following pattern:

    <e r="LR"><p><l>adeiladour<s n="n"/><s n="m"/></l><r>architecte<s n="n"/><s n="mf"/></r></p></e>
    <e r="LR"><p><l>adeiladour<s n="n"/><s n="f"/></l><r>architecte<s n="n"/><s n="mf"/></r></p></e>
    <e r="RL"><p><l>adeiladour<s n="n"/><s n="GD"/></l><r>architecte<s n="n"/><s n="mf"/></r></p></e>

The symbol mf indicates that the word is the same in masculine and feminine genders, this is defined in the monolingual dictionary of the language in question. The symbol GD stands for "gender to be determined". This pattern basically says, from language x \rightarrow y translate the masculine and feminine of "adeiladour" to the masculine/feminine of "architecte". From language y \rightarrow x translate the masculine/feminine "architecte" to "adeiladour" with the gender to be determined by transfer rules.

And following lt-expand:

    adeiladour:adeiladour<n><m><sg>       →  architecte:architecte<n><mf><sg>
    adeiladourien:adeiladour<n><m><pl>    →  architectes:architecte<n><mf><pl>
    adeiladourez:adeiladour<n><f><sg>     →  architecte:architecte<n><mf><sg>
    adeiladourezed:adeiladour<n><f><pl>   →  architectes:architecte<n><mf><pl>
    ?:adeiladour<n><GD><sg>               ←  architecte:architecte<n><mf><sg>
    ?:adeiladour<n><GD><pl>               ←  architecte:architecte<n><mf><pl>
Noun has two genders in language x and two genders in language y and the same number pattern in both

In some bilingual dictionaries you will find two entries, like:

    <e><p><l>studier<s n="n"/><s n="m"/></l><r>étudiant<s n="n"/><s n="m"/></r></p></e>
    <e><p><l>studier<s n="n"/><s n="f"/></l><r>étudiant<s n="n"/><s n="f"/></r></p></e>

and in others, one entry like:

    <e><p><l>studier<s n="n"/></l><r>étudiant<s n="n"/></r></p></e>

There are various disadvantages and advantages to either one, but basically it works the same as in the first example.

Noun has one gender with singular and plural in language x and one gender with singular/plural in language y

This follows a similar pattern to the entry where the gender was masculine/feminine. The symbol ND stands for "number to be determined", and sp is for singular/plural.

    <e r="LR"><p><l>miz<s n="n"/><s n="m"/><s n="sg"/></l><r>mois<s n="n"/><s n="m"/><s n="sp"/></r></p></e>
    <e r="LR"><p><l>miz<s n="n"/><s n="m"/><s n="pl"/></l><r>mois<s n="n"/><s n="m"/><s n="sp"/></r></p></e>
    <e r="RL"><p><l>miz<s n="n"/><s n="m"/><s n="ND"/></l><r>mois<s n="n"/><s n="m"/><s n="sp"/></r></p></e>

And with lt-expand:

    miz:miz<n><m><sg>      →   mois:mois<n><m><sp>
    mizioù:miz<n><m><pl>   →   mois:mois<n><m><sp>
    ?:miz<n><m><ND>        ←   mois:mois<n><m><sp>
Noun has two gender with singular and plural in language x and one gender with singular/plural in language y

This doesn't happen very often. Basically we employ both GD and ND and just redouble the entries, one for each possible combination:

    <e r="LR"><p><l>con<s n="n"/><s n="m"/><s n="sg"/></l><r>gilipollas<s n="n"/><s n="mf"/><s n="sp"/></r></p></e>
    <e r="LR"><p><l>con<s n="n"/><s n="m"/><s n="pl"/></l><r>gilipollas<s n="n"/><s n="mf"/><s n="sp"/></r></p></e>
    <e r="LR"><p><l>con<s n="n"/><s n="f"/><s n="sg"/></l><r>gilipollas<s n="n"/><s n="mf"/><s n="sp"/></r></p></e>
    <e r="LR"><p><l>con<s n="n"/><s n="f"/><s n="pl"/></l><r>gilipollas<s n="n"/><s n="mf"/><s n="sp"/></r></p></e>
    <e r="RL"><p><l>con<s n="n"/><s n="GD"/><s n="ND"/></l><r>gilipollas<s n="n"/><s n="mf"/><s n="sp"/></r></p></e>

Here is what the above entry does in lt-expand format:

    con:con<n><m><sg>     →  gilipollas:gilipollas<n><mf><sp>
    cons:con<n><m><pl>    →  gilipollas:gilipollas<n><mf><sp>
    conne:con<n><f><sg>   →  gilipollas:gilipollas<n><mf><sp>
    connes:con<n><f><pl>  →  gilipollas:gilipollas<n><mf><sp>
    ?:con<n><GD><ND>      ←  gilipollas:gilipollas<n><mf><sp> 

Adjectives

See also

Personal tools