Difference between revisions of "Bilingual dictionary"
(One intermediate revision by one other user not shown) | |||
Line 5: | Line 5: | ||
The '''bilingual dictionary''' (also known as '''bidix''' and '''transfer lexicon''') contains translation between two languages. It is one of the main five data files in any language pair (see also: [[Apertium New Language Pair HOWTO]]). |
The '''bilingual dictionary''' (also known as '''bidix''' and '''transfer lexicon''') contains translation between two languages. It is one of the main five data files in any language pair (see also: [[Apertium New Language Pair HOWTO]]). |
||
They can be downloaded from |
They can be downloaded from [[Using Git|GitHub]] (https://github.com/apertium). The bilingual dictionary file name are in the form ''apertium-A-B.A-B.dix'' where ''apertium-A-B'' is the name of the [[List of language pairs| language pair]]. For example file ''apertium-af-nl.af-nl.dix'' is the bilingual dictionary for the pair Afrikaans(af) and Dutch(nl). |
||
Below you will find some notes on how to work with bilingual dictionaries (or ''transfer lexica'') in Apertium, they aren't complete but should give an overview of the basic cases and patterns. |
Below you will find some notes on how to work with bilingual dictionaries (or ''transfer lexica'') in Apertium, they aren't complete but should give an overview of the basic cases and patterns. |
||
Line 123: | Line 123: | ||
==See also== |
==See also== |
||
* [[Tips for working on bilingual dictionaries]] |
* [[Tips for working on bilingual dictionaries]] |
||
* [[Generating bilingual dictionary entries]] |
* [[Building dictionaries#Generating bilingual dictionary entries]] |
||
* [[Ideas for Google Summer of Code/Improved bilingual dictionary induction]] |
|||
* [[Ideas for Google Summer of Code/Template-based bilingual dictionary]] |
|||
* [[Converting a bilingual dictionary to Grammatical Framework]] |
|||
[[Category:Documentation in English]] |
[[Category:Documentation in English]] |
Latest revision as of 16:41, 17 March 2018
The bilingual dictionary (also known as bidix and transfer lexicon) contains translation between two languages. It is one of the main five data files in any language pair (see also: Apertium New Language Pair HOWTO).
They can be downloaded from GitHub (https://github.com/apertium). The bilingual dictionary file name are in the form apertium-A-B.A-B.dix where apertium-A-B is the name of the language pair. For example file apertium-af-nl.af-nl.dix is the bilingual dictionary for the pair Afrikaans(af) and Dutch(nl).
Below you will find some notes on how to work with bilingual dictionaries (or transfer lexica) in Apertium, they aren't complete but should give an overview of the basic cases and patterns.
Two genders for one language, one for the other[edit]
Two genders for each language[edit]
If both of the languages involved have two genders, the following patterns are likely:
Nouns[edit]
- Noun has one gender in language and one gender in language and the same number pattern in both
There will probably be one entry in the bilingual dictionary, in either of the following patterns:
<e><p><l>skingomz<s n="n"/><s n="f"/></l><r>radio<s n="n"/><s n="f"/></r></p></e> <e><p><l>skinwel<s n="n"/><s n="m"/></l><r>télévision<s n="n"/><s n="f"/></r></p></e>
Basically, lemma in language on the left and lemma in language on the right. In the first example the gender does not change, but we include it anyway (to make the dictionaries more useful for people who want to re-use them). In the second example the gender changes.
This is the output in lt-expand
format:
skingomz:skingomz<n><f><sg> ←→ radio:radio<n><f><sg> skingomzioù:skingomz<n><f><pl> ←→ radios:radio<n><f><pl> skinwel:skinwel<n><m><sg> ←→ télévision:télévision<n><f><sg> skinwelioù:skinwel<n><m><pl> ←→ télévisions:télévision<n><f><pl>
- Noun has two genders in language and one gender in language and the same number pattern in both
There will probably be three entries in the bilingual dictionary, in the following pattern:
<e r="LR"><p><l>adeiladour<s n="n"/><s n="m"/></l><r>architecte<s n="n"/><s n="mf"/></r></p></e> <e r="LR"><p><l>adeiladour<s n="n"/><s n="f"/></l><r>architecte<s n="n"/><s n="mf"/></r></p></e> <e r="RL"><p><l>adeiladour<s n="n"/><s n="GD"/></l><r>architecte<s n="n"/><s n="mf"/></r></p></e>
The symbol mf
indicates that the word is the same in masculine and feminine genders, this is defined in the monolingual dictionary of the language in question. The symbol GD
stands for "gender to be determined". This pattern basically says, from language translate the masculine and feminine of "adeiladour" to the masculine/feminine of "architecte". From language translate the masculine/feminine "architecte" to "adeiladour" with the gender to be determined by transfer rules.
And following lt-expand
:
adeiladour:adeiladour<n><m><sg> → architecte:architecte<n><mf><sg> adeiladourien:adeiladour<n><m><pl> → architectes:architecte<n><mf><pl> adeiladourez:adeiladour<n><f><sg> → architecte:architecte<n><mf><sg> adeiladourezed:adeiladour<n><f><pl> → architectes:architecte<n><mf><pl> ?:adeiladour<n><GD><sg> ← architecte:architecte<n><mf><sg> ?:adeiladour<n><GD><pl> ← architecte:architecte<n><mf><pl>
- Noun has two genders in language and two genders in language and the same number pattern in both
In some bilingual dictionaries you will find two entries, like:
<e><p><l>studier<s n="n"/><s n="m"/></l><r>étudiant<s n="n"/><s n="m"/></r></p></e> <e><p><l>studier<s n="n"/><s n="f"/></l><r>étudiant<s n="n"/><s n="f"/></r></p></e>
and in others, one entry like:
<e><p><l>studier<s n="n"/></l><r>étudiant<s n="n"/></r></p></e>
There are various disadvantages and advantages to either one, but basically it works the same as in the first example.
- Noun has one gender with singular and plural in language and one gender with singular/plural in language
This follows a similar pattern to the entry where the gender was masculine/feminine. The symbol ND
stands for "number to be determined", and sp
is for singular/plural.
<e r="LR"><p><l>miz<s n="n"/><s n="m"/><s n="sg"/></l><r>mois<s n="n"/><s n="m"/><s n="sp"/></r></p></e> <e r="LR"><p><l>miz<s n="n"/><s n="m"/><s n="pl"/></l><r>mois<s n="n"/><s n="m"/><s n="sp"/></r></p></e> <e r="RL"><p><l>miz<s n="n"/><s n="m"/><s n="ND"/></l><r>mois<s n="n"/><s n="m"/><s n="sp"/></r></p></e>
And with lt-expand
:
miz:miz<n><m><sg> → mois:mois<n><m><sp> mizioù:miz<n><m><pl> → mois:mois<n><m><sp> ?:miz<n><m><ND> ← mois:mois<n><m><sp>
- Noun has two gender with singular and plural in language and one gender with singular/plural in language
This doesn't happen very often. Basically we employ both GD
and ND
and just redouble the entries, one for each possible combination:
<e r="LR"><p><l>con<s n="n"/><s n="m"/><s n="sg"/></l><r>gilipollas<s n="n"/><s n="mf"/><s n="sp"/></r></p></e> <e r="LR"><p><l>con<s n="n"/><s n="m"/><s n="pl"/></l><r>gilipollas<s n="n"/><s n="mf"/><s n="sp"/></r></p></e> <e r="LR"><p><l>con<s n="n"/><s n="f"/><s n="sg"/></l><r>gilipollas<s n="n"/><s n="mf"/><s n="sp"/></r></p></e> <e r="LR"><p><l>con<s n="n"/><s n="f"/><s n="pl"/></l><r>gilipollas<s n="n"/><s n="mf"/><s n="sp"/></r></p></e> <e r="RL"><p><l>con<s n="n"/><s n="GD"/><s n="ND"/></l><r>gilipollas<s n="n"/><s n="mf"/><s n="sp"/></r></p></e>
Here is what the above entry does in lt-expand
format:
con:con<n><m><sg> → gilipollas:gilipollas<n><mf><sp> cons:con<n><m><pl> → gilipollas:gilipollas<n><mf><sp> conne:con<n><f><sg> → gilipollas:gilipollas<n><mf><sp> connes:con<n><f><pl> → gilipollas:gilipollas<n><mf><sp> ?:con<n><GD><ND> ← gilipollas:gilipollas<n><mf><sp>
Adjectives[edit]
See also[edit]
- Tips for working on bilingual dictionaries
- Building dictionaries#Generating bilingual dictionary entries
- Ideas for Google Summer of Code/Improved bilingual dictionary induction
- Ideas for Google Summer of Code/Template-based bilingual dictionary
- Converting a bilingual dictionary to Grammatical Framework