Tasks for GCI: Crossing Dictionaries

From Apertium
Revision as of 00:51, 14 November 2010 by Jimregan (talk | contribs) (start writing an explanation on the wiki instead of doing it in an email)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Tasks for GCI: Crossing Dictionaries

Many of our tasks are 'task families', the process is the same, only the languages involved are different. Crossing dictionaries is one such task. There is other information on the wiki pertaining to crossing dictionaries, but I would like to keep this document as self-contained as possible -- if you have a question that isn't answered here, ask on the Talk page, and I will update the page to answer your question.

Firstly, and most importantly, you are not required to know all three languages involved in the crossing. Any knowledge you may have will be helpful, but the intermediate language is only important in a few ways, and ultimately, only the two languages in the expected output are really important.

What is dictionary crossing?

Dictionary crossing, sometimes called triangulation, involves taking each word of one language in a bilingual dictionary, and using its translation in one dictionary as the lookup key in the second.

Let's say that we want to use English-Spanish and Spanish-Romanian (which Apertium has) to create a dictionary for English-Romanian (which Apertium does not have).

As an example, given the English-Spanish entries:

    <e r="LR"><p><l>dog<s n="n"/></l><r>perro<s n="n"/><s n="GD"/></r></p></e>
    <e r="RL"><p><l>dog<s n="n"/></l><r>perro<s n="n"/></r></p></e>

and the Spanish-Romanian entries:

      <e>
        <p>
          <l>perro<s n="n"/></l>
          <r>câine<s n="n"/></r>
        </p>
      </e>

we would ideally like to see the output:

    <e r="LR"><p><l>dog<s n="n"/></l><r>câine<s n="n"/><s n="GD"/></r></p></e>
    <e r="RL"><p><l>dog<s n="n"/></l><r>câine<s n="n"/></r></p></e>

(Now, you would think that 'dog' would be a simple example, but even at this stage, we have to see some transfer details. We really do encourage anyone who is interested in taking on tasks with us to first take on a task around the New Language Pair HOWTO, which will give you some of the practical knowledge needed to perform our other tasks).