Difference between revisions of "User:Firespeaker/Steps for writing a language pair"

From Apertium
Jump to navigation Jump to search
Line 12: Line 12:
 
=== Add some verbs ===
 
=== Add some verbs ===
 
=== Other word classes ===
 
=== Other word classes ===
  +
== Solving more complicated translation problems ==
== Write some transfer rules ==
 
  +
There are many sorts of translation problems where a simple one-to-one mapping in dix won't work.
  +
  +
=== Disambiguation ===
  +
The lowest level of these translation problems is morphological disambiguation. This is where one form has multiple interpretations. These needed to be sorted out before any words are looked up in dix. For example, you have the form енді, which can have the following readings: ен{{tag|n}}{{tag|acc}} "width", ен{{tag|v}}{{tag|iv}}{{tag|ifi}}{{tag|p3}}{{tag|pl}} "they entered", ен{{tag|v}}{{tag|iv}}{{tag|ifi}}{{tag|p3}}{{tag|sg}} "s/he/it entered", енді{{tag|adv}} "now". The morphological analyser, without the help of disambiguation, spits out all these forms, and chooses one at random (for our purposes) to translate through dix. If you get the wrong form, you get weird things like "They entered I know answer" instead of "Now I know the answer".
  +
  +
To disambiguate these readings, you need to make rules based on the grammatical context. E.g., if the next word is a verb, then you can remove the verb reading from the list of possible correct readings; if the word is at the end of the sentence, you can remove the noun from the list of possible correct readings; etc. You can also choose a default reading (in this case, the adverb reading is by far the most common) and select other readings in very specific contexts—e.g., not just in certain grammatical positions, but when they occur with certain other words.
  +
  +
=== Lexical Selection ===
  +
  +
=== Transfer ===
  +
 
== Evaluate ==
 
== Evaluate ==

Revision as of 04:22, 20 May 2013

Document Resources

  • If there are any dictionaries or other resources for the language pair, you should make a list of these.

Comparative Grammar

Transducers

Write a morphological transducer for each language. At first they can be pretty basic—i.e., you don't have to get through all the steps for each one.

Start a bidix

Map some tags

Add some nouns

Add some verbs

Other word classes

Solving more complicated translation problems

There are many sorts of translation problems where a simple one-to-one mapping in dix won't work.

Disambiguation

The lowest level of these translation problems is morphological disambiguation. This is where one form has multiple interpretations. These needed to be sorted out before any words are looked up in dix. For example, you have the form енді, which can have the following readings: ен<n><acc> "width", ен<v><iv><ifi><p3><pl> "they entered", ен<v><iv><ifi><p3><sg> "s/he/it entered", енді<adv> "now". The morphological analyser, without the help of disambiguation, spits out all these forms, and chooses one at random (for our purposes) to translate through dix. If you get the wrong form, you get weird things like "They entered I know answer" instead of "Now I know the answer".

To disambiguate these readings, you need to make rules based on the grammatical context. E.g., if the next word is a verb, then you can remove the verb reading from the list of possible correct readings; if the word is at the end of the sentence, you can remove the noun from the list of possible correct readings; etc. You can also choose a default reading (in this case, the adverb reading is by far the most common) and select other readings in very specific contexts—e.g., not just in certain grammatical positions, but when they occur with certain other words.

Lexical Selection

Transfer

Evaluate