Difference between revisions of "User:Firespeaker/Steps for writing a language pair"
Firespeaker (talk | contribs) |
Firespeaker (talk | contribs) |
||
Line 12: | Line 12: | ||
=== Add some verbs === |
=== Add some verbs === |
||
=== Other word classes === |
=== Other word classes === |
||
== Solving more complicated translation problems == |
|||
== Write some transfer rules == |
|||
There are many sorts of translation problems where a simple one-to-one mapping in dix won't work. |
|||
=== Disambiguation === |
|||
The lowest level of these translation problems is morphological disambiguation. This is where one form has multiple interpretations. These needed to be sorted out before any words are looked up in dix. For example, you have the form енді, which can have the following readings: ен{{tag|n}}{{tag|acc}} "width", ен{{tag|v}}{{tag|iv}}{{tag|ifi}}{{tag|p3}}{{tag|pl}} "they entered", ен{{tag|v}}{{tag|iv}}{{tag|ifi}}{{tag|p3}}{{tag|sg}} "s/he/it entered", енді{{tag|adv}} "now". The morphological analyser, without the help of disambiguation, spits out all these forms, and chooses one at random (for our purposes) to translate through dix. If you get the wrong form, you get weird things like "They entered I know answer" instead of "Now I know the answer". |
|||
To disambiguate these readings, you need to make rules based on the grammatical context. E.g., if the next word is a verb, then you can remove the verb reading from the list of possible correct readings; if the word is at the end of the sentence, you can remove the noun from the list of possible correct readings; etc. You can also choose a default reading (in this case, the adverb reading is by far the most common) and select other readings in very specific contexts—e.g., not just in certain grammatical positions, but when they occur with certain other words. |
|||
=== Lexical Selection === |
|||
=== Transfer === |
|||
== Evaluate == |
== Evaluate == |
Revision as of 04:22, 20 May 2013
Contents
Document Resources
- If there are any dictionaries or other resources for the language pair, you should make a list of these.
Comparative Grammar
Transducers
Write a morphological transducer for each language. At first they can be pretty basic—i.e., you don't have to get through all the steps for each one.
Start a bidix
Map some tags
Add some nouns
Add some verbs
Other word classes
Solving more complicated translation problems
There are many sorts of translation problems where a simple one-to-one mapping in dix won't work.
Disambiguation
The lowest level of these translation problems is morphological disambiguation. This is where one form has multiple interpretations. These needed to be sorted out before any words are looked up in dix. For example, you have the form енді, which can have the following readings: ен<n>
<acc>
"width", ен<v>
<iv>
<ifi>
<p3>
<pl>
"they entered", ен<v>
<iv>
<ifi>
<p3>
<sg>
"s/he/it entered", енді<adv>
"now". The morphological analyser, without the help of disambiguation, spits out all these forms, and chooses one at random (for our purposes) to translate through dix. If you get the wrong form, you get weird things like "They entered I know answer" instead of "Now I know the answer".
To disambiguate these readings, you need to make rules based on the grammatical context. E.g., if the next word is a verb, then you can remove the verb reading from the list of possible correct readings; if the word is at the end of the sentence, you can remove the noun from the list of possible correct readings; etc. You can also choose a default reading (in this case, the adverb reading is by far the most common) and select other readings in very specific contexts—e.g., not just in certain grammatical positions, but when they occur with certain other words.