Курсы машинного перевода для языков России/Session 3

From Apertium
< Курсы машинного перевода для языков России
Revision as of 17:50, 18 December 2011 by Francis Tyers (talk | contribs) (Created page with '{{TOCD}} The aim of this session is to give an overview of the issue of morphological ambiguity, and describe how it is treated in Constraint Grammar. We will give some theory r…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The aim of this session is to give an overview of the issue of morphological ambiguity, and describe how it is treated in Constraint Grammar. We will give some theory regarding different types of morphological ambiguity, and an overview of how they are dealt with using rules. The practice section will involve discovering tagging errors and trying to solve the errors with rules.

Theory

TODO; find new examples

The ambiguity that we are going to cover in this session is morphological ambiguity. This is the ambiguity that comes from a surface form having more than one possible morphological analysis (also referred to as homonymy — samenameness). For example, the Italian word pubblico can be,

  • The verb "pubblicare" in the present indicative tense, first person singular.
  • The masculine noun "pubblico" in the singular number.
  • The adjective "pubblico" inflected for masculine, singular.

The translational ambiguity that pubblico as a noun can be translated to German as Öffentlichkeit, Publikum, Zuschauer, etc. does not come into morphological ambiguity, and thus is not treated in this session.

Morphological ambiguity

There are two principle types of morphological ambiguity. The morphological ambiguity between parts of speech (for example a word that could be either noun or verb) and the morphological ambiguity within parts of speech (for example that a word form can only be a noun, but may be nominative or genitive). Typically the more complex morphology a language has, the higher the ratio of within part-of-speech ambiguity to between part-of-speech ambiguity.

Between parts-of-speech

TODO; find new examples

An example of ambiguity between parts-of-speech is given above, the word pubblico can be a noun, a verb or an adjective. Consider also the frequent ambiguity between adjectives denoting ethnic groups (e.g. in Spanish francés, búlgaro, italiano) and the nouns denoting the languages (el francés, el búlgaro, el italiano).

Within parts-of-speech

TODO; find new examples

For examples of ambiguity within parts-of-speech we can look at the Slavic languages, where there is a frequent syncretism between nominative, accusative and genitive.

  • Příkladem může být kondenzace vody.(cs)
  • Príkladom môže byť kondenzácia vody.(sk)
  • Przykładem może być kondensacja wody.(pl)
  • Primer je lahko kondenzacija vode.(sl)

This is not however exclusive to Slavic languages, Romance languages also exhibit limited within part-of-speech ambiguity, consider the French temps (ambiguous singular and plural), and Irish fear (ambiguous singular nominative, plural genitive).