Курсы машинного перевода для языков России/Session 3
The aim of this session is to give an overview of the issue of morphological ambiguity, and describe how it is treated in Constraint Grammar. We will give some theory regarding different types of morphological ambiguity, and an overview of how they are dealt with using rules. The practice section will involve discovering tagging errors and trying to solve the errors with rules.
Theory
TODO; find new examples
The ambiguity that we are going to cover in this session is morphological ambiguity. This is the ambiguity that comes from a surface form having more than one possible morphological analysis (also referred to as homonymy — samenameness). For example, the Italian word pubblico can be,
- The verb "pubblicare" in the present indicative tense, first person singular.
- The masculine noun "pubblico" in the singular number.
- The adjective "pubblico" inflected for masculine, singular.
The translational ambiguity that pubblico as a noun can be translated to German as Öffentlichkeit, Publikum, Zuschauer, etc. does not come into morphological ambiguity, and thus is not treated in this session.
Morphological ambiguity
There are two principle types of morphological ambiguity. The morphological ambiguity between parts of speech (for example a word that could be either noun or verb) and the morphological ambiguity within parts of speech (for example that a word form can only be a noun, but may be nominative or genitive). Typically the more complex morphology a language has, the higher the ratio of within part-of-speech ambiguity to between part-of-speech ambiguity.
Between parts-of-speech
TODO; find new examples
An example of ambiguity between parts-of-speech is given above, the word pubblico can be a noun, a verb or an adjective. Consider also the frequent ambiguity between adjectives denoting ethnic groups (e.g. in Spanish francés, búlgaro, italiano) and the nouns denoting the languages (el francés, el búlgaro, el italiano).
Within parts-of-speech
TODO; find new examples
For examples of ambiguity within parts-of-speech we can look at the Slavic languages, where there is a frequent syncretism between nominative, accusative and genitive.
- Příkladem může být kondenzace vody.(
cs
) - Príkladom môže byť kondenzácia vody.(
sk
) - Przykładem może być kondensacja wody.(
pl
) - Primer je lahko kondenzacija vode.(
sl
)
This is not however exclusive to Slavic languages, Romance languages also exhibit limited within part-of-speech ambiguity, consider the French temps (ambiguous singular and plural), and Irish fear (ambiguous singular nominative, plural genitive).