Vowel harmony
Both Turkish and Azerbaijani, along with most other Turkic languages exhibit vowel harmony. See the following table of inflections for the word pivə, "beer" in Azerbaijani. Underscore indicates a vowel that has been "harmonised".
Azerbaijani | Gloss |
---|---|
pivə | beer |
pivəler | beers |
pivəlerim | my beers |
pivədən | from beer |
pivələrdən | from beers |
This will pose a problem for both analysis and generation of word forms. In analysis it is possible to overanlayse words, e.g. say have a paradigm for "a → e" for the plural ending -ler, which would accept both -ler and -lar. Then we would analyse both the correct form: biralar and an incorrect form biraler. This causes problems because of ambiguity (we shouldn't be analysing non-existant words!), especially on short words. It remains to be seen if this ambiguity will be too great.
One example of ambiguity would be with the word for "book", kitab. The form kitabı means "his book", but the form kitabi (or kitabî) means "bookish". This should not be too much of a problem as the two are different parts of speech and should be taken care of in the tagging stage.
The other problem is generation, we do not currently have a way in apertium to enforce vowel harmony, it may be possible to use an alternate spell-checker to do this (e.g. hunspell
has specialised algorithms for both Azerbaijani and Turkish, or possible we could use post-gen or write a new post-gen module for this.