Difference between revisions of "Turkish and Azerbaijani"
Jump to navigation
Jump to search
(correcting) |
|||
(95 intermediate revisions by 13 users not shown) | |||
Line 1: | Line 1: | ||
{{TOCD}} |
{{TOCD}} |
||
==Noun morphology== |
|||
==Source== |
|||
Turkish has several cases: |
|||
absolute, definite-accusative, dative, locative, ablative, genitive |
|||
It also has pronominal clitics. |
|||
Typically these are applied in the following order: |
|||
# plural suffix |
|||
# suffix of possession |
|||
# case-ending |
|||
# personal suffix |
|||
<pre> |
<pre> |
||
https://github.com/apertium/apertium-tur-aze |
|||
kitap for ex. is the stem |
|||
https://github.com/apertium/apertium-aze |
|||
kitap + plural + pronoun |
|||
https://github.com/coltekin/TRmorph |
|||
kitaplar is the "books" |
|||
a noun has five cases |
|||
object direction is the "i case" |
|||
give me that book for ex. |
|||
bana o kitabı ver |
|||
"that book" |
|||
kitabı |
|||
that is directed to object |
|||
from that book = kitaptan |
|||
in that book = kitapta |
|||
"from my book" |
|||
kitab+ım+dan |
|||
"from my books" |
|||
kitap+lar+ım+dan |
|||
</pre> |
</pre> |
||
==Todo== |
|||
==Agglutination case== |
|||
verb= gitmek stem=git |
|||
<pre> |
|||
I'm going = gidiyorum (tr) |
|||
= gidirem (azerbaijani) |
|||
gid+iyor+um (present continous, pr1, turkish) |
|||
gid+ir+em (present continous, pr1, azerbaijani) |
|||
git (lemma) -i -yor (for continous tense) -um (for first personal pronoun) (turkish) |
|||
git (lemma) -i -r(for continous tense) -em (for first personal pronoun) (azerbaijani) |
|||
</pre> |
|||
==Vowel harmony== |
|||
{{see-also|Vowel harmony}} |
|||
Both Turkish and Azerbaijani, along with most other Turkic languages exhibit vowel harmony. See the following table of inflections for the word pivə, "beer" in Azerbaijani. Underscore indicates a vowel that has been "harmonised". |
|||
{|class=wikitable |
|||
! Azerbaijani !! Gloss |
|||
|- |
|||
|pivə || beer |
|||
|- |
|||
|pivəler || beers |
|||
|- |
|||
|pivəlerim || my beers |
|||
|- |
|||
|pivədən || from beer |
|||
|- |
|||
|pivəl<u>ə</u>rdən || from beers |
|||
|} |
|||
This will pose a problem for both analysis and generation of word forms. In analysis it is possible to ''overanlayse'' words, e.g. say have a paradigm for "a → e" for the plural ending ''-ler'', which would accept both ''-ler'' and ''-lar''. Then we would analyse both the correct form: ''biralar'' and an incorrect form ''biraler''. This causes problems because of ambiguity (we shouldn't be analysing non-existant words!), especially on short words. It remains to be seen if this ambiguity will be too great. |
|||
One example of ambiguity would be with the word for "book", ''kitab''. The form ''kitabı'' means "his book", but the form ''kitabi'' (or ''kitabî'') means "bookish". This should not be too much of a problem as the two are different parts of speech and should be taken care of in the tagging stage. |
|||
The other problem is generation, we do not currently have a way in apertium to enforce vowel harmony, it may be possible to use an alternate spell-checker to do this (e.g. <code>hunspell</code> has specialised algorithms for both Azerbaijani and Turkish, or possible we could use post-gen or write a new post-gen module for this. |
|||
==Test case== |
|||
*Turkish: biram var. |
|||
*Azerbaijani: pivəm var |
|||
beer+p1 have |
|||
I have a beer. |
|||
*Turkish: iki biram var |
|||
*Azerbijani: iki pivəm var |
|||
two beer+p1 have |
|||
I have two beers |
|||
===Noun=== |
|||
===Trmorph=== |
|||
* <code>abs</code> — absolute |
|||
* <code>dac</code> — definite-accusative |
|||
* <code>dat</code> — dative |
|||
* <code>abl</code> — ablative |
|||
* <code>loc</code> — locative |
|||
* <code>gen</code> — genitive |
|||
* <code>com</code> — comitative |
|||
# Check words in corpus which are analysed, but then have an apostrophe and unknown word after (e.g. case ending) ... perhaps they need to be added as proper nouns. e.g. Okyanusu'nun |
|||
Underlined denotes the affix. |
|||
=== |
===Azmorph=== |
||
# Add consonant harmony |--| Azerbaijani, beside having double vowel harmony like turkish, has consonant harmony. Q/K (as well as their devoiced version ğ/y) change according to the precedent vowel. |
|||
# Add final t voicing in d ex. getmək geDirəm |
|||
#Remove 040-exception_ben.fst |--| Unlike turkish, azerbaijani doesn't have irregular dative for men and sen (personal pronouns). |
|||
#Disambiguation sucks big time: I don't know why it doesn't take içerler as <v><t_aor><3pp> but as a noun. Need to fix ASAP |
|||
# Fix punctuation |
|||
# Subcategorise proper names as far as possible |
|||
# Interrogative mi |
|||
# Add da/de cnjcoo as d<A>, since it follows vowel harmony |
|||
# remove apostrophe from case endings after propernames etc. |
|||
===Other=== |
|||
{|class=wikitable |
|||
! person !! n.sg.abs !! n.sg.dac !! n.sg.dat !! n.sg.loc !! n.sg.abl !! n.sg.gen !! n.sg.com |
|||
|- |
|||
|''none''|| bira || bira<u>'''y'''ı</u> || bira<u>'''y'''a</u> || bira<u>da</u> || bira<u>dan</u> || bira<u>'''n'''ın</u> || bira<u>yla</u> |
|||
|- |
|||
| p1.sg || bira<u>m</u> || bira<u>mı</u> || bira<u>ma</u> || bira<u>mda</u>||bira<u>mdan</u> || bira<u>mın</u> || bira<u>mla</u> |
|||
|- |
|||
| p2.sg || bira<u>n</u> || bira<u>'''n'''ı</u> || bira<u>'''n'''a</u> || bira<u>nda</u> || bira<u>ndan</u> || bira<u>'''n'''ın</u> || bira<u>nla</u> |
|||
|- |
|||
| p3.sg || bira<u>sı</u> || bira<u>'''s'''ını</u> || bira<u>'''s'''ına</u> || bira<u>sında</u> || bira<u>sından</u> || bira<u>'''s'''ının</u> || bira<u>sıyla</u> |
|||
|- |
|||
| p1.pl || bira<u>mız</u> || bira<u>mızı</u> || bira<u>mıza</u> || bira<u>mızda</u> || bira<u>mızdan</u> || bira<u>mızın</u> || bira<u>mızla</u> |
|||
|- |
|||
| p2.pl || bira<u>nız</u> || bira<u>nızı</u> || bira<u>nıza</u> || bira<u>nızda</u> || bira<u>nızdan</u> || bira<u>nızın</u> || bira<u>nızla</u> |
|||
|- |
|||
| p3.pl || bira<u>sı</u> || bira<u>sını</u> || bira<u>sına</u> || bira<u>sında</u> || bira<u>sından</u> || bira<u>sının</u> || bira<u>sıyla</u> |
|||
|- |
|||
| || || || || || || <!-- nothing here --> |
|||
|- |
|||
! person !! n.pl.abs !! n.pl.dac !! n.pl.dat !! n.pl.loc !! n.pl.abl !! n.pl.gen !! n.pl.com |
|||
|- |
|||
|''none''|| bira<u>lar</u> || bira<u>ları</u> || bira<u>lara</u> || bira<u>larda</u> || bira<u>lardan</u> || bira<u>ların</u> || bira<u>larla</u> |
|||
|- |
|||
| p1.sg || bira<u>larım</u> || bira<u>larımı</u> || bira<u>larıma</u> || bira<u>larımda</u> || bira<u>larımda</u> || bira<u>larımın</u> || bira<u>mla</u> |
|||
|- |
|||
| p2.sg || bira<u>ların</u> || bira<u>larını</u> || bira<u>larına</u> || bira<u>larında</u> || bira<u>larından</u> || bira<u>larının</u> || bira<u>nla</u> |
|||
|- |
|||
| p3.sg || bira<u>ları</u> || bira<u>larını</u> || bira<u>larına</u> || bira<u>larında</u> || bira<u>larından</u> || bira<u>larının</u> || bira<u>larınla</u> |
|||
|- |
|||
| p1.pl || bira<u>larımız</u> || bira<u>larımızı</u> || bira<u>larımıza</u> || bira<u>larımızda</u> ||bira<u>larımızdan</u> || bira<u>larımızın</u> || bira<u>larmızla</u> |
|||
|- |
|||
| p2.pl || bira<u>larınız</u> || bira<u>larınızı</u> || bira<u>larınıza</u> || bira<u>larınızda</u> || bira<u>larınızdan</u> || bira<u>larınızın</u> || bira<u>nızla</u> |
|||
|- |
|||
| p3.pl || bira<u>ları</u> || bira<u>larını</u> || bira<u>larına</u> || bira<u>larında </u> ||bira<u>larından</u> || bira<u>larının</u> || bira<u>larıyla</u> |
|||
|- |
|||
| || || || || || || <!-- nothing here --> |
|||
|} |
|||
* add proper support for compound numerals in the bidix |
|||
The consonants with black are only there to combine the vowels next to them, they don't belong this form. If the stem (the noun in this case) ends with consonant, those extra letters will fall. For example if the word is tabut (which ends with the letter t) n.sg.dac without person case will be tabut'''u''' (u is because of the harmonization). |
|||
* write up test cases on the wiki |
|||
* find a girlfriend for the boss |
|||
* the corpus is [http://elx.dlsi.ua.es/~fran/SETIMES/source/tr-en/setimes.tr here], you'll need to clean it. |
|||
* <s>make a script to trim the trmorph lexicon to the bidix... we don't want to analyse anything more than we can translate ! </s> |
|||
** '''Improve the trimming script(s)''' |
|||
* FIX: yemek <v> does weird thing with the vowel |
|||
* '''write disambiguation rules''' |
|||
* '''Subcategorise proper names as far as possible''': One way of doing this would be with Wikipedia -- at least for toponyms. |
|||
=== |
===Overgeneration=== |
||
{|class=wikitable |
|||
! person !! n.sg.abs !! n.sg.dac !! n.sg.dat !! n.sg.loc !! n.sg.abl !! n.sg.gen !! n.sg.com |
|||
|- |
|||
|''none''|| pivə || || || || || || |
|||
|- |
|||
| p1.sg || pivəm || || || || || || |
|||
|- |
|||
| p2.sg || || || || || || || |
|||
|- |
|||
| p3.sg || || || || || || || |
|||
|- |
|||
| p1.pl || || || || || || || |
|||
|- |
|||
| p2.pl || || || || || || || |
|||
|- |
|||
| p3.pl || || || || || || || |
|||
|- |
|||
| || || || || || || <!-- nothing here --> |
|||
|- |
|||
! person !! n.pl.abs !! n.pl.dac !! n.pl.dat !! n.pl.loc !! n.pl.abl !! n.pl.gen !! n.pl.com |
|||
|- |
|||
|''none''|| pivəler || || || || || || |
|||
|- |
|||
| p1.sg || pivəlerim || || || || || || |
|||
|- |
|||
| p2.sg || || || || || || || |
|||
|- |
|||
| p3.sg || || || || || || || |
|||
|- |
|||
| p1.pl || || || || || || || |
|||
|- |
|||
| p2.pl || || || || || || || |
|||
|- |
|||
| p3.pl || || || || || || || |
|||
|- |
|||
| || || || || || || <!-- nothing here --> |
|||
|} |
|||
* <code>^<q><3s><dir>$</code> → midir/mudur/müdür/mıdır |
|||
====Comparison==== |
|||
* <code>^Ye<v><pass><t_cont><3s>$</code> → Yeniliyor/Yeniyor |
|||
* <code>^ye<v><t_imp><2s>$</code> → ye/yesənə |
|||
* <code>^ölç<v><pass><abil><neg><t_aor><3s>$</code> → ölçüləbilmir/ölçüləmir |
|||
* <code> ^bir<num><D_sAr><adv>$</code> → bir'ər/birər |
|||
==See also== |
|||
{|class=wikitable |
|||
! Turkish !! Azerbaijani !! Gloss !! Symbols |
|||
|- |
|||
| bira || pivə || beer || <code>n.sg</code> |
|||
|- |
|||
| biralar || pivəler || beers || <code>n.pl</code> |
|||
|- |
|||
| biram || pivəm || my beer || <code>n.sg.p1</code> |
|||
|- |
|||
| biralarım || pivəlerim || my beers || <code>n.pl.p1</code> |
|||
|- |
|||
| biradan || pivədən || from the beer || <code>n.sg.abl</code> |
|||
|- |
|||
| biralardan || pivələrdən || from the beers || <code>n.pl.abl</code> |
|||
|- |
|||
| biramdan || pivəmdən || from my beer || <code>n.sg.p1.abl</code> |
|||
|- |
|||
| biralarımdan || pivlərimdən || from my beers || <code>n.pl.p1.abl</code> |
|||
|} |
|||
* [[Turkish]] |
|||
===Verb=== |
|||
* [[/Pending tests|Pending tests]] |
|||
* [[/Regression tests|Regression tests]] |
|||
==External links== |
|||
{|class=wikitable |
|||
! Turkish !! Azerbaijani !! Gloss |
|||
|- |
|||
| içerim || içirəm || I drink |
|||
|- |
|||
| içersin || içirsən || You drink |
|||
|- |
|||
| içer || içir || He drinks |
|||
|- |
|||
| içer || içir || She drinks |
|||
|- |
|||
| içer || içir || It drinks |
|||
|- |
|||
| içerler || içirlər || You (pl.) drink |
|||
|- |
|||
| içeriz || içirik || We drink |
|||
|- |
|||
| içer || içir || They drink |
|||
|} |
|||
* [http://www.tdk.org.tr/lehceler/Default.aspx Pan-Turkic dictionary] |
|||
==Examples== |
|||
==Further reading== |
|||
;Turkish |
|||
Bütün insanlar hür, haysiyet ve haklar bakımından eşit doğarlar. Akıl ve vicdana sahiptirler ve birbirlerine karşı kardeşlik zihniyeti ile hareket etmelidirler. |
|||
* Vügar Sultanzade (????) ''Turkish - Azerbaijani Dictionary of Interlingual Homonyms and Paronyms'' |
|||
;Azerbaijani |
|||
Bütün insanlar ləyaqət və hüquqlarına görə azad və bərabər doğulurlar. Onarın şüuralrı və vicdanları var və bir-birlərinə mübasibətdə qardaşlıq runhunda davranmalıdırlar. |
|||
[[Category:Turkish to Azerbaijani]] |
|||
;Azerbaijani (''turkified'') |
|||
Bütün insanlar <u>azadlıq</u>, ləyaqət və haqlarına görə bərabər doğulurlar. Onların <u>ağılları</u> və vicdanları var və onlar bir-birlərinə mübasibətdə qardaşlıq ruhunda davranmalıdırlar. |
Latest revision as of 18:27, 8 March 2018
Source[edit]
https://github.com/apertium/apertium-tur-aze https://github.com/apertium/apertium-aze https://github.com/coltekin/TRmorph
Todo[edit]
Trmorph[edit]
- Check words in corpus which are analysed, but then have an apostrophe and unknown word after (e.g. case ending) ... perhaps they need to be added as proper nouns. e.g. Okyanusu'nun
Azmorph[edit]
- Add consonant harmony |--| Azerbaijani, beside having double vowel harmony like turkish, has consonant harmony. Q/K (as well as their devoiced version ğ/y) change according to the precedent vowel.
- Add final t voicing in d ex. getmək geDirəm
- Remove 040-exception_ben.fst |--| Unlike turkish, azerbaijani doesn't have irregular dative for men and sen (personal pronouns).
- Disambiguation sucks big time: I don't know why it doesn't take içerler as <v><t_aor><3pp> but as a noun. Need to fix ASAP
- Fix punctuation
- Subcategorise proper names as far as possible
- Interrogative mi
- Add da/de cnjcoo as d<A>, since it follows vowel harmony
- remove apostrophe from case endings after propernames etc.
Other[edit]
- add proper support for compound numerals in the bidix
- write up test cases on the wiki
- find a girlfriend for the boss
- the corpus is here, you'll need to clean it.
make a script to trim the trmorph lexicon to the bidix... we don't want to analyse anything more than we can translate !- Improve the trimming script(s)
- FIX: yemek <v> does weird thing with the vowel
- write disambiguation rules
- Subcategorise proper names as far as possible: One way of doing this would be with Wikipedia -- at least for toponyms.
Overgeneration[edit]
^
→ midir/mudur/müdür/mıdır<3s><dir>$
^Ye<v><pass><t_cont><3s>$
→ Yeniliyor/Yeniyor^ye<v><t_imp><2s>$
→ ye/yesənə^ölç<v><pass><abil><neg><t_aor><3s>$
→ ölçüləbilmir/ölçüləmir^bir<num><D_sAr><adv>$
→ bir'ər/birər
See also[edit]
External links[edit]
Further reading[edit]
- Vügar Sultanzade (????) Turkish - Azerbaijani Dictionary of Interlingual Homonyms and Paronyms