Difference between revisions of "Turkish and Azerbaijani"

From Apertium
Jump to navigation Jump to search
 
(121 intermediate revisions by 14 users not shown)
Line 1: Line 1:
 
{{TOCD}}
 
{{TOCD}}
==Noun morphology==
 
   
  +
==Source==
Turkish has several cases:
 
 
absolute, definite-accusative, dative, locative, ablative, genitive
 
 
It also has pronominal clitics.
 
 
Typically these are applied in the following order:
 
 
# plural suffix
 
# suffix of possession
 
# case-ending
 
# personal suffix
 
   
 
<pre>
 
<pre>
  +
https://github.com/apertium/apertium-tur-aze
kitap for ex. is the stem
 
  +
https://github.com/apertium/apertium-aze
kitap + plural + pronoun
 
  +
https://github.com/coltekin/TRmorph
kitaplar is the "books"
 
 
a noun has five cases
 
 
object direction is the "i case"
 
 
give me that book for ex.
 
bana o kitabı ver
 
 
"that book"
 
kitabı
 
 
that is directed to object
 
 
from that book = kitaptan
 
in that book = kitapta
 
 
"from my book"
 
kitab+ım+dan
 
 
"from my books"
 
kitap+lar+ım+dan
 
 
 
</pre>
 
</pre>
   
  +
==Todo==
==Agglutination case==
 
 
verb= gitmek stem=git
 
 
<pre>
 
I'm going = gidiyorum (tr)
 
= gidirem (azerbaijani)
 
 
gid+iyor+um (present continous, pr1, turkish)
 
gid+ir+em (present continous, pr1, azerbaijani)
 
 
git (lemma) -i -yor (for continous tense) -um (for first personal pronoun) (turkish)
 
git (lemma) -i -r(for continous tense) -em (for first personal pronoun) (azerbaijani)
 
</pre>
 
 
 
==Test case==
 
 
*Turkish: biram var.
 
*Azerbaijani: pivəm var
 
   
  +
===Trmorph===
beer+p1 have
 
   
  +
# Check words in corpus which are analysed, but then have an apostrophe and unknown word after (e.g. case ending) ... perhaps they need to be added as proper nouns. e.g. Okyanusu'nun
I have a beer.
 
   
  +
===Azmorph===
*Turkish: iki biram var
 
  +
# Add consonant harmony |--| Azerbaijani, beside having double vowel harmony like turkish, has consonant harmony. Q/K (as well as their devoiced version ğ/y) change according to the precedent vowel.
*Azerbijani: iki pivəm var
 
  +
# Add final t voicing in d ex. getmək geDirəm
  +
#Remove 040-exception_ben.fst |--| Unlike turkish, azerbaijani doesn't have irregular dative for men and sen (personal pronouns).
  +
#Disambiguation sucks big time: I don't know why it doesn't take içerler as <v><t_aor><3pp> but as a noun. Need to fix ASAP
  +
# Fix punctuation
  +
# Subcategorise proper names as far as possible
  +
# Interrogative mi
  +
# Add da/de cnjcoo as d<A>, since it follows vowel harmony
  +
# remove apostrophe from case endings after propernames etc.
   
  +
===Other===
two beer+p1 have
 
   
  +
* add proper support for compound numerals in the bidix
I have two beers
 
  +
* write up test cases on the wiki
  +
* find a girlfriend for the boss
  +
* the corpus is [http://elx.dlsi.ua.es/~fran/SETIMES/source/tr-en/setimes.tr here], you'll need to clean it.
  +
* <s>make a script to trim the trmorph lexicon to the bidix... we don't want to analyse anything more than we can translate ! </s>
  +
** '''Improve the trimming script(s)'''
  +
* FIX: yemek <v> does weird thing with the vowel
  +
* '''write disambiguation rules'''
  +
* '''Subcategorise proper names as far as possible''': One way of doing this would be with Wikipedia -- at least for toponyms.
   
===Noun===
+
===Overgeneration===
   
* <code>abs</code> &mdash; absolute
 
* <code>dac</code> &mdash; definite-accusative
 
* <code>dat</code> &mdash; dative
 
* <code>abl</code> &mdash; ablative
 
* <code>loc</code> &mdash; locative
 
* <code>gen</code> &mdash; genitive
 
   
  +
* <code>^<q><3s><dir>$</code> → midir/mudur/müdür/mıdır
====Turkish====
 
  +
* <code>^Ye<v><pass><t_cont><3s>$</code> → Yeniliyor/Yeniyor
  +
* <code>^ye<v><t_imp><2s>$</code> → ye/yesənə
  +
* <code>^ölç<v><pass><abil><neg><t_aor><3s>$</code> → ölçüləbilmir/ölçüləmir
  +
* <code> ^bir<num><D_sAr><adv>$</code> → bir'ər/birər
   
  +
==See also==
{|class=wikitable
 
! person !! n.sg.abs !! n.sg.dac !! n.sg.dat !! n.sg.loc !! n.sg.abl !! n.sg.gen
 
|-
 
|''none''|| bira || || || || ||
 
|-
 
| p1.sg || biram || || || || ||
 
|-
 
| p2.sg || || || || || ||
 
|-
 
| p3.sg || || || || || ||
 
|-
 
| p1.pl || || || || || ||
 
|-
 
| p2.pl || || || || || ||
 
|-
 
| p3.pl || || || || || ||
 
|-
 
| || || || || || || <!-- nothing here -->
 
|-
 
! person !! n.pl.abs !! n.pl.dac !! n.pl.dat !! n.pl.loc !! n.pl.abl !! n.pl.gen
 
|-
 
|''none''|| biralar || || || || ||
 
|-
 
| p1.sg || biralarım || || || || ||
 
|-
 
| p2.sg || || || || || ||
 
|-
 
| p3.sg || || || || || ||
 
|-
 
| p1.pl || || || || || ||
 
|-
 
| p2.pl || || || || || ||
 
|-
 
| p3.pl || || || || || ||
 
|-
 
| || || || || || || <!-- nothing here -->
 
|}
 
   
  +
* [[Turkish]]
====Azerbaijani====
 
  +
* [[/Pending tests|Pending tests]]
  +
* [[/Regression tests|Regression tests]]
   
  +
==External links==
{|class=wikitable
 
! person !! n.sg.abs !! n.sg.dac !! n.sg.dat !! n.sg.loc !! n.sg.abl !! n.sg.gen
 
|-
 
|''none''|| pivə || || || || ||
 
|-
 
| p1.sg || pivəm || || || || ||
 
|-
 
| p2.sg || || || || || ||
 
|-
 
| p3.sg || || || || || ||
 
|-
 
| p1.pl || || || || || ||
 
|-
 
| p2.pl || || || || || ||
 
|-
 
| p3.pl || || || || || ||
 
|-
 
| || || || || || || <!-- nothing here -->
 
|-
 
! person !! n.pl.abs !! n.pl.dac !! n.pl.dat !! n.pl.loc !! n.pl.abl !! n.pl.gen
 
|-
 
|''none''|| pivəler || || || || ||
 
|-
 
| p1.sg || pivəlerim || || || || ||
 
|-
 
| p2.sg || || || || || ||
 
|-
 
| p3.sg || || || || || ||
 
|-
 
| p1.pl || || || || || ||
 
|-
 
| p2.pl || || || || || ||
 
|-
 
| p3.pl || || || || || ||
 
|-
 
| || || || || || || <!-- nothing here -->
 
|}
 
   
  +
* [http://www.tdk.org.tr/lehceler/Default.aspx Pan-Turkic dictionary]
====Comparison====
 
   
  +
==Further reading==
{|class=wikitable
 
! Turkish !! Azerbaijani !! Gloss !! Symbols
 
|-
 
| bira || pivə || beer || <code>n.sg</code>
 
|-
 
| biralar || pivəler || beers || <code>n.pl</code>
 
|-
 
| biram || pivəm || my beer || <code>n.sg.p1</code>
 
|-
 
| biralarım || pivəlerim || my beers || <code>n.pl.p1</code>
 
|-
 
| biradan || pivədən || from the beer || <code>n.sg.fromcase</code>
 
|-
 
| biralardan || pivələrdən || from the beers || <code>n.pl.fromcase</code>
 
|-
 
| biramdan || pivəmdən || from my beer || <code>n.sg.p1.fromcase</code>
 
|-
 
| biralarımdan || pivlərimdən || from my beers || <code>n.pl.p1.fromcase</code>
 
|}
 
   
  +
* Vügar Sultanzade (????) ''Turkish - Azerbaijani Dictionary of Interlingual Homonyms and Paronyms''
===Verb===
 
   
  +
[[Category:Turkish to Azerbaijani]]
{|class=wikitable
 
! Turkish !! Azerbaijani !! Gloss
 
|-
 
| var || var || I have
 
|-
 
| || || You have
 
|-
 
| || || He has
 
|-
 
| || || She has
 
|-
 
| || || It has
 
|-
 
| || || You (pl.) have
 
|-
 
| || || We have
 
|-
 
| || || They have
 
|}
 

Latest revision as of 18:27, 8 March 2018

Source[edit]

https://github.com/apertium/apertium-tur-aze
https://github.com/apertium/apertium-aze
https://github.com/coltekin/TRmorph

Todo[edit]

Trmorph[edit]

  1. Check words in corpus which are analysed, but then have an apostrophe and unknown word after (e.g. case ending) ... perhaps they need to be added as proper nouns. e.g. Okyanusu'nun

Azmorph[edit]

  1. Add consonant harmony |--| Azerbaijani, beside having double vowel harmony like turkish, has consonant harmony. Q/K (as well as their devoiced version ğ/y) change according to the precedent vowel.
  2. Add final t voicing in d ex. getmək geDirəm
  3. Remove 040-exception_ben.fst |--| Unlike turkish, azerbaijani doesn't have irregular dative for men and sen (personal pronouns).
  4. Disambiguation sucks big time: I don't know why it doesn't take içerler as <v><t_aor><3pp> but as a noun. Need to fix ASAP
  5. Fix punctuation
  6. Subcategorise proper names as far as possible
  7. Interrogative mi
  8. Add da/de cnjcoo as d<A>, since it follows vowel harmony
  9. remove apostrophe from case endings after propernames etc.

Other[edit]

  • add proper support for compound numerals in the bidix
  • write up test cases on the wiki
  • find a girlfriend for the boss
  • the corpus is here, you'll need to clean it.
  • make a script to trim the trmorph lexicon to the bidix... we don't want to analyse anything more than we can translate !
    • Improve the trimming script(s)
  • FIX: yemek <v> does weird thing with the vowel
  • write disambiguation rules
  • Subcategorise proper names as far as possible: One way of doing this would be with Wikipedia -- at least for toponyms.

Overgeneration[edit]

  • ^<3s><dir>$ → midir/mudur/müdür/mıdır
  • ^Ye<v><pass><t_cont><3s>$ → Yeniliyor/Yeniyor
  • ^ye<v><t_imp><2s>$ → ye/yesənə
  • ^ölç<v><pass><abil><neg><t_aor><3s>$ → ölçüləbilmir/ölçüləmir
  • ^bir<num><D_sAr><adv>$ → bir'ər/birər

See also[edit]

External links[edit]

Further reading[edit]

  • Vügar Sultanzade (????) Turkish - Azerbaijani Dictionary of Interlingual Homonyms and Paronyms