Difference between revisions of "Turkish and Azerbaijani"
Jump to navigation
Jump to search
(36 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
{{TOCD}} |
{{TOCD}} |
||
==Source== |
|||
<pre> |
|||
https://github.com/apertium/apertium-tur-aze |
|||
https://github.com/apertium/apertium-aze |
|||
https://github.com/coltekin/TRmorph |
|||
</pre> |
|||
==Todo== |
==Todo== |
||
Line 5: | Line 13: | ||
===Trmorph=== |
===Trmorph=== |
||
# Check words in corpus which are analysed, but then have an apostrophe and unknown word after (e.g. case ending) ... perhaps they need to be added as proper nouns. e.g. Okyanusu'nun |
|||
# Fix punctuation -- [https://sourceforge.net/tracker/?func=detail&aid=3271696&group_id=224521&atid=1061990 3271696] |
|||
# <s>Proper names in caps</s> |
|||
# Subcategorise proper names as far as possible |
|||
# Spaces in words in trmorph |
|||
===Azmorph=== |
===Azmorph=== |
||
# Add consonant harmony |--| Azerbaijani, beside having double vowel harmony like turkish, has consonant harmony. Q/K (as well as their devoiced version ğ/y) change according to the precedent vowel. |
# Add consonant harmony |--| Azerbaijani, beside having double vowel harmony like turkish, has consonant harmony. Q/K (as well as their devoiced version ğ/y) change according to the precedent vowel. |
||
# Add final t voicing in d ex. getmək geDirəm |
|||
#Remove 040-exception_ben.fst |--| Unlike turkish, azerbaijani doesn't have irregular dative for men and sen (personal pronouns). |
#Remove 040-exception_ben.fst |--| Unlike turkish, azerbaijani doesn't have irregular dative for men and sen (personal pronouns). |
||
#Disambiguation sucks big time: I don't know why it doesn't take içerler as <v><t_aor><3pp> but as a noun. Need to fix ASAP |
#Disambiguation sucks big time: I don't know why it doesn't take içerler as <v><t_aor><3pp> but as a noun. Need to fix ASAP |
||
# Fix punctuation |
# Fix punctuation |
||
# Proper names in caps |
|||
# Subcategorise proper names as far as possible |
# Subcategorise proper names as far as possible |
||
# Interrogative mi |
# Interrogative mi |
||
# Add da/de cnjcoo as d<A>, since it follows vowel harmony |
# Add da/de cnjcoo as d<A>, since it follows vowel harmony |
||
# remove apostrophe from case endings after propernames etc. |
|||
===Other=== |
===Other=== |
||
* add proper support for compound numerals in the bidix |
|||
* Finish adding closed categories |
|||
* Fix punctuation |
|||
* Numerals in the bidix |
|||
* Finish adding nouns from the CSV list |
|||
* try some sentences/paragraphs and fix disambiguation/transfer errors |
|||
* write a script which generates stem lists for azmorph from the bilingual dictionary -- only stems which can be added automatically (some of them have e.g. <k> final or something) |
|||
* continue adding words from the missing list |
|||
* write up test cases on the wiki |
* write up test cases on the wiki |
||
* find a girlfriend for the boss |
* find a girlfriend for the boss |
||
* the corpus is [http://elx.dlsi.ua.es/~fran/SETIMES/source/tr-en/setimes.tr here], you'll need to clean it. |
* the corpus is [http://elx.dlsi.ua.es/~fran/SETIMES/source/tr-en/setimes.tr here], you'll need to clean it. |
||
* make a script to trim the trmorph lexicon to the bidix... we don't want to analyse anything more than we can translate ! |
* <s>make a script to trim the trmorph lexicon to the bidix... we don't want to analyse anything more than we can translate ! </s> |
||
** '''Improve the trimming script(s)''' |
|||
* FIX: yemek <v> does weird thing with the vowel |
|||
* '''write disambiguation rules''' |
|||
* '''Subcategorise proper names as far as possible''': One way of doing this would be with Wikipedia -- at least for toponyms. |
|||
===Overgeneration=== |
|||
==Examples== |
|||
;Turkish |
|||
Avrupa, Afrika'nın kuzeyinde, Asya'nın batısında ve Atlas Okyanusu'nun doğusunda bulunan kıta. |
|||
;Azerbaijani |
|||
Avropa, Afrikanın şimalında, Asiyanın qərbində, Atlantik okeanının şərqində yerləşən qitə. |
|||
;Turkish |
|||
Bütün insanlar hür, haysiyet ve haklar bakımından eşit doğarlar. Akıl ve vicdana sahiptirler ve birbirlerine karşı kardeşlik zihniyeti ile hareket etmelidirler. |
|||
;Azerbaijani |
|||
Bütün insanlar ləyaqət və hüquqlarına görə azad və bərabər doğulurlar. Onların şüuralrı və vicdanları var və bir-birlərinə münasibətdə qardaşlıq runhunda davranmalıdırlar. |
|||
;Azerbaijani (''turkified'') |
|||
Bütün insanlar <u>azadlıq</u>, ləyaqət və haqlarına görə bərabər doğulurlar. Onların <u>ağılları</u> və vicdanları var və onlar bir-birlərinə münasibətdə qardaşlıq ruhunda davranmalıdırlar. |
|||
An example to show that there is not a widely need of word reorder. |
|||
;Turkish |
|||
Bilişim ve telekomünikasyon teknolojilerinin asıl altyapı göstergelerinde biri olan telefon iletişimi ülke halkının en çok istifade ettiği iletişim aracı olmaya devam etmektedir. |
|||
;Azerbaijani |
|||
İnformasiya və kommunikasiya texnologiyalarının əsas infrastruktur göstəricilərindən biri olan telefon rabitəsi ölkə əhalisinin ən çox istifadə etdiyi rabitə vasitəsi olmakda davam edir. |
|||
;English |
|||
The telephone communication which is one the essential infrasturtucre indicators of information and telecommunication, continues to be one of the most utilized communication device by the people of the country. |
|||
* <code>^<q><3s><dir>$</code> → midir/mudur/müdür/mıdır |
|||
* <code>^Ye<v><pass><t_cont><3s>$</code> → Yeniliyor/Yeniyor |
|||
* <code>^ye<v><t_imp><2s>$</code> → ye/yesənə |
|||
* <code>^ölç<v><pass><abil><neg><t_aor><3s>$</code> → ölçüləbilmir/ölçüləmir |
|||
* <code> ^bir<num><D_sAr><adv>$</code> → bir'ər/birər |
|||
==See also== |
==See also== |
||
Line 73: | Line 52: | ||
* [[/Pending tests|Pending tests]] |
* [[/Pending tests|Pending tests]] |
||
* [[/Regression tests|Regression tests]] |
* [[/Regression tests|Regression tests]] |
||
==External links== |
|||
* [http://www.tdk.org.tr/lehceler/Default.aspx Pan-Turkic dictionary] |
|||
==Further reading== |
==Further reading== |
Latest revision as of 18:27, 8 March 2018
Source[edit]
https://github.com/apertium/apertium-tur-aze https://github.com/apertium/apertium-aze https://github.com/coltekin/TRmorph
Todo[edit]
Trmorph[edit]
- Check words in corpus which are analysed, but then have an apostrophe and unknown word after (e.g. case ending) ... perhaps they need to be added as proper nouns. e.g. Okyanusu'nun
Azmorph[edit]
- Add consonant harmony |--| Azerbaijani, beside having double vowel harmony like turkish, has consonant harmony. Q/K (as well as their devoiced version ğ/y) change according to the precedent vowel.
- Add final t voicing in d ex. getmək geDirəm
- Remove 040-exception_ben.fst |--| Unlike turkish, azerbaijani doesn't have irregular dative for men and sen (personal pronouns).
- Disambiguation sucks big time: I don't know why it doesn't take içerler as <v><t_aor><3pp> but as a noun. Need to fix ASAP
- Fix punctuation
- Subcategorise proper names as far as possible
- Interrogative mi
- Add da/de cnjcoo as d<A>, since it follows vowel harmony
- remove apostrophe from case endings after propernames etc.
Other[edit]
- add proper support for compound numerals in the bidix
- write up test cases on the wiki
- find a girlfriend for the boss
- the corpus is here, you'll need to clean it.
make a script to trim the trmorph lexicon to the bidix... we don't want to analyse anything more than we can translate !- Improve the trimming script(s)
- FIX: yemek <v> does weird thing with the vowel
- write disambiguation rules
- Subcategorise proper names as far as possible: One way of doing this would be with Wikipedia -- at least for toponyms.
Overgeneration[edit]
^
→ midir/mudur/müdür/mıdır<3s><dir>$
^Ye<v><pass><t_cont><3s>$
→ Yeniliyor/Yeniyor^ye<v><t_imp><2s>$
→ ye/yesənə^ölç<v><pass><abil><neg><t_aor><3s>$
→ ölçüləbilmir/ölçüləmir^bir<num><D_sAr><adv>$
→ bir'ər/birər
See also[edit]
External links[edit]
Further reading[edit]
- Vügar Sultanzade (????) Turkish - Azerbaijani Dictionary of Interlingual Homonyms and Paronyms