Difference between revisions of "Turkish and Azerbaijani"

From Apertium
Jump to navigation Jump to search
Line 17: Line 17:
 
#Disambiguation sucks big time: I don't know why it doesn't take içerler as <v><t_aor><3pp> but as a noun. Need to fix ASAP
 
#Disambiguation sucks big time: I don't know why it doesn't take içerler as <v><t_aor><3pp> but as a noun. Need to fix ASAP
 
# Fix punctuation
 
# Fix punctuation
# Proper names in caps
+
# Proper names in caps -- works in trmorph
 
# Subcategorise proper names as far as possible
 
# Subcategorise proper names as far as possible
 
# Interrogative mi
 
# Interrogative mi

Revision as of 21:30, 22 April 2011

Todo

Trmorph

  1. Fix punctuation -- 3271696
  2. Proper names in caps
  3. Subcategorise proper names as far as possible
  4. Spaces in words in trmorph
  5. subcategorise converb suffixes (See here)
  6. subcategorise verbal noun suffixes (see here)

Azmorph

  1. Add consonant harmony |--| Azerbaijani, beside having double vowel harmony like turkish, has consonant harmony. Q/K (as well as their devoiced version ğ/y) change according to the precedent vowel.
  2. Remove 040-exception_ben.fst |--| Unlike turkish, azerbaijani doesn't have irregular dative for men and sen (personal pronouns).
  3. Disambiguation sucks big time: I don't know why it doesn't take içerler as <v><t_aor><3pp> but as a noun. Need to fix ASAP
  4. Fix punctuation
  5. Proper names in caps -- works in trmorph
  6. Subcategorise proper names as far as possible
  7. Interrogative mi
  8. Add da/de cnjcoo as d<A>, since it follows vowel harmony
  9. subcategorise converb suffixes (See here)
  10. subcategorise verbal noun suffixes (see here)

Other

  • Finish adding closed categories
  • Fix punctuation
  • Numerals in the bidix
  • Finish adding nouns from the CSV list
  • try some sentences/paragraphs and fix disambiguation/transfer errors
  • write a script which generates stem lists for azmorph from the bilingual dictionary -- only stems which can be added automatically (some of them have e.g. <k> final or something)
  • continue adding words from the missing list
  • write up test cases on the wiki
  • find a girlfriend for the boss
  • the corpus is here, you'll need to clean it.
  • make a script to trim the trmorph lexicon to the bidix... we don't want to analyse anything more than we can translate !

Overgeneration

  • ^ye<v><t_imp><2p>$ → yesenize/yiyin/yiyiniz
  • ^şükret<v><cv>$ → şükredincə/şükredip/şükrediyor/şükredəcək/şükredəli/...
  • ^<3s><dir>$ → midir/mudur/müdür/mıdır
  • ^Ye<v><pass><t_cont><3s>$ → Yeniliyor/Yeniyor
  • ^ye<v><t_imp><2s>$ → ye/yesənə
  • ^ölç<v><pass><abil><neg><t_aor><3s>$ → ölçüləbilmir/ölçüləmir
  • ^bir<num><D_sAr><adv>$ → bir'ər/birər

Examples

Turkish

Avrupa, Afrika'nın kuzeyinde, Asya'nın batısında ve Atlas Okyanusu'nun doğusunda bulunan kıta.

Azerbaijani

Avropa, Afrikanın şimalında, Asiyanın qərbində, Atlantik okeanının şərqində yerləşən qitə.

An example to show that there is not a widely need of word reorder.

Turkish

Bilişim ve telekomünikasyon teknolojilerinin asıl altyapı göstergelerinde biri olan telefon iletişimi ülke halkının en çok istifade ettiği iletişim aracı olmaya devam etmektedir.

Azerbaijani

İnformasiya və kommunikasiya texnologiyalarının əsas infrastruktur göstəricilərindən biri olan telefon rabitəsi ölkə əhalisinin ən çox istifadə etdiyi rabitə vasitəsi olmakda davam edir.

English

The telephone communication which is one the essential infrasturtucre indicators of information and telecommunication, continues to be one of the most utilized communication device by the people of the country.

See also

Further reading

  • Vügar Sultanzade (????) Turkish - Azerbaijani Dictionary of Interlingual Homonyms and Paronyms