Difference between revisions of "Turkish and Azerbaijani"

From Apertium
Jump to navigation Jump to search
Line 24: Line 24:
 
#Disambiguation sucks big time: I don't know why it doesn't take içerler as <v><t_aor><3pp> but as a noun. Need to fix ASAP
 
#Disambiguation sucks big time: I don't know why it doesn't take içerler as <v><t_aor><3pp> but as a noun. Need to fix ASAP
 
# Fix punctuation
 
# Fix punctuation
# Proper names in caps -- works in trmorph
 
 
# Subcategorise proper names as far as possible
 
# Subcategorise proper names as far as possible
 
# Interrogative mi
 
# Interrogative mi
 
# Add da/de cnjcoo as d<A>, since it follows vowel harmony
 
# Add da/de cnjcoo as d<A>, since it follows vowel harmony
# subcategorise converb suffixes (See [https://sourceforge.net/mailarchive/forum.php?thread_name=20110422162330.GM2542%40rug.nl&forum_name=apertium-turkic here])
 
# subcategorise verbal noun suffixes (see [https://sourceforge.net/mailarchive/forum.php?thread_name=20110422162330.GM2542%40rug.nl&forum_name=apertium-turkic here])
 
 
# remove apostrophe from case endings after propernames etc.
 
# remove apostrophe from case endings after propernames etc.
   

Revision as of 19:38, 6 August 2011

Source

https://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-tr-az
https://apertium.svn.sourceforge.net/svnroot/apertium/branches/azmorph
https://github.com/coltekin/TRmorph

Todo

Trmorph

  1. Check words in corpus which are analysed, but then have an apostrophe and unknown word after (e.g. case ending) ... perhaps they need to be added as proper nouns. e.g. Okyanusu'nun

Azmorph

  1. Add consonant harmony |--| Azerbaijani, beside having double vowel harmony like turkish, has consonant harmony. Q/K (as well as their devoiced version ğ/y) change according to the precedent vowel.
  2. Add final t voicing in d ex. getmək geDirəm
  3. Remove 040-exception_ben.fst |--| Unlike turkish, azerbaijani doesn't have irregular dative for men and sen (personal pronouns).
  4. Disambiguation sucks big time: I don't know why it doesn't take içerler as <v><t_aor><3pp> but as a noun. Need to fix ASAP
  5. Fix punctuation
  6. Subcategorise proper names as far as possible
  7. Interrogative mi
  8. Add da/de cnjcoo as d<A>, since it follows vowel harmony
  9. remove apostrophe from case endings after propernames etc.

Other

  • add proper support for compound numerals in the bidix
  • write up test cases on the wiki
  • find a girlfriend for the boss
  • the corpus is here, you'll need to clean it.
  • make a script to trim the trmorph lexicon to the bidix... we don't want to analyse anything more than we can translate !
    • Improve the trimming script(s)
  • FIX: yemek <v> does weird thing with the vowel
  • write disambiguation rules
  • Subcategorise proper names as far as possible: One way of doing this would be with Wikipedia -- at least for toponyms.

Overgeneration

  • ^<3s><dir>$ → midir/mudur/müdür/mıdır
  • ^Ye<v><pass><t_cont><3s>$ → Yeniliyor/Yeniyor
  • ^ye<v><t_imp><2s>$ → ye/yesənə
  • ^ölç<v><pass><abil><neg><t_aor><3s>$ → ölçüləbilmir/ölçüləmir
  • ^bir<num><D_sAr><adv>$ → bir'ər/birər

See also

External links

Further reading

  • Vügar Sultanzade (????) Turkish - Azerbaijani Dictionary of Interlingual Homonyms and Paronyms