Difference between revisions of "Turkish and Azerbaijani"
Jump to navigation
Jump to search
(→Other) |
|||
Line 11: | Line 11: | ||
# subcategorise converb suffixes (See [https://sourceforge.net/mailarchive/forum.php?thread_name=20110422162330.GM2542%40rug.nl&forum_name=apertium-turkic here]) |
# subcategorise converb suffixes (See [https://sourceforge.net/mailarchive/forum.php?thread_name=20110422162330.GM2542%40rug.nl&forum_name=apertium-turkic here]) |
||
# subcategorise verbal noun suffixes (see [https://sourceforge.net/mailarchive/forum.php?thread_name=20110422162330.GM2542%40rug.nl&forum_name=apertium-turkic here]) |
# subcategorise verbal noun suffixes (see [https://sourceforge.net/mailarchive/forum.php?thread_name=20110422162330.GM2542%40rug.nl&forum_name=apertium-turkic here]) |
||
# Check words in corpus which are analysed, but then have an apostrophe and unknown word after (e.g. case ending) ... perhaps they need to be added as proper nouns. e.g. Okyanusu'nun |
# Check words in corpus which are analysed, but then have an apostrophe and unknown word after (e.g. case ending) ... perhaps they need to be added as proper nouns. e.g. Okyanusu'nun |
||
# find a way of exporting trmorph for apertium into two files, one with lexemes (that we specify -- e.g. the ones in the bilingual dictionary), and the other with the morphophonology. |
|||
===Azmorph=== |
===Azmorph=== |
Revision as of 09:32, 25 April 2011
Todo
Trmorph
- Fix punctuation -- 3271696
Proper names in caps- Subcategorise proper names as far as possible
- Spaces in words in trmorph
- subcategorise converb suffixes (See here)
- subcategorise verbal noun suffixes (see here)
- Check words in corpus which are analysed, but then have an apostrophe and unknown word after (e.g. case ending) ... perhaps they need to be added as proper nouns. e.g. Okyanusu'nun
- find a way of exporting trmorph for apertium into two files, one with lexemes (that we specify -- e.g. the ones in the bilingual dictionary), and the other with the morphophonology.
Azmorph
- Add consonant harmony |--| Azerbaijani, beside having double vowel harmony like turkish, has consonant harmony. Q/K (as well as their devoiced version ğ/y) change according to the precedent vowel.
- Remove 040-exception_ben.fst |--| Unlike turkish, azerbaijani doesn't have irregular dative for men and sen (personal pronouns).
- Disambiguation sucks big time: I don't know why it doesn't take içerler as <v><t_aor><3pp> but as a noun. Need to fix ASAP
- Fix punctuation
- Proper names in caps -- works in trmorph
- Subcategorise proper names as far as possible
- Interrogative mi
- Add da/de cnjcoo as d<A>, since it follows vowel harmony
- subcategorise converb suffixes (See here)
- subcategorise verbal noun suffixes (see here)
- remove apostrophe from case endings after propernames etc.
Other
- Finish adding closed categories
- Fix punctuation
Numerals in the bidix from corpus- add proper support for compound numerals in the bidix
- Finish adding nouns from the CSV list
- try some sentences/paragraphs and fix disambiguation/transfer errors
- write a script which generates stem lists for azmorph from the bilingual dictionary -- only stems which can be added automatically (some of them have e.g. <k> final or something)
- continue adding words from the missing list
- write up test cases on the wiki
- find a girlfriend for the boss
- the corpus is here, you'll need to clean it.
- make a script to trim the trmorph lexicon to the bidix... we don't want to analyse anything more than we can translate !
- FIX: yemek <v> does weird thing with the vowel
Overgeneration
^ye<v><t_imp><2p>$
→ yesenize/yiyin/yiyiniz^şükret<v><cv>$
→ şükredincə/şükredip/şükrediyor/şükredəcək/şükredəli/...^
→ midir/mudur/müdür/mıdır<3s><dir>$
^Ye<v><pass><t_cont><3s>$
→ Yeniliyor/Yeniyor^ye<v><t_imp><2s>$
→ ye/yesənə^ölç<v><pass><abil><neg><t_aor><3s>$
→ ölçüləbilmir/ölçüləmir^bir<num><D_sAr><adv>$
→ bir'ər/birər
See also
Further reading
- Vügar Sultanzade (????) Turkish - Azerbaijani Dictionary of Interlingual Homonyms and Paronyms