Difference between revisions of "Kazakh and Tatar"
		
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
		
		
		
		
		
	
| Firespeaker (talk | contribs) | Firespeaker (talk | contribs)  | ||
| Line 47: | Line 47: | ||
| # <s>(kaz) *Назарбаевтың</s> (<tt>r40594</tt>) | # <s>(kaz) *Назарбаевтың</s> (<tt>r40594</tt>) | ||
| # АКШ-*тың  НАТО-*ның | # АКШ-*тың  НАТО-*ның | ||
| :: This is a problem with lexc, not twol —[[User:Firespeaker|Firespeaker]] 06:15, 20 August 2012 (UTC) | |||
| # <s>(kaz) *организмдер / организм<n><pl><nom> = организмдар</s> (<tt>r40597</tt>) | # <s>(kaz) *организмдер / организм<n><pl><nom> = организмдар</s> (<tt>r40597</tt>) | ||
| # (kaz) процесс, процесі,  | # (kaz) процесс, процесі/процессі, процесінің/процессінің | ||
| # (kaz) автомобиль<n><attr>, *автомобильдер // автомобиль<n><pl><nom> = автомобильлер | # (kaz) автомобиль<n><attr>, *автомобильдер // автомобиль<n><pl><nom> = автомобильлер | ||
| :: Wait, seriously?? | |||
| :: As far as I can tell, автомобильдер is the most common form.  The form автомобилдер also seems to be used, but doesn't look super formal, and автомобильлер seems to only be attested in "Kazakh" because Nissan seems to like to write in Noğay for its Kazakh-speaking audience. —[[User:Firespeaker|Firespeaker]] 06:10, 20 August 2012 (UTC) | :: As far as I can tell, автомобильдер is the most common form.  The form автомобилдер also seems to be used, but doesn't look super formal, and автомобильлер seems to only be attested in "Kazakh" because Nissan seems to like to write in Noğay for its Kazakh-speaking audience. —[[User:Firespeaker|Firespeaker]] 06:10, 20 August 2012 (UTC) | ||
Revision as of 06:15, 20 August 2012
This is a language pair translating between Kazakh and Tatar.
Contents
General TODO
See /Work_plan.
- Declination of Tatar nouns ending with -и.
- Set up.- bidix-with-context.shscript (see- apertium-kaz-tat/dev/bidix; seems to be very useful, requires another script from spectie)
- Add some of the short wikipedia-article-like texts I have for evaluation into- texts(should be ~200 words).
- Implement cont. class for compound/multiword nouns which already have possessive ending (<px3sp>), e.g. Қытай Халық Республикасы.
- This continuation class should link only to CASE (but consider that some of them can have plural form: ишегаллары).
 
- Add "ярты", "ярым" and "чирек" as numerals, but don't link them to common numerals cont. class.
- (Lexical selection rule): сондай-ақ > шулай-ук
- Fix roman numerals:- add them to tat.lexc too;
- change- LEXICON NUM-ROMANto something like this:- %<num%>%<ord%>: # ;.
 
- Add transfer rule(s) to handle instrumental case of all parts-of-speech which are subject to substantivation, not only of nouns (this is one of the things which make testvoc results look bad)
- A separate cont.class for verbs which have causative forms ending with -дыр/-дер
- A "location-cases" cont. classes for some of the postpositions and location adverbs (e.g. "бире")
- 'Natinflcont. class in tat.lexc
- Fix "дыр<mod_ind>" thing (it doesn't pass bidix right now)
- Pronouns
- check cont. classes (note: if it looks like an overgeneration, and me is not sure about it, overgenerate in both lexc's)
- translate pronouns from kaz.lexc, add them to bidix and add equivalents into tat.lexc
- ^нигез/ни<prn><itg><px2pl><nom>
 
- Determiners
- "unify" cont. classes and tags
- add stems
 
- Adjectives
- personal clitics after adjectives are not implemented yet
 
- Translating between classes
- General stuff
- Copula suffixes: have for kazakh -ø = p3.sp, and for tatar -ø = p3.sp and -lAr = p3.pl. When translating kaz->tat there is no problem, when translating tat->kaz, pl->sp.
 
- Current:- ^миллион<num><subst><dat>$ --> миллионгеShould be:- ^миллион<num><subst><dat>$ --> миллионға
- Current:- ^сөйле<v><tv><coop><ger_past><loc>$ --> сөйлесгендеShould be:- ^сөйле<v><tv><coop><ger_past><loc>$ --> сөйлескенде
- Deletion of soft sign "ь" before vowels in Tatar (see comments at the end of the apertium-tat/apertium-tat.tat.twolfile)
- Kazakh:- ^ойна<v><tv><ifi><p1><pl>$ --> ойнадыкShould be: ойнадыҚ
- (tat) *аенда
- *жатқандығын
- безнекенеме (accusative case before clitics); безнекенгәме
- *журналистерді - *журналистеріне - *журналистерді- something like т:0 <=> :с/:0 _ %{L%}:/:0(r40597)
 
- (kaz) *Назарбаевтың(r40594)
- АКШ-*тың НАТО-*ның
- This is a problem with lexc, not twol —Firespeaker 06:15, 20 August 2012 (UTC)
 
- (kaz) *организмдер / организм<n><pl><nom> = организмдар(r40597)
- (kaz) процесс, процесі/процессі, процесінің/процессінің
- (kaz) автомобиль<n><attr>, *автомобильдер // автомобиль<n><pl><nom> = автомобильлер
- As far as I can tell, автомобильдер is the most common form. The form автомобилдер also seems to be used, but doesn't look super formal, and автомобильлер seems to only be attested in "Kazakh" because Nissan seems to like to write in Noğay for its Kazakh-speaking audience. —Firespeaker 06:10, 20 August 2012 (UTC)
 
International vocabulary
*терроризмге *массивіндегі *террорлық *Факті *кодекстің *терроризмге *Полицейлер *журналистерді «*АНТИТЕРРОРЛЫҚ » *полицейлер *антитеррорлық *режим *полицейлер *журналистерді *автоматты *автобустар *полицейлер *журналистеріне *сайттың *технологиялар *компьютер *мобильді *техникаларға *интернет *объектілерін *радиациялық *сантехник *проблемасы *веб-*сайттар *позитивті *алгебра *коалициялық *иммиграциялық *дипломатиялық *стратегиялық *станциясында
Proper nouns
*Гонконгтан
Discuss first
- There is only one formal form (<frm>) in Tatar, which can be both sg and plural. But in Kazakh there are two forms. Should I pretend as if in Tatar it *were* the same and duplicate the same form with a different tag or should I handle it in transfer?
- Consider турындагы - should it still be tagged as postposition?
- How to handle verbs with inner inflection (sometimes a Kazakh verb is translated with a multiword and vice versa. E.g. әуреле > башын әйләндер)
Part-of-speech related TODO's and DONE's can be found here:
To run tests, use aq-regtest utility from Apertium-quality tools. E.g. 
aq-regtest -d . kaz-tat http://wiki.apertium.org/wiki/Special:Export/Kazakh_and_Tatar/Postadvebs
Done
- But keep an eye on this
- Numerals
- kaz <num><subst>(<px3>) in fractions[1] = tat <num><subst>(<px3>)
- kaz <num><coll><advl> = tat <num><coll>
- kaz <num><coll><subst> = tat <num><subst>
 
Notes
- ↑ Currently whether it is in fractions or not is not taken into account

