Difference between revisions of "Kazakh and Tatar/TODO"
Jump to navigation
Jump to search
(→misc: add one more) |
Firespeaker (talk | contribs) (→Kazakh) |
||
Line 19: | Line 19: | ||
==== misc ==== |
==== misc ==== |
||
* *жатқандығын |
* *жатқандығын |
||
** Is this not a lexc problem? |
** Is this not a lexc problem? —[[User:Firespeaker|Firespeaker]] 16:26, 6 February 2013 (UTC) |
||
* безнекенеме (accusative case before clitics); безнекенгәме |
* безнекенеме (accusative case before clitics); безнекенгәме |
||
** Is this Kazakh? —[[User:Firespeaker|Firespeaker]] 16:26, 6 February 2013 (UTC) |
** Is this Kazakh? —[[User:Firespeaker|Firespeaker]] 16:26, 6 February 2013 (UTC) |
||
Line 47: | Line 47: | ||
==== и phonology ==== |
==== и phonology ==== |
||
⚫ | |||
⚫ | |||
{| class="wikitable" border="1" |
{| class="wikitable" border="1" |
||
Line 76: | Line 74: | ||
* <s>АКШ-*тың НАТО-*ның |
* <s>АКШ-*тың НАТО-*ның |
||
:: This is a problem with lexc, not twol —[[User:Firespeaker|Firespeaker]] 06:15, 20 August 2012 (UTC)</s> |
:: This is a problem with lexc, not twol —[[User:Firespeaker|Firespeaker]] 06:15, 20 August 2012 (UTC)</s> |
||
* <s>words with "[back vowel]...и[(cons)]" (i.e., borrowings)</s> '''dealt with via <tt>%{☭%}</tt>''' |
|||
* <s>(kaz) *организмдер / организм<n><pl><nom> = организмдар</s> (<tt>r40597</tt>) |
** <s>(kaz) *организмдер / организм<n><pl><nom> = организмдар</s> (<tt>r40597</tt>) |
||
⚫ | |||
⚫ | |||
* (kaz) процесс, процесі/процессі, процесінің/процессінің |
* (kaz) процесс, процесі/процессі, процесінің/процессінің |
||
* <s>(kaz) автомобиль<n><attr>, *автомобильдер // автомобиль<n><pl><nom> = автомобильлер |
* <s>(kaz) автомобиль<n><attr>, *автомобильдер // автомобиль<n><pl><nom> = автомобильлер |
Revision as of 17:21, 8 February 2013
Contents
General TODO
See Kazakh and Tatar/Work_plan and Kazakh and Tatar/Remaining unanalysed forms
- 0 itself and numbers containing it aren't analyzed (in both directions)
- Declination of Tatar nouns ending with -и.
- A separate cont.class for verbs which have causative forms ending with -дыр/-дер
- Isn't this the default for
<v>
<iv>
?
- Isn't this the default for
- A "location-cases" cont. classes for some of the postpositions and location adverbs (e.g. "бире")
- What do you mean? —Firespeaker 16:20, 6 February 2013 (UTC)
- Better disambiguation
Might be twol, might not be, but JNW needs to go through this stuff and figure out the issues.
Kazakh
misc
- *жатқандығын
- Is this not a lexc problem? —Firespeaker 16:26, 6 February 2013 (UTC)
- безнекенеме (accusative case before clitics); безнекенгәме
- Is this Kazakh? —Firespeaker 16:26, 6 February 2013 (UTC)
Currently generated incorrect form(s) | Unanalyzed correct form(s) | Comments |
---|---|---|
^қаубы/қауіп<n><px3sp><nom> | 294 *қаупі |
қауіп:қау%{y%}п N1 ; ! "danger" қауіп:қауіп N1 ; ! "danger" Dir/LR .gc қаупі=10,500 .gc қауіпі=1,560 .gc қаупы=27 .gc қауыпы=10 .gc қәуіп=11 |
^құғы/құқ<n><px3sp><nom>$ | 284 *құқы | Final consonant remains voiceles in intervocalic position. |
^жойу/жой<v><tv><ger><nom>$ | 215 *жою |
и phonology
Currently generated incorrect form(s) | Unanalyzed correct form(s) | Comments |
---|---|---|
^жиіліп/жи<v><tv><pass><gna_perf>/жи<v><tv><pass><prc_perf>$ | 35 *жиылып | Added in lexc as жи:жи V-TV ; ! "" . Tried to change it to жи:жи%{й%} V-TV ; ! "" — makes жиып work, but doesn't affect the gerund form. Not quite the right thing.
|
жиіп/жи<v><tv><gna_perf>/жи<v><tv><prc_perf>$ | 58 *жиып | |
^жиу/жи<v><tv><ger><nom>$ | жию |
done (but keep an eye on)
Current:^миллион<num><subst><dat>$ --> миллионге
Should be:^миллион<num><subst><dat>$ --> миллионға
Current:^сөйле<v><tv><coop><ger_past><loc>$ --> сөйлесгенде
Should be:^сөйле<v><tv><coop><ger_past><loc>$ --> сөйлескенде
Kazakh:^ойна<v><tv><ifi><p1><pl>$ --> ойнадык
Should be: ойнадыҚ*журналистерді - *журналистеріне - *журналистердіsomething like т:0 <=> :с/:0 _ %{L%}:/:0(r40597)
(kaz) *Назарбаевтың(r40594)АКШ-*тың НАТО-*ның
This is a problem with lexc, not twol —Firespeaker 06:15, 20 August 2012 (UTC)
words with "[back vowel]...и[(cons)]" (i.e., borrowings)dealt with via %{☭%}(kaz) *организмдер / организм<n><pl><nom> = организмдар(r40597)Currently:^Исраил<np><ant><m><gen>/Исраилдың$
and^Исраил<np><ant><m><dat>/Исраилға$
Correct forms are Исраилдің and Исраилге respectivelyCurrently:^Иерусалим<np><top><dat>/Иерусалимға$
, Иерусалимдағы and Иерусалимның. Correct forms are Иерусалимге, Иерусалимдегі and Иерусалимнең respectively. In short, make them take front vowel affixes!
- (kaz) процесс, процесі/процессі, процесінің/процессінің
(kaz) автомобиль<n><attr>, *автомобильдер // автомобиль<n><pl><nom> = автомобильлер
As far as I can tell, автомобильдер is the most common form. The form автомобилдер also seems to be used, but doesn't look super formal, and автомобильлер seems to only be attested in "Kazakh" because Nissan seems to like to write in Noğay for its Kazakh-speaking audience. —Firespeaker 06:10, 20 August 2012 (UTC)The thing is that the form we are generating is автомобильлер. - Francis Tyers 07:08, 20 August 2012 (UTC)(r40704)
^организмге/*организмге$ *организмнің *организмнен(r40705)*тарихынан *тарихы ...(≤r40705)
Tatar
- (tat) generates укыу
- (tat) generates айендә
- Deletion of soft sign "ь" before vowels in Tatar (see comments at the end of the
apertium-tat/apertium-tat.tat.twol
file) - (tat) *аенда, generating музейе instead of correct музее
apertium-tat$ echo "^йөр<v><iv><gpr_impf>$" | hfst-proc -g tat.autogen.hfst
>>йөрә торган
- ^хокукын/*хокукын$ <-- ^хокукын<n><px3sp><acc>$ See ^құқын/*құқын$ issue above.
Other
International vocabulary
*терроризмге *массивіндегі *террорлық *Факті *кодекстің *терроризмге *Полицейлер *журналистерді «*АНТИТЕРРОРЛЫҚ » *полицейлер *антитеррорлық *режим *полицейлер *журналистерді *автоматты *автобустар *полицейлер *журналистеріне *сайттың *технологиялар *компьютер *мобильді *техникаларға *интернет *объектілерін *радиациялық *сантехник *проблемасы *веб-*сайттар *позитивті *алгебра *коалициялық *иммиграциялық *дипломатиялық *стратегиялық *станциясында
Proper nouns
*Гонконгтан
Discuss first
- There is only one formal form (<frm>) in Tatar, which can be both sg and plural. But in Kazakh there are two forms. Should I pretend as if in Tatar it *were* the same and duplicate the same form with a different tag or should I handle it in transfer?
- Consider турындагы - should it still be tagged as postposition?
- How to handle verbs with inner inflection (sometimes a Kazakh verb is translated with a multiword and vice versa. E.g. әуреле > башын әйләндер)
Part-of-speech related TODO's and DONE's can be found here:
To run tests, use aq-regtest
utility from Apertium-quality tools. E.g.
aq-regtest -d . kaz-tat http://wiki.apertium.org/wiki/Special:Export/Kazakh_and_Tatar/Postadvebs
Done
- But keep an eye on this
- Numerals
- kaz <num><subst>(<px3>) in fractions[1] = tat <num><subst>(<px3>)
- kaz <num><coll><advl> = tat <num><coll>
- kaz <num><coll><subst> = tat <num><subst>
Notes
- ↑ Currently whether it is in fractions or not is not taken into account