Difference between revisions of "Kazakh and Tatar"
Jump to navigation
Jump to search
Line 6: | Line 6: | ||
# Declination of Tatar nouns ending with -и. |
# Declination of Tatar nouns ending with -и. |
||
# Set up <code>bidix-with-context.sh</code> script (see <code>apertium-kaz-tat/dev/bidix</code>; seems to be very useful, requires another script from spectie). |
# <s>Set up <code>bidix-with-context.sh</code> script (see <code>apertium-kaz-tat/dev/bidix</code>; seems to be very useful, requires another script from spectie)</s>. |
||
# <s>Add some of the short wikipedia-article-like texts I have for evaluation into <code>texts</code> (should be ~200 words).</s> |
# <s>Add some of the short wikipedia-article-like texts I have for evaluation into <code>texts</code> (should be ~200 words).</s> |
||
# Implement cont. class for compound/multiword nouns which already have possessive ending (<px3sp>), e.g. ''Қытай Халық Республикасы''. |
# Implement cont. class for compound/multiword nouns which already have possessive ending (<px3sp>), e.g. ''Қытай Халық Республикасы''. |
||
Line 21: | Line 21: | ||
# Current: <code>^сөйле<v><tv><coop><ger_past><loc>$ --> сөйлесгенде</code> Should be: <code>^сөйле<v><tv><coop><ger_past><loc>$ --> сөйлескенде</code> |
# Current: <code>^сөйле<v><tv><coop><ger_past><loc>$ --> сөйлесгенде</code> Should be: <code>^сөйле<v><tv><coop><ger_past><loc>$ --> сөйлескенде</code> |
||
# Deletions of soft sign "ь" before vowels in Tatar (see comments at the end of the <code>apertium-tat/apertium-tat.tat.twol</code> file) |
# Deletions of soft sign "ь" before vowels in Tatar (see comments at the end of the <code>apertium-tat/apertium-tat.tat.twol</code> file) |
||
=== Discuss first === |
|||
# There is only one formal form (<frm>) in Tatar, which can be both sg and plural. But in Kazakh there are two forms. Should I pretend as if in Tatar it *were* the same and duplicate the same form with a different tag or should I handle it in transfer? |
|||
---- |
---- |
Revision as of 22:40, 11 June 2012
This is a language pair translating between Kazakh and Tatar.
General TODO
See /Work_plan.
- Declination of Tatar nouns ending with -и.
Set up.bidix-with-context.sh
script (seeapertium-kaz-tat/dev/bidix
; seems to be very useful, requires another script from spectie)Add some of the short wikipedia-article-like texts I have for evaluation intotexts
(should be ~200 words).- Implement cont. class for compound/multiword nouns which already have possessive ending (<px3sp>), e.g. Қытай Халық Республикасы.
- This continuation class should link only to CASE (but consider that some of them can have plural form: ишегаллары).
- Add "ярты", "ярым" and "чирек" as numerals, but don't link them to common numerals cont. class.
- (Lexical selection rule): сондай-ақ > шулай-ук
- Fix roman numerals:
- add them to tat.lexc too;
- change
LEXICON NUM-ROMAN
to something like this:%<num%>%<ord%>: # ;
.
Twol realated stuff
- Current:
^миллион<num><subst><dat>$ --> миллионге
Should be:^миллион<num><subst><dat>$ --> миллионға
- Current:
^сөйле<v><tv><coop><ger_past><loc>$ --> сөйлесгенде
Should be:^сөйле<v><tv><coop><ger_past><loc>$ --> сөйлескенде
- Deletions of soft sign "ь" before vowels in Tatar (see comments at the end of the
apertium-tat/apertium-tat.tat.twol
file)
Discuss first
- There is only one formal form (<frm>) in Tatar, which can be both sg and plural. But in Kazakh there are two forms. Should I pretend as if in Tatar it *were* the same and duplicate the same form with a different tag or should I handle it in transfer?
Part-of-speech related TODO's and DONE's can be found here:
To run tests, use aq-regtest
utility from Apertium-quality tools. E.g.
aq-regtest -d . kaz-tat http://wiki.apertium.org/wiki/Special:Export/Kazakh_and_Tatar/Postadvebs
Done
- But keep an eye on this
- Numerals
- kaz <num><subst>(<px3>) in fractions[1] = tat <num><subst>(<px3>)
- kaz <num><coll><advl> = tat <num><coll>
- kaz <num><coll><subst> = tat <num><subst>
Notes
- ↑ Currently whether it is in fractions or not is not taken into account