Kazakh and Tatar/Diary

From Apertium
Jump to navigation Jump to search

Monday, 28th May 2012

Checking & refactoring clitics

Some of the clitics appear only after certain forms (e.g. "шы<mod_foc>" in Kazakh, which expresses politeness, joins only 2nd person singular). And vice versa - some of the forms can get only certain clitics (imperative forms get only "чы" and "сана" in Tatar)

I moved the above clitics into a separate lexicon, and linked imperative forms to it, so that there is no overgeneration now (and a bit easier life for spectie's "testvocing" PC's).

In Tatar some new clitics were added as well.

Tuesday, 29th May 2012

Checking & refactoring clitics (cont.)

A question whether %+ғана%<postadv%>:% %{G%}ана # ; ! "only" in CLIT continuation class was correct produced a discussion about whether we should handle harmonizing of such words in transducer (means matching them to the previous word) or post-generator can take care of that.

Another thing is that some Tatar modal particles do not vary depending of the previous word (e.g. "бит"), but I have put them into CLIT continuation class (as all other modal particles were there). This might be wrong.

I learned a lot of new stuff :), but the possible changes in CLIT lexicon were kept for later.

Some work on postadverbs

See Postadverbs

Wednesday, 30th May 2012

Had to study for a "zachet", not much done, but:

Went over numerals again, some additions

Started categorizing postpositions depending on what case they govern

Their "case-governance" often mismatches between the the two languages, so some transfer rules will be required.

I'll need help to set up coverage-measuring scripts and to learn how I can testvoc only certain POS's.

Also I think that I need another story :) To keep testing things on a parallel text much earlier than midterm comes is a good idea anyway.

Wednesday, 30th May 2012

Postpositions

Categorized Kazakh postpositions and translated them into Tatar. Wasn't sure about four or five, and put them down in dev/bidix/postpositions.todo.txt. Also in kaz.lexc there are some postpositions which seem to be POS-miscategorized (see ! To be checked).

Categorized postpositions in tat.lexc, added some more stuff.

Tomorrow I am going to translate this additional Tatar ones (to boost up the coverage - they seem to be quite frequent!) and work on transfer rules if they are needed.

Tuesday, 31st May 2012 - Sunday, 3rd June 2012

Let's review what has been happening to apertium-kaz-tat in the last few days (with every new day it is getting harder to remember about changes).

I haven't written here for a couple of days because I wanted to finish working on POS's I had to finish according to the workplan and only after that happily announce about it. But what I learned from it is to try to keep track of changes as they occur, because going back even just for a few days takes more time than to sum up what you have been doing in a few shorter sentences immediately after the "working hours".

Monday, 4th June 2012

Conjunctions

Translated what already was in lexc's, added some more. Wasn't sure about ones in dev/bidix/conjunctions.todo.txt, gonna go back to them later (they are rather archaic).


There was a question about how I should classify conjunctions (co-ordinating, sub-ordinating or adverbial). The first two are distinguished by all grammars I have, so I mostly just followed them while adding stems into CCLEX and CSLEX.

I might have misunderstood "adverbial conjunctions" [although I think I haven't :)]. In this lexicon landed words, which aren't pure conjunctions, but are derived from other parts of speech (<prn>, <prn>+<post>, <det><n> etc). They connect two sentences, but also appear as a part of one of them (in contrast to co-ordinating and sub-ordinating conjunctions, which are autonomous) and substitute semantically the other sentence and can appear in the middle of the sentece (separated by comma).

If I understood it correctly, adverbial conjunctions are what Tatar grammars call "мөнәсәбәтле сүзләр = относительные слова" (see pp. 341-351 of Volume 2 of the Academical Grammar).

Also into this lexicon I plan to add all other so called "transition words" (e.g. "беренчедән" = "firstly" etc.).


Grammars I have also write about conjunctive use of interrogative - demonstrative pronoun pairs (calling them "союзные слова"). E.g.: "Кем эшләми, шул ашамый". But I think that adding all these pronouns once more as correlative conjunctions wouldn't make any sense for Kazah-Tatar pair (or any other Turkic pair).