Морфологический трансдуктор русского языка

From Apertium
Revision as of 19:40, 10 April 2013 by Francis Tyers (talk | contribs) (спра́шиваемый)
Jump to navigation Jump to search

Decisions

  • Monosyllables that can be stressed get their stress mark, even if they are indeclinable.
  • Undeclinable words get their full paradigm, and disambiguation is left up to the disambiguator.
  • Adverbs are added separately from adjectives, even when there is an obvious derivation.
  • Adverbs get their comparative as part of the adverbial paradigm where possible.
  • Personal pronouns are added with their lemma as their nominative form: я, мы, ты, вы, он, она, оно, они.

Comments

  • Indeclinables. I take it that either you could say that they have no forms at all or that they have full paradigms where all the forms are identical, right? What is more desirable computationally? I think that conceptually this is experienced more like the latter with a lot of syncretic forms. So in a sense I would vote for the latter. It is like the syncretism of fish in English with a) My poor little fish named Ben died yesterday vs. b) Look at all those fish! The lack of a sg vs. pl distinction morphologically does not prevent us from experiencing the difference grammatically when we have syncretism. We will have a big issue with several hundred biaspectual verbs in Russian. These can express both perfective and imperfective, however in context they are NEVER ambiguous, at least that is what native linguists always claim... At any rate, we should have a consistent policy for uninflected nouns and biaspectual verbs, etc. Actually, come to think of it, there is an uninflected verb in Russian: na! It has only that form and it interpreted as an imperative, 'here, take it', although you can get a plural: nate! (600+ attestations in RNC).
  • I personally would vote for interpreting вечером as both an adverb and the instrumental sg of вечер. Obviously it will mostly be the adverb, but the noun is always possible.
  • About the adverbs that are regularly formed from adjectives -- what is the convention with other languages that do this? What are the tradeoffs? What do I lose or gain by making one decision over the other? It might be easier to have them as separate entries and then one can also separate the comparative forms for the adjective from those for the adverb since they would come in different entries, and that might be cleaner.
  • About the pronouns: My intuition would be to separate them and have a different entry for singular vs. plural. In other words, I would have one entry each for я, мы, ты, вы, он, она, оно, они. I think that is conceptually easier. Each of those would inflect only in one number, but I think that would make more sense for users than looking for нам under я.

Comparison

There are a couple of issues that need to be taken seriously. One is that the (synthetic) comparative forms of adjectives and adverbs are nearly always (maybe always?) identical. This is something that a native speaker would be aware of and it would be desirable somehow to represent this in our grammar too.

However, there is a lot of noise in the system and the adverbs are not strictly just forms of the adjectives either.

The synthetic comparative adjectives are like the short forms of adjectives in that they do not inflect for case and they also appear in predicate position, not attributive. There are analytic comparative forms that have regular adjective endings and appear in attributive position (более красивый, самый красивый).

The synthetic comparative can also serve as a superlative, but only in predicate position, and only in collocation with всех, всего. Since it requires the collocation with всех, всего, it is not really synthetic. There is also the analytic superlative with самый, but that one is only attributive.

There is also the issue of the pseudosuperlatives prefixed in наи-, but I don't know as I have the energy for those right now...

So we have:

  • быстрый -- adjective ‘fast’
  • быстро -- adverb ‘fast’
  • быстрее -- ‘faster’ synthetic comparative adjective for predicate position, plus comparative adverb
  • более быстрый -- ‘faster’ analytic comparative adjective, primarily used for attributive position, but might also leak to predicative
  • более быстро -- ‘faster’ analytic comparative adverb (less common than быстрее)
  • самый быстрый -- ‘fastest’ analytic attributive superlative adjective
  • быстрее всех, всего -- ‘fastest of all’ analytic predicative superlative adjective and adverb
  • наиболее быстрый -- ‘fastest’ analytic superlative adjective, relatively rarely found
  • наиболее быстрo -- ‘fastest’ analytic superlative adverb, very rarely found
  • быстрейший -- ‘fastest’ synthetic attributive superlative adjective, but it can have the meaning ‘very’ instead of ‘most’
  • наибыстрейший -- ‘fastest’ synthetic attributive superlative adjective, but it can have the meaning ‘very’ instead of ‘most’, these last two are rather synonymous.


---

The issue of comparative forms is tied to the types of adjectives that one has. There are two not entirely discrete categories that are relevant: qualitative (like: good, big, smart) and relational (like: silver, wooden, university). Comparative (and superlative) forms are most strongly associated with qualitative adjectives, whereas relational adjectives are less likely to form comparatives, although some are possible in metaphorical use. So one can have a better job, a bigger book, a smarter student, but it is less likely to find things like a silverer spoon, a woodener desk, or a universityer course. But you can get things like this occasionally. Here is a corpus example: При ветре кустики ершились, выворачивая листья, становились ещё серебрянее. [И. Грекова. Фазан (1984)] ‘The bushes got tousled by the wind, the leaves turned over and became even silverer/more silvery.’ Note that this is metaphorical because the leaves are not literally covered in silver, they just look that way. So while there is a strong tendency to form the comparatives from the qualitative adjectives, it is also the case that they can leak over to the relational adjectives. So we would need to be able to analyze comparatives from all adjectives (and their adverbs, where they exist), though we might not need to generate them for all. But there is another note to make here. Here I am speaking impressionistically — all this would need to be verified by taking a closer look at the data. I believe that the two categories of adjectives also have a rather different status in the language. Qualitative adjectives are something closer to a closed class and probably smaller group, though probably also with higher frequency. Russian (and probably most languages) is not actively creating new members for this category. The formation of relational adjectives is on the contrary very productive. Any time a chemist comes up with a new substance or a group of people form a new organization, one can create relational adjectives to describe things made from or associated with the new nouns. I don't know how small the qualitative group is, but I suspect that it is not very large, at most a couple hundred. The relational group is potentially infinite. But the boundaries between them are extremely fuzzy. Some 20+ years ago I had an MA student who wrote a thesis about qualitative vs. relational adjectives in Russian and Czech, looking at various possible criteria and of course though many examples are fairly clear, there are also a good number that are less clear and it looks more like a continuum than two really discrete categories. But that is true of almost any supposed textbook distinction in language, isn't it? When you start exploring real data the boundaries fade very fast and everything looks much more complicated.

Impersonal and reflexive

Missing genitive plural

There are, for example, some morphological peculiarities that are tied to stress patterns. You have already seen that the second locative and the NPl in stressed –a show this connection. There are others. One of these is something I thought I should alert you to right away because you are going to hit it very soon. It is the mystery of the missing GPl form with about 40 nouns in Russian. Most of these words are fairly low frequency, but one is of very high frequency, and it is in our little lexicon: мечта ‘dream’. I know someone who is working on this problem and here is the state-of-the-art:

All the lexemes with gaps in the Gpl are:

  • feminine or neuter nouns ending in a vowel in the Nsg
  • nouns whose stems end in a non-palatalized consonant
  • nouns that have stress on the ending rather than the stem throughout the paradigm
  • According to the pattern, they should have a zero ending, but they just don't have a form at all.
  • Here are some of the words: мечта, фита, балда, брюзга, дно, тамада, тахта, пурга, баба-яга, кочерга, мольба, лапта, кума, юла, корма, казна, ушко, озерцо, очко, тетива, фата, башка, айва, раба, хула, тьма, клюка, хвала

Verbs

Participle forms

Lemma Aspect Trans. Z Suff. Pref. pprs.adv pp.adv pprs.actv pp.actv pprs.pasv pp.pasv -ся
гуля́ть impf iv 1a - - гуля́я (→) гуля́в гуля́ющий гуля́вший - - -
у́жинать impf iv 1a - - у́жиная (→) у́жинав у́жинающий у́жинавший - - -
отдыха́ть impf iv 1a - - отдыха́я (→) отдыха́в отдыха́ющий отдыха́вший - - -
быва́ть impf iv 1a - - быва́я (→) быва́в быва́ющий быва́вший - - -
обе́дать impf iv 1a - - обе́дая (→) обе́дав обе́дающий обе́давший - - -
умира́ть impf iv 1a - - умира́я (→) умира́в умира́ющий умира́вший - - -
чита́ть impf tv 1a - - чита́я (→) чита́в чита́ющий чита́вший чита́емый (→) чи́танный inf, pres.p3.*, past.*
ду́мать impf tv 1a - - ду́мая (→) ду́мав ду́мающий ду́мавший ду́маемый (→) ду́манный inf, pres.p3.*, past.*
слу́шать impf tv 1a - - слу́шая (→) слу́шав слу́шающий слу́шавший слу́шаемый (→) слу́шанный inf, pres.p3.*, past.*
де́лать impf tv 1a - - де́лая (→) де́лав де́лающий де́лавший де́лаемый (→) де́ланный inf, pres.p3.*, past.*
зна́ть impf tv 1a - - зна́я (→) зна́в зна́ющий зна́ющий зна́емый (→) зна́нный inf, pres.p3.*, past.*
начина́ть impf tv 1a - начина́я (→) начина́в начина́ющий начина́вший начина́емый (→) начина́нный inf, pres.p3.*, past.*
жела́ть impf tv 1a - - жела́я (→) жела́в жела́ющий жела́вший жела́емый (→) жела́нный inf, pres.p3.*, past.*
разгова́ривать impf iv 1a - - разгова́ривая (→) разгова́ривав разгова́ривающий разгова́ривавший - - -
хвата́ть¹ impf iv 1a - - хвата́я (→) хвата́в хвата́ющий хвата́вший хвата́емый - inf, pres.p3.*, past.*
хвата́ть² impf iv 1a - - - - - - - - inf, pres.p3.*, past.*
надева́ть impf tv 1a - - надева́я (→) надева́в надева́ющий надева́вший надева́емый надёванный inf, pres.p3.*, past.*
спра́шивать impf tv 1a - - спра́шивая (→) спра́шивав спра́шивающий спра́шивавший спра́шиваемый - inf, pres.p3.*, past.*
забыва́ть impf tv 1a - - забыва́я (→) забыва́в забыва́ющий забыва́вший забыва́емый - inf, pres.p3.*, past.*
пока́зывать impf tv 1a - - пока́зывая (→) пока́зывав пока́зывающий пока́зывавший пока́зываемый - inf, pres.p3.*, past.*
расска́зывать impf tv 1a - - расска́зывая (→) расска́зывав расска́зывающий расска́зывавший расска́зываемый - inf, pres.p3.*, past.*
открыва́ть impf tv 1a - - открыва́я (→) открыва́в открыва́ющий открыва́вший открыва́емый - inf, pres.p3.*, past.*
зараба́тывать impf tv 1a - - зараба́тывая (→) зараба́тывав зараба́тывающий зараба́тывавший зараба́тываемый - inf, pres.p3.*, past.*
предлага́ть impf tv 1a - - предлага́я (→) предлага́в предлага́ющий предлага́вший предлага́емый - inf, pres.p3.*, past.*
помога́ть impf tv 1a - - помога́я (→) помога́в помога́ющий помога́вший - - -
переезжа́ть impf iv 1a - - переезжа́я (→) переезжа́в переезжа́ющий переезжа́вший - - -
уезжа́ть impf iv 1a - - уезжа́я (→) уезжа́в уезжа́ющий уезжа́вший - - -
понима́ть impf iv 1a - - понима́я (→) понима́в понима́ющий понима́вший понима́емый - inf, pres.p3.*, past.*
покупа́ть impf tv 1a - - покупа́я (→) покупа́в покупа́ющий покупа́вший покупа́емый - inf, pres.p3.*, past.*
игра́ть impf tv 1a - - игра́я (→) игра́в игра́ющий игра́вший игра́емый и́гранный inf, pres.p3.*, past.*
чу́вствовать impf tv 2a - - чу́вствуя (→) чу́вствовав чу́вствующий чу́вствовавший чу́вствуемый (→) чу́вствованный inf, pres.p3.*, past.*
гото́вить impf tv 4a - - гото́вя (→) гото́вив гото́вящий гото́вивший гото́вимый (→) гото́вленный inf, pres.p3.*, past.*
тра́тить impf tv 4a - - тра́тя (→) тра́тив тра́тящий тра́тившйи тра́тимый (→) тра́ченный inf, pres.p3.*, past.*
ви́деть impf tv 5a - - ви́дя (→) ви́дев ви́дев ви́девший ви́димый (→) ви́денный inf, pres.p3.*, past.*
стоя́ть impf iv 5b - - сто́я (→) стоя́в стоя́щий стоя́вший - - -


:Present gerund (<pprs><adv>): Only from imperfective verbs. Formed by deleting -Vт from 3pl and adding -я/-a. Stress is same as in 1sg present. 13 exceptional verbs form the present gerund from the infinitive instead: дава́я.

:Past active participle (<pp><actv>): Formed from both perfective and imperfective verbs. If past ends in -л, delete it and add -вший. Otherwise, add -ший. Same stress as in masc sg past. Participle declines like 4a adjective (no short forms). There are a few exceptional forms like мётший (т is missing in past), заня́вший (stress is from infinitive). Reflexives in -ся.

:Past gerund (<pp><adv>): Formed from both perfective and imperfective verbs, but in practice nearly only from perfective verbs. Formed by removing -й from past active participle, but if the form is in -вши, then it has an alternate in -в: увидевши, увидев. But if it is reflexive, then the form is always -вшись.

:Present passive participle (<pprs><pasv>): Only formed from imperfective transitive verbs from certain morphological classes: verbs with 1pl in V+ем (where the e is NOT stressed). It is rarely formed from other verbs. The participle declines like a 1a adjective and has both long and short forms.

:Past passive participle (<pp><pasv>): Regularly formed from perfective transitive verbs. Rarely also formed from imperfective transitive verbs that do not contain an imperfectivising suffix. Lots of complicated rules for formation, declined as 1a adjectives with both long and short forms.