Difference between revisions of "Bosnian-Croatian-Montenegrin-Serbian and Macedonian"

From Apertium
Jump to navigation Jump to search
Line 94: Line 94:
** "s trima lijepim ženama" (with three women). The number and the rest of the phrase take the dual forms (Nom,Acc,Voc==Gen.Sg, Gen=>Gen.Pl (triju lijepih žena), Dat,Loc,Ins=>Ins.Pl (trima lijepim ženama) ). This variant is more literary.
** "s trima lijepim ženama" (with three women). The number and the rest of the phrase take the dual forms (Nom,Acc,Voc==Gen.Sg, Gen=>Gen.Pl (triju lijepih žena), Dat,Loc,Ins=>Ins.Pl (trima lijepim ženama) ). This variant is more literary.
** "s tri lijepe žene" (the number is in a frozen form, and the rest of the phrase gets genitive, the actual meaning is determined from context or prepositions). This variant is closer to actual speech.
** "s tri lijepe žene" (the number is in a frozen form, and the rest of the phrase gets genitive, the actual meaning is determined from context or prepositions). This variant is closer to actual speech.

* Remove 'Adj' if 'Adj' OR 'Prop' if previous word is 'U' and second word not part of an NP (e.g. pronoun, finite verb, ...)
** ''U Bugarskoj je do sada dijagnosticirano samo 366 slučajeva, a u Hrvatskoj 341.''


==See also==
==See also==

Revision as of 17:30, 19 July 2011

Progress of the work in the bonding period

Insofar, a new dictionary has been started from scratch, some paradigms added from the grammar of croatian, along with some closed word categories. The most extensive work has been done with male noun paradigms, and seems that most work will be done with nouns. Adjectives are inherently more work, but there is less variation. To the dictionary paradigms have been added for verbs, for the present, aorist, imperfect, and futureI tense (the combination of the clitic 'ću' with the infinitive).

Todo

Testing framework
  • Set up pending/regression tests framework
  • Set testvoc
  • Set up corpus/generation-test
Serbo-Croatian dictionary

The reflex of yat:

  • Adding two additional modes to the monodix (ek/ijek) so that lemmas containing yat can be analysed both as ekavian and ijekavian
  • Update the makefile and the xslt machinery so that this works

Verbs(Marked for aspect, transitivity and reflexivity)

  • Most of them are from a list extracted from the verbs in the mk monodix (the list is almost completely added, a dosen is missing)
  • Verb paradigm names are fashioned to contain information about the verb ("aspect_transitivity_paradig/m__vblex"), makes input more convenient

Adjectives:

  • marked for definiteness
  • entered as quadruples (positive, comparative, superlative, absolute superlative)

Nouns:

  • masculine : a great deal of paradigms covered
  • feminine : some general cases
  • neuter : some general cases

Closed word categories:

    • prepositions (including the ones of type s/sa and k/ka, which need to be postprocessed in generation)
    • conjunctions
    • interjections
    • particles
    • pronouns (personal, reflexive, possesive, interrogative, relational, demonstrative (pronoun and adjective), indefinite, negative, ...

Other:

  • Add the paradigms from the grammar of Croatian (the one by Barić, Lončarić, Malić, Pavešić, Peti, Zečević, Znika) to the sh monodix [in progress]
  • Obtain a grammar of Serbian, for reference on differences
Macedonian dictionary
  • Add determiner forms for some pronouns (e.g demonstratives, possessives, etc.) -- things that can modify nouns
Bilingual dictionary
  • Update the pronoun entries, the symbols in the monodix have been adjusted to correspond more closely to the analysis in the macedonian monodix
Transfer rules

Transfer rules

Three stage transfer is used.

Some problems:

Genitive constructions, problem in distinguishing partitive from possesive:

  • Čaša vode = Чаша вода [partitive]
  • Pilotkinje Vojske Srbije == Пилотките на Српската војска [possesive]

Tricky sequences (instrumental adjectives):

  • "...upravljanje velikom, jakom 'pticom' " == "...управување со голема, силна 'птица' ".

Only one preposition needed for the whole chain, which can be of any length.

Disambiguation rules

For all cases modifiers (adjectives, numbers, pronouns) transfer number, case and gender, so instances like these were used to disambiguate

  • ...sa njenim toplim nježnim rukama - A sequence in instrumental
  • ...sa svojih toplih nježnih ruku - A sequence in genitive

Nominative or accusative:

  • Select Accusative if there's a preceeding Accusative preposition
  • Select Accusative if there's a preceeding transitive verb (direct object rule)

Genitive or accusative:

  • Disambiguate based on prepositions (the intersection is only "u")
  • Select accusative if there's a preceeding transitive verb (direct object rule)

Dative or locative:

  • Select dative if there's no preceeding preposition or modifier in dative
  • Select dative if there's a preceeding dative preposition
  • Select locative if there's a preceeding locative preposition

Instrumental (unambiguous in singular):

  • Select instrumental if there's a preceeding instrumental preposition
  • In plural identical to Dative/Locative
    • (easily disambiguated from Locative, since the latter is entirely prepositional)
    • Possible problems in the plural with instrumental/non-instrumental, though not yet encountered
      • "Ljudima sam pomeo pod" - 'To the people' or 'using the people'
      • "Ploviti morima" - To sail 'to the seas' or 'across seas' (or 'using the seas')

Numbers:

  • Numbers 2-4 govern noun phrases differently (remnants of dual), two variants:
    • "s trima lijepim ženama" (with three women). The number and the rest of the phrase take the dual forms (Nom,Acc,Voc==Gen.Sg, Gen=>Gen.Pl (triju lijepih žena), Dat,Loc,Ins=>Ins.Pl (trima lijepim ženama) ). This variant is more literary.
    • "s tri lijepe žene" (the number is in a frozen form, and the rest of the phrase gets genitive, the actual meaning is determined from context or prepositions). This variant is closer to actual speech.

See also

External links

Further reading

  • Ivan Todorović "Disambiguation of Serbian sentences with Unitex".

References