Difference between revisions of "Bosnian-Croatian-Montenegrin-Serbian and Macedonian"

From Apertium
Jump to navigation Jump to search
 
(7 intermediate revisions by one other user not shown)
Line 9: Line 9:
==Notes==
==Notes==


;Bosnian-Croatian-Montenegrin-Serbian morphological lexicon
;Testing framework
* TODO: Set up corpus/generation-test

;Serbo-Croatian dictionary
The reflex of yat:
The reflex of yat:
* Added two additional modes to the monodix (ek/ijek) so that lemmas containing yat can be analysed both as ekavian and ijekavian
* Added two additional modes to the monodix (ek/ijek) so that lemmas containing yat can be analysed both as ekavian and ijekavian
Line 46: Line 43:
* Čaša vode = Чаша вода [partitive]
* Čaša vode = Чаша вода [partitive]
* Pilotkinje Vojske Srbije == Пилотките на Српската војска [possesive]
* Pilotkinje Vojske Srbije == Пилотките на Српската војска [possesive]
* TODO: After the lexicon in the analyser becomes sufficiently large grep out all "(Noun) (Noun + Genitive)" occurences in i.e. the wikipedia corpus, and find pairs which:
** '''More frequently appear as partitive'''
** '''More frequently appear as possesive'''
** Can appear either way
and afterwards ignore the latter case, and make two categories of noun pairs.

Tricky sequences (instrumental adjectives):
Tricky sequences (instrumental adjectives):
* "...upravljanje '''velikom''', '''jakom''' ''''pticom'''' " == "...управување '''со''' '''голема''', '''силна''' ''''птица'''' ".
* "...upravljanje '''velikom''', '''jakom''' ''''pticom'''' " == "...управување '''со''' '''голема''', '''силна''' ''''птица'''' ".
Line 64: Line 67:
* Anina čaša == чашата на Ана (Ana's glass)
* Anina čaša == чашата на Ана (Ana's glass)
* Anina ruka == раката на Ана (Ana's hand)
* Anina ruka == раката на Ана (Ana's hand)
* (possible TODO in the mk analyser: add an analysis for possesive adjectives, currently doesn't exist)
Preposition 's' when standing with genitive does not translate as 'со':
Preposition 's' when standing with genitive does not translate as 'со':
* ...sam poletjela s avionske piste...==...полетав '''од''' авионската писта... and not ...'''со''' авионската писта..., which would have an instrumental meaning
* ...sam poletjela s avionske piste...==...полетав '''од''' авионската писта... and not ...'''со''' авионската писта..., which would have an instrumental meaning
Line 103: Line 107:
* [[/Regression tests|Regression tests]]
* [[/Regression tests|Regression tests]]
* [[/Final_report|Final report]]
* [[/Final_report|Final report]]

;TODOs
* Set up corpus/generation-test


==External links==
==External links==
Line 120: Line 127:
* SEELRC [http://www.seelrc.org:8080/grammar/pdf/stand_alone_bcs.pdf Bosnian / Croatian / Serbian Reference Grammar]
* SEELRC [http://www.seelrc.org:8080/grammar/pdf/stand_alone_bcs.pdf Bosnian / Croatian / Serbian Reference Grammar]
* SEELRC [http://www.seelrc.org:8080/grammar/pdf/stand_alone_macedonian.pdf Macedonian Grammar]
* SEELRC [http://www.seelrc.org:8080/grammar/pdf/stand_alone_macedonian.pdf Macedonian Grammar]
[[Category:Serbo-Croatian and Macedonian|*]]
[[Category:Bosnian-Croatian-Montenegrin-Serbian and Macedonian|*]]
[[Category:Bosnian-Croatian-Montenegrin-Serbian]]
[[Category:Macedonian]]

Latest revision as of 05:34, 22 August 2017

Source[edit]

https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-sh-mk

Notes[edit]

Bosnian-Croatian-Montenegrin-Serbian morphological lexicon

The reflex of yat:

  • Added two additional modes to the monodix (ek/ijek) so that lemmas containing yat can be analysed both as ekavian and ijekavian
  • Updated the makefile and the xslt machinery so that this works

Verbs(Marked for aspect, transitivity and reflexivity)

  • Most of them are from a list extracted from the verbs in the mk monodix
  • Verb paradigm names are fashioned to contain information about the verb ("aspect_transitivity_paradig/m__vblex"), makes input more convenient

Adjectives:

  • marked for definiteness
  • entered as quadruples (positive, comparative, superlative, absolute superlative)

Other:

  • Added the paradigms from the grammar of Croatian (the one by Barić, Lončarić, Malić, Pavešić, Peti, Zečević, Znika) to the sh monodix
Macedonian dictionary
  • Added determiner forms for some pronouns (e.g demonstratives, possessives, etc.) -- things that can modify nouns
  • Added some words from closed categories (adverbs, ...)
Bilingual dictionary
  • Updated the pronoun entries, the symbols in the monodix have been adjusted to correspond more closely to the analysis in the macedonian monodix
Transfer rules

Transfer rules[edit]

Three stage transfer is used.

Some problems:

Genitive constructions, problem in distinguishing partitive from possesive:

  • Čaša vode = Чаша вода [partitive]
  • Pilotkinje Vojske Srbije == Пилотките на Српската војска [possesive]
  • TODO: After the lexicon in the analyser becomes sufficiently large grep out all "(Noun) (Noun + Genitive)" occurences in i.e. the wikipedia corpus, and find pairs which:
    • More frequently appear as partitive
    • More frequently appear as possesive
    • Can appear either way

and afterwards ignore the latter case, and make two categories of noun pairs.

Tricky sequences (instrumental adjectives):

  • "...upravljanje velikom, jakom 'pticom' " == "...управување со голема, силна 'птица' ".

Only one preposition needed for the whole chain, which can be of any length.

  • What we do here is just make long chunks in the t1x, e.g. in this case: ADJ ADJ CM ADJ NOM
    These sequences will probably be fairly infrequent though... - Francis Tyers 09:05, 20 July 2011 (UTC)

Change of gender in translation: ...vožnja<f> zrakoplovom bila je odlučujuća<f>... == ...возење<nt> со авион беше решавачко<f>. Potentialy too far to be matched.

Obligatory clitic with definite object

  • I saw the man == Го видов човекот

The clitics must preceed the finite verb, in this order:

  • subjunctive-negative-mood-aux-ethical dative-dative object-accusative object-verb
  • ...да не ќе сум си му го дал... == that I won't have given it to him...

Possesive genitive to на construcions:

  • Anina čaša == чашата на Ана (Ana's glass)
  • Anina ruka == раката на Ана (Ana's hand)
  • (possible TODO in the mk analyser: add an analysis for possesive adjectives, currently doesn't exist)

Preposition 's' when standing with genitive does not translate as 'со':

  • ...sam poletjela s avionske piste...==...полетав од авионската писта... and not ...со авионската писта..., which would have an instrumental meaning

Disambiguation rules[edit]

For all cases modifiers (adjectives, numbers, pronouns) transfer number, case and gender, so instances like these were used to disambiguate

  • ...sa njenim toplim nježnim rukama - A sequence in instrumental
  • ...sa svojih toplih nježnih ruku - A sequence in genitive

Nominative or accusative:

  • Select Accusative if there's a preceeding Accusative preposition
  • Select Accusative if there's a preceeding transitive verb (direct object rule)

Genitive or accusative:

  • Disambiguate based on prepositions (the intersection is only "u")
  • Select accusative if there's a preceeding transitive verb (direct object rule)

Dative or locative:

  • Select dative if there's no preceeding preposition or modifier in dative
  • Select dative if there's a preceeding dative preposition
  • Select locative if there's a preceeding locative preposition

Instrumental (unambiguous in singular):

  • Select instrumental if there's a preceeding instrumental preposition
  • In plural identical to Dative/Locative
    • (easily disambiguated from Locative, since the latter is entirely prepositional)
    • Possible problems in the plural with instrumental/non-instrumental, though not yet encountered
      • "Ljudima sam pomeo pod" - 'To the people' or 'using the people'
      • "Ploviti morima" - To sail 'to the seas' or 'across seas' (or 'using the seas')

Numbers:

  • Numbers 2-4 govern noun phrases differently (remnants of dual), two variants:
    • "s trima lijepim ženama" (with three women). The number and the rest of the phrase take the dual forms (Nom,Acc,Voc==Gen.Sg, Gen=>Gen.Pl (triju lijepih žena), Dat,Loc,Ins=>Ins.Pl (trima lijepim ženama) ). This variant is more literary.
    • "s tri lijepe žene" (the number is in a frozen form, and the rest of the phrase gets genitive, the actual meaning is determined from context or prepositions). This variant is closer to actual speech.

See also[edit]

TODOs
  • Set up corpus/generation-test

External links[edit]

Further reading[edit]

  • Ivan Todorović "Disambiguation of Serbian sentences with Unitex".

References[edit]