Difference between revisions of "Bosnian-Croatian-Montenegrin-Serbian and Macedonian"
(→Notes) |
Firespeaker (talk | contribs) |
||
(7 intermediate revisions by one other user not shown) | |||
Line 9: | Line 9: | ||
==Notes== |
==Notes== |
||
;Bosnian-Croatian-Montenegrin-Serbian morphological lexicon |
|||
;Testing framework |
|||
⚫ | |||
;Serbo-Croatian dictionary |
|||
The reflex of yat: |
The reflex of yat: |
||
* Added two additional modes to the monodix (ek/ijek) so that lemmas containing yat can be analysed both as ekavian and ijekavian |
* Added two additional modes to the monodix (ek/ijek) so that lemmas containing yat can be analysed both as ekavian and ijekavian |
||
Line 46: | Line 43: | ||
* Čaša vode = Чаша вода [partitive] |
* Čaša vode = Чаша вода [partitive] |
||
* Pilotkinje Vojske Srbije == Пилотките на Српската војска [possesive] |
* Pilotkinje Vojske Srbije == Пилотките на Српската војска [possesive] |
||
* TODO: After the lexicon in the analyser becomes sufficiently large grep out all "(Noun) (Noun + Genitive)" occurences in i.e. the wikipedia corpus, and find pairs which: |
|||
** '''More frequently appear as partitive''' |
|||
** '''More frequently appear as possesive''' |
|||
** Can appear either way |
|||
and afterwards ignore the latter case, and make two categories of noun pairs. |
|||
Tricky sequences (instrumental adjectives): |
Tricky sequences (instrumental adjectives): |
||
* "...upravljanje '''velikom''', '''jakom''' ''''pticom'''' " == "...управување '''со''' '''голема''', '''силна''' ''''птица'''' ". |
* "...upravljanje '''velikom''', '''jakom''' ''''pticom'''' " == "...управување '''со''' '''голема''', '''силна''' ''''птица'''' ". |
||
Line 64: | Line 67: | ||
* Anina čaša == чашата на Ана (Ana's glass) |
* Anina čaša == чашата на Ана (Ana's glass) |
||
* Anina ruka == раката на Ана (Ana's hand) |
* Anina ruka == раката на Ана (Ana's hand) |
||
* (possible TODO in the mk analyser: add an analysis for possesive adjectives, currently doesn't exist) |
|||
Preposition 's' when standing with genitive does not translate as 'со': |
Preposition 's' when standing with genitive does not translate as 'со': |
||
* ...sam poletjela s avionske piste...==...полетав '''од''' авионската писта... and not ...'''со''' авионската писта..., which would have an instrumental meaning |
* ...sam poletjela s avionske piste...==...полетав '''од''' авионската писта... and not ...'''со''' авионската писта..., which would have an instrumental meaning |
||
Line 103: | Line 107: | ||
* [[/Regression tests|Regression tests]] |
* [[/Regression tests|Regression tests]] |
||
* [[/Final_report|Final report]] |
* [[/Final_report|Final report]] |
||
;TODOs |
|||
⚫ | |||
==External links== |
==External links== |
||
Line 120: | Line 127: | ||
* SEELRC [http://www.seelrc.org:8080/grammar/pdf/stand_alone_bcs.pdf Bosnian / Croatian / Serbian Reference Grammar] |
* SEELRC [http://www.seelrc.org:8080/grammar/pdf/stand_alone_bcs.pdf Bosnian / Croatian / Serbian Reference Grammar] |
||
* SEELRC [http://www.seelrc.org:8080/grammar/pdf/stand_alone_macedonian.pdf Macedonian Grammar] |
* SEELRC [http://www.seelrc.org:8080/grammar/pdf/stand_alone_macedonian.pdf Macedonian Grammar] |
||
[[Category: |
[[Category:Bosnian-Croatian-Montenegrin-Serbian and Macedonian|*]] |
||
[[Category:Bosnian-Croatian-Montenegrin-Serbian]] |
|||
[[Category:Macedonian]] |
Latest revision as of 05:34, 22 August 2017
Source[edit]
https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-sh-mk
Notes[edit]
- Bosnian-Croatian-Montenegrin-Serbian morphological lexicon
The reflex of yat:
- Added two additional modes to the monodix (ek/ijek) so that lemmas containing yat can be analysed both as ekavian and ijekavian
- Updated the makefile and the xslt machinery so that this works
Verbs(Marked for aspect, transitivity and reflexivity)
- Most of them are from a list extracted from the verbs in the mk monodix
- Verb paradigm names are fashioned to contain information about the verb ("aspect_transitivity_paradig/m__vblex"), makes input more convenient
Adjectives:
- marked for definiteness
- entered as quadruples (positive, comparative, superlative, absolute superlative)
Other:
- Added the paradigms from the grammar of Croatian (the one by Barić, Lončarić, Malić, Pavešić, Peti, Zečević, Znika) to the sh monodix
- Macedonian dictionary
- Added determiner forms for some pronouns (e.g demonstratives, possessives, etc.) -- things that can modify nouns
- Added some words from closed categories (adverbs, ...)
- Bilingual dictionary
- Updated the pronoun entries, the symbols in the monodix have been adjusted to correspond more closely to the analysis in the macedonian monodix
- Transfer rules
Transfer rules[edit]
Three stage transfer is used.
Some problems:
Genitive constructions, problem in distinguishing partitive from possesive:
- Čaša vode = Чаша вода [partitive]
- Pilotkinje Vojske Srbije == Пилотките на Српската војска [possesive]
- TODO: After the lexicon in the analyser becomes sufficiently large grep out all "(Noun) (Noun + Genitive)" occurences in i.e. the wikipedia corpus, and find pairs which:
- More frequently appear as partitive
- More frequently appear as possesive
- Can appear either way
and afterwards ignore the latter case, and make two categories of noun pairs.
Tricky sequences (instrumental adjectives):
- "...upravljanje velikom, jakom 'pticom' " == "...управување со голема, силна 'птица' ".
Only one preposition needed for the whole chain, which can be of any length.
- What we do here is just make long chunks in the t1x, e.g. in this case: ADJ ADJ CM ADJ NOM
- These sequences will probably be fairly infrequent though... - Francis Tyers 09:05, 20 July 2011 (UTC)
Change of gender in translation: ...vožnja<f> zrakoplovom bila je odlučujuća<f>... == ...возење<nt> со авион беше решавачко<f>. Potentialy too far to be matched.
Obligatory clitic with definite object
- I saw the man == Го видов човекот
The clitics must preceed the finite verb, in this order:
- subjunctive-negative-mood-aux-ethical dative-dative object-accusative object-verb
- ...да не ќе сум си му го дал... == that I won't have given it to him...
Possesive genitive to на construcions:
- Anina čaša == чашата на Ана (Ana's glass)
- Anina ruka == раката на Ана (Ana's hand)
- (possible TODO in the mk analyser: add an analysis for possesive adjectives, currently doesn't exist)
Preposition 's' when standing with genitive does not translate as 'со':
- ...sam poletjela s avionske piste...==...полетав од авионската писта... and not ...со авионската писта..., which would have an instrumental meaning
Disambiguation rules[edit]
For all cases modifiers (adjectives, numbers, pronouns) transfer number, case and gender, so instances like these were used to disambiguate
- ...sa njenim toplim nježnim rukama - A sequence in instrumental
- ...sa svojih toplih nježnih ruku - A sequence in genitive
Nominative or accusative:
- Select Accusative if there's a preceeding Accusative preposition
- Select Accusative if there's a preceeding transitive verb (direct object rule)
Genitive or accusative:
- Disambiguate based on prepositions (the intersection is only "u")
- Select accusative if there's a preceeding transitive verb (direct object rule)
Dative or locative:
- Select dative if there's no preceeding preposition or modifier in dative
- Select dative if there's a preceeding dative preposition
- Select locative if there's a preceeding locative preposition
Instrumental (unambiguous in singular):
- Select instrumental if there's a preceeding instrumental preposition
- In plural identical to Dative/Locative
- (easily disambiguated from Locative, since the latter is entirely prepositional)
- Possible problems in the plural with instrumental/non-instrumental, though not yet encountered
- "Ljudima sam pomeo pod" - 'To the people' or 'using the people'
- "Ploviti morima" - To sail 'to the seas' or 'across seas' (or 'using the seas')
Numbers:
- Numbers 2-4 govern noun phrases differently (remnants of dual), two variants:
- "s trima lijepim ženama" (with three women). The number and the rest of the phrase take the dual forms (Nom,Acc,Voc==Gen.Sg, Gen=>Gen.Pl (triju lijepih žena), Dat,Loc,Ins=>Ins.Pl (trima lijepim ženama) ). This variant is more literary.
- "s tri lijepe žene" (the number is in a frozen form, and the rest of the phrase gets genitive, the actual meaning is determined from context or prepositions). This variant is closer to actual speech.
See also[edit]
- TODOs
- Set up corpus/generation-test
External links[edit]
- Wikipedia: Differences in standard Bosnian, Croatian and Serbian
- Hrvatski jezični portal — Croatian language portal, word definitions with inflection (find the definition and click on izvedeni oblici )
- Macedonian<->Serbian online dictionary
- Word definitions for Macedonian
- Блаже Конески - Историја на македонскиот јазик (Blaže Koneski - History of the Macedonian Language; in Macedonian, cyrilic)
Further reading[edit]
- Ivan Todorović "Disambiguation of Serbian sentences with Unitex".