Difference between revisions of "Dependency parsing for Turkic"
Firespeaker (talk | contribs) (→A potential option for Turkic: кыргызча жазуусу менен, казакчаныкы эмес...) |
|||
Line 540: | Line 540: | ||
For example, take the following sentence: |
For example, take the following sentence: |
||
Кечээ мугалим студентке китеп(ти) берген. |
|||
yesterday teacher student-to book(the) gave. |
yesterday teacher student-to book(the) gave. |
||
Using this test, you can create the following sentences, marked for grammaticality in spoken [not literary] Kyrgyz. (''Note that the grammaticality marking refers to the whole utterance, independent of other pragmatics, and not to either of the individual sentences, whether in combination with the other or alone, or in some other context.'') |
Using this test, you can create the following sentences, marked for grammaticality in spoken [not literary] Kyrgyz. (''Note that the grammaticality marking refers to the whole utterance, independent of other pragmatics, and not to either of the individual sentences, whether in combination with the other or alone, or in some other context.'') |
||
Мугалим студентке китеп(ти) берген. |
Мугалим студентке китеп(ти) берген. Кечээ берген. |
||
teacher student-to book gave. yesterday gave. |
teacher student-to book gave. yesterday gave. |
||
* |
*Кечээ студентке китеп(ти) берген. Мугалим берген. |
||
yesterday student-to book(the) gave. Teacher gave. |
yesterday student-to book(the) gave. Teacher gave. |
||
Кечээ мугалим китеп(ти) берген. Студентке берген. |
|||
yesterday teacher book(the) gave. student-to gave. |
yesterday teacher book(the) gave. student-to gave. |
||
?* |
?*Кечээ мугалим студентке берген. Китеп(ти) берген. |
||
yesterday teacher student-to gave. book(the) gave. |
yesterday teacher student-to gave. book(the) gave. |
||
The ungrammatical sentences show that мугалим and китеп are arguments and must be included with the original sentence. The grammaticality of moving |
The ungrammatical sentences show that мугалим and китеп are arguments and must be included with the original sentence. The grammaticality of moving кечээ and студентке out of the original sentence shows that they are probably adjuncts. |
||
==== One approach: oblique if not core ==== |
==== One approach: oblique if not core ==== |
Latest revision as of 01:32, 6 March 2024
Introduction[edit]
For the first version we intend as shallow an analysis as the standard allows. E.g. different kinds of nmod
(possession, adverbials, etc. will not be distinguished). For later versions we intend to deepen the analysis.
acl
: Clausal modifier of a noun[edit]
acl
stands for finite and non-finite clauses that modify a nominal. The acl
relation contrasts with the advcl
relation, which is used for adverbial clauses that modify a predicate. The head of the acl
relation is the noun that is modified, and the dependent is the head of the clause that modifies the noun.
In Turkic, acl
will often be used for (relative) clauses headed by verbal adjectives (gpr_
).
___acl___ | | Үйге жүгіретін адам мені шошытты. to.home run.GPR man me.ACC startled. "The man running home startled me." / "The man who is running home startled me."
Gerunds in indefinite genitive[edit]
We also use acl
for gerunds that modify nouns.
______acl_______ | | Әр адамның бейбіт жиналыстар және ассоциацияларды құру бостандығына құқығы бар. every man.GEN peaceful assemblies and association.PL.ACC build.GER.GEN freedom.SG3.DAT right.SG3 existing. "Every person has the right to freedom of [building associations] and [peaceful assemblies]."
Conditionals with болса[edit]
Secondary predication[edit]
This relation is also used for optional depictives. The adjective is taken to modify the nominal of which it provides a secondary predication. See xcomp
for further discussion of resultatives and depictives.
advcl
: adverbial clause modifier[edit]
Adverbial clause modifiers (advcl
) are subordinate clauses that are not complements. Also non-complement infinitival or temporal clauses and non-complement participles modifying verbs are marked as advcl
.
In Turkic, verbal adverbs (gna_
) will take this label if they modify a main verb.
________________________advcl_____________________________ | | Ном номчааш, ол кижиниң чуртталгазын шуптузун билип алдым. book.ACC read.GNA.PAST, that person.GEN life.3.ACC all.3.ACC know.PRC.PERF make.PAST.1SG ________________________advcl__________________________________ | | Китапны укыгач, ул кешенең тормышы турында барысын да белдем. book.ACC read.GNA.PAST, that person.GEN life.3.NOM about all.3.ACC know.PAST.1SG "Having read the book, I found out everything about that person's life."
Note that unless there is a separate subject for the "subordinate" clause, the subject will be the same as for the main clause, but is not directly connected.
Comparison[edit]
We also use advcl
for the comparator in comparison constructions like "X is bigger than Y", in Turkic, the "than Y" is in the ablative case and this depends on the adjective X.
advmod
: adverb modifier[edit]
The dependency type advmod is used for adverb modifiers of verbs, nominals and adverbs alike.
_advmod_ | | Света келзе, мени удавас чедип келир деп дамчыдыңар. Sveta come.COND, I.ACC soon reach.PRC.PERF come.PRC.AOR COMP tell.IMP.PL __advmod_ | | Света килсә, <name of the speaker> тиздән кайтып җитә деп әйтегез. Sveta come.COND, soon return.PRES.3sg. COMP tell.IMP.PL "If Sveta comes, tell her I'll return soon."
amod
: adjectival modifier[edit]
Nouns may take adjectival modifiers, which are marked with the dependency type amod
. It is also possible for an adjective to take another adjective as a modifier. (These adjectival modifiers are generally expressed with -ly adverbs in English.)
_____amod_____ | | Мергенде солун номнар бар. Mergen.LOC interesting book.PL existing. _____amod_____ | | Мәргәндә кызыклы китаплар бар. Mergen.LOC interesting book.PL existing. "Mergen has some interesting books."
The label amod
is also used for ordinal numbers, which when rendered in digits may not be overtly marked.
___amod__ ___nmod_______ | | | | 1968 жылдан бастап Ширазда театр фестивалы өткізіліп тұрды. 1968th year.ABL starting Shiraz.LOC theatre festival.3sg take.place.CAUS AUX.
It is also used for locative nouns in -DAGI.
(Note: This is a provisional classification, pending discussion.)
*appos
: appositional modifier[edit]
An appositional modifier of a noun is a nominal immediately following the first noun that serves to define or modify that noun. It includes examples in parentheses, as well as defining abbreviations in one of these structures.
*aux
: auxiliary[edit]
An auxiliary of a clause is a non-main verb of the clause, e.g. one of тур-, кел-, ал- etc. The main verb in the case of auxiliary use is the participle (prc_
).
auxpass
: unused[edit]
case
: case[edit]
The dependency type case is used for the postposition in postpositional phrases. The head of an postpositional phrase is the nominal, not the postposition, so as to analyse postpositional phrases similarly to nominal modifiers without a postposition (e.g. when using local "cases") To the same end, the type case is used in combination with the type nmod
, which is also used for nominal modifiers when no adposition is present (see nmod).
Note that case
is not used with auxiliary nouns (sometimes called "postpositions") in the form of N¹.gen N².poss.case, for those nmod
should be used (following treatment in English of prepositional constructions like "in front of").
_case__ | | Meн кадайым-биле киноже чорук баар мен. I wife-with cinema.ALL go AUX.AOR.
______case________ | | Бис бүгү чүвени сээң чугаалааның ёзугаар кылган бис We all thing.ACC you.GEN say.GER.2SG.NOM according.to do.PAST.2PL
cc
: coordinating conjunction[edit]
For more on coordination, see the conj relation. A cc is the relation between the first conjunct and the coordinating conjunction delimiting another conjunct. (Note: different dependency grammars have different treatments of coordination. We take the last conjunct as the head of the coordination.)
___________________________conj_______ | ____________cc__________________ | | | _______conj_____ | | | | | __cc__ | | | | | | | | | | | Барлық адамдар тумысынан азат және қадір-қасиеті мен құқықтары тең болып дүниеге келеді All people free and dignity and rights equal being world.DAT come.PAST "All people are born free in dignity and rights."
*ccomp
: clausal complement[edit]
Non-finite complements (with acc
)[edit]
_____subj__ _______ccomp______ ROOT | | | | | Кейінірек ФИФА ротация принципі өзгеретінін жариялады Later FIFA rotation principle change.IMPF.ACC declare.PAST "Later FIFA declared that the rotation principle was changing."
Reported speech (with де-)[edit]
________conj______ _____ccomp________ | || | «Төрге шық, тамақ іш », - демепті. place.DAT go.IMP food.NOM drink.IMP say.NEG.IFI.EVID.3SG "go back to the tör and eat!" they did not say."
cmpnd
: compound[edit]
cmpnd
is used for noun compounds. Nouns should modify the next noun in the compound in order to respect the branching structure.
Most uses of attr
will be tagged with cmpnd
:
Nouns in the izafet construction (e.g. possessive on the final noun) should not get the cmpnd
tag.
__cmpnd_ ___cmpnd___ | | | | Мартан-оол март айдан сентябрь айга чедир Кызылга чурттап турган . Martan-ool March month.ABL September month.DAT until Kyzyl.DAT live.PRC.PERF sit.PAST Мартан-оол март аеннан сентябрь аена кадәр Кызылда яшәгән . Martan-ool March month.3.ABL September month.3.DAT until Kyzyl.LOC live.PAST.3SG (Note 3 person possessives, hence no cmpnd labels). "Martan-ool was living in Kyzyl from March until September"
The cmpnd
label should also be used for strings of numerals:
conj
: conjunct[edit]
A conjunct is the relation between two elements connected by a coordinating conjunction, such as and, or, etc. We treat conjunctions asymmetrically: The head of the relation is the last conjunct and all the other conjuncts depend on it via the conj relation. Note that this differs from the UD practice of putting the head as the first conjunct. See here for a discussion on this.
___________________________conj_______ | ____________cc__________________ | | | _______conj_____ | | | | | __cc__ | | | | | | | | | | | Барлық адамдар тумысынан азат және қадір-қасиеті мен құқықтары тең болып дүниеге келеді All people free and dignity and rights equal being world.DAT come.PAST "All people are born free in dignity and rights."
Warning: If two sentences are joined with a comma and there is no relation between them, the relation should be parataxis
.
cop
: copula[edit]
A copula is the relation between the complement of a copular verb and the copular verb to be (only). (We normally take a copula as a dependent of its complement.)
The copula be is not treated as the head of a clause, but rather the dependent of a lexical predicate.
In Turkic the copula is either бол or э. Third person copula forms in the present tense are not shown in the surface forms, but may be included by morphological analysers. In the following, ·
denotes a contraction boundary which is not present in the orthography):
- Aorist copula (-Ø suffix)
ROOT | _subj_ | | | | Меңээ ном херек I.DAT book necessary ROOT | __subj_ | cop | | | | | Меңээ ном херек·ø I.DAT book necessary·is "I need a book"
- Existentials with "бар" and "чок"
cop | | Бо бажыңда он үш квартира бар·ø This house.LOC ten three flat existing.are
- Aorist copula (with personal suffix)
_cop_ | | Кызылга ынак мен Kyzyl.DAT favourite.am "Kyzyl is my favourite"
- Aorist evidential copula (-DIr suffix)
ROOT | _______cop_____ | | | __det__ __nmod_ | | ____subj___ | | | | | | | | | | Бо институттуң директору Мерген·дир This institute.GEN director.3SG Mergen·is
- Freestanding copula with "бол"
| Мээң аас-кежик чогумдан шупту чүве болган . I.GEN happiness not.1SG.ABL all thing was "all of my troubles were due to the fact that I have no joy."
- Use of "бол" without predicate.
ROOT | Эрте заманда Эрназар деген киши болуптур.
- Subjectless use of "бол"
__ccomp_ ROOT | | | шылым шегуге болмайды .
csubj
: unused[edit]
csubjpass
: unused[edit]
*x
(dep
): unspecified dependency[edit]
det
: determiner[edit]
The relation determiner (det
) holds between a nominal head and its determiner. Most commonly, a word of POS det
will have the relation det
and vice versa.
__det__ | | Баяғыда біреу той жасапты , тойға көп кісі жиналыпты , Қожа да келіпті . Long.ago someone feast make.PAST , feast.DAT a.lot people get.together.PAST , Koža also come.PAST.EVID "A long time ago someone had a feast, a lot of people came to the feast, and Koža also came."
disc
: discourse element[edit]
This is used for interjections and other discourse words and elements (which are not clearly linked to the structure of the sentence, except in an expressive way).
The disc
label is used for clitic words, including the question word (ма-, ба-, etc.).
disl
: unused[edit]
*obj
: direct object[edit]
The direct object of a verb is the noun phrase that denotes the entity acted upon.
In Turkic languages the direct object will be marked with either the acc
(if definite) or nom
(if indefinite) cases.
expl
: unused[edit]
*barb
(foreign
): foreign words[edit]
goeswith
: unused[edit]
*arg
(iobj
): argument which is not the direct object[edit]
*list
: list[edit]
*mark
: marker[edit]
mwe
: unused[edit]
name
: name[edit]
Multiword named entities are marked as name (Владимир Карбый-оолович Чооду): the last element (Чооду) is the head, and all the other elements are attached to the one to its right with the relation name.
___________________x____________________ | | | ____name__ ___name__ | | | | | | | Культура бажыңының директору Роберт Адар-оолович Аракчаа. Culture house.3SG.GEN director.3SG Robert Adar-oolovič Arakčaa. "The director of the cultural centre is Robert Adar-oolovič Arakčaa."
neg
: unused[edit]
*nmod
: nominal modifier[edit]
nmod is a noun (or noun phrase) functioning as a non-core (oblique) argument or adjunct. This means that it functionally corresponds to an adverbial when it attaches to a verb, adjective or other adverb. But when attaching to a noun, it corresponds to an attribute, or genitive complement (the terms are less standardized here).
*subj
(nsubj
): (nominal) subject[edit]
nsubjpass
: unused[edit]
*nummod
: numeric modifier[edit]
A numeric modifier of a noun is any number phrase that serves to modify the meaning of the noun with a quantity.
_nummod_ | | Бразилия өз жерінде чемпионатты екі рет өткізген бесінші ел болды (Мексика, Италия, Франция және Германиядан кейін). Brazil self land.3sg.LOC championship.ACC two time fifth country was (Mexico, Italy, France and Germany.ABL after)
Ordinals should not get this relation (see amod
).
*parataxis
: parataxis[edit]
Side-by-side sentences[edit]
When two sentences share no relation but are written together in a single sentence (delimited by comma or semicolon or something) then we use the relation parataxis
.
_________________________________________parataxis_______________________________________ | | Футболдан әлем чемпионаты 2014 — ФИФА-ның 20-шы футболдан әлем чемпионаты, финалдық кезеңі 2014 жылдың 12 маусым мен 13 шілде күндері аралығында Бразилияда өтті.
*punct
: punctuation[edit]
*relcl
: Relative clause modifier[edit]
remnant
: remnant[edit]
The remnant relation is used to provide a satisfactory treatment of ellipsis (in the case of gapping and stripping, where a predicational or verbal head gets elided) without having to postulate empty nodes in the basic representation.
UD adopts an analysis that notes that in ellipsis a remnant corresponds to a correlate in a preceding clause. The remnant relation connects each remnant to its correlate in the basic dependency representation. This is then a sufficient representation to reconstruct the predicate-argument structure in the enhanced representation.
_________remnant____________ | | | _________________|___________remnant___________________________ | | | | | | | | Ашылу матчы Сан-Паулуда, ал финалы Рио-де-Жанейродағы Маракана стадионында орын алды. Opening match San-Paulo.LOC, and final.3Sg Rio-de-Janeiro.LOC.ATTR Marcana stadium.LOC place.take. "The opening match took place in San Paulo and the final match took place in Rio de Janeiro's Marcana stadium."
*reparandum
: overridden disfluency[edit]
*root
: root[edit]
*vocative
: vocative[edit]
*xcomp
: open clausal complement[edit]
An open clausal complement (xcomp) of a verb or an adjective is a predicative or clausal complement that cannot have its own subject. The reference of the subject is necessarily determined by an argument external to the xcomp (normally by the object of the next higher clause, if there is one, or else by the subject of the next higher clause). This is often referred to as obligatory control. These complements are always non-finite, and they are complements (arguments of the higher verb or adjective) rather than adjuncts/modifiers, such as a purpose clause. The name xcomp is borrowed from Lexical-Functional Grammar.
_____obj_____ _____xcomp____ | | | | Әлі де болса Азаматты табуға әрекет·етіп жүр . Azamat.ACC find.GER.DAT trying AUX. "... trying to find Azamat."
Particular questions[edit]
conj
vs. parataxis
vs. remnant
[edit]
conj
:- if there is an explicit coordinator (жана, ал, биракъ, мен, etc.) then use the
conj
relation.
- if there is an explicit coordinator (жана, ал, биракъ, мен, etc.) then use the
parataxis
:- relation between a word and other elements, such as a sentential parenthetical or a clause after a “:” or a “;”, placed side by side without any explicit coordination, subordination, or argument relation with the head word. Parataxis is a discourse-like equivalent of coordination.
- used for a pair of what could have been standalone sentences, but which are being treated together as a single sentence. They may be joined by punctuation such as a colon or comma, or not delimited by punctuation at all.
- used for reported speech in the structure "xxx yyy" деп ... The "xxx yyy" is in parataxis with деп.
- used for news article bylines "London (BBC)"
- clause interjections.
remnant
:
Testing for argument status[edit]
In Turkic languages, traditional tests for whether a constituent is the argument of a verb or the adjunct of a verb(/predicate) don't work well. Knowing whether something is an argument or an adjunct ("is required or not") is crucial in dependency grammars, since it determines whether a constituent gets e.g. an obj
or nmod
label. This page describes an alternative method that may perform better.
Why traditional tests fail[edit]
One traditional test of whether a constituent is an argument or an adjunct is the "do-so" test (*I ate apples and oranges, and Bill did so too apples and oranges.). Turkic and structurally similar languages, like Ewen, don't have this structure, so this test doesn't work in Turkic.
Another traditional test of whether a constituent is an argument or adjunct is the grammaticality of the sentence when the constituent is left out (*Jill really likes). This doesn't work in Turkic either, since any argument ("no exceptions") can be left out if already present [in the right way] in discourse. While it will sound like information is missing when predicates are used on their own with none of their arguments, contexts can usually be thought of where most of these sound perfectly grammatical.
A potential option for Turkic[edit]
One test that seems to show something approaching argument status is an "additional information"-as-a-new-sentence test. In this test, you leave out one of the arguments in the original sentence, and provide it in a second sentence together with a repeated copy of the verb.
For example, take the following sentence:
Кечээ мугалим студентке китеп(ти) берген. yesterday teacher student-to book(the) gave.
Using this test, you can create the following sentences, marked for grammaticality in spoken [not literary] Kyrgyz. (Note that the grammaticality marking refers to the whole utterance, independent of other pragmatics, and not to either of the individual sentences, whether in combination with the other or alone, or in some other context.)
Мугалим студентке китеп(ти) берген. Кечээ берген. teacher student-to book gave. yesterday gave.
*Кечээ студентке китеп(ти) берген. Мугалим берген. yesterday student-to book(the) gave. Teacher gave.
Кечээ мугалим китеп(ти) берген. Студентке берген. yesterday teacher book(the) gave. student-to gave.
?*Кечээ мугалим студентке берген. Китеп(ти) берген. yesterday teacher student-to gave. book(the) gave.
The ungrammatical sentences show that мугалим and китеп are arguments and must be included with the original sentence. The grammaticality of moving кечээ and студентке out of the original sentence shows that they are probably adjuncts.
One approach: oblique if not core[edit]
This approach lists some tests for core arguments, and if a given noun doesn't match one of the tests, then it's considered oblique (non-core).
Tests for core include:
- Is it a nominative subject?
- Justification: nominative subjects agree in person/number with the verb, whether overt or not.
- Is it an accusative object?
- Justification: accusative objects can be promoted to nominative subjects when the verb is passivised.
- Is it genitive subject in a subordinated sentence?
- Actual test:
- Justification:
- Is it a demoted dative/accusative/ablative subject?
- Actual test: can a sentence be created in which the noun is in nominative, the verb has one (or two) fewer causative morphemes, and the relationship between the verb and this noun preserved?