User talk:Popcorndude/Recursive Transfer
Contents
- 1 General comments and things to look at
- 2 Linguistic/transfer phenomena
- 2.1 Serbo-Croatian clitics
- 2.2 Object incorporation
- 2.3 Constituent reordering
- 2.4 NP-internal reordering
- 2.5 Optional NP-internal reordering
- 2.6 Ambiguous rules
- 2.7 Valency (order -> cases)
- 2.8 Valency (cases -> order)
- 2.9 Functional mismatch
- 2.10 Beheadening headless constructions
- 2.11 Inferring focus from order / focus-dependent ordering
- 2.12 Part of speech mismatches
- 2.13 Nominal versus complementised subordinate clauses
- 2.14 Adjectival versus complementised relative clauses
- 2.15 Choosing correct verbal adjective in Turkish
- 2.16 English verb phrase nominalisation
- 2.17 Complete rephrasings
- 2.18 multiple lexical units
- 2.19 Agreement from a different element
- 2.20 One word to three
- 2.21 V2 stuff
- 2.22 Kyrgyz conditionals
- 3 Implementation Ideas
General comments and things to look at
- GLR
- PCFGs
Reading list
- Particular systems
- http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.625.910&rep=rep1&type=pdf
- https://www.aclweb.org/anthology/W/W08/W08-0311.pdf
- http://www.cis.upenn.edu/~xtag/koreantag/nasr-et-al-1997.ps
- MT linguistics
- http://olst.ling.umontreal.ca/pdf/MelcukWanner2006.pdf
- http://users.umiacs.umd.edu/~bonnie/Publications/Attic/Dorr1994g.pdf
Linguistic/transfer phenomena
Serbo-Croatian clitics
Serbo-Croatian closely observes Wackernagel's Law that clitics (unstressed functional words) are placed in the second position in all clauses. The first element may be a single word or a noun phrase: Taj je čovjek rekao, "That man (has) said", or Taj čovjek je rekao'.
Taj je čovjek rekao. That is man said.
Object incorporation
Yupik:
I am going to put crowberries in -> pagunghalighnaqaqa pagunghagh- -ligh- -naqe- -a- -qa crowberry -put.in- -going.to- TRN.IND S1SG.O3PL
Chukchi:
“Cıkwaŋaqaj chased (after) the reindeer in the other encampment.” -> Гаӄорапэнратԓэн Сыкваӈаӄай рэмкык га-ӄора-пэнр-ат-ԓэн Сыкваӈаӄай рэмк-ык PERF-reindeer-chase-PERF-S3SG Cıkwaŋaqaj folk-LOC
Constituent reordering
NP-internal reordering
Optional NP-internal reordering
Ambiguous rules
X de Y -> X Y memoría de traducción -> translation memory -> Y's X hermana de mi vecina -> my neighbour's sister -> X of Y constitución de 1812 -> constitution of 1812
Valency (order -> cases)
Valency (cases -> order)
Functional mismatch
Adverbials needing an extra morpheme to attributivise:
euskarazko esaldiak -> phrases in Basque euskara-z-ko esaldi-ak Basque-INS-ko phrase-DET.PL arabadaki çocuklar -> the children in the car araba-da-ki çocuk-lar car-LOC-ki child-PL
Beheadening headless constructions
arabalardakileri gördüm -> I saw [the ones] in the cars araba-lar-da-ki-ler-i gör-dü-m car-PL-LOC-[ki-PL-ACC] see-PAST-SG1
Inferring focus from order / focus-dependent ordering
Part of speech mismatches
Juan suele leer mucho -> Juan usually reads a lot Juan solía leer mucho -> Juan used to read a lot Mis amigos suelen leer mucho -> My friends usually read a lot soler-PRES X -> usually X soler-IMPF X -> used to X
Nominal versus complementised subordinate clauses
(Kyrgyz)
Мен Мураттын бараарын көрдүм I Murat.GEN go-VN-POSS.3-ACC see-PAST-1SG
(Turkish)
Ben gördüm ki Murat gidecek. I see-NPST-1SG that Murat go-FUT(-3)
Adjectival versus complementised relative clauses
(Kyrgyz)
мен кечээ ага берген белек I yesterday 3-DAT give-VAdj gift
the gift that I gave him/her yesterday
Choosing correct verbal adjective in Turkish
Turkish has a choice between two past/perfect verbal adjectives depending on whether the subject or an internal argument (or adjunct?) is being extracted from the verb phrase.
-(j)An for subject extraction:
kedi gören kız cat see-VADJ girl "the girl that saw a cat"
kız gören kedi girl see-VADJ cat "the cat that saw a girl"
-DIK+POSS for extraction of other arguments:
kedinin gördüğü kız cat-GEN see-VADJ girl "the girl that a cat saw"
kızın gördüğü kedi girl-GEN see-VADJ cat "the cat that a girl saw"
Most other Turkic languages only have one option for both, e.g. Kyrgyz:
-GAn:
мышык көргөн кыз cat see-VADJ girl "the girl that saw a cat" OR "the girl that the cat saw"
кыз көргөн мышык girl see-VADJ cat "the cat that saw a girl" OR "the cat that a girl saw"
Note that all nouns used in these examples are in nominative / bare (=indefinite) accusative case, which is what leads to the ambiguity in Kyrgyz. If the first noun in both Kyrgyz examples were marked as accusative, then they would be unambiguous examples of subject extraction, with an object parsed as definite (as opposed to the current indefinite).
How do we transfer between these in Turkic?
How do we transfer either of these to/from English?
English verb phrase nominalisation
- "I was interested in what she spoke about"
- = "Ал эмне жөнүндө сүйлөшкөнүнө кызыгып жаттым."
Complete rephrasings
Мугалим студенттердин баштарын айлантты. teacher.NOM student-PL-GEN head-PL-POSS.3-ACC spin-CAUS-PAST.3 "The teacher confused the students." / "The teacher made the students confused." literally: "The teacher made the students' heads spin."
A simpler example
Менин башым айланып жатат. my head-POSS.1SG.NOM spin-INF PROG-NPST-3. "I'm confused." literally: "My head is spinning."
There are a number of mismatches between various Turkic languages and English just like this. In this one, maybe a direct translation is okay, but in others it is less good: "one's feelings drop" = "feel down"; "take to one's neck" = "accept/recognise/admit".
In all of these examples, the verb is 3rd person in Turkic, and whatever person in English. The person of the possessed noun in Turkic corresponds to the person of the subject in English.
multiple lexical units
mlu
s need to be considered especially regarding destination of translation, since they'll get treated as two separate words by the time they get to transfer if in source.
Мен мугалиммин ^Мен/Мен<prn><pers><p1><sg><nom>$ ^мугалиммин/мугалим<n><nom>+э<cop><aor><p1><sg>$ "I am a teacher."
Agreement from a different element
The desiderative ("I want to [VERB]") is formed in some Turkic languages with a verbal noun (or similar) with possessive(-like) morphology as the subject of either a main verb / auxiliary or an adjective+copula. In transfer to a language like English, Spanish, French, or Turkish, the person of this possessed form needs to be marked in the main verb's agreement, along with tense and the like from the verbs at the end.
Yangi kitabimni senga ko'rsatkim bor. new<adj> book<n><px1sg><acc> you<prn><pers><p2><sg><dat> show<v><tv><vn><p1><sg> existing<adj>+COP<cop><npst><p3><sg>. Яна кетабымны сиңа күрсәтәсем килә. new<adj> book<n><px1sg><acc> you<prn><pers><p2><sg><dat> show<v><tv><vn><p1><sg> arrive<vaux><npst><p3><sg>. Жаңа кітабымды саған көрсеткім келіп жатыр. Жаңы китебимди сага көрсөткүм келип атат. new<adj> book<n><px1sg><acc> you<prn><pers><p2><sg><dat> show<v><tv><vn><p1><sg> arrive<vaux><inf> PROG<vaux><npst><p3><sg>.
Translating to or from the following:
"I want to show you my new book." "Yo quiero mostrarte mi libro nuevo." "Je veux te montrer mon livre nouveau." Yeni kitabımı sana göstermek istiyorum. new<adj> book<n><px1sg><acc> you<prn><pers><p2><sg><dat> show<v><tv><inf> want<v><prog><npst><p1><sg>.
Past tense in Uzbek:
Yangi kitabimni senga ko'rsatkim bor edi. new<adj> book<n><px1sg><acc> you<prn><pers><p2><sg><dat> show<v><tv><vn><p1><sg> existing<adj> COP<cop><pst><p3><sg>. "I wanted to show you my new book."
One word to three
E.g., "мындай" to "this type of"
Мындай китеп окудуңбу? this.type.of book read<v><tv><ifi><p2><sg>+бы<qst> "Have you read this type of book?"
V2 stuff
E.g., Yiddish
Similar to #Serbo-Croatian clitics or the Russian thing.
Kyrgyz conditionals
See [1]. Determining which modal verb to insert based on tags at the other end of the sentence. Rough solution: [2]
Implementation Ideas
Using Bison or something like it might be faster than writing a custom parser and it might also be one less source of error to have that component already exist. On the other hand, it would be really nice to allow rules to handle situations like
S / \ / VP / /\ / V NP (N) /\ | / \ | / \ | Adj ^ N |________|
Here the subject is being stuck in the middle of another NP, which I'm really not sure how to deal with in Yacc (except maybe by manually reinserting the subject into the input stream when the object is parsed, but that seems like a bad idea). With writing a custom one, we could make it so that the Reduce operation can produce more than one node as output, so a rule for the above could be something like
NP.nom NP.acc -> adj.acc n.nom n.acc {2} {3 1};
or something more general like
NP.$case * -> adj.$case * n.$case {3 1(gender=3.gender)} {2}; # match an adjective and noun with the same case marking, separate by another word # copy the gender marking from the noun to the adjective and output in N-Adj order # then deal with the other word
Questions:
- Should the parser generate a C file and compile like Bison does or should it just generate a rule table and load that from a file?
- To what extent is it possible and desirable to put parts of this data in the monolingual repositories?
- If this were possible to the fullest extent it would substantially decrease the total number of rules that need to be written since the Catalan rules could then be reused in every pair that includes Catalan.
- This would probably require every language to be parsing to more or less the same abstract syntax tree.
- In any event, there are probably lexical things that affect syntax and would have to be pair-specific
Popcorndude (talk) 18:03, 8 March 2019 (CET)
- Recursive transfer talks about glue rules and I think the simplest way to implement that would probably be to not require that the input stream reduce to a single node. That is, an input like "det n det n" could reduce to "NP NP" and then just be output like that without it being a problem that it doesn't get to a root node.
- There's also a mention of converting left-recursive grammars to right-recursive ones, and if that's just talking about rules like "X -> y X" then maybe it would make sense to have a notation for arbitrarily many of a term which is then compiled to a left-recursive rule.
- Popcorndude (talk) 01:49, 9 March 2019 (CET)