Difference between revisions of "User talk:Popcorndude/Recursive Transfer"

From Apertium
Jump to navigation Jump to search
Line 19: Line 19:
 
== Linguistic/transfer phenomena ==
 
== Linguistic/transfer phenomena ==
   
; Serbo-Croatian clitics
+
=== Serbo-Croatian clitics ===
   
 
Serbo-Croatian closely observes Wackernagel's Law that clitics (unstressed functional words) are placed in the second position in all clauses. The first element may be a single word or a noun phrase: Taj je čovjek rekao, "That man (has) said", or Taj čovjek je rekao'.
 
Serbo-Croatian closely observes Wackernagel's Law that clitics (unstressed functional words) are placed in the second position in all clauses. The first element may be a single word or a noun phrase: Taj je čovjek rekao, "That man (has) said", or Taj čovjek je rekao'.
Line 28: Line 28:
 
</pre>
 
</pre>
   
; Object incorporation
+
=== Object incorporation ===
   
 
Yupik:
 
Yupik:
Line 53: Line 53:
 
</pre>
 
</pre>
   
; Constituent reordering
+
=== Constituent reordering ===
   
; NP-internal reordering
+
=== NP-internal reordering ===
   
; Optional NP-internal reordering
+
=== Optional NP-internal reordering ===
   
; Ambiguous rules
+
=== Ambiguous rules ===
   
 
<pre>
 
<pre>
Line 68: Line 68:
 
</pre>
 
</pre>
   
; Valency (order -> cases)
+
=== Valency (order -> cases) ===
   
; Valency (cases -> order)
+
=== Valency (cases -> order) ===
   
   
; Functional mismatch
+
=== Functional mismatch ===
   
 
Adverbials needing an extra morpheme to attributivise:
 
Adverbials needing an extra morpheme to attributivise:
Line 87: Line 87:
 
</pre>
 
</pre>
   
; Beheadening headless constructions
+
=== Beheadening headless constructions ===
   
 
<pre>
 
<pre>
Line 95: Line 95:
 
</pre>
 
</pre>
   
; Inferring focus from order / focus-dependent ordering
+
=== Inferring focus from order / focus-dependent ordering ===
   
; Part of speech mismatches
+
=== Part of speech mismatches ===
   
 
<pre>
 
<pre>
Line 107: Line 107:
 
soler-IMPF X -> used to X
 
soler-IMPF X -> used to X
 
</pre>
 
</pre>
  +
  +
=== Nominal versus complementised subordinate clauses ===
  +
(Kyrgyz)
  +
<pre>
  +
Мен Мураттын бараарын kördüm
  +
I Murat.GEN go-VN-POSS.3-ACC see-PAST-1SG
  +
</pre>
  +
  +
(Turkish)
  +
<pre>
  +
Ben gördüm ki Murat gidecek.
  +
I see-NPST-1SG that Murat go-FUT(-3)
  +
</pre>
  +
  +
=== Choosing correct verbal adjective in Turkish ===
  +
  +
=== English verb phrase nominalisation ===
  +
  +
* "I was interested in what she spoke about"
  +
* = "Ал эмне жөнүндө сүйлөшкөнүнө кызыгып жаттым."
   
 
== Implementation Ideas ==
 
== Implementation Ideas ==

Revision as of 02:28, 9 March 2019

General comments and things to look at

  • GLR
  • PCFGs


Reading list

Particular systems
MT linguistics

Linguistic/transfer phenomena

Serbo-Croatian clitics

Serbo-Croatian closely observes Wackernagel's Law that clitics (unstressed functional words) are placed in the second position in all clauses. The first element may be a single word or a noun phrase: Taj je čovjek rekao, "That man (has) said", or Taj čovjek je rekao'.

   Taj  je čovjek rekao.
   That is man    said.

Object incorporation

Yupik:

I am going to put crowberries in

->

pagunghalighnaqaqa

pagunghagh- -ligh-   -naqe-      -a-     -qa
crowberry   -put.in- -going.to-  TRN.IND  S1SG.O3PL

Chukchi:

“Cıkwaŋaqaj chased (after) the reindeer in the other encampment.”

-> 

Гаӄорапэнратԓэн Сыкваӈаӄай рэмкык
га-ӄора-пэнр-ат-ԓэн Сыкваӈаӄай рэмк-ык
PERF-reindeer-chase-PERF-S3SG Cıkwaŋaqaj folk-LOC

Constituent reordering

NP-internal reordering

Optional NP-internal reordering

Ambiguous rules


X de Y -> X Y             memoría de traducción -> translation memory 
       -> Y's X           hermana de mi vecina -> my neighbour's sister
       -> X of Y          constitución de 1812  -> constitution of 1812

Valency (order -> cases)

Valency (cases -> order)

Functional mismatch

Adverbials needing an extra morpheme to attributivise:

euskarazko     esaldiak -> phrases in Basque
euskara-z-ko   esaldi-ak
Basque-INS-ko  phrase-DET.PL 


arabadaki      çocuklar -> the children in the car
araba-da-ki    çocuk-lar 
car-LOC-ki     child-PL

Beheadening headless constructions

arabalardakileri         gördüm        -> I saw [the ones] in the cars
araba-lar-da-ki-ler-i    gör-dü-m
car-PL-LOC-[ki-PL-ACC]   see-PAST-SG1

Inferring focus from order / focus-dependent ordering

Part of speech mismatches

Juan suele leer mucho -> Juan usually reads a lot
Juan solía leer mucho -> Juan used to read a lot
Mis amigos suelen leer mucho -> My friends usually read a lot

soler-PRES X -> usually X
soler-IMPF X -> used to X

Nominal versus complementised subordinate clauses

(Kyrgyz)

Мен Мураттын  бараарын          kördüm
I   Murat.GEN go-VN-POSS.3-ACC  see-PAST-1SG

(Turkish)

Ben gördüm          ki   Murat gidecek.
I   see-NPST-1SG  that Murat go-FUT(-3)

Choosing correct verbal adjective in Turkish

English verb phrase nominalisation

  • "I was interested in what she spoke about"
  • = "Ал эмне жөнүндө сүйлөшкөнүнө кызыгып жаттым."

Implementation Ideas

Using Bison or something like it might be faster than writing a custom parser and it might also be one less source of error to have that component already exist. On the other hand, it would be really nice to allow rules to handle situations like

           S
          / \
         /   VP
        /    /\
       /    V  NP
      (N)      /\
       |      /  \
       |     /    \
       |   Adj  ^  N
       |________|

Here the subject is being stuck in the middle of another NP, which I'm really not sure how to deal with in Yacc (except maybe by manually reinserting the subject into the input stream when the object is parsed, but that seems like a bad idea). With writing a custom one, we could make it so that the Reduce operation can produce more than one node as output, so a rule for the above could be something like

NP.nom NP.acc -> adj.acc n.nom n.acc {2} {3 1};

or something more general like

NP.$case * -> adj.$case * n.$case {3 1(gender=3.gender)} {2};
# match an adjective and noun with the same case marking, separate by another word
# copy the gender marking from the noun to the adjective and output in N-Adj order
# then deal with the other word

Questions:

  • Should the parser generate a C file and compile like Bison does or should it just generate a rule table and load that from a file?
  • To what extent is it possible and desirable to put parts of this data in the monolingual repositories?
    • If this were possible to the fullest extent it would substantially decrease the total number of rules that need to be written since the Catalan rules could then be reused in every pair that includes Catalan.
    • This would probably require every language to be parsing to more or less the same abstract syntax tree.
    • In any event, there are probably lexical things that affect syntax and would have to be pair-specific

Popcorndude (talk) 18:03, 8 March 2019 (CET)

Recursive transfer talks about glue rules and I think the simplest way to implement that would probably be to not require that the input stream reduce to a single node. That is, an input like "det n det n" could reduce to "NP NP" and then just be output like that without it being a problem that it doesn't get to a root node.
There's also a mention of converting left-recursive grammars to right-recursive ones, and if that's just talking about rules like "X -> y X" then maybe it would make sense to have a notation for arbitrarily many of a term which is then compiled to a left-recursive rule.
Popcorndude (talk) 01:49, 9 March 2019 (CET)