Difference between revisions of "Subreadings in Constraint Grammar"

Revision as of 08:21, 16 May 2013

This is now implemented in vislcg3: http://beta.visl.sdu.dk/cg3/chunked/subreadings.html

Why we need sub-readings

Typical input with sub-readings:

^foobar/foo+bar/fubar/flue+barge$

Right now, only the last sub-reading is used, in the above example, vislcg3 treats it as if it were

^foobar/bar/fubar/barge$

This works great for compounds where the stuff before the + is mostly inconsequential, while for other multiword expressions it is not so good... (Also, mapping tags are only put on the last sub-reading now.)

Wait can't we just split on the + with pretransfer before sending this to cg-proc?

No, because we first have to disambiguate between eg. ^foobar/foo+bar/fubar/flue+barge$ (what would that even look like if split? wouldn't work)

What we need

We may need to refer to an earlier sub-reading in order to disambiguate
We may want to put a mapping tag on an earlier sub-reading
And of course we want to be able to refer to the last as in the current situation

Referring to the final sub-reading

Northern Sámi postpositions take genitive.

Input fragment:

^soahtefámu/soahti<N><Sg><Nom><Cmp>+fápmu<N><Sg><Acc>/soahti<N><Sg><Nom><Cmp>+fápmu<N><Sg><Gen>$ 
^vuostá/vuostá<Po>/vuostá<Pr>/vuostá<N><Sg><Nom>$

Correct output:

^soahtefámu/soahti<N><Sg><Nom><Cmp>+fápmu<N><Sg><Gen><@→P>$        # war.power.GEN
^vuostá/vuostá<Po><@←ADVL>$^                                       # against.PO

If the input noun were unambiguously nominative, the Po reading should not be selected, so we might have a rule somewhere with

REMOVE Po if (-1 (Nom))

but if this matched non-final sub-readings, we would get the wrong tagging here. Currently, non-final sub-readings are ignored, so the sme-nob CG's work fine (as do the nn-nb ones for compounding there).

Referring to non-final sub-readings

Input:

^D'an/Da<pr>+an<det><def><sp>$
^emgann/emgann<n><m><sg>$ 
^ez/e<vpart><obj>/ael<n><m><pl>/mont<vblex><pri><p2><sg>/monet<vblex><pri><p2><sg>/e<pr>+da<det><pos><mf><sp>$
^an/an<det><def><sp>/mont<vblex><pri><p1><sg>/monet<vblex><pri><p1><sg>$

Correct output:

^D'an/Da<pr><@ADVL→>+an<det><def><sp><@→N>$       # to.the
^emgann/emgann<n><m><sg><@P←>$                    # battle
^ez/e<vpart><obj><@Pcle>$                         # PART
^an/mont<vblex><pri><p1><sg><@+FMAINV>$           # I.go

We want to refer to the <pr> sub-reading when mapping emgann as @P← (possibly also in disambiguation).
We want to MAP an @ADVL→ tag on the <pr> sub-reading (also a @→N tag on the determiner). These sub-readings are split into two units by pretransfer.

VISL CG-3 syntax

One alternative is to keep as the default behaviour that we always refer to only the last sub-reading unless explicitly mentioning sub-readings. But for some languages, you might want to prefer the first sub-reading. VISL CG-3 caters to both preferences. From the manual:

The order of which is the primary reading vs. sub-readings depends on the grammar SUBREADINGS setting:

     SUBREADINGS = RTL ; # Default, right-to-left
     SUBREADINGS = LTR ; # Alternate, left-to-right

Then, to refer to a non-final sub-reading in the default RTL mode, we could say

 ADD (@ADV←) TARGET (n) IF (-2/1 (pr)) (-2 (n)) ;

to say that we require the next-to-final sub-reading of the cohort two positions left be a word that has the main reading n and next sub-reading pr. This would match if the input were e.g.

 ^forsooth/for<pr>+sooth<n>/forsooth<adv>$ ^he/prpers<prn>$ ^be/be<vblex>$

Since we only have two sub-readings here, we could also ask that the last sub-reading be pr, with the same effect:

 ADD (@ADV←) TARGET (n) IF (-2/-1 (pr)) (-2 (n)) ;

Parallell to regular CG word indexes, 0 is the "head". In RTL mode, this is the last sub-reading, while -1 is one sub-reading to the left of that. Positive numbers read from the left, so 1 is the first sub-reading from the left. For three sub-readings, that gives us the following indexing:

   ^foo<tags>+bar<tags>+fie<tags>$
      2        1         0
     -1       -2        -3

For LTR mode, the left sub-reading is the head with index 0, and counts go the other way:

   ^foo<tags>+bar<tags>+fie<tags>$
      0        1         2
     -3       -2        -1

To ADD the tag to the non-final sub-reading itself, use the SUB:N keyword after ADD:

 ADD SUB:-1 (@→V) TARGET (pr) IF (*1 (v)) ;

Not implemented yet(?)

We might also want to say "require any main- or sub-reading to be tagged pr":

 ADD (@P←) TARGET (n) IF (-1/*0 (pr)) ;

@@ Line 59: / Line 59: @@
 * We want to '''MAP''' an @ADVL→ tag on the &lt;pr&gt; sub-reading (also a @→N tag on the determiner). These sub-readings are split into two units by pretransfer.
-===Possible syntax===
+==VISL CG-3 syntax==
-One alternative is to keep as the default behaviour that we always refer to only the last sub-reading unless explicitly mentioning sub-readings.
+One alternative is to keep as the default behaviour that we always refer to only the last sub-reading unless explicitly mentioning sub-readings. But for some languages, you might want to prefer the first sub-reading. VISL CG-3 caters to both preferences. From the manual:
+''The order of which is the primary reading vs. sub-readings depends on the grammar SUBREADINGS setting:''
-Then, to '''refer''' to a non-final sub-reading, we could say
+      SUBREADINGS = RTL ; # Default, right-to-left
- MAP (@P←) TARGET (n) IF (-1/-1 (pr)) ;
+      SUBREADINGS = LTR ; # Alternate, left-to-right
-to say that we require the next-to-final sub-reading of the cohort to the left to be a preposition.
+Then, to '''refer''' to a non-final sub-reading in the default RTL mode, we could say
+  ADD (@ADV←) TARGET (n) IF (-2/1 (pr)) (-2 (n)) ;
-Parallell to regular CG word indexes, 0 is the "head" (the last sub-reading), while -1 is one sub-reading to the left of that. Positive numbers would read from the left, so 1 is the first sub-reading from the left. For three sub-readings, that gives us the following indexing:
+to say that we require the next-to-final sub-reading of the cohort two positions left be a word that has the main reading <code>n</code> and next sub-reading <code>pr</code>. This would match if the input were e.g.
-    ^sublem1<tags>+sublem2<tags>+sublem3<tags>$
-2              3
-      -2            -1              0
+  ^forsooth/for<pr>+sooth<n>/forsooth<adv>$ ^he/prpers<prn>$ ^be/be<vblex>$
+Since we only have two sub-readings here, we could also ask that the last sub-reading be <code>pr</code>, with the same effect:
+  ADD (@ADV←) TARGET (n) IF (-2/-1 (pr)) (-2 (n)) ;
-We might also want to say "require ''any'' main- or sub-reading to be tagged <code>pr</code>'':
- MAP (@P←) TARGET (n) IF (-1/*0 (pr)) ;
+Parallell to regular CG word indexes, 0 is the "head". In RTL mode, this is the last sub-reading, while -1 is one sub-reading to the left of that. Positive numbers read from the left, so 1 is the first sub-reading from the left. For three sub-readings, that gives us the following indexing:
+    ^foo<tags>+bar<tags>+fie<tags>$
+1         0
+      -1       -2        -3
-To '''MAP''' to a non-final sub-reading, we could then say
+For LTR mode, the left sub-reading is the head with index 0, and counts go the other way:
+    ^foo<tags>+bar<tags>+fie<tags>$
- MAP SUB:-1 (@ADVL→) TARGET (pr) IF (1* (n)) ;
+1         2
+      -3       -2        -1
+To ADD the tag to the non-final sub-reading itself, use the SUB:N keyword after ADD:
+  ADD SUB:-1 (@→V) TARGET (pr) IF (*1 (v)) ;
+===Not implemented yet(?)===
+We might also want to say "require ''any'' main- or sub-reading to be tagged <code>pr</code>":
+  ADD (@P←) TARGET (n) IF (-1/*0 (pr)) ;

Difference between revisions of "Subreadings in Constraint Grammar"

Revision as of 08:21, 16 May 2013

Contents

Why we need sub-readings

What we need

Referring to the final sub-reading

Referring to non-final sub-readings

VISL CG-3 syntax

Not implemented yet(?)

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools