Difference between revisions of "Subreadings in Constraint Grammar"
Line 18: | Line 18: | ||
==What we need== |
==What we need== |
||
* We may need to refer to |
* We may need to refer to a non-main sub-reading in order to disambiguate |
||
* We may want to put a mapping tag on |
* We may want to put a mapping tag on a non-main sub-reading |
||
* And of course we want to be able to refer to the |
* And of course we want to be able to refer to the main sub-reading |
||
===Referring to the final sub-reading=== |
===Referring to the final sub-reading=== |
Revision as of 16:23, 15 August 2013
This is now implemented in vislcg3: http://beta.visl.sdu.dk/cg3/chunked/subreadings.html
Contents
Why we need sub-readings
Typical input with sub-readings:
^foobar/foo+bar/fubar/flue+barge$
Right now, only the last sub-reading is used, in the above example, vislcg3 treats it as if it were
^foobar/bar/fubar/barge$
This works great for compounds where the stuff before the + is mostly inconsequential, while for other multiword expressions it is not so good... (Also, mapping tags are only put on the last sub-reading now.)
- Wait can't we just split on the + with pretransfer before sending this to cg-proc?
- No, because we first have to disambiguate between eg. ^foobar/foo+bar/fubar/flue+barge$ (what would that even look like if split? wouldn't work)
What we need
- We may need to refer to a non-main sub-reading in order to disambiguate
- We may want to put a mapping tag on a non-main sub-reading
- And of course we want to be able to refer to the main sub-reading
Referring to the final sub-reading
Northern Sámi postpositions take genitive.
Input fragment:
^soahtefámu/soahti<N><Sg><Nom><Cmp>+fápmu<N><Sg><Acc>/soahti<N><Sg><Nom><Cmp>+fápmu<N><Sg><Gen>$ ^vuostá/vuostá<Po>/vuostá<Pr>/vuostá<N><Sg><Nom>$
Correct output:
^soahtefámu/soahti<N><Sg><Nom><Cmp>+fápmu<N><Sg><Gen><@→P>$ # war.power.GEN ^vuostá/vuostá<Po><@←ADVL>$^ # against.PO
If the input noun were unambiguously nominative, the Po reading should not be selected, so we might have a rule somewhere with
REMOVE Po if (-1 (Nom))
but if this matched non-final sub-readings, we would get the wrong tagging here. Currently, non-final sub-readings are ignored, so the sme-nob CG's work fine (as do the nn-nb ones for compounding there).
Referring to non-final sub-readings
Input:
^D'an/Da<pr>+an<det><def><sp>$ ^emgann/emgann<n><m><sg>$ ^ez/e<vpart><obj>/ael<n><m><pl>/mont<vblex><pri><p2><sg>/monet<vblex><pri><p2><sg>/e<pr>+da<det><pos><mf><sp>$ ^an/an<det><def><sp>/mont<vblex><pri><p1><sg>/monet<vblex><pri><p1><sg>$
Correct output:
^D'an/Da<pr><@ADVL→>+an<det><def><sp><@→N>$ # to.the ^emgann/emgann<n><m><sg><@P←>$ # battle ^ez/e<vpart><obj><@Pcle>$ # PART ^an/mont<vblex><pri><p1><sg><@+FMAINV>$ # I.go
- We want to refer to the <pr> sub-reading when mapping emgann as @P← (possibly also in disambiguation).
- We want to MAP an @ADVL→ tag on the <pr> sub-reading (also a @→N tag on the determiner). These sub-readings are split into two units by pretransfer.
VISL CG-3 syntax
One alternative is to keep as the default behaviour that we always refer to only the last sub-reading unless explicitly mentioning sub-readings. But for some languages, you might want to prefer the first sub-reading. VISL CG-3 caters to both preferences. From the manual:
The order of which is the primary reading vs. sub-readings depends on the grammar SUBREADINGS setting:
SUBREADINGS = RTL ; # Default, right-to-left SUBREADINGS = LTR ; # Alternate, left-to-right
Then, to refer to a non-final sub-reading in the default RTL mode, we could say
ADD (@ADV←) TARGET (n) IF (-2/1 (pr)) (-2 (n)) ;
to say that we require the next-to-final sub-reading of the cohort two positions left be a word that has the main reading n
and next sub-reading pr
. This would match if the input were e.g.
^forsooth/for<pr>+sooth<n>/forsooth<adv>$ ^he/prpers<prn>$ ^be/be<vblex>$
Since we only have two sub-readings here, we could also ask that the last sub-reading be pr
, with the same effect:
ADD (@ADV←) TARGET (n) IF (-2/-1 (pr)) (-2 (n)) ;
Parallell to regular CG word indexes, 0 is the "head". In RTL mode, this is the last sub-reading, while -1 is one sub-reading to the left of that. Positive numbers read from the left, so 1 is the first sub-reading from the left. For three sub-readings, that gives us the following indexing:
^foo<tags>+bar<tags>+fie<tags>$ 2 1 0 -1 -2 -3
For LTR mode, the left sub-reading is the head with index 0, and counts go the other way:
^foo<tags>+bar<tags>+fie<tags>$ 0 1 2 -3 -2 -1
To ADD the tag to the non-final sub-reading itself, use the SUB:N keyword after ADD:
ADD SUB:-1 (@→V) TARGET (pr) IF (*1 (v)) ;
Not implemented yet(?)
We might also want to say "require any main- or sub-reading to be tagged pr
":
ADD (@P←) TARGET (n) IF (-1/*0 (pr)) ;