Difference between revisions of "Subreadings in Constraint Grammar"

From Apertium
Jump to navigation Jump to search
(better Possible syntax)
(according to TD, SETs should be agnostic to that sort of thing)
Line 61: Line 61:
Then, to '''refer''' to a non-final sub-reading, we could say
Then, to '''refer''' to a non-final sub-reading, we could say


MAP (@P←) TARGET (n) IF (-1 (pr) + {-1}) ;
MAP (@P←) TARGET (n) IF (-1/-1 (pr)) ;


to say that we require the next-to-final sub-reading of the cohort to the left to be a preposition. What we're doing here is intersecting the (pr) set with a special set that constrains it to be to a sub-reading (the syntax of that set could of course be something else than {N}).
to say that we require the next-to-final sub-reading of the cohort to the left to be a preposition.


Parallell to regular CG word indexes, 0 is the "head" (the last sub-reading), while -1 is one sub-reading to the left of that. Positive numbers would read from the left, so 1 is the first sub-reading from the left. For three sub-readings, that gives us the following indexing:
Parallell to regular CG word indexes, 0 is the "head" (the last sub-reading), while -1 is one sub-reading to the left of that. Positive numbers would read from the left, so 1 is the first sub-reading from the left. For three sub-readings, that gives us the following indexing:
Line 75: Line 75:
We might also want to say "require ''any'' main- or sub-reading to be tagged <code>pr</code>'':
We might also want to say "require ''any'' main- or sub-reading to be tagged <code>pr</code>'':


MAP (@P←) TARGET (n) IF (-1 (pr) + {*0}) ;
MAP (@P←) TARGET (n) IF (-1/*0 (pr)) ;




Line 81: Line 81:
To '''MAP''' to a non-final sub-reading, we could then say
To '''MAP''' to a non-final sub-reading, we could then say


MAP (@ADVL→) TARGET (pr) + {-1} IF (1* (n)) ;
MAP /-1 (@ADVL→) TARGET (pr) IF (1* (n)) ;


==Some file==
==Some file==

Revision as of 08:50, 12 October 2011

Current situation

Typical input with sub-readings:

^foobar/foo+bar/fubar/flue+barge$

Right now, only the last sub-reading is used, in the above example, vislcg3 treats it as if it were

^foobar/bar/fubar/barge$

This works great for compounds where the stuff before the + is mostly inconsequential, while for other multiword expressions it is not so good... (Also, mapping tags are only put on the last sub-reading now.)

Wait can't we just split on the + with pretransfer before sending this to cg-proc?
No, because we first have to disambiguate between eg. ^foobar/foo+bar/fubar/flue+barge$ (what would that even look like if split? wouldn't work)

What we need

  • We may need to refer to an earlier sub-reading in order to disambiguate
  • We may want to put a mapping tag on an earlier sub-reading
  • And of course we want to be able to refer to the last as in the current situation

Referring to the final sub-reading

Northern Sámi postpositions take genitive.

Input fragment:

^soahtefámu/soahti<N><Sg><Nom><Cmp>+fápmu<N><Sg><Acc>/soahti<N><Sg><Nom><Cmp>+fápmu<N><Sg><Gen>$ 
^vuostá/vuostá<Po>/vuostá<Pr>/vuostá<N><Sg><Nom>$

Correct output:

^soahtefámu/soahti<N><Sg><Nom><Cmp>+fápmu<N><Sg><Gen><@→P>$        # war.power.GEN
^vuostá/vuostá<Po><@←ADVL>$^                                       # against.PO

If the input noun were unambiguously nominative, the Po reading should not be selected, so we might have a rule somewhere with

REMOVE Po if (-1 (Nom))

but if this matched non-final sub-readings, we would get the wrong tagging here. Currently, non-final sub-readings are ignored, so the sme-nob CG's work fine (as do the nn-nb ones for compounding there).

Referring to non-final sub-readings

Input:

^D'an/Da<pr>+an<det><def><sp>$
^emgann/emgann<n><m><sg>$ 
^ez/e<vpart><obj>/ael<n><m><pl>/mont<vblex><pri><p2><sg>/monet<vblex><pri><p2><sg>/e<pr>+da<det><pos><mf><sp>$
^an/an<det><def><sp>/mont<vblex><pri><p1><sg>/monet<vblex><pri><p1><sg>$

Correct output:

^D'an/Da<pr><@ADVL→>+an<det><def><sp><@→N>$       # to.the
^emgann/emgann<n><m><sg><@P←>$                    # battle
^ez/e<vpart><obj><@Pcle>$                         # PART
^an/mont<vblex><pri><p1><sg><@+FMAINV>$           # I.go
  • We want to refer to the <pr> sub-reading when mapping emgann as @P← (possibly also in disambiguation).
  • We want to MAP an @ADVL→ tag on the <pr> sub-reading (also a @→N tag on the determiner). These sub-readings are split into two units by pretransfer.

Possible syntax

One alternative is to keep as the default behaviour that we always refer to only the last sub-reading unless explicitly mentioning sub-readings.

Then, to refer to a non-final sub-reading, we could say

MAP (@P←) TARGET (n) IF (-1/-1 (pr)) ;

to say that we require the next-to-final sub-reading of the cohort to the left to be a preposition.

Parallell to regular CG word indexes, 0 is the "head" (the last sub-reading), while -1 is one sub-reading to the left of that. Positive numbers would read from the left, so 1 is the first sub-reading from the left. For three sub-readings, that gives us the following indexing:

   ^sublem1<tags>+sublem2<tags>+sublem3<tags>$
      1             2              3
     -2            -1              0


We might also want to say "require any main- or sub-reading to be tagged pr:

MAP (@P←) TARGET (n) IF (-1/*0 (pr)) ;


To MAP to a non-final sub-reading, we could then say

MAP /-1 (@ADVL→) TARGET (pr) IF (1* (n)) ;

Some file

SECTION

SUBSTITUTE ("од") ("од:5") ("од") (-1 (adj));


^помладо/adj<pref><comp>+млад<adj><nt><sg><nom><ind>$ ^од/од<pr>$ ^30/30<num>$^./.<sent>$
MAP (@+FMAINV) TARGET VerbFin ;

^n'eus/ne<adv>+bezañ<vblex><pri><impers><sp>/ne<adv>+kaout<vblex><pri><p1><pl>$ ^kador/kador<n><f><sg>$ ^ebet/ebet<adv>$^./.<sent>$