Difference between revisions of "Subreadings in Constraint Grammar"

From Apertium
Jump to navigation Jump to search
(better Possible syntax)
Line 61: Line 61:
 
Then, to '''refer''' to a non-final sub-reading, we could say
 
Then, to '''refer''' to a non-final sub-reading, we could say
   
MAP (@P←) TARGET (n) IF (-1:1 (pr)) ;
+
MAP (@P←) TARGET (n) IF (-1 (pr) + {-1}) ;
   
to say that we require the first sub-reading of the cohort to the left to be a preposition.
+
to say that we require the next-to-final sub-reading of the cohort to the left to be a preposition. What we're doing here is intersecting the (pr) set with a special set that constrains it to be to a sub-reading (the syntax of that set could of course be something else than {N}).
   
  +
Parallell to regular CG word indexes, 0 is the "head" (the last sub-reading), while -1 is one sub-reading to the left of that. Positive numbers would read from the left, so 1 is the first sub-reading from the left. For three sub-readings, that gives us the following indexing:
: Should "first" mean "first when reading from the last"? It seems logical that :0 should be the final sub-reading, just like 0 elsewhere is "this cohort"
 
   
  +
^sublem1<tags>+sublem2<tags>+sublem3<tags>$
We might also want to say "require ''any'' sub-reading to be tagged <code>pr</code>'':
 
  +
1 2 3
  +
-2 -1 0
   
MAP (@P←) TARGET (n) IF (-1:* (pr)) ;
 
   
   
 
We might also want to say "require ''any'' main- or sub-reading to be tagged <code>pr</code>'':
   
 
MAP (@P←) TARGET (n) IF (-1 (pr) + {*0}) ;
To '''MAP''' to a non-final sub-reading, we could say...
 
   
MAP (@ADVL→) TARGET:1 (pr) IF (1* (n)) ;
 
   
  +
...hmm... or maybe TARGET (:1 (pr)) ?
 
 
To '''MAP''' to a non-final sub-reading, we could then say
  +
 
MAP (@ADVL→) TARGET (pr) + {-1} IF (1* (n)) ;
   
 
==Some file==
 
==Some file==

Revision as of 08:23, 6 October 2011

Current situation

Typical input with sub-readings:

^foobar/foo+bar/fubar/flue+barge$

Right now, only the last sub-reading is used, in the above example, vislcg3 treats it as if it were

^foobar/bar/fubar/barge$

This works great for compounds where the stuff before the + is mostly inconsequential, while for other multiword expressions it is not so good... (Also, mapping tags are only put on the last sub-reading now.)

Wait can't we just split on the + with pretransfer before sending this to cg-proc?
No, because we first have to disambiguate between eg. ^foobar/foo+bar/fubar/flue+barge$ (what would that even look like if split? wouldn't work)

What we need

  • We may need to refer to an earlier sub-reading in order to disambiguate
  • We may want to put a mapping tag on an earlier sub-reading
  • And of course we want to be able to refer to the last as in the current situation

Referring to the final sub-reading

Northern Sámi postpositions take genitive.

Input fragment:

^soahtefámu/soahti<N><Sg><Nom><Cmp>+fápmu<N><Sg><Acc>/soahti<N><Sg><Nom><Cmp>+fápmu<N><Sg><Gen>$ 
^vuostá/vuostá<Po>/vuostá<Pr>/vuostá<N><Sg><Nom>$

Correct output:

^soahtefámu/soahti<N><Sg><Nom><Cmp>+fápmu<N><Sg><Gen><@→P>$        # war.power.GEN
^vuostá/vuostá<Po><@←ADVL>$^                                       # against.PO

If the input noun were unambiguously nominative, the Po reading should not be selected, so we might have a rule somewhere with

REMOVE Po if (-1 (Nom))

but if this matched non-final sub-readings, we would get the wrong tagging here. Currently, non-final sub-readings are ignored, so the sme-nob CG's work fine (as do the nn-nb ones for compounding there).

Referring to non-final sub-readings

Input:

^D'an/Da<pr>+an<det><def><sp>$
^emgann/emgann<n><m><sg>$ 
^ez/e<vpart><obj>/ael<n><m><pl>/mont<vblex><pri><p2><sg>/monet<vblex><pri><p2><sg>/e<pr>+da<det><pos><mf><sp>$
^an/an<det><def><sp>/mont<vblex><pri><p1><sg>/monet<vblex><pri><p1><sg>$

Correct output:

^D'an/Da<pr><@ADVL→>+an<det><def><sp><@→N>$       # to.the
^emgann/emgann<n><m><sg><@P←>$                    # battle
^ez/e<vpart><obj><@Pcle>$                         # PART
^an/mont<vblex><pri><p1><sg><@+FMAINV>$           # I.go
  • We want to refer to the <pr> sub-reading when mapping emgann as @P← (possibly also in disambiguation).
  • We want to MAP an @ADVL→ tag on the <pr> sub-reading (also a @→N tag on the determiner). These sub-readings are split into two units by pretransfer.

Possible syntax

One alternative is to keep as the default behaviour that we always refer to only the last sub-reading unless explicitly mentioning sub-readings.

Then, to refer to a non-final sub-reading, we could say

MAP (@P←) TARGET (n) IF (-1 (pr) + {-1}) ;

to say that we require the next-to-final sub-reading of the cohort to the left to be a preposition. What we're doing here is intersecting the (pr) set with a special set that constrains it to be to a sub-reading (the syntax of that set could of course be something else than {N}).

Parallell to regular CG word indexes, 0 is the "head" (the last sub-reading), while -1 is one sub-reading to the left of that. Positive numbers would read from the left, so 1 is the first sub-reading from the left. For three sub-readings, that gives us the following indexing:

   ^sublem1<tags>+sublem2<tags>+sublem3<tags>$
      1             2              3
     -2            -1              0


We might also want to say "require any main- or sub-reading to be tagged pr:

MAP (@P←) TARGET (n) IF (-1 (pr) + {*0}) ;


To MAP to a non-final sub-reading, we could then say

MAP (@ADVL→) TARGET (pr) + {-1} IF (1* (n)) ;

Some file

SECTION

SUBSTITUTE ("од") ("од:5") ("од") (-1 (adj));


^помладо/adj<pref><comp>+млад<adj><nt><sg><nom><ind>$ ^од/од<pr>$ ^30/30<num>$^./.<sent>$
MAP (@+FMAINV) TARGET VerbFin ;

^n'eus/ne<adv>+bezañ<vblex><pri><impers><sp>/ne<adv>+kaout<vblex><pri><p1><pl>$ ^kador/kador<n><f><sg>$ ^ebet/ebet<adv>$^./.<sent>$