Difference between revisions of "Constructing a TSX file with a Constraint Grammar"

From Apertium
Jump to navigation Jump to search
 
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
  +
#redirect[[Apertium and Constraint Grammar]]
{{TOCD}}
 
Constraint Grammar (CG) is a method of POS-tagging ambiguous text. The apertium-tagger has a basic form of this in addition to probabilistic tagging. It should be possible to use an existing CG to write, or improve an existing TSX (tagger definition file) in the form of using it to create sets of forbid/enforce rules.
 
 
==Terminology==
 
 
* cohort — set of analyses for a given surface form.
 
 
==Labels==
 
 
Coarse tag "labels" in Constraint Grammar (CG) are specified either as {{sc|list}} or {{sc|set}}. Sometimes however, these are not complete sets, so may need to be combined.
 
 
For example:
 
 
<pre>
 
LIST A-N-CC = A N CC ;
 
LIST A-pos = (A Pos) ;
 
LIST %etter/fram/opp% = ("etter" Pr) ("fram" Pr) ("frem" Pr) ("opp" Pr) ;
 
</pre>
 
 
Is three lists, expressed in TSX format as below:
 
 
<pre>
 
<def-label name="A-N-CC">
 
<tags-item tags="adj.*"/>
 
<tags-item tags="n.*"/>
 
<tags-item tags="cnjcoo"/>
 
</def-label>
 
<def-label name="A-pos">
 
<tags-item tags="adj.pos.*"/>
 
</def-label>
 
<def-label name="%etter/fram/opp%">
 
<tags-item lemma="etter" tags="pr"/>
 
<tags-item lemma="fram" tags="pr"/>
 
<tags-item lemma="frem" tags="pr"/>
 
<tags-item lemma="opp" tags="pr"/>
 
</def-label>
 
</pre>
 
 
etc. Note that this may cause some problems, so it might be best to attempt this using only ambiguous tags to start with.
 
 
==Constraints==
 
 
Constraint Grammar uses a series of hand-written constraints in order to POS-tag ambiguous words.
 
 
===Forbid rules===
 
 
The operation analogous to a ''forbid rule'' is {{sc|remove}}.
 
 
===Enforce rules===
 
 
The operation analogous to an ''enforce rule'' is {{sc|select}}, which "selects a reading, if it contains a TARGETed tag. In practice, selection is equivalent to a removal of all other readings."
 
 
<pre>
 
# 2866
 
SELECT (A Sg Neu Indef) IF
 
(0 %rundt%)
 
(1 Det-Qnt)
 
;
 
</pre>
 
 
Means enforce <code>adj.sg.nt.indef</code> if the lemma of the word is "rundt" and the lexical unit to the left is a quantifier <code>det.qnt</code>
 
 
In order to convert this into Apertium format one would need to take all of the coarse tags which are not <code>det.qnt</code> and make them into label sequences as below:
 
 
<pre>
 
<forbid>
 
<label-sequence>
 
<label-item label="%rundt%">
 
<label-item label="A-pos">
 
</label-sequence>
 
 
...
 
 
</forbid>
 
</pre>
 
 
===Prefer tags===
 
 
==Further reading==
 
 
* [http://beta.visl.sdu.dk/cg2_howto.html VISL: Basic how-to for vislcg]
 
 
[[Category:Documentation]]
 

Latest revision as of 10:28, 23 March 2009