Difference between revisions of "Constructing a TSX file with a Constraint Grammar"

Revision as of 21:07, 2 March 2008

Constraint Grammar (CG) is a method of POS-tagging ambiguous text. The apertium-tagger has a basic form of this in addition to probabilistic tagging. It should be possible to use an existing CG to write, or improve an existing TSX (tagger definition file) in the form of using it to create sets of forbid/enforce rules.

Terminology

cohort — a surface form of a word, along with its analyses (possible lexical units).

Apertium equivalent: ^words/word<n><pl>/word<vblex><pres><p3><sg>$

baseform — the lemma of a word.
reading — a single analysis of a word.

Apertium equivalent: ^word<n><pl>$

wordform — a surface form of a word.Labels

Coarse tag "labels" in Constraint Grammar (CG) are specified either as list or set. Sometimes however, these are not complete sets, so may need to be combined.
For example:

LIST A-N-CC = A N CC ;
LIST A-pos = (A Pos) ;
LIST %etter/fram/opp% = ("etter" Pr) ("fram" Pr) ("frem" Pr) ("opp" Pr) ;

Is three lists, expressed in TSX format as below:

  <def-label name="A-N-CC">
    <tags-item tags="adj.*"/>
    <tags-item tags="n.*"/>
    <tags-item tags="cnjcoo"/>
  </def-label>
  <def-label name="A-pos">
    <tags-item tags="adj.pos.*"/>
  </def-label>
  <def-label name="%etter/fram/opp%">
    <tags-item lemma="etter" tags="pr"/>
    <tags-item lemma="fram" tags="pr"/>
    <tags-item lemma="frem" tags="pr"/>
    <tags-item lemma="opp" tags="pr"/>
  </def-label>

etc. Note that this may cause some problems, so it might be best to attempt this using only ambiguous tags to start with.

Constraints

Constraint Grammar uses a series of hand-written constraints in order to POS-tag ambiguous words.

Forbid rules

The operation analogous to a forbid rule is remove.

# 3526
"<bare>" REMOVE (CS) IF
        (-1 CS)
;

This means that it works on the lemma "bare", which can be a subordinating conjunction, verb or adverb. It says to forbid the string "bare bare" where both lexical units are subordinating conjunctions. In TSX format:

  <forbid>
    <label-sequence>
      <label-item label="bare-CS">
      <label-item label="bare-CS">
    </label-sequence>
  </forbid>

Presuming we have a label definition of:

  <def-label name="bare-CS">
    <tags-item lemma="bare" tags="conjsub"/>
  </def-label>

Enforce rules

The operation analogous to an enforce rule is select, which "selects a reading, if it contains a TARGETed tag. In practice, selection is equivalent to a removal of all other readings."

# 2866
SELECT (A Sg Neu Indef) IF
        (0 %rundt%)
        (1 Det-Qnt)
;

Means enforce adj.sg.nt.indef if the lemma of the word is "rundt" and the lexical unit to the left is a quantifier det.qnt
In order to convert this into Apertium format one would need to take all of the coarse tags which are not det.qnt and make them into label sequences as below:

  <forbid>
    <label-sequence>
      <label-item label="%rundt%"> 
      <label-item label="A-pos">
    </label-sequence>
    
    ...

  </forbid>

Prefer tagsFurther reading

vislcg3 documentation (single page)
VISL: Basic how-to for vislcg (vislcg2)

Note that vislcg3 is the version which is actively developed.

@@ Line 4: / Line 4: @@
 ==Terminology==
-* cohort &mdash; a [[surface form]] of a word, along with its analyses (possible [[lexical unit]]s).
+* ''cohort'' &mdash; a [[surface form]] of a word, along with its analyses (possible [[lexical unit]]s).
 ::Apertium equivalent: <code>^words/word<n><pl>/word<vblex><pres><p3><sg>$</code>
-* baseform &mdash; the [[lemma]] of a word.
+* ''baseform'' &mdash; the [[lemma]] of a word.
-* reading &mdash; a single analysis of a word.
+* ''reading'' &mdash; a single analysis of a word.
 ::Apertium equivalent: <code>^word<n><pl>$
-* wordform &mdash; a [[surface form]] of a word.
+* ''wordform'' &mdash; a [[surface form]] of a word.
 ==Labels==

Difference between revisions of "Constructing a TSX file with a Constraint Grammar"

Revision as of 21:07, 2 March 2008

Contents

Terminology

Labels

Constraints

Forbid rules

Enforce rules

Prefer tags

Further reading

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools