Constraint Grammar
From Apertium
|
Constraint Grammar is a tool that can be used to POS-tag ambiguous text. There are free constraint grammars developed outside the Apertium project for: Norwegian (the Oslo-Bergen tagger), Sámi languages (from Giellatekno) and Faroese (also from Giellatekno).
[edit] Terminology
- See also: Apertium stream format
- cohort — a surface form of a word, along with its analyses (possible lexical units), an ambiguous lexical unit.
- Apertium equivalent:
^words/word<n><pl>/word<vblex><pres><p3><sg>$
- Apertium equivalent:
- baseform — the lemma of a word.
- reading — a single analysis of a word.
- Apertium equivalent:
^word<n><pl>$
- Apertium equivalent:
- wordform — a surface form of a word.
[edit] Note on parenthesis
The use of parentheses to distinguish between tags and lists/sets seems to be the main confusing point for people learning CG. If we have the morphological tags tag1 and tag2, then we can have rules like this:
LIST set1 = tag1 ;
LIST set2 = (tag1 tag2) ; # matches a word with both tag1 and tag2
LIST set3 = tag1 tag2 ; # matches a word with tag1 or tag2
LIST word = "hello" ;
SELECT:1a (tag1) (1 word) ;
SELECT:1b set1 (1 word) ; # equivalent to 1a
SELECT:2a (tag1 tag2) (1 word) ;
SELECT:2b set2 (1 word) ; # equivalent to 2a
SELECT:3a tag1 (1 word) ;
SELECT:3b tag2 (1 word) ;
SELECT:3c set3 (1 word) ; # equivalent to 3a and 3b combined
SELECT:1c set1 (1 ("hello")) ; # equivalent to 1a (or 1b)
[edit] Languages using CG in Apertium
[edit] When is CG needed?
Currently some of the CG rules written in the above language pairs may be written as forbid rules in the TSX format used by apertium-tagger. If the rules for your language pair can be written in the .tsx format, you can go for an easier design without a CG module in that language pair.
[edit] See also
- Apertium and Constraint Grammar -- installation and use
- Introduksjon til føringsgrammatikk -- a HOWTO, in Norwegian bokmål
- Rule-based finite-state disambiguation -- GsoC 2012 project by User:Krvoje, a "CG light" (or, a more apertiummy CG) with rules in XML compiled to an FST
[edit] External links
- VISL CG-3 Development Information + documentation and downloads
- Basic Tutorial for VISL CG-3
- cg-mode for emacs, gives basic syntax highlighting and indentation
- Kevin Donnelly's CG tutorial
- Hulden M, Francom J (2012) Boosting Statistical Tagger Accuracy with Simple Rule-Based Grammars, Proc. LREC 2012, p. 2114-2117 shows how 20 hours (very little time!) writing disambiguation rules gives substantial improvements. Some of the rules shown may also be implemented in the TSX format used by apertium-tagger.

