Constraint Grammar
Constraint Grammar is a tool that can be used to POS-tag ambiguous text. There are free constraint grammars developed outside the Apertium project for: Norwegian (the Oslo-Bergen tagger), Sámi languages (from Giellatekno) and Faroese (also from Giellatekno).
Terminology
- See also: Apertium stream format
- cohort — a surface form of a word, along with its analyses (possible lexical units), an ambiguous lexical unit.
- Apertium equivalent:
^words/word<n><pl>/word<vblex><pres><p3><sg>$
- Apertium equivalent:
- baseform — the lemma of a word.
- reading — a single analysis of a word.
- Apertium equivalent:
^word<n><pl>$
- Apertium equivalent:
- wordform — a surface form of a word.
Note on parenthesis
Parentheses, and the distinction between tags and lists/sets, seem to be the main confusing point for people learning CG. If we have the morphological tags tag1
and tag2
, then we can have rules like this:
LIST set1 = tag1 ; LIST set2 = (tag1 tag2) ; # matches a word with both tag1 and tag2 LIST set3 = tag1 tag2 ; # matches a word with tag1 or tag2 LIST word = "hello" ;
SELECT:rule1a (tag1) (1 word) ; SELECT:rule1b set1 (1 word) ; # equivalent to rule1a SELECT:rule2a (tag1 tag2) (1 word) ; SELECT:rule2b set2 (1 word) ; # equivalent to rule2a SELECT:rule3a tag1 (1 word) ; SELECT:rule3b tag2 (1 word) ; SELECT:rule3c set3 (1 word) ; # equivalent to rule3a and rule3b combined
SELECT:rule1c set1 (1 ("hello")) ; # equivalent to rule1a (or rule1b)
Languages using CG in Apertium
See also
- Apertium and Constraint Grammar -- installation and use
- Introduksjon til føringsgrammatikk -- a HOWTO, in Norwegian bokmål