Difference between revisions of "Constraint Grammar"

Revision as of 10:03, 1 October 2013

Terminology

See also: Apertium stream format

cohort — a surface form of a word, along with its analyses (possible lexical units), an ambiguous lexical unit.

Apertium equivalent: ^words/word<n><pl>/word<vblex><pres><p3><sg>$

baseform — the lemma of a word.
reading — a single analysis of a word.

Apertium equivalent: ^word<n><pl>$

wordform — a surface form of a word.

Note on parenthesis

The use of parentheses to distinguish between tags and lists/sets seems to be the main confusing point for people learning CG. If we have the morphological tags tag1 and tag2, then we can have rules like this:

LIST set1 = tag1 ;
LIST set2 = (tag1 tag2) ; # matches a word with both tag1 and tag2
LIST set3 = tag1 tag2 ;   # matches a word with tag1 or tag2
LIST word = "hello" ;

SELECT:1a (tag1) (1 word) ;
SELECT:1b  set1  (1 word) ;   # equivalent to 1a

SELECT:2a (tag1 tag2) (1 word) ;
SELECT:2b  set2       (1 word) ;   # equivalent to 2a

SELECT:3a tag1 (1 word) ;
SELECT:3b tag2 (1 word) ;
SELECT:3c set3 (1 word) ;   # equivalent to 3a and 3b combined

SELECT:1c  set1  (1 ("hello")) ; # equivalent to 1a (or 1b)

Languages using CG in Apertium

When is CG needed?

Currently some of the CG rules written in the above language pairs may be written as forbid rules in the TSX format used by apertium-tagger. If the rules for your language pair can be written in the .tsx format, you can go for an easier design without a CG module in that language pair.

External links

VISL CG-3 Development Information + documentation and downloads
Basic Tutorial for VISL CG-3
cg-mode for emacs, gives basic syntax highlighting and indentation
Kevin Donnelly's CG tutorial
Hulden M, Francom J (2012) Boosting Statistical Tagger Accuracy with Simple Rule-Based Grammars, Proc. LREC 2012, p. 2114-2117 shows how 20 hours (very little time!) writing disambiguation rules gives substantial improvements. Some of the rules shown may also be implemented in the TSX format used by apertium-tagger.

@@ Line 47: / Line 47: @@
 * [[Introduksjon til føringsgrammatikk]] -- a HOWTO, in Norwegian bokmål
 * [[Rule-based finite-state disambiguation]] -- GsoC 2012 project by [[User:Krvoje]], a "CG light" (or, a more apertiummy CG) with rules in XML compiled to an FST
+* [[Emacs#CG]] – emacs mode for editing and testing CG grammars
 ==External links==

Difference between revisions of "Constraint Grammar"

Revision as of 10:03, 1 October 2013

Contents

Terminology

Note on parenthesis

Languages using CG in Apertium

When is CG needed?

See also

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools