Difference between revisions of "Constraint Grammar"

From Apertium
Jump to navigation Jump to search
Line 64: Line 64:
   
 
Currently some of the CG rules written in the above language pairs may be written as forbid rules in the [[TSX format]] used by apertium-tagger. If the rules for your language pair can be written in the .tsx format, you can go for an easier design without a CG module in that language pair.
 
Currently some of the CG rules written in the above language pairs may be written as forbid rules in the [[TSX format]] used by apertium-tagger. If the rules for your language pair can be written in the .tsx format, you can go for an easier design without a CG module in that language pair.
  +
  +
==Editor support==
  +
* [http://beta.visl.sdu.dk/cg3ide.html CG-3 IDE] – the official vislcg3 CG IDE
  +
* [https://github.com/goavki/syntxfile_gedit_CG/ Gedit] syntax highlighting (also for any other editor that uses gtksourceview)
 
* [[Emacs#CG|Emacs]] emacs mode for editing and testing CG grammars (highlighting + IDE-like features)
   
 
==See also==
 
==See also==
Line 70: Line 75:
 
* [[Introduksjon til føringsgrammatikk]] -- a HOWTO, in Norwegian bokmål
 
* [[Introduksjon til føringsgrammatikk]] -- a HOWTO, in Norwegian bokmål
 
* [[Rule-based finite-state disambiguation]] -- GsoC 2012 project by [[User:Krvoje]], a "CG light" (or, a more apertiummy CG) with rules in XML compiled to an FST
 
* [[Rule-based finite-state disambiguation]] -- GsoC 2012 project by [[User:Krvoje]], a "CG light" (or, a more apertiummy CG) with rules in XML compiled to an FST
* [[Emacs#CG]] emacs mode for editing and testing CG grammars
 
 
* [[Constraint Grammar/Speed]] – some tips on speeding up your rules
 
* [[Constraint Grammar/Speed]] – some tips on speeding up your rules
 
* [[Constraint Grammar/Optimisation]] – ideas on how to optimise the vislcg3 engine
 
* [[Constraint Grammar/Optimisation]] – ideas on how to optimise the vislcg3 engine

Revision as of 09:21, 4 December 2016

En français

Constraint Grammar is a tool that can be used to POS-tag ambiguous text. There are free constraint grammars developed outside the Apertium project for: Norwegian (the Oslo-Bergen tagger), Sámi languages (from Giellatekno), Faroese (also from Giellatekno), Finnish (by Fred Karlsson).

Terminology

See also: Apertium stream format
Apertium equivalent: ^words/word<n><pl>/word<vblex><pres><p3><sg>$
  • baseform — the lemma of a word.
  • reading — a single analysis of a word.
Apertium equivalent: ^word<n><pl>$

Note on parenthesis

The use of parentheses to distinguish between tags and lists/sets seems to be the main confusing point for people learning CG. If we have the morphological tags tag1 and tag2, then we can have rules like this:

LIST set1 = tag1 ;
LIST set2 = (tag1 tag2) ; # matches a word with both tag1 and tag2
LIST set3 = tag1 tag2 ;   # matches a word with tag1 or tag2
LIST word = "hello" ;

SELECT:1a (tag1) (1 word) ;
SELECT:1b  set1  (1 word) ;   # equivalent to 1a

SELECT:2a (tag1 tag2) (1 word) ;
SELECT:2b  set2       (1 word) ;   # equivalent to 2a

SELECT:3a tag1 (1 word) ;
SELECT:3b tag2 (1 word) ;
SELECT:3c set3 (1 word) ;   # equivalent to 3a and 3b combined

SELECT:1c  set1  (1 ("hello")) ; # equivalent to 1a (or 1b)

Languages using CG in Apertium

and many others. The following languages currently (2014-06-27) have CG's of over 100 rules:

When is CG needed?

Currently some of the CG rules written in the above language pairs may be written as forbid rules in the TSX format used by apertium-tagger. If the rules for your language pair can be written in the .tsx format, you can go for an easier design without a CG module in that language pair.

Editor support

  • CG-3 IDE – the official vislcg3 CG IDE
  • Gedit syntax highlighting (also for any other editor that uses gtksourceview)
  • Emacs emacs mode for editing and testing CG grammars (highlighting + IDE-like features)

See also

External links