Difference between revisions of "Constraint Grammar/Speed"

From Apertium
Jump to navigation Jump to search
m (Unhammer moved page CG/Speed to Constraint Grammar/Speed)
 
(5 intermediate revisions by the same user not shown)
Line 24: Line 24:


==Speedy alternatives to CG==
==Speedy alternatives to CG==
* apertium-tagger is a lot faster than CG, but only supports the same types of rules that the tagger can learn (e.g. select or remove this bigram). (However, if you run apertium-tagger before the CG, your CG has less work to do.)
* [[apertium-tagger]] is a lot faster than CG, but only supports the same types of rules that the tagger can learn (e.g. select or remove this bigram).


* fomacg (see [[User:David Nemeskey/GSOC progress 2013]]) is a project to create a finite-state version of CG, but is currently research stage.
* fomacg (see [[User:David Nemeskey/GSOC progress 2013]]) is a project to create a finite-state version of CG, but is currently research stage.


* [[Constraint Grammar/Optimisation|future versions of vislcg3]] :-)

==See also==

* More tips on p.48 and on of http://www.hf.uio.no/iln/om/organisasjon/tekstlab/aktuelt/arrangementer/arkiv/CG08/CG3_Oslo.pdf#48
<pre>
– disambiguation gain: SELECT > REMOVE, frequent rules first
(also: avoid "ghost" checking rules), "heavy" sets first, POS
targets vs. word targets, target frequency (not used)
– processing cost:
● rules length (in number of contexts)
● global > local contexts
● NOT/C > simple check
</pre>


[[Category:Constraint Grammar]]
[[Category:Constraint Grammar]]

Latest revision as of 12:56, 23 March 2022

Tips on how to speed up your Constraint Grammar.

Put popular rules first[edit]

If a cohort can be fully disambiguated early on, CG won't have to bother with that cohort later at all.

E.g. if you two rules that work on nouns, but one of them only matches if there's some rare verb in the context, try to put that rule after the first rule (as long as you still get the correct disambiguation!).

Avoid slow rule types[edit]

Regexes[edit]

Lots of regex matching ("foo.*bar"r) is typically slow.

(*)[edit]

A rule like

"<foo>" ADD (bar) (*) IF …

is much slower than

"<foo>" ADD (bar) ("<foo>") IF …

(vislcg3 might optimise that away in the future?)


Speedy alternatives to CG[edit]

  • apertium-tagger is a lot faster than CG, but only supports the same types of rules that the tagger can learn (e.g. select or remove this bigram).

See also[edit]

– disambiguation gain: SELECT > REMOVE, frequent rules first
  (also: avoid "ghost" checking rules), "heavy" sets first, POS
  targets vs. word targets, target frequency (not used)
– processing cost:
  ● rules length (in number of contexts)
  ● global > local contexts
  ● NOT/C > simple check