Ideas for Google Summer of Code/Apertium FST CG

From Apertium
Jump to navigation Jump to search

The purpose of this task is to create a replacement to the Constraint Grammar usage as the first step on Aertium disambiguation, before the part of speech tagger.

Currently, many language pairs use Constraint grammar as a pre-disambiguator for the Apertium tagger, allowing the imposition of more fine grained constraints than would be otherwise possible. However, current implementation of CG is much slower than most of the other modules in the Apertium pipeline, and it's also very different in terms of syntax to other Apertium modules (dictionaries, lexical selection, transfer rules, etc).

There have been a few attempts to create FST versions of CG (see User:David_Nemeskey/GSOC_progress_2013), but they haven't succeeded. The hypothesis is that a simpler version of CG that supports the main features that CG support (no need to feature parity) would have better adoption and integration within the Apertium pipeline.

Coding Challenge

  1. Extract common use cases of Constraing Grammar in Apertium languages
  2. Create a prototype in a scripting language that allows for simple disambiguation rules (select/remove)