Difference between revisions of "Ideas for Google Summer of Code/Rule-based finite-state disambiguation"

From Apertium
Jump to navigation Jump to search
(Created page with '{{TOCD}} ==Tasks== ==Coding challenge== ==Frequently asked questions== ==Previous GSOC projects== [[Category:Ideas for Google Summer of Code|Rule-based finite-state disambi…')
 
Line 1: Line 1:
 
{{TOCD}}
 
{{TOCD}}
  +
  +
Currently Apertium only has a bigram/trigram part-of-speech tagger. The objective of this task would be to implement a disambiguation framework for Apertium that can be expressed as a finite-state transducer. It might be a good idea to express this as constraint rules, in a novel XML-based file format.
   
 
==Tasks==
 
==Tasks==
  +
  +
* Define an XML format for writing finite-state constraint rules.
  +
* Write a compiler which turns these rules into a binary finite-state representation.
  +
* Write a processor which applies these rules to an Apertium input stream.
   
 
==Coding challenge==
 
==Coding challenge==
  +
  +
* Write a stream processor (see [[Apertium stream format]]) for the output of <code>lt-proc</code> that parses character by character, respecting [[superblanks]].
   
 
==Frequently asked questions==
 
==Frequently asked questions==

Revision as of 15:25, 4 March 2012

Currently Apertium only has a bigram/trigram part-of-speech tagger. The objective of this task would be to implement a disambiguation framework for Apertium that can be expressed as a finite-state transducer. It might be a good idea to express this as constraint rules, in a novel XML-based file format.

Tasks

  • Define an XML format for writing finite-state constraint rules.
  • Write a compiler which turns these rules into a binary finite-state representation.
  • Write a processor which applies these rules to an Apertium input stream.

Coding challenge

Frequently asked questions

Previous GSOC projects