Ideas for Google Summer of Code/Use preferences in pair

From Apertium
Jump to navigation Jump to search

Language pairs can now have any number of linguistic or stylistic preferences that the user can choose between. Before, we had only fixed sets, e.g. British vs American English, which required compiling a new pipeline for each set of preferences. These days it is possible to turn on/off individual spelling choices like "-ize" vs "-ise" or even word-specific ones like "encyclop​ae​dia", "man​oeuvre" vs "encyclop​e​dia", "man​e​uver", within just one pipeline – as long as the pipeline is set up to allow this. This GsoC task involves setting up an existing pipeline to allow this kind of variation.

The new preference system is used in nob→nno and cat→spa, but there are other language pairs that could have preferences enabled as well. This requires first of all figuring out what preference variation is possible and useful, systematising it, and then enabling it in the language pair by turning hard restrictions into ambiguity and selectors. We remove LR/RL's and merge paradigms, and add simple CG rules to pick the form the user requested.


Documentation about preference variation:


Coding challenge

  • initial documentation of possible preferences in a pair of your choice (which doesn't already have preferences enabled)
  • enable a single bidix preference
  • enable a single generator preference