Ideas for Google Summer of Code/Use preferences in pair

From Apertium
< Ideas for Google Summer of Code
Revision as of 09:31, 4 March 2024 by Unhammer (talk | contribs) (Created page with "Language pairs can now have any number of linguistic or stylistic preferences that the user can choose between. Before, we had only fixed sets, e.g. British vs American Englis...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Language pairs can now have any number of linguistic or stylistic preferences that the user can choose between. Before, we had only fixed sets, e.g. British vs American English, which required compiling a new pipeline for each set of preferences. These days it is possible to turn on/off individual spelling choices like "-ize" vs "-ise" or even word-specific ones like "encyclop​ae​dia", "man​oeuvre" vs "encyclop​e​dia", "man​e​uver", within just one pipeline – as long as the pipeline is set up to allow this. This GsoC task involves setting up an existing pipeline to allow this kind of variation.

The new preference system is used in nob→nno and cat→spa, but there are other language pairs that could have preferences enabled as well. This requires first of all figuring out what preference variation is possible and useful, systematising it, and then enabling it in the language pair by turning hard restrictions into ambiguity and selectors. We remove LR/RL's and merge paradigms, and add simple CG rules to pick the form the user requested.

Some documentation:

https://wiki.apertium.org/wiki/Dialectal_or_standard_variation#Overlapping_variants

https://github.com/apertium/apertium/issues/118