Html-tools/GCI suggests and stuff

From Apertium
< Html-tools
Revision as of 18:00, 1 December 2015 by TommiPirinen (talk | contribs) (started describing stuff)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This page is for GCi (GSOC?) etc. ideas for functionalities related to "Suggest a better translation" kind of features to html-tools.

  1. Unknown words

When you translate stuff with http://apertium.org/ some translations contain words with stars like:

These are words and these nonwords foo. -> Estos son palabras y estos *nonwords *foo.

These words are missing from somewhere in apertium. Currently the web page kind of collects some of these. I would want them to become links to a page that has a form where you can enter details about it.

Things to do:

  1. . The most basic implementation can be super simple with few text boxes and record or mail the form to wherever.
  2. . It is possible to find out where the word is missing from: e.g. is this a word not in English (monolingual) dictionary or English-Spanish (bilingual) dictionary. Finding out which is which may require working with APY.
  3. . In http://apertium.org there are only *words I think, it is possible to get @words and #words too, try http://turkic.apertium.org, I personally would like these to be different clicky links to other forms (see this doc for what the symbols mean)
  4. . When the words are missing in a monolingual dictionary, it may be required for user to classify them before they can be used. With some languages the classification it's enough to say if it's a noun, verb, adjective or something else, but with others you need to know *inflectional pattern* or *paradigm*. To help this classification there are web apps like https://github.com/flammie/paradigm.abumatran.eu that could be integrated to these links (this task requires some research work as well).

Other things:

  • Suggest a better translation features are known to be prone to vandalism, user signup / login / authentication stuff may be needed. Also for copyright reasons.