Ideas for Google Summer of Code/Bidix lookup and maintenance
Things to have in the interface:
- Paradigm generation
- probably not editable (yet)
- get it by expanding monolingual transducer
- See also: https://github.com/apertium/apertium-paradigmatrix
- Translations
- from parsing bidix
- deciding which form to display might be an interesting challenge
- indicate default translations by parsing .lrx
- See also: https://github.com/apertium/apertium-html-tools/issues/105
- Phrases
- get these from .lsx and maybe also .lrx
- Reverse lookups
- for every translation, what other words would translate to it?
Ranking translations probably doesn't exhaust the information that can be extracted from .lrx, and we might as well display the rest and make it editable as well, even if that's information that doesn't normally appear in a dictionary.
If the user edits something and doesn't have a github account, a bot should make the PR for them (maybe? are we worried about spam?). If they do have an account they should log in using OAuth. It also shouldn't be too hard to make an offline mode so that people can just use this as a local editor.
It would be really great if this could eventually integrate with Easy dictionary maintenance, which is essentially the monolingual equivalent of this.
Coding Challenge[edit]
A webpage that takes 2 language codes (abc and xyz) and a surface form in abc and displays a list of surface forms in xyz of all translations, sorted by part of speech.
Further Reading[edit]
- Dictionary design issues for Athabaskan languages: