Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

Geriaoueg

From Apertium
Jump to: navigation, search
Note: After Apertium's migration to GitHub, this tool is read-only on the SourceForge repository and does not exist on GitHub. If you are interested in migrating this tool to GitHub, see Migrating tools to GitHub.
(thumbnail)
Browsing a Breton website with Geriaoueg pop-up vocabulary hints in French.

Geriaoueg is a set of scripts which use apertium morphological analysers (see list of dictionaries) and a bilingual wordlist to provide vocabulary help when browsing the web. It was inspired by the BBC's "Vocab" tool. The software can be found in apertium-tools/geriaoueg, currently it works with most web browsers with the exception of Internet Explorer.

In order to set up the software for a new pair of languages you will need a morphological analyser which outputs Apertium format analyses (e.g. with lttoolbox or HFST) and a bilingual tab-separated wordlist. It is hoped that this will make Apertium format resources more useful to more people, and be a step toward building a full machine translation system.

There has also been work on browser plugins with similar functionality.

[edit] Todo

  • Make it parse HTML better — there will be a list of websites it is expected to parse, e.g. BBC News, VilaWeb, VOA, Wikipedia
  • Make it deal with different character encodings (at least ISO-8859-x and UTF-8)
  • Make it work with Internet Explorer
  • Extend it to support more languages (at least all of the trunk/ languages — approx. 20)
  • Internationalisation of the interface (.po-ise all strings)
  • Make it prettier (see for example BBC Vocab and Lingro)
  • An option which will only highlight the words above/under a certain frequency freshhold, like a sliding bar. For beginners they can have all the words highlighted, but for more advanced readers, they can have only the most infrequent words.
  • A way to have Basque, Kyrgyz and Sámi work, and compounds
  • Make it optionally read in dictionaries in lttoolbox format.

[edit] External links

Personal tools