Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

Ideas for Google Summer of Code/Geriaoueg vocabulary assistant

From Apertium
Jump to: navigation, search


Geriaoueg is a program that provides "popup" vocabulary assistance, something like BBC Vocab or Lingro. Currently it only works with Breton--French, Welsh--English and Spanish--Breton. This task would be to develop it to work with any language in our SVN and fix problems with processing and displaying non-standard HTML. Extend Geriaoueg so that it works more reliably with broken HTML, with any given language pair (e.g. support for both lttoolbox and HFST).

[edit] Tasks

See also: Geriaoueg
  • Make it parse HTML better — there will be a list of websites it is expected to parse, e.g. BBC News, VilaWeb, VOA, RFE/RL, Wikipedia
  • Make it deal with different character encodings (at least ISO-8859-x and UTF-8)
  • Make it work with Internet Explorer
  • Extend it to support more languages (at least all of the trunk/ languages — approx. 40)
  • Internationalisation of the interface (.po-ise all strings)
  • Make it prettier (see for example BBC Vocab and Lingro)
  • An option which will only highlight the words above/under a certain frequency freshhold, like a sliding bar. For beginners they can have all the words highlighted, but for more advanced readers, they can have only the most infrequent words.
  • A way to have Basque, Kyrgyz and Sámi work, and compounds
  • Make it optionally read in dictionaries in lttoolbox format.

[edit] Coding challenge

  • Set up Geriaoueg, and demonstrate it working.

[edit] Frequently asked questions

  • none yet, ask us something! :)

[edit] See also

Personal tools