Ideas for Google Summer of Code/Geriaoueg vocabulary assistant

From Apertium
Jump to navigation Jump to search

Geriaoueg is a program that provides "popup" vocabulary assistance, something like BBC Vocab or Lingro. Currently it only works with Breton--French, Welsh--English and Spanish--Breton. This task would be to develop it to work with any language in our SVN and fix problems with processing and displaying non-standard HTML. Extend Geriaoueg so that it works more reliably with broken HTML, with any given language pair (e.g. support for both lttoolbox and HFST).

Tasks

See also: Geriaoueg
  • Make it parse HTML better — there will be a list of websites it is expected to parse, e.g. BBC News, VilaWeb, VOA, RFE/RL, Wikipedia
  • Make it deal with different character encodings (at least ISO-8859-x and UTF-8)
  • Make it work with Internet Explorer
  • Extend it to support more languages (at least all of the trunk/ languages — approx. 20)
  • Internationalisation of the interface (.po-ise all strings)
  • Make it prettier (see for example BBC Vocab and Lingro)
  • An option which will only highlight the words above/under a certain frequency freshhold, like a sliding bar. For beginners they can have all the words highlighted, but for more advanced readers, they can have only the most infrequent words.
  • A way to have Basque, Kyrgyz and Sámi work, and compounds
  • Make it optionally read in dictionaries in lttoolbox format.

Coding challenge

  • Set up Geriaoueg, and demonstrate it working.

Frequently asked questions

Previous GSOC projects