Difference between revisions of "Ideas for Google Summer of Code/Geriaoueg vocabulary assistant"

From Apertium
Jump to navigation Jump to search
(Created page with '{{TOCD}} Geriaoueg is a program that provides "popup" vocabulary assistance, something like BBC Vocab or Lingro. Currently it only works with Breton--French, Welsh--English and S…')
 
Line 4: Line 4:
 
==Tasks==
 
==Tasks==
 
{{see-also|Geriaoueg}}
 
{{see-also|Geriaoueg}}
  +
  +
* Make it parse HTML better — there will be a list of websites it is expected to parse, e.g. BBC News, VilaWeb, VOA, RFE/RL, Wikipedia
  +
* Make it deal with different character encodings (at least ISO-8859-x and UTF-8)
  +
* Make it work with Internet Explorer
  +
* Extend it to support more languages (at least all of the trunk/ languages — approx. 20)
  +
* Internationalisation of the interface (.po-ise all strings)
  +
* Make it prettier (see for example BBC Vocab and Lingro)
  +
* An option which will only highlight the words above/under a certain frequency freshhold, like a sliding bar. For beginners they can have all the words highlighted, but for more advanced readers, they can have only the most infrequent words.
  +
* A way to have Basque, Kyrgyz and Sámi work, and compounds
  +
* Make it optionally read in dictionaries in [[lttoolbox]] format.
   
 
==Coding challenge==
 
==Coding challenge==

Revision as of 11:28, 20 February 2012

Geriaoueg is a program that provides "popup" vocabulary assistance, something like BBC Vocab or Lingro. Currently it only works with Breton--French, Welsh--English and Spanish--Breton. This task would be to develop it to work with any language in our SVN and fix problems with processing and displaying non-standard HTML. Extend Geriaoueg so that it works more reliably with broken HTML, with any given language pair (e.g. support for both lttoolbox and HFST).

Tasks

See also: Geriaoueg
  • Make it parse HTML better — there will be a list of websites it is expected to parse, e.g. BBC News, VilaWeb, VOA, RFE/RL, Wikipedia
  • Make it deal with different character encodings (at least ISO-8859-x and UTF-8)
  • Make it work with Internet Explorer
  • Extend it to support more languages (at least all of the trunk/ languages — approx. 20)
  • Internationalisation of the interface (.po-ise all strings)
  • Make it prettier (see for example BBC Vocab and Lingro)
  • An option which will only highlight the words above/under a certain frequency freshhold, like a sliding bar. For beginners they can have all the words highlighted, but for more advanced readers, they can have only the most infrequent words.
  • A way to have Basque, Kyrgyz and Sámi work, and compounds
  • Make it optionally read in dictionaries in lttoolbox format.

Coding challenge

Frequently asked questions

Previous GSOC projects