Ideas for Google Summer of Code/Geriaoueg vocabulary assistant
Jump to navigation
Jump to search
Geriaoueg is a program that provides "popup" vocabulary assistance, something like BBC Vocab or Lingro. Currently it only works with Breton--French, Welsh--English and Spanish--Breton. This task would be to develop it to work with any language in our SVN and fix problems with processing and displaying non-standard HTML. Extend Geriaoueg so that it works more reliably with broken HTML, with any given language pair (e.g. support for both lttoolbox and HFST).
Tasks[edit]
- See also: Geriaoueg
- Make it parse HTML better — there will be a list of websites it is expected to parse, e.g. BBC News, VilaWeb, VOA, RFE/RL, Wikipedia
- Make it deal with different character encodings (at least ISO-8859-x and UTF-8)
- Make it work with Internet Explorer
- Extend it to support more languages (at least all of the trunk/ languages — approx. 40)
- Internationalisation of the interface (.po-ise all strings)
- Make it prettier (see for example BBC Vocab and Lingro)
- An option which will only highlight the words above/under a certain frequency freshhold, like a sliding bar. For beginners they can have all the words highlighted, but for more advanced readers, they can have only the most infrequent words.
- A way to have Basque, Kyrgyz and Sámi work, and compounds
- Make it optionally read in dictionaries in lttoolbox format.
Coding challenge[edit]
- Set up Geriaoueg, and demonstrate it working.
Frequently asked questions[edit]
- none yet, ask us something! :)