Difference between revisions of "Ideas for Google Summer of Code/Geriaoueg vocabulary assistant"

From Apertium
Jump to navigation Jump to search
 
Line 8: Line 8:
 
* Make it deal with different character encodings (at least ISO-8859-x and UTF-8)
 
* Make it deal with different character encodings (at least ISO-8859-x and UTF-8)
 
* Make it work with Internet Explorer
 
* Make it work with Internet Explorer
* Extend it to support more languages (at least all of the trunk/ languages — approx. 20)
+
* Extend it to support more languages (at least all of the trunk/ languages — approx. 40)
 
* Internationalisation of the interface (.po-ise all strings)
 
* Internationalisation of the interface (.po-ise all strings)
 
* Make it prettier (see for example BBC Vocab and Lingro)
 
* Make it prettier (see for example BBC Vocab and Lingro)

Latest revision as of 19:31, 24 February 2014

Geriaoueg is a program that provides "popup" vocabulary assistance, something like BBC Vocab or Lingro. Currently it only works with Breton--French, Welsh--English and Spanish--Breton. This task would be to develop it to work with any language in our SVN and fix problems with processing and displaying non-standard HTML. Extend Geriaoueg so that it works more reliably with broken HTML, with any given language pair (e.g. support for both lttoolbox and HFST).

Tasks[edit]

See also: Geriaoueg
  • Make it parse HTML better — there will be a list of websites it is expected to parse, e.g. BBC News, VilaWeb, VOA, RFE/RL, Wikipedia
  • Make it deal with different character encodings (at least ISO-8859-x and UTF-8)
  • Make it work with Internet Explorer
  • Extend it to support more languages (at least all of the trunk/ languages — approx. 40)
  • Internationalisation of the interface (.po-ise all strings)
  • Make it prettier (see for example BBC Vocab and Lingro)
  • An option which will only highlight the words above/under a certain frequency freshhold, like a sliding bar. For beginners they can have all the words highlighted, but for more advanced readers, they can have only the most infrequent words.
  • A way to have Basque, Kyrgyz and Sámi work, and compounds
  • Make it optionally read in dictionaries in lttoolbox format.

Coding challenge[edit]

  • Set up Geriaoueg, and demonstrate it working.

Frequently asked questions[edit]

  • none yet, ask us something! :)

See also[edit]