Difference between revisions of "Geriaoueg"
Jump to navigation
Jump to search
(→Todo) |
|||
Line 2: | Line 2: | ||
'''Geriaoueg''' is a set of scripts which use apertium morphological analysers (see [[list of dictionaries]]) and a bilingual wordlist to provide vocabulary help when browsing the web. It was inspired by the BBC's "[http://www.bbc.co.uk/cymru/vocab/ Vocab]" tool. The software can be found in [[SVN]] under <code>apertium-tools/geriaoueg</code>, currently it works with most web browsers with the exception of Internet Explorer. |
'''Geriaoueg''' is a set of scripts which use apertium morphological analysers (see [[list of dictionaries]]) and a bilingual wordlist to provide vocabulary help when browsing the web. It was inspired by the BBC's "[http://www.bbc.co.uk/cymru/vocab/ Vocab]" tool. The software can be found in [[SVN]] under <code>apertium-tools/geriaoueg</code>, currently it works with most web browsers with the exception of Internet Explorer. |
||
In order to set up the software for a new pair of languages you will need a morphological analyser which outputs Apertium format analyses (e.g. with [[lttoolbox]] or [[ |
In order to set up the software for a new pair of languages you will need a morphological analyser which outputs Apertium format analyses (e.g. with [[lttoolbox]] or [[HFST]]) and a bilingual tab-separated wordlist. It is hoped that this will make Apertium format resources more useful to more people, and be a step toward building a full machine translation system. |
||
==Todo== |
==Todo== |
Revision as of 14:29, 15 March 2011
Geriaoueg is a set of scripts which use apertium morphological analysers (see list of dictionaries) and a bilingual wordlist to provide vocabulary help when browsing the web. It was inspired by the BBC's "Vocab" tool. The software can be found in SVN under apertium-tools/geriaoueg
, currently it works with most web browsers with the exception of Internet Explorer.
In order to set up the software for a new pair of languages you will need a morphological analyser which outputs Apertium format analyses (e.g. with lttoolbox or HFST) and a bilingual tab-separated wordlist. It is hoped that this will make Apertium format resources more useful to more people, and be a step toward building a full machine translation system.
Todo
- Make it parse HTML better — there will be a list of websites it is expected to parse, e.g. BBC News, VilaWeb, VOA, Wikipedia
- Make it deal with different character encodings (at least ISO-8859-x and UTF-8)
- Make it work with Internet Explorer
- Extend it to support more languages (at least all of the trunk/ languages — approx. 20)
- Internationalisation of the interface (.po-ise all strings)
- Make it prettier (see for example BBC Vocab and Lingro)
- An option which will only highlight the words above/under a certain frequency freshhold, like a sliding bar. For beginners they can have all the words highlighted, but for more advanced readers, they can have only the most infrequent words.
- A way to have Basque and Sámi work, and compounds