Difference between revisions of "Geriaoueg"

Latest revision as of 09:27, 4 April 2021

Note: After Apertium's migration to GitHub, this tool is read-only on the SourceForge repository and does not exist on GitHub. If you are interested in migrating this tool to GitHub, see Migrating tools to GitHub.

Browsing a Breton website with Geriaoueg pop-up vocabulary hints in French.

Geriaoueg is a set of scripts which use apertium morphological analysers (see list of dictionaries) and a bilingual wordlist to provide vocabulary help when browsing the web. It was inspired by the BBC's "Vocab" tool. The software can be found in apertium-tools/geriaoueg, currently it works with most web browsers with the exception of Internet Explorer.

In order to set up the software for a new pair of languages you will need a morphological analyser which outputs Apertium format analyses (e.g. with lttoolbox or HFST) and a bilingual tab-separated wordlist. It is hoped that this will make Apertium format resources more useful to more people, and be a step toward building a full machine translation system.

There has also been work on browser plugins with similar functionality.

Todo[edit]

Make it parse HTML better — there will be a list of websites it is expected to parse, e.g. BBC News, VilaWeb, VOA, Wikipedia
Make it deal with different character encodings (at least ISO-8859-x and UTF-8)
Make it work with Internet Explorer
Extend it to support more languages (at least all of the trunk/ languages — approx. 20)
Internationalisation of the interface (.po-ise all strings)
Make it prettier (see for example BBC Vocab and Lingro)
An option which will only highlight the words above/under a certain frequency freshhold, like a sliding bar. For beginners they can have all the words highlighted, but for more advanced readers, they can have only the most infrequent words.
A way to have Basque, Kyrgyz and Sámi work, and compounds
Make it optionally read in dictionaries in lttoolbox format.

External links[edit]

Geriaoueg
https://github.com/vigneshv59/geriaoueg-firefox (gci 2014 project)
https://github.com/vigneshv59/geriaoueg-chrome (gci 2014 project)
https://github.com/GrammarSoft/proofing-webext may also have useful code to parse HTML, specifically skipNonText and rest of that file.

@@ Line 1: / Line 1: @@
+{{Github-unmigrated-tool}}
 [[Image:Pantallazo-Geriaoueg.png|thumb|right|Browsing a Breton website with Geriaoueg pop-up vocabulary hints in French.]]
-'''Geriaoueg''' is a set of scripts which use apertium morphological analysers (see [[list of dictionaries]]) and a bilingual wordlist to provide vocabulary help when browsing the web. It was inspired by the BBC's "[http://www.bbc.co.uk/cymru/vocab/ Vocab]" tool. The software can be found in [[SVN]] under <code>apertium-tools/geriaoueg</code>, currently it works with most web browsers with the exception of Internet Explorer.
+'''Geriaoueg''' is a set of scripts which use apertium morphological analysers (see [[list of dictionaries]]) and a bilingual wordlist to provide vocabulary help when browsing the web. It was inspired by the BBC's "[http://www.bbc.co.uk/cymru/vocab/ Vocab]" tool. The software can be found in [https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/geriaoueg/ <code>apertium-tools/geriaoueg</code>], currently it works with most web browsers with the exception of Internet Explorer.
-In order to set up the software for a new pair of languages you will need a morphological analyser which outputs Apertium format analyses (e.g. with [[lttoolbox]] or [[sfst]]) and a bilingual tab-separated wordlist. It is hoped that this will make Apertium format resources more useful to more people, and be a step toward building a full machine translation system.
+In order to set up the software for a new pair of languages you will need a morphological analyser which outputs Apertium format analyses (e.g. with [[lttoolbox]] or [[HFST]]) and a bilingual tab-separated wordlist. It is hoped that this will make Apertium format resources more useful to more people, and be a step toward building a full machine translation system.
+There has also been work on browser plugins with similar functionality.
+==Todo==
+* Make it parse HTML better &mdash; there will be a list of websites it is expected to parse, e.g. BBC News, VilaWeb, VOA, Wikipedia
+* Make it deal with different character encodings (at least ISO-8859-x and UTF-8)
+* Make it work with Internet Explorer
+* Extend it to support more languages (at least all of the trunk/ languages &mdash; approx. 20)
+* Internationalisation of the interface (.po-ise all strings)
+* Make it prettier (see for example BBC Vocab and Lingro)
+* An option which will only highlight the words above/under a certain frequency freshhold, like a sliding bar. For beginners they can have all the words highlighted, but for more advanced readers, they can have only the most infrequent words.
+* A way to have Basque, Kyrgyz and Sámi work, and compounds
+* Make it optionally read in dictionaries in [[lttoolbox]] format.
 ==External links==
 * [http://elx.dlsi.ua.es/geriaoueg/ Geriaoueg]
+* https://github.com/vigneshv59/geriaoueg-firefox (gci 2014 project)
+* https://github.com/vigneshv59/geriaoueg-chrome (gci 2014 project)
+* https://github.com/GrammarSoft/proofing-webext may also have useful code to parse HTML, specifically [https://github.com/GrammarSoft/proofing-webext/blob/master/js/shared.js#L159 skipNonText] and rest of that file.
 [[Category:Tools]]

Difference between revisions of "Geriaoueg"

Latest revision as of 09:27, 4 April 2021

Todo[edit]

External links[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools