Difference between revisions of "User:OverPowered/GSoC2021Proposal"

From Apertium
Jump to navigation Jump to search
m
 
(4 intermediate revisions by the same user not shown)
Line 12: Line 12:
* Translating an entire webpage at a time
* Translating an entire webpage at a time



I've recorded all the weekly progress of this project in its [https://wiki.apertium.org/wiki/User:OverPowered/GSoC2021_Progress_Report Progress Report]




Line 17: Line 19:


*'''Name''': Omkar Prabhune
*'''Name''': Omkar Prabhune
*'''Official Email''': [mailto:omkar.prabhune19@vit.edu omkar.prabhune19@vit.edu]
*'''IRC''': op
*'''IRC''': op
*'''Wiki''': [https://wiki.apertium.org/wiki/User:OverPowered OverPowered]
*'''Official Email''': [mailto:omkar.prabhune19@vit.edu omkar.prabhune19@vit.edu]
*'''Personal (Google) Email''': [mailto:omkar.prabhune.317@gmail.com omkar.prabhune.317@gmail.com]
*'''Personal (Google) Email''': [mailto:omkar.prabhune.317@gmail.com omkar.prabhune.317@gmail.com]
*'''GitHub''': [https://github.com/OverPoweredDev OverPoweredDev]
*'''GitHub''': [https://github.com/OverPoweredDev OverPoweredDev]
Line 72: Line 75:
* /translate - for translating words the user hovers on
* /translate - for translating words the user hovers on
* /translateDoc - for translating entire webpages (might possible have to use /translate for that as well depending on how well it plays with most websites)
* /translateDoc - for translating entire webpages (might possible have to use /translate for that as well depending on how well it plays with most websites)


'' Note: In order to translate entire web pages more effectively, as <TinoDidriksen> suggested, it might be more efficient/future-proof to tag all ‘to be translated’ inline elements with a class in a completely separate transport html document and then pass this new document to /translateDoc to be translated. ''

'' Another addition to the project is adding context-based hover translations. Basically, the idea is to work upon the feature added in last year’s GSoC, [https://summerofcode.withgoogle.com/archive/2020/projects/5160743990722560/ Markup handling with wordbound blanks]. However instead of using this to maintain markup formatting through documents, we bind the original word through the translation pipeline and then display this context based translation upon hover. ''

'' This can also be implemented in the reverse direction i.e. upon translating an entire document, the extension can display the original word that it was translated from. ''




Line 125: Line 135:
* If needed, this is the time to tweak Apy functionalities slightly but should not be necessary
* If needed, this is the time to tweak Apy functionalities slightly but should not be necessary
* Apart from that, actually hook up the extension to use the API functionalities to translate in the pop-up
* Apart from that, actually hook up the extension to use the API functionalities to translate in the pop-up
* Properly set up the html pop-up of the extension
* Validate input given to pop-up
* Enable different options for translation
|-
|-
| '''Week 3'''
| '''Week 3'''
(June 21 - June 28)
(June 21 - June 28)
|
|
* Start work on hover functionality
* Properly set up the html pop-up of the extension
* design the hovering text-box and the details that will be shown on it
* validate input given to pop-up
* Implement translation features for hover function too
* enable different options for translation
|-
|-
| '''Week 4'''
| '''Week 4'''
(June 28 - July 5)
(June 28 - July 5)
|
|
* Start work on hover functionality
* Start work on the translate entire document feature
* Experiment using the /translateDoc functionality for the .html page
* design the hovering text-box and the details that will be shown on it
* If this doesn’t work, translate the document section by section
* Implement translation features for hover function too
|-
|-
| '''Week 5'''
| '''Week 5'''
Line 159: Line 172:
(July 16 - July 23)
(July 16 - July 23)
|
|
* Start work on the translate entire document feature
* Start work on inline-gist translation i.e. the context-based hover translation
* Experiment using the /translateDoc functionality for the .html page
* Experiment using the wordbound blanks made in last year’s GSoC
* If this doesn’t work, translate the document section by section
|-
|-
| '''Week 7'''
| '''Week 7'''
(July 23 - July 30)
(July 23 - July 30)
|
|
* Continue working on the previous week’s start
Continue working on the translate document feature, enable support on different social media and news websites. Check if it works even on infinite scrolling sites, etc.
* If it has been implemented in one direction, then work on getting it working in the reverse too
|-
|-
| '''Week 8'''
| '''Week 8'''

Latest revision as of 16:27, 20 May 2021

Google Summer of Code 2021 Proposal- A Reworked Apertium Browser Plugin


My proposal is to develop the Apertium Browser Plugin mentioned in the Project Ideas for GSoC.

The current Geriaoueg plugin is out of date, with the official link given in the wiki being unreachable and the 2014 version on GitHub being completely unusable on both Firefox and Chrome.

The extension I plan on making will have three main functionalities:*

  • Translating a word or phrase that the user hovers on on a website
  • Translating between an existing language pair in the extension pop-up
  • Translating an entire webpage at a time


I've recorded all the weekly progress of this project in its Progress Report


Contact Details[edit]

About Me[edit]

I’m a undergraduate Computer Science Student at VIT, Pune. I’ve been interested in Natural Language Processing and Linguistics for a couple years now, having worked on basic projects like a Rasa-based Chatbot and Specialised Semantic Searches before.

Apart from those I also have decent experience working on browser plugins, Some examples of that here and here.

Why am I interested in Apertium?[edit]

As someone from India, where it’s somehow the norm to be trilingual or even quadrilingual before 20, I’ve grown to have a pretty decent appreciation for languages in general, both real life and programming. So Apertium and it’s unique approach of not using popular deep-learning based solutions and instead opting to preserve linguistic diversity for endangered languages has been an organisation I’ve been wanting to contribute for a while

Hence, I’ve been working on projects in Natural Language Processing for a couple years as an Undergrad. That said, I’m still not confident enough to bring a complete language pair up to release in this short timeline but I still want to contribute.

Having made a couple extensions for chrome and firefox before, I can also confidently say that I am able to deliver a finished product with all features within the timeframe given.

So to answer the title, both because of my excitement for the topic and ability to contribute to this project


Proposal[edit]

Which task am I interested in? What do I plan to do?[edit]

My proposal is to develop the Apertium Browser Plugin mentioned in the Project Ideas for GSoC.

The current Geriaoueg plugin is out of date, with the official link given in the wiki being unreachable and the 2014 version on GitHub being completely unusable on both Firefox and Chrome.

The extension I plan on making will have three main functionalities:*

  • Translating a word or phrase that the user hovers on on a website
  • Translating between an existing language pair in the extension pop-up
  • Translating an entire webpage at a time


Benefits to the Community/ Why should Apertium sponsor this?[edit]

Currently the Apertium platform can be used either offline or directly on its website. A browser extension is a great way to add another platform for end users to use. Moreover, for translation functions an extension is arguably better than a website in providing translation features for a user, since it is both easier to use and will operate within the webpage itself.

Also considering that there’s much less time this GSoC, an extension is a project that is definitely deliverable and will greatly enhance the useability of the Apertium Project. Compared to something longer and more time-intensive like bringing a new language pair to release, I can guarantee that this proposal can be developed into a product within the timeline given


Implementation Plan[edit]

Most of the work done will be in the extension/plugin with support of the Apertium Apy for the real heavy lifting i.e. Language Identification and Translation. Because of this, it is easier to fit into this year’s shorter GSoC Timeline.

The three main types of POST requests I will be using to make this are:

  • /identifyLang - to first identify which language we’re translating from
  • /translate - for translating words the user hovers on
  • /translateDoc - for translating entire webpages (might possible have to use /translate for that as well depending on how well it plays with most websites)


Note: In order to translate entire web pages more effectively, as <TinoDidriksen> suggested, it might be more efficient/future-proof to tag all ‘to be translated’ inline elements with a class in a completely separate transport html document and then pass this new document to /translateDoc to be translated.

Another addition to the project is adding context-based hover translations. Basically, the idea is to work upon the feature added in last year’s GSoC, Markup handling with wordbound blanks. However instead of using this to maintain markup formatting through documents, we bind the original word through the translation pipeline and then display this context based translation upon hover.

This can also be implemented in the reverse direction i.e. upon translating an entire document, the extension can display the original word that it was translated from.


As for the extension itself, it will first ask users which language they want to translate to when it is first installed, this will be saved as the go to language when showing translation over hovers or in the extension pop-up.

After this initial setup, the extension is mainly listening for two events, either the user hovers over a word or asks to translate either a custom phrase or the entire web page in the pop-up.

Hover[edit]

  • In this case, the extension first finds the exact word hovered on using jQuery’s mouseover() method to find the relevant div and then the exact word with the mouse coordinates.
  • The language of the word is then found using the /identifyLang functionality of the API,
  • And then subsequently translated using the /translate functionality.
  • The hovering text above the word will display language it was translate from, as well as its meaning

Pop-Up[edit]

  • Here the user enters the language they are translating from and to
  • Past that it’s fairly similar to the actual website, where there’s one input text bar, one output and two dropdowns to select the language
  • Naturally, there will be a Detect Language option for the input language
  • A button below this would give the option to translate the entire web page


Deliverable[edit]

A browser plugin supported at least on both Firefox and Chrome able to translate individual words, and entire webpages (To be tested on websites like wikipedia, BBC News, The Economic Times, and even social media like Facebook, Reddit, Twitter, etc.).


Timeline[edit]

Phase 1[edit]

Community Bonding Period

(May 17 - June 7)

  • Understand better how the Apertium API works
  • Identify parts of the Geriaoueg extension that still work
  • Check Source Code of similar extensions
  • Design tests, experiments and evaluation procedures for an extension (which sites it should be able to read, etc.)
  • Write a workflow diagram of the improved extension, what background processes it will have and what permissions it will need
Week 1

(June 7 - June 14)

Set up basic extension on chromium that can:

  • Detect the word it hovers on (on websites like wikipedia, news websites, etc.)
  • Shows a basic popup when clicked on
Week 2

(June 14 - June 21)

  • If needed, this is the time to tweak Apy functionalities slightly but should not be necessary
  • Apart from that, actually hook up the extension to use the API functionalities to translate in the pop-up
  • Properly set up the html pop-up of the extension
  • Validate input given to pop-up
  • Enable different options for translation
Week 3

(June 21 - June 28)

  • Start work on hover functionality
  • design the hovering text-box and the details that will be shown on it
  • Implement translation features for hover function too
Week 4

(June 28 - July 5)

  • Start work on the translate entire document feature
  • Experiment using the /translateDoc functionality for the .html page
  • If this doesn’t work, translate the document section by section
Week 5

(July 5 - July 12)

  • While it was made in chromium, this week is for making sure there’s no problems implemented whatever’s done so far on other browsers (in order: Chrome, FireFox, Edge, Brave)
  • Final Touch Ups on the current functionalities of the extension
  • Start working on documentation

Deliverable #1

Browser plugin that can translate words hovered on or those typed into its input pop-up. Implementable on the most popular Chromium based browsers (Chrome, FireFox, Edge, Brave)

Phase 2[edit]

Week 6

(July 16 - July 23)

  • Start work on inline-gist translation i.e. the context-based hover translation
  • Experiment using the wordbound blanks made in last year’s GSoC
Week 7

(July 23 - July 30)

  • Continue working on the previous week’s start
  • If it has been implemented in one direction, then work on getting it working in the reverse too
Week 8

(July 30 - August 6)

  • During this week, the tests prepared in the Community Bonding Period will be used to test all the functionalities implemented so far.
  • If needed, changes will be made but the project should be up and running at this point.
  • Apart from that, write documentation for the extension
Week 9

(August 6 - August 13)

The extension should be completely functional by now. All that’s left at this point will be:

  • Minor bugfixes
  • UI redesign
  • Small optimisations in code
Week 10

(August 13 - August 16)

Intentionally kept free, so as to sort out any issues that crop up before this and cause any unforeseen delay

Deliverable #2

The project will be complete at this point. A browser plugin supported at least on both Firefox and Chrome able to translate individual words, and entire webpages (Tested on websites like wikipedia, BBC News, The Economic Times, and even social media like Facebook, Reddit, Twitter, etc.).


Other Summer Plans[edit]

Of which I have none so I’ll be free to work on this full time