User:Mono/GSoC 2017

From Apertium
< User:Mono
Revision as of 01:40, 8 March 2018 by Sushain (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Apertium is a free/open-source platform for rule-based machine translation and language technology which is aimed providing support for lesser-resourced and marginalized languages. The current interface of Apertium is already pretty awesome. However, adding a few more functionalities such as webpage translation, spellchecker interface and the dictionary lookup feature would make this platform even more awesome. My GSoC project has revolved around implementing these features along with making the interface more robust.

I would like to thank my mentors Sushain, Jonathan, Xavivars, Unhammer, TinoDidriksen and the entire Apertium community for helping and guiding me throughout the course of this project. All that was accomplished wouldn't have been remotely possible without the support of my mentors. I would also thank Google Summer of Code community to provide me with the platform where I could learn and build my skillset and quench my thirst for open source contribution.

The further part of the wiki mentions about the work I have accomplished during the period of GSoC 2017.

Webpage Translation mode

An interface that lets the user to input a URL, choose a source language and a destination language and translate the webpage. This feature has been successfully completed as a part of my GSoC project! Both the frontend as well as the backend for this feature have been merged into the main project.

Code

Backend

Frontend

Documentation

Backend

URL Function Parameters Output
/translatePage Translates a webpage
  • langpair: language pair to use for translation
  • url: url of webpage that has to be translated
Returns the translated webpage
curl -Ss 'http://localhost:2737/translatePage?langpair=eng|spa&url=http://facebook.com'

output

Frontend

ENABLED_MODES: an array of the enabled interfaces, a non-empty subset of ['translation', 'analyzation', 'generation', 'sandbox']

  • translation lookup turns on webpage translation mode.

The backend for this mode is merged through this commit. The frontend for this project is merged through this PR The screenshot of the current state of interface can be found here.

Future Work

Make use of a form handler while submitting the URL links for translation. The related issues are in this comment.


SpellChecker mode

Checks for the spelling of input text for a given language and suggests alternatives if the spelling is wrong.

Code

Backend

Frontend

Documentation

Backend

URL Function Parameters Output
/speller Performs spellchecking on a given text for a given language
  • lang: language to perform spellchecking for
  • q: text to perform spellchecking on
Returns the spellchecking results
curl -Ss 'http://localhost:2737/speller?lang=hin&q=माय' | ascii2uni -a U -q

[{"sugg": [["काय", "1.000000"], ["चाय", "1.000000"], ["राय", "1.000000"], ["हाय", "1.000000"], ["साय", "1.000000"], ["मा", "1.000000"], ["वाय", "1.000000"], ["दाय", "1.000000"], ["गाय", "1.000000"], ["जाय", "1.000000"]], "known": false, "token": "माय"}]

Frontend

ENABLED_MODES: an array of the enabled interfaces, a non-empty subset of ['translation', 'analyzation', 'generation', 'sandbox', 'speller']

  • speller turns on spell checking mode.

The screenshot of the current state of interface can be found here.

Future Work

Improving the logic of mapping the suggestions returned from the backend for the tokens appropriately to the corresponding text on the frontend.


Dictionary Lookup mode

An interface that generates all forms of a given word. It renders the definitions of a given word for a given language pair after translating them.

Code

Backend

Frontend

Documentation

Backend

URL Function Parameters Output
/dictionaryLookup Generate dictionary forms of a given word
  • langpair: language pair to use for translation
  • q: word to perform dictionary lookup on
Returns the possible forms of after translation
curl -Ss 'http://localhost:2737/dictionaryLookup?langpair=eng|spa&q=light'
{"vblex": ["encender", "iluminar"], "n": ["luz"], "adj": ["ligero", "claro"]}

Frontend

ENABLED_MODES: an array of the enabled interfaces, a non-empty subset of ['translation', 'analyzation', 'generation', 'sandbox']

  • translation lookup turns on dictionary lookup mode.

The screenshot of the current state of interface can be found here.

Future Work

The pending tasks with respect to dictionary lookup mode are discussed in this comment.


Suggestions Interface

An interface that lets the user insert suggestions on the wiki page.

Code

Frontend

Backend

Future Work

This feature had just began. Focus was first put on completing the above 3 features before progressing on this one. Thus, there is no documentation on this feature as a part of my project. The future tasks for this feature would involve enhancing both the frontend as well as the backend code, testing the functionality and then creating a pull request for the same.


Installation Notification

1. A notification that appears when the requests made to the APy take more than a threshold time.
2. This notification also appears when an average of the duration of requests exceeds a certain threshold indicating that the servers may be overloaded in that particular time phase and thus, one could set the APy locally too.
3. At any point, we maintain a queue of duration of requests with a certain maximum size. If the size of the queue exceeds this threshold, we dequeue a duration and enqueue the duration of the latest request. This ensures a moving average and helps determine if the load on the server has reduced.

Code

1. An issue that was observed here was, a variable apyRequestStartTime stored the timestamp when an AJAX request is made through callApy method. This variable was not cleared after the execution of request. Thus, if an AJAX request is made which is not handled through callApy(), on completion, it used up the start timestamp of the previous request and thus, the difference between the timestamp at which the request completes and the previous start timestamp almost always exceeded the threshold. This erroneously displayed the notification.
The following patch resolved the above issue.

Code


POST v/s GET

1. Initially, the AJAX requests made use of GET method to retrieve data from the backend.
2. The GET method was used along with jsonp to allow cross domain requests. However, this gave a 414-request URI too large error when the input size was large and thus, resulted in failed requests.
3. This issue was resolved by making use of a POST method if the request size was beyond a threshold size, and a GET method otherwise.

Code


Language Dropdown going offscreen Issue

1. The language dropdowns of the source languages and the destination languages used to go off-screen when the browser window size was adjusted. This would obstruct the user from choosing the language of his choice.
2. This issue was fixed by dynamically determining the available space on the browser window (triggered on resize) and adjusting the number of columns to fit the languages inside the viewport.

Code


LTR/RTL alignment of languages in dropdown

1. Inspite of setting a left-to-right or right-to-left orientation for the language display names, the browser did not render it in the expected manner.
2. A patch was created which applied the necessary styling to the display names along with the styling of other associated UI elements to achieve the right rendering.

Code


Interface breaks when cookies are disabled issue

1. The Apertium interface used to break when the cookies were disabled.
2. This was because the interface used to interact with the localStorage of the browser and when the cookies were disabled, this interaction was prohibited by the browser. This was not handled in the code.
3. The issue was resolved by handling the exception that occurs when the cookies were disabled.

Code


Improve detectLanguage() functionality

1. The detectLanguage() method did not call the autoDstSelectLang() method to detect a destination language automatically after the langauge for a given text was identified.

Code


Prevent the requests when input is empty

1. The handlers on the backend gave an internal server error when the requests were made with empty inputs or if any of the necessary arguments were missing.
2. This validation was added for a lot of functionalities such as that of Analyzer, Generator, Detect Language, APy Sandbox.

Code


Improvement of Functionalities

1. The swap button did not swap the source language and destination language on smaller screens.
2. The translate button did not call the translate() method on smaller screens.
3. The Detect Language button was active on docTranslation interface whereas the detection it used to perform was for the input text on translateText interface.
4. Calling appropriate translate() method based on the interface on which it is called.
5. Fixing the container animation issues. When the interface was switched between containers rapidly, the animation used to break and it would render a blank screen.
6. The language selectors used to overlap with the swap button for a certain set of recent source languages.
7. Adding a button that takes the user to the top of the webpage.
8. APY to APy stylizations.
9. Alignment of Translate, Analyze and Request buttons with their respective textareas on the interface.
10. Execute translate() method as soon as any of source languages or destination languages is changed. (so that it executes even on docTranslation interface)
The above issues were resolved through following patches:


Code


Miscellaneous Issues

1. Mark unknown checkbox to be sent with docTranslation interface.
2. Textarea sizes getting restored on page resize.
Pull requests have been created to solve the above issues.


Code


Important Links

1. Apertium Wiki

2. Apertium Web Interface

3. Aperium html-tools github

4. Apertium APy github

5. Apertium html-tools forked repo github

6. Apertium APy forked repo github

7. Commits to master (pull requests that got merged):
Frontend:

Backend:

8. Issues opened by me:
Frontend:

Backend:

9. Pull requests by me:
Frontend:

Backend: