Difference between revisions of "Web Dictionary Maintenance"
(90 intermediate revisions by 3 users not shown) | |||
Line 5: | Line 5: | ||
= Introduction = |
= Introduction = |
||
Apertium over the years still has deficiencies so that lay people can contribute to increase the base of words in dictionaries, even with simple contributions. Collaboration is the key to developing a tool that serves a huge range of users throughout the world. We believe that it is possible to engage more users and thus give greater impact to the tool within the communities where there is involvement. We will deal in this document with requirements on non-functional and technical objectives regarding the tool proposal. |
Apertium over the years still has deficiencies so that lay people can contribute to increase the base of words in dictionaries, even with simple contributions. Collaboration is the key to developing a tool that serves a huge range of users throughout the world. We believe that it is possible to engage more users and thus give greater impact to the tool within the communities where there is involvement. We will deal in this document with requirements on non-functional and technical objectives regarding the tool proposal. |
||
<sub>O Apertium com o passar dos anos ainda apresenta deficiências para que pessoas leigas possam contribuir para o aumento da base de palavras em dicionários, mesmo que com contribuições simples. A colaboração é a chave para o desenvolvimento de uma ferramenta que serve a uma gama enorme de usuários ao longo do mundo. Acreditamos que é possível engajar mais usuários e com isso dar maior impacto da ferramenta dentro das comunidades em que existe envolvimento. Trataremos neste documento sobre requisitos sobre objetivos não funcionais e técnicos sobre a proposta de ferramenta. |
|||
</sub></span> |
|||
;Original Ideias |
;Original Ideias |
||
Line 20: | Line 18: | ||
: '''Unofficial Mentor:''' Alessio Miranda Junior<br /> |
: '''Unofficial Mentor:''' Alessio Miranda Junior<br /> |
||
: '''Telegram/Whatsapp:''' Alessio: +55 (31) 9.8888-7770<br /> |
: '''Telegram/Whatsapp:''' Alessio: +55 (31) 9.8888-7770<br /> |
||
: '''E-mail:''' alessio@cefetmg.br or viniciussnogueira13@gmail.com<br /> |
: '''E-mail:''' alessio@cefetmg.br or vinicius_snogueira@hotmail.com / viniciussnogueira13@gmail.com<br /> |
||
: '''IRC:''' AlessioJr<br /> |
: '''IRC:''' viniciussn or AlessioJr<br /> |
||
: '''GTalk:''' alessiojunin@gmail.com |
: '''GTalk:''' alessiojunin@gmail.com |
||
=Description= |
=Description= |
||
==Abstract:== |
==Abstract:== |
||
The dictionaries complexity and size makes its modification extremely hard and time-consuming, and, along with the need for programming knowledge, keeps away potential contributors. A tool that facilitates the work of dictionaries developers, eliminate computer knowledge (XML, Git, etc) and that allows collaboration between members of the community is of extreme necessity because, with the removal of these barriers, a larger group of people will have the necessary requirements to participate in the development of dictionaries, thus increasing the translation capacity of Apertium. |
|||
Even with several tools that help the way you manage and create dictionaries, Apertium is far from the normal users, who have a huge contribution potential. Or even create a simple way to create contributions to recurring users with advanced knowledge. |
Even with several tools that help the way you manage and create dictionaries, Apertium is far from the normal users, who have a huge contribution potential. Or even create a simple way to create contributions to recurring users with advanced knowledge. Although it is a bold plan, and there are several possible and desirable requirements, we will describe some basic premises for this step and that must be respected and aligned with the apertium developer community. |
||
Although it is a bold plan, and there are several possible and desirable requirements, we will describe some basic premises for this step and that must be respected and aligned with the apertium developer community. |
|||
<span style="color:red"><sub> |
|||
Mesmo com várias ferramentas que auxiliam a forma de gerir e criar dicionários ainda o apertium se mostra distante dos usuários normais que tem um enorme potencial de contribuição. Ou mesmo criar uma forma simples de criar contribuições para usuários recorrentes com conhecimento avançado. |
|||
Apesar de ser um plano ousado, e existirem vários requisitos possíveis e desejáveis, vamos descrever algumas premisas básicas para este etapa e que devem ser respeitadas e devem estar alinhadas com a comunidade de desenvolvedores apertium.</sub></span> |
|||
==Definitions== |
==Definitions== |
||
:* Lay users do not need to know Apertium's internal structure |
:* Lay users do not need to know Apertium's internal structure. |
||
:* Intermediate users need to have a knowledge of dictionary management methodology |
:* Intermediate users need to have a knowledge of dictionary management methodology. |
||
:* Advanced users are those who know the structure of Apertium. |
:* Advanced users are those who know the structure of Apertium. |
||
:* Communities are user groups that merge characteristics to the database extension of a dictionary of a specific language. |
:* Communities are user groups that merge characteristics to the database extension of a dictionary of a specific language. |
||
:* Official dictionaries are dictionaries managed and with seal of the apertium community. |
:* Official dictionaries are dictionaries managed and with seal of the apertium community. |
||
:* Unofficial dictionaries are dictionaries run by an independent community |
:* Unofficial dictionaries are dictionaries run by an independent community. |
||
:* Test dictionaries are deprived of users to test their contributions |
:* Test dictionaries are deprived of users to test their contributions. |
||
==Objectives:== |
==Objectives:== |
||
:* The major goal of the project is to develop a web tool to facilitate the management of the Apertium (XML's) database of integrators for novice and advanced users. |
:* The major goal of the project is to develop a web tool to facilitate the management of the Apertium (XML's) database of integrators for novice and advanced users. |
||
:* New communities of contributions should be able to organize independently of a central command. The command exists but will be distributed. |
:* New communities of contributions should be able to organize independently of a central command. The command exists but will be distributed. |
||
:* Lay users who feel motivated to contribute, should have an interface that promotes ease in contributing even with limited possibilities |
:* Lay users who feel motivated to contribute, should have an interface that promotes ease in contributing even with limited possibilities. |
||
:* The development of dictionaries should be collaborative and distributed. Each dictionary should have a management community, but a user may have the freedom to disagree and create a new version maintaining a history of the author. |
|||
</sub></span> |
|||
:* Any tool should maintain the traditional Apertium structure and not be interoperable between existing tools. |
|||
:* The development of dictionaries should be collaborative and distributed. Each dictionary should have a management community, but a user may have the freedom to disagree and create a new version maintaining a history of the author. <span style="color:red"><sub>O desenvolvimento de dicionários deve ser colaborativo e distribuído. Cada dicionário deve ter uma comunidade que faz a gestão, mas um usuário pode ter a liberdade de discordar e criar uma nova versão mantendo um histórico do autor.</sub></span> |
|||
:* There must be a methodology for communities to maintain the quality of their dictionaries. Although they are distributed the goal is to create a unified and mature dictionary. |
|||
:* Any tool should maintain the traditional apertium structure and not be interoperable between existing tools. <span style="color:red"><sub>Qualquer ferramenta deve manter a estrutura tradicional do apertium e não ser interoperãvel entre as ferramentas já existentes.</sub></span> |
|||
:* There must be a methodology for communities to maintain the quality of their dictionaries. Although they are distributed the goal is to create a unified and mature dictionary. <span style="color:red"><sub>Deve haver uma metodologia para que as comunidades possam manter a qualidade de seus dicionários. Embora eles sejam distribuídos o objetivo é criar um dicionário unificado e maduro.</sub></span> |
|||
==Questions:== |
|||
:* -- <span style="color:red"><sub>--</sub></span> |
|||
:* Dixtools is available? Problems? |
|||
:* Any Problem to use Gitlab (OpenSource) Like projetos.a2portais.com.br or gitlab |
|||
:* Gitlab api? free version? |
|||
=Technical Objectives:= |
=Technical Objectives:= |
||
:* Develop, initially, monolingual dictionaries but keeping the particular format of each file. |
:* Develop, initially, monolingual dictionaries but keeping the particular format of each file. |
||
:* Minimize the direct manipulation of XML files, providing features that reduce this need. |
:* Minimize the direct manipulation of XML files, providing features that reduce this need. |
||
:* Making use of |
:* Making use of GitLab as an administrative and control tool. |
||
== Git, GitLab and Apertium == |
|||
:* <span style="color:red"><sub>Evitar a manipulação direta de arquivos XML, facilitando o trabalho dos desenvolvedores de dicionários e consequentemente aumentando sua produtividade;</sub></span> |
|||
[[File:Git_flow_demonstration.jpg|thumb|250px|right|Git Flow demonstration]] |
|||
:* <span style="color:red"><sub>Permitir que usuários leigos em programação possam contribuir com o desenvolvimento e manutenção dos dicionários;</sub></span> |
|||
As the Apertium files are textual, we had the idea of seeking productivity and using existing and consolidated tools in the world. |
|||
:* <span style="color:red"><sub>Facilitar a colaboração entre membros da comunidade no desenvolvimento dos dicionários.</sub></span> |
|||
The proposal is to create a friendly web interface for lay users, but use git's Back-End as version control and do dictionary management in GitLab as it is an administrative tool that already guarantees incredible power for remote collaboration. |
|||
To demonstrate this relationship we will highlight some concepts of Git. Over time there may be several TimeLines: |
|||
:* --. |
|||
:* The Official Branch (Master) has the official version of the apertium managed by official developers or maintainers. |
|||
:* Users have the power to create new Branches and develop unofficial dictionaries in parallel, creating communities that may from time to time suggest modifications to Brach Master. |
|||
:* These requests can be accepted or rejected by maintainers, and this format is recursive and other developers can create unofficial copies of unofficial ones. |
|||
== Git, GitLab and Appertium == |
|||
:* Integration requests will use the Merge Request concepts present in tools such as GitLab that will be the BackEnd for advanced users. |
|||
We are planning to use Git as an administrative and version control tool. |
|||
:* All dictionaries releases will be in a single repository and available to any member of the community to continue the work. |
|||
:* Each user will have your own branch to work on, which also will be available to the entire community to contribute to it. |
|||
:* The users will be able to make merge requests with other branches (official or unofficial). |
|||
:* At first, we will use GitLab to manage this merges requests. The team who maintain each dictionary will judge if the modifications are valids or not. |
|||
=Application-GSOC2019= |
=Application-GSOC2019= |
||
===Abstract:=== |
===Abstract:=== |
||
The idea is to develop the Web tool so that lay users contribute to dictionaries and use Git management with GitLab to manage these changes. |
|||
* The focus will be on monolingual dictionaries and bilingual direct relationships. |
|||
• Create a alternative form to edit dix files with GUI resources. |
|||
* Use the Git methodology and propose a guideline or flow for collaborative dictionary management. |
|||
• Develop, initially, monolingual dictionaries but keeping the particular format of each file. |
|||
* Do not change the current XML format of dictionaries. |
|||
• Minimize the direct manipulation of XML files, providing features that reduce this need. |
|||
• Making use of DixTools to keep code reuse. |
|||
===Why is it you are interested in machine translation?=== |
|||
My . |
|||
===Why |
===Why is it you are interested in machine translation and Apertium?=== |
||
During the graduation course, Professor Aléssio demonstrated the need and opportunity to contribute with open source projects. |
|||
I’ve . |
|||
He introduced me to Apertium and in my final graduation work I started reading and talking about it. |
|||
We started discussing techniques for managing collaboration with the community and currently do not have |
|||
as much knowledge about machine translation machines but I have experience in developing applications |
|||
and web using collaboration with Git and GitLab. |
|||
===Which of the published tasks are you interested in?=== |
===Which of the published tasks are you interested in?=== |
||
[http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Easy_dictionary_maintenance Easy dictionary maintenance] |
|||
Easy. |
|||
===Why should Google and Apertium sponsor it?=== |
===Why should Google and Apertium sponsor it?=== |
||
This project will have an impact in all languages. Facilitating dictionary manipulation will benefit everyone, |
|||
I . |
|||
allowing a wider range of people to contribute to existing dictionaries and start new language projects. |
|||
===How and who will it benefit in society?=== |
===How and who will it benefit in society?=== |
||
This application will benefit the entire Apertium community. |
|||
I . |
|||
Dictionary developers will have a tool to speed up their work and an easy-to-use tool will attract new contributors. |
|||
As a result, Apertium's translation capacity will be increased, bringing better results to the end user. |
|||
===What do you plan to do?=== |
===What do you plan to do?=== |
||
We are planning to create a web GUI that will facilitate common tasks of creating/modifying dictionaries and allow lay users to contribute to a dictionary. |
|||
The application will be connected to a Git repository, which will be used as an administrative and version control tool. |
|||
We. |
|||
'''Development Paradigm:''' MVC Paradigm<br /> |
|||
'''Program Language:''' Java |
'''Program Language:''' Java, JavaScript<br /> |
||
'''Persistence:''' XML (Apertium XML Files)<br /> |
'''Persistence:''' XML (Apertium XML Files)<br /> |
||
'''Framworks:''' |
'''Framworks:''' Angular<br /> |
||
===Stages/Milestones=== |
===Stages/Milestones=== |
||
Line 116: | Line 112: | ||
! Stage |
! Stage |
||
! Description |
! Description |
||
|- |
|||
| Community Bonding Period |
|||
| Technology analysis |
|||
|Analyze and find a good way to manipulate XML in a database. |
|||
Testing and choosing the best alternative. |
|||
UML diagrams of the project. |
|||
|- |
|- |
||
| 1, 2 |
| 1, 2 |
||
| -- |
| -- |
||
|Database modeling and creation. |
|||
| --. |
|||
Implementation of GitConnector. |
|||
|- |
|- |
||
| 2, 3 |
| 2, 3 |
||
| -- |
| -- |
||
| WebAPI development. |
|||
| --. |
|||
| |
|||
GUI prototype developement for monolingual dictionary: Symbols, Paradigms and Lemmas. |
|||
|- |
|||
| 3, 4 |
|||
| -- |
|||
| GUI prototype developement for bilingual dictionary. |
|||
|- |
|- |
||
| '''Prototype''' |
| '''Prototype''' |
||
| Milestone 1 |
| Milestone 1 |
||
| First version of GUI for community analysis. |
|||
| -- |
|||
|- |
|- |
||
| 5, 6 |
| 5, 6 |
||
| -- |
| -- |
||
| Development of WebDictionaryController (monolingual and bilingual dictionary). |
|||
| --. |
|||
|- |
|- |
||
| 7 |
| 7, 8 |
||
| -- |
| -- |
||
| Adjustments and adding new features with community feedback. |
|||
| --. |
|||
|- |
|- |
||
| |
| 9 |
||
| -- |
| -- |
||
| Functionalities integrations, test and fixes. |
|||
| --. |
|||
|- |
|- |
||
| '''Prototype''' |
| '''Prototype''' |
||
| Milestone 2 |
| Milestone 2 |
||
| First functional release for community analysis. |
|||
| -- |
|||
|- |
|- |
||
| |
| 10, 11 |
||
| -- |
| -- |
||
| Adjustments and adding new features with community feedback. |
|||
| --. |
|||
|- |
|- |
||
| |
| 12 |
||
| -- |
|||
| --. |
|||
|- |
|||
| 11 |
|||
| Pré-Release |
| Pré-Release |
||
| Fix remain bugs, final adjustments and documentation in Wiki |
|||
| -- |
|||
|- |
|||
| '''Prototype''' |
|||
| -- |
|||
| |
|||
|- |
|||
| 12 |
|||
| Makeup |
|||
| -- |
|||
|- |
|- |
||
| |
| |
||
| '''Final Release''' |
| '''Final Release''' |
||
|Final version realease. |
|||
| |
|||
|} |
|} |
||
===Presentation=== |
===Presentation=== |
||
My name is Vinicius, a Brazilian Student at . |
My name is Vinicius, a Brazilian Student at Federal Center for Technological Education of Minas Gerais - Brazil. |
||
===Resume of Skills=== |
===Resume of Skills=== |
||
Line 189: | Line 188: | ||
Professional Skills: |
Professional Skills: |
||
* Language developer: Java, Phyton, |
* Language developer: C, Java, JavaScript, PHP, Phyton, PL/SQL. |
||
Projects for Summer: |
Projects for Summer: |
||
Line 196: | Line 195: | ||
* Planning to work in GSOC |
* Planning to work in GSOC |
||
=Project= |
=Project= |
||
==Why?== |
==Why?== |
||
:* The dictionaries complexity and size makes its modification extremely hard and time-consuming, and, along with the need for programming knowledge, keeps potential contributors away. A tool that facilitates the work of dictionaries developers, eliminate the need of computer knowledge (XML, Git, etc) and that allows collaboration between members of the community is of extreme necessity because, with the removal of these barriers, a larger group of people will have the necessary requirements to participate in the development of dictionaries, thus increasing the translation capacity of Apertium. |
|||
:* --. |
|||
:* --. |
|||
:* --. |
|||
:* --. |
|||
==How can use?== |
|||
:* --. |
|||
:* --. |
|||
:* --. |
|||
:* --. |
|||
==What its the plan?== |
==What its the plan?== |
||
:* We are planning to create a web GUI that allows the user to develop the basic tasks of dictionaries and translation pairs manipulation in an easy and practical way. |
|||
* -- |
|||
:* We will use Git as an administrative and control tool, allowing the user to contribute to the work of other community members or start your own project. |
|||
==Components ans Technologies== |
|||
* -- |
|||
[[File:ComponentDiagramWebDictionary.jpg|thumb|700px|right|Components Diagram]] |
|||
* Actors |
|||
** Lay User |
|||
** Dictionary Admin |
|||
* UserBrowser |
|||
* -- |
|||
** WebInterface with AngularJS - GUI to easily manipulate the dictionary XML file. |
|||
* Web Dicionary Maintence (APP) |
|||
* -- |
|||
** WebAPI - Interface between user's GUI and server. See: [LINK] |
|||
*** |
|||
** Temporary Database - Temporary database in which the XML file will be loaded with the crosses made. |
|||
** JavaGit Connector - API to execute necessary Git functions. |
|||
** WebDictionary Controller |
|||
* |
* Git |
||
** GitRepository - Repository containing the dictionaries |
|||
** GitLab - Repository management tool |
|||
** GitHub - Repository management tool |
|||
[[File:SequenceDiagramWebDictionary.jpg|thumb|700px|right|Components Diagram]] |
|||
==How it Works?== |
==How it Works?== |
||
We will develop a WebInterface to lay users add words or suggest modifications. |
|||
* -- |
|||
We are planning to use Git as an administrative and version control tool. |
|||
* All dictionaries releases will be in a single repository and available to any member of the community to continue the work. |
|||
* -- |
|||
* Each user will have your own branch to work on. |
|||
* The users will be able to make merge requests with the official branch. |
|||
* At first, we will use GitLab (or similar tool) to manage this merges requests. The team who maintain each dictionary will judge if the modifications are valid or not. |
|||
=Example of Interaction - Adding new lemma to a dictionary= |
|||
* -- |
|||
<br /> |
|||
<br /> |
|||
<br /> |
|||
<br /> |
|||
{| class="wikitable" border="1" |
|||
|- |
|||
! Stage |
|||
! Description |
|||
! Image |
|||
|- |
|||
| 1 |
|||
| ''' Choosing an official dictionary and importing it ''' |
|||
The Official Dictionaries page display all available dictionaries in the git repository. |
|||
Each version correspond to a tag in the master branch. |
|||
* -- |
|||
Once imported, the user will name this dictionary and it will appear on the My Dictionaries page. |
|||
* -- |
|||
| [[File:Prototype1.png|thumb|600px|center|Choosing an official dictionary and importing it]] |
|||
==Components ans Tecnologies== |
|||
[[File: |
[[File:Git1.png|thumb|600px|center|Master branch]] |
||
[[File:Prototype2.png|thumb|600px|center|My Dictionaries page]] |
|||
* -- |
|||
|- |
|||
| 2 |
|||
| ''' Loading the dictionary file ''' , |
|||
On the Language File page, the user selects one of their dictionaries and the dictionary XML will be formatted to be temporarily saved to a database. Also on this page, the user can edit the dictionary alphabet and see some statistics about it. |
|||
The Symbols, Paradigms, and Lemmas pages refer to the dictionary selected here. |
|||
| [[File:Prototype3.png|thumb|600px|center|Loading the dictionary file]] |
|||
|- |
|||
| 3 |
|||
| '''Verifying if the lemmas already exists''' |
|||
On the Lemmas - Search page, the user will be able to see all the inflections of the dictionary lemmas and check if the lemma he wants to add already exists. |
|||
| [[File:Prototype4.png|thumb|600px|center|Verifying if the lemmas already exists]] |
|||
|- |
|||
| 4.1 |
|||
| ''' Creating the new lemma - Step 1 ''' |
|||
In the Lemmas - Create page, the user will follow three steps to add a new lemma. |
|||
In the first step, the user will define the lemma and the lemma root. In addition, he will give some examples of inflections and symbols to receive suggestions on which paradigm to use. |
|||
* -- |
|||
| [[File:Prototype5.png|thumb|600px|center|text-top|Creating the new lemma - Step 1]] |
|||
|- |
|||
| 4.2 |
|||
| ''' Creating the new lemma - Step 2 ''' |
|||
In the second step, the user will see paradigm suggestions (or can see all paradigms). When a paradigm is selected, all inflections of the created lemma will be shown to the user. If none of the paradigms fit, the user will be directed to create a new paradigm. |
|||
* -- |
|||
| [[File:Prototype6.png|thumb|600px|center|Creating the new lemma - Step 2]] |
|||
|- |
|||
| 4.3 |
|||
| ''' Creating the new lemma - Step 3 ''' |
|||
In the third and last step, the user will confirm and save the change. At this point, a branch will be create for this dictionary in the git repository (if she doesn’t already exists) and a commit with this changes will be done. |
|||
* -- |
|||
| [[File:Prototype7.png|thumb|600px|right|Creating the new lemma - Step 3]] |
|||
[[File:Git2.png|thumb|600px|center|New branch created in Git]] |
|||
|- |
|||
| 5 |
|||
| '''Merge Request''' |
|||
On My Dictionaries page, the user can create a merge request that will be judge by the master branch maintainer. |
|||
* -- |
|||
| [[File:Prototype8.png|thumb|600px|center|Merge Request]] |
|||
==Example of Iteraction== |
|||
|- |
|||
[[File:SequenceDiagramWebDictionary.jpg|thumb|500px|right|Example Sequence Diagram]] |
|||
| 6 |
|||
| '''Official dictionary keeper judges the merge request''' |
|||
In GitLab (or similar tool), the dictionary maintainer will see the list of merge requests. |
|||
At each request, the maintainer can view all commits and accepts only a few of them. |
|||
* -- |
|||
At each commit, he will judge the changes. |
|||
* -- |
|||
When he accepts the merge request, the user's branch is deleted. |
|||
* -- |
|||
|[[File:Git5.png|thumb|600px|center|Merge request list]] |
|||
[[File:Git6.png|thumb|600px|center|Commits from current merge request]] |
|||
[[File:Git7.png|thumb|600px|center|Changes]] |
|||
[[File:Git4.png|thumb|600px|center|GitLab Graph]] |
|||
|- |
|||
| 7 |
|||
| '''Feedback''' |
|||
On Merge Requests page, the user can see all request he have made and their status. |
|||
| [[File:Prototype9.png|thumb|600px|center|Feedback]] |
|||
|- |
|||
| |
|||
| '''Final Release''' |
|||
| |
|||
|} |
|||
[[Category:GSoC 2019 student proposals]] |
|||
* -- |
Latest revision as of 21:41, 6 April 2019
Translate to PT-BR with Google Tradução
Introduction[edit]
Apertium over the years still has deficiencies so that lay people can contribute to increase the base of words in dictionaries, even with simple contributions. Collaboration is the key to developing a tool that serves a huge range of users throughout the world. We believe that it is possible to engage more users and thus give greater impact to the tool within the communities where there is involvement. We will deal in this document with requirements on non-functional and technical objectives regarding the tool proposal.
- Original Ideias
- http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code
- http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Easy_dictionary_maintenance
- Original GSOC2019 Application
- http://wiki.apertium.org/wiki/User:Alessiojr/Easy_dictionary_-_Application-GSOC2019
- Studant Information
- Student: Vinícius Silva Nogueira
- Unofficial Mentor: Alessio Miranda Junior
- Telegram/Whatsapp: Alessio: +55 (31) 9.8888-7770
- E-mail: alessio@cefetmg.br or vinicius_snogueira@hotmail.com / viniciussnogueira13@gmail.com
- IRC: viniciussn or AlessioJr
- GTalk: alessiojunin@gmail.com
Description[edit]
Abstract:[edit]
The dictionaries complexity and size makes its modification extremely hard and time-consuming, and, along with the need for programming knowledge, keeps away potential contributors. A tool that facilitates the work of dictionaries developers, eliminate computer knowledge (XML, Git, etc) and that allows collaboration between members of the community is of extreme necessity because, with the removal of these barriers, a larger group of people will have the necessary requirements to participate in the development of dictionaries, thus increasing the translation capacity of Apertium.
Even with several tools that help the way you manage and create dictionaries, Apertium is far from the normal users, who have a huge contribution potential. Or even create a simple way to create contributions to recurring users with advanced knowledge. Although it is a bold plan, and there are several possible and desirable requirements, we will describe some basic premises for this step and that must be respected and aligned with the apertium developer community.
Definitions[edit]
- Lay users do not need to know Apertium's internal structure.
- Intermediate users need to have a knowledge of dictionary management methodology.
- Advanced users are those who know the structure of Apertium.
- Communities are user groups that merge characteristics to the database extension of a dictionary of a specific language.
- Official dictionaries are dictionaries managed and with seal of the apertium community.
- Unofficial dictionaries are dictionaries run by an independent community.
- Test dictionaries are deprived of users to test their contributions.
Objectives:[edit]
- The major goal of the project is to develop a web tool to facilitate the management of the Apertium (XML's) database of integrators for novice and advanced users.
- New communities of contributions should be able to organize independently of a central command. The command exists but will be distributed.
- Lay users who feel motivated to contribute, should have an interface that promotes ease in contributing even with limited possibilities.
- The development of dictionaries should be collaborative and distributed. Each dictionary should have a management community, but a user may have the freedom to disagree and create a new version maintaining a history of the author.
- Any tool should maintain the traditional Apertium structure and not be interoperable between existing tools.
- There must be a methodology for communities to maintain the quality of their dictionaries. Although they are distributed the goal is to create a unified and mature dictionary.
Questions:[edit]
- Dixtools is available? Problems?
- Any Problem to use Gitlab (OpenSource) Like projetos.a2portais.com.br or gitlab
- Gitlab api? free version?
Technical Objectives:[edit]
- Develop, initially, monolingual dictionaries but keeping the particular format of each file.
- Minimize the direct manipulation of XML files, providing features that reduce this need.
- Making use of GitLab as an administrative and control tool.
Git, GitLab and Apertium[edit]
As the Apertium files are textual, we had the idea of seeking productivity and using existing and consolidated tools in the world. The proposal is to create a friendly web interface for lay users, but use git's Back-End as version control and do dictionary management in GitLab as it is an administrative tool that already guarantees incredible power for remote collaboration.
To demonstrate this relationship we will highlight some concepts of Git. Over time there may be several TimeLines:
- The Official Branch (Master) has the official version of the apertium managed by official developers or maintainers.
- Users have the power to create new Branches and develop unofficial dictionaries in parallel, creating communities that may from time to time suggest modifications to Brach Master.
- These requests can be accepted or rejected by maintainers, and this format is recursive and other developers can create unofficial copies of unofficial ones.
- Integration requests will use the Merge Request concepts present in tools such as GitLab that will be the BackEnd for advanced users.
Application-GSOC2019[edit]
Abstract:[edit]
The idea is to develop the Web tool so that lay users contribute to dictionaries and use Git management with GitLab to manage these changes.
- The focus will be on monolingual dictionaries and bilingual direct relationships.
- Use the Git methodology and propose a guideline or flow for collaborative dictionary management.
- Do not change the current XML format of dictionaries.
Why is it you are interested in machine translation and Apertium?[edit]
During the graduation course, Professor Aléssio demonstrated the need and opportunity to contribute with open source projects. He introduced me to Apertium and in my final graduation work I started reading and talking about it. We started discussing techniques for managing collaboration with the community and currently do not have as much knowledge about machine translation machines but I have experience in developing applications and web using collaboration with Git and GitLab.
Which of the published tasks are you interested in?[edit]
Easy dictionary maintenance
Why should Google and Apertium sponsor it?[edit]
This project will have an impact in all languages. Facilitating dictionary manipulation will benefit everyone, allowing a wider range of people to contribute to existing dictionaries and start new language projects.
How and who will it benefit in society?[edit]
This application will benefit the entire Apertium community. Dictionary developers will have a tool to speed up their work and an easy-to-use tool will attract new contributors. As a result, Apertium's translation capacity will be increased, bringing better results to the end user.
What do you plan to do?[edit]
We are planning to create a web GUI that will facilitate common tasks of creating/modifying dictionaries and allow lay users to contribute to a dictionary. The application will be connected to a Git repository, which will be used as an administrative and version control tool.
Program Language: Java, JavaScript
Persistence: XML (Apertium XML Files)
Framworks: Angular
Stages/Milestones[edit]
Week | Stage | Description |
---|---|---|
Community Bonding Period | Technology analysis | Analyze and find a good way to manipulate XML in a database.
Testing and choosing the best alternative. UML diagrams of the project. |
1, 2 | -- | Database modeling and creation.
Implementation of GitConnector. |
2, 3 | -- | WebAPI development.
GUI prototype developement for monolingual dictionary: Symbols, Paradigms and Lemmas. |
3, 4 | -- | GUI prototype developement for bilingual dictionary. |
Prototype | Milestone 1 | First version of GUI for community analysis. |
5, 6 | -- | Development of WebDictionaryController (monolingual and bilingual dictionary). |
7, 8 | -- | Adjustments and adding new features with community feedback. |
9 | -- | Functionalities integrations, test and fixes. |
Prototype | Milestone 2 | First functional release for community analysis. |
10, 11 | -- | Adjustments and adding new features with community feedback. |
12 | Pré-Release | Fix remain bugs, final adjustments and documentation in Wiki |
Final Release | Final version realease. |
Presentation[edit]
My name is Vinicius, a Brazilian Student at Federal Center for Technological Education of Minas Gerais - Brazil.
Resume of Skills[edit]
Apertium Knowledge:
* Study to develop a collaborative tool for apertium. * A study of actual Apertium process about creating a language pair by the user point of view. * Analysis of the characteristics of the 2010 project of Aléssio and adaptation to current times.
Academic Skills:
* Graduating in Computer Engineering- Federal Center for Technological Education of Minas Gerais – Brazil
Professional Skills:
* Language developer: C, Java, JavaScript, PHP, Phyton, PL/SQL.
Projects for Summer:
* No professional activities planned. * Planning to work in GSOC
Project[edit]
Why?[edit]
- The dictionaries complexity and size makes its modification extremely hard and time-consuming, and, along with the need for programming knowledge, keeps potential contributors away. A tool that facilitates the work of dictionaries developers, eliminate the need of computer knowledge (XML, Git, etc) and that allows collaboration between members of the community is of extreme necessity because, with the removal of these barriers, a larger group of people will have the necessary requirements to participate in the development of dictionaries, thus increasing the translation capacity of Apertium.
What its the plan?[edit]
- We are planning to create a web GUI that allows the user to develop the basic tasks of dictionaries and translation pairs manipulation in an easy and practical way.
- We will use Git as an administrative and control tool, allowing the user to contribute to the work of other community members or start your own project.
Components ans Technologies[edit]
- Actors
- Lay User
- Dictionary Admin
- UserBrowser
- WebInterface with AngularJS - GUI to easily manipulate the dictionary XML file.
- Web Dicionary Maintence (APP)
- WebAPI - Interface between user's GUI and server. See: [LINK]
- Temporary Database - Temporary database in which the XML file will be loaded with the crosses made.
- JavaGit Connector - API to execute necessary Git functions.
- WebDictionary Controller
- WebAPI - Interface between user's GUI and server. See: [LINK]
- Git
- GitRepository - Repository containing the dictionaries
- GitLab - Repository management tool
- GitHub - Repository management tool
How it Works?[edit]
We will develop a WebInterface to lay users add words or suggest modifications. We are planning to use Git as an administrative and version control tool.
- All dictionaries releases will be in a single repository and available to any member of the community to continue the work.
- Each user will have your own branch to work on.
- The users will be able to make merge requests with the official branch.
- At first, we will use GitLab (or similar tool) to manage this merges requests. The team who maintain each dictionary will judge if the modifications are valid or not.
Example of Interaction - Adding new lemma to a dictionary[edit]
Stage | Description | Image |
---|---|---|
1 | Choosing an official dictionary and importing it
The Official Dictionaries page display all available dictionaries in the git repository. Each version correspond to a tag in the master branch. Once imported, the user will name this dictionary and it will appear on the My Dictionaries page. |
|
2 | Loading the dictionary file ,
On the Language File page, the user selects one of their dictionaries and the dictionary XML will be formatted to be temporarily saved to a database. Also on this page, the user can edit the dictionary alphabet and see some statistics about it. The Symbols, Paradigms, and Lemmas pages refer to the dictionary selected here. |
|
3 | Verifying if the lemmas already exists
On the Lemmas - Search page, the user will be able to see all the inflections of the dictionary lemmas and check if the lemma he wants to add already exists. |
|
4.1 | Creating the new lemma - Step 1
In the Lemmas - Create page, the user will follow three steps to add a new lemma. In the first step, the user will define the lemma and the lemma root. In addition, he will give some examples of inflections and symbols to receive suggestions on which paradigm to use. |
|
4.2 | Creating the new lemma - Step 2
In the second step, the user will see paradigm suggestions (or can see all paradigms). When a paradigm is selected, all inflections of the created lemma will be shown to the user. If none of the paradigms fit, the user will be directed to create a new paradigm. |
|
4.3 | Creating the new lemma - Step 3
In the third and last step, the user will confirm and save the change. At this point, a branch will be create for this dictionary in the git repository (if she doesn’t already exists) and a commit with this changes will be done. |
|
5 | Merge Request
On My Dictionaries page, the user can create a merge request that will be judge by the master branch maintainer. |
|
6 | Official dictionary keeper judges the merge request
In GitLab (or similar tool), the dictionary maintainer will see the list of merge requests. At each request, the maintainer can view all commits and accepts only a few of them. At each commit, he will judge the changes. When he accepts the merge request, the user's branch is deleted. |
|
7 | Feedback
On Merge Requests page, the user can see all request he have made and their status. |
|
Final Release |