Difference between revisions of "User:Mathematic-alpha/proposal"
(Created page with " Ngadou Yopa Malingo Street Buea, Cameroon (237) 681-702-945 Project: AAdopt an unreleased language pair with a minimal user interface April 2019 Name: Ngadou Yopa Sylvestr...") |
|||
(12 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
== Information == |
|||
This project was accepted for the 2019 edition of Google Summer of Code. |
|||
Consider checking [http://wiki.apertium.org/wiki/User:Mathematic-alpha/gsoc-progress this page] to check on the progress of the work. |
|||
Thanks |
|||
== Contact Information == |
|||
Ngadou Yopa |
'''Name:''' Ngadou Yopa Sylvestre Ronald |
||
Malingo Street |
|||
Buea, Cameroon |
|||
(237) 681-702-945 |
|||
Project: AAdopt an unreleased language pair with a minimal user interface |
|||
April 2019 |
|||
'''Display name:''' Ngadou Yopa |
|||
IRC Nickname: math-alpha (m-alpha) |
|||
'''Location:''' [https://en.wikipedia.org/wiki/Buea Malingo Street, Buea, Cameroon] |
|||
E-mail address: yopasylvestre@gmail.com (mathalpha26@gmail.com) |
|||
Website: http://ngadou.me/portfolio |
|||
'''E-mail:''' [mailto:yopasylvestre@gmail.com yopasylvestre@gmail.com] ([mailto:mathalpha26@gmail.com mathalpha26@gmail.com]) |
|||
Github: https://github.com/math-alpha |
|||
Gitlab: https://gitlab.com/mathematic-alpha |
|||
'''IRC:''' math-alpha (m-alpha) |
|||
Time Zone: UTC +1:00 (Central Africa) |
|||
School/Degree: B.Eng. in Computer Engineering, Faculty of Engineering and Technology, Buea, Cameroon |
|||
'''GitHub:''' [https://github.com/math-alpha math-alpha] |
|||
'''Gitlab:''' [https://gitlab.com/mathematic-alpha mathematic-alpha] |
|||
'''Telegram:''' [https://t.me/ngadou @ngadou] |
|||
'''Website:''' http://ngadou.me/portfolio |
|||
'''Time Zone:''' UTC +1:00 (Central Africa) |
|||
'''School/Degree:''' B.Eng. in Computer Engineering, Faculty of Engineering and Technology, Buea, Cameroon |
|||
Expected Graduation Year: December 2021 |
Expected Graduation Year: December 2021 |
||
== Why is it I am interested in machine translation? == |
|||
WHY IS IT THAT I AM INTERESTED IN MACHINE TRANSLATION? |
|||
A language is a method of human communication, either spoken or written, consisting of the use of words in a structured and conventional way. For a community to integrate in this evolving world, it needs an interface to communicate with other cultures. |
|||
A language is a method of human communication, either spoken or written, consisting of the use of words in a structured and conventional way. For a community to integrate into this evolving world, it needs an interface to communicate with other cultures. |
|||
I study computer engineering and I am highly interested in AI and mathematics. Machine translation is one of the branches of these sciences hence my interest. |
I study computer engineering and I am highly interested in AI and mathematics. Machine translation is one of the branches of these sciences hence my interest. |
||
WHY IS IT THAT I AM INTERESTED IN APERTIUM? |
|||
== Why is it that I am interested in Apertium? == |
|||
Apertium is an open source rule-based Machine Translation project and one of the very rare organizations working on NLP. I highly appreciate the community and developers who are doing great work in machine translation. |
Apertium is an open source rule-based Machine Translation project and one of the very rare organizations working on NLP. I highly appreciate the community and developers who are doing great work in machine translation. |
||
*Because Apertium is free/open-source software. |
|||
THE PUBLISHED TASK I AM INTERESTED IN |
|||
*Because its community is strongly committed to under-resourced and minoritised/marginalised languages. |
|||
Adopt an unreleased language pair : I'd like to develop the pairs Mə̀dʉ̂mbɑ̀-Français which is actually in the nursery plus a minimal user interface. |
|||
*Because there is a lot of good work done and being done in it. |
|||
*Because it is not only machine translation, but also free resources that can be used for other purposes: e.g. dictionaries, morphological analysers, spellcheckers, etc. |
|||
MY PROPOSAL |
|||
Title |
|||
== Which of the published tasks are you interested in? What do you plan to do? == |
|||
'''Adopt an unreleased language pair:''' I'd like to develop the pairs Mə̀dʉ̂mbɑ̀-Français which is actually in the nursery plus a minimal user interface. |
|||
== My proposal == |
|||
=== Title === |
|||
Adopt an unreleased language pair with a minimal user interface |
Adopt an unreleased language pair with a minimal user interface |
||
Major goals |
=== Major goals === |
||
Improving the Mə̀dʉ̂mbɑ̀-Français language pair up to 91 % of publicly available Mə̀dʉ̂mbɑ̀ corpus |
|||
*Improving the Mə̀dʉ̂mbɑ̀-Français language pair up to 91 % of publicly available Mə̀dʉ̂mbɑ̀ corpus |
|||
Mə̀dʉ̂mbɑ̀ to Français |
|||
**Mə̀dʉ̂mbɑ̀ to Français |
|||
Français to Mə̀dʉ̂mbɑ̀ |
|||
**Français to Mə̀dʉ̂mbɑ̀ |
|||
Developing a minimal interface for adding words and transfer rules |
|||
{{comment|Do you have a coding challenge related to this pair? That would be very important. —[[User:Firespeaker|Firespeaker]] ([[User talk:Firespeaker|talk]]) 15:13, 7 April 2019 (CEST)}} |
|||
{{comment|Yes since I have worked on it in GCI. I can still bootstrap a new pair if it is imperative —[[User:Mathematic-alpha|Mathematic-alpha]] ([[User talk:Mathematic-alpha|talk]]) 21:55, 8 April 2019 (CEST)}} |
|||
Reasons why Google and Apertium should sponsor it? |
|||
{{comment|:Bootstrapping a new pair isn't useful if one already exists, but the coding challenge is indeed imperative! —[[User:Firespeaker|Firespeaker]] ([[User talk:Firespeaker|talk]]) 23:12, 12 April 2019 (CEST)}} |
|||
There exist many local cultural movements in Africa with the goal of developing language and opening to the world. This project will definitely mark a starting point or proof of concept in Machine Translation in Cameroon. |
|||
{{comment| Noted}} |
|||
*Developing a minimal interface for adding words and transfer rules |
|||
** Demo https://github.com/math-alpha/contribution |
|||
** Interface for adding words |
|||
** Interface for adding transfer rules |
|||
The rationale for proposing a GUI is to encourage more people to get involved since most people I meet are not very comfortable with the terminal and all the rest. |
|||
During the post-application period, I plan to study in more detail the apertium-ambiguous package. I will also learn more about developing applications on the GitHub platform. My current perception is that for the limited time of the GSoC it is better to invest in developing what might serve as a basis to bring more non-technical contributors while showing a case study. |
|||
=== Reasons why Google and Apertium should sponsor it === |
|||
As mentioned above, the Apertium community is strongly committed to under-resourced and minoritised/marginalised languages and Google helps its own way via programs like GSoC and GCI. |
|||
There exist many local cultural movements in Africa with the goal of developing language and opening to the world but they generally fail to duel on a scientific basis. This project will definitely mark a starting point or proof of concept in Machine Translation in Cameroon and will greatly have a positive impact on language development. |
|||
=== Online translations === |
|||
Some efforts have been made by some independent organisations to develop dictionaries for Mə̀dʉ̂mbɑ̀ such as [https://glosbe.com/en/byv Glosbe], [https://translation.babylon-software.com/english/Medumba/ Babylon-Software], [https://resulam.com/ghomala-5/ Resulam] and some more. The problem is they use the "naive approach" in a sense they do not do PoS tagging nor have transfer rules. |
|||
=== Workplan === |
|||
{|class="wikitable" |
|||
! style="width: 10%" | Week |
|||
! style="width: 15%" | Dates |
|||
! style="width: 36%" | Goals |
|||
! style="width: 13%" | Bidix<br>(excluding<br>proper names) |
|||
! style="width: 13%" | WER |
|||
! style="width: 13%" | Coverage |
|||
|- |
|||
! Post-application period |
|||
| style="text-align:center" | 10 April - 26 May |
|||
| |
|||
* Find more language resources (Diktionary et al.) |
|||
* Study in more detail [[Using weights for ambiguous rules]] |
|||
| style="text-align:center" | ~6,000 |
|||
| style="text-align:center" | fra > byv ~30% |
|||
| style="text-align:center" | ~88% |
|||
|- |
|||
! 1 |
|||
| style="text-align:center" | 27 May - 2 June |
|||
| Improving Mə̀dʉ̂mbɑ̀ monodix<br/>Adding prn, pr, cnj*, basic adv to bidix |
|||
| style="text-align:center" | ~9,000 |
|||
| style="text-align:center" | |
|||
| style="text-align:center" | |
|||
|- |
|||
! 2 |
|||
| style="text-align:center" | 3 June- 9 June |
|||
| Adding n, adj, adv to the bidix from the French dictionary |
|||
| style="text-align:center" | ~12,000 |
|||
| style="text-align:center" | |
|||
| style="text-align:center" | ~86,0% |
|||
|- |
|||
! 3 |
|||
| style="text-align:center" | 10 June - 16 June |
|||
| Adding vblex to the bidix from the French dictionary<br/>Beginning to add missing words in decreasing order of frequency fra > byv |
|||
| style="text-align:center" | ~14,000 |
|||
| style="text-align:center" | |
|||
| style="text-align:center" | ~89% |
|||
|- |
|||
! 4 |
|||
| style="text-align:center" | 17 June - 23 June |
|||
| Adding words<br/>Transfer rules fra > byv |
|||
| style="text-align:center" | ~15,000 |
|||
| style="text-align:center" | |
|||
| style="text-align:center" | ~90% |
|||
|- |
|||
! 5 |
|||
| style="text-align:center" | 24 June - 30 June |
|||
| Adding words<br/>Deliverable #1: Mə̀dʉ̂mbɑ̀ to French translator |
|||
'''First evaluation''' (28 June) |
|||
| style="text-align:center" | ~16,000 |
|||
| style="text-align:center" | (WP) 15,0% |
|||
| style="text-align:center" | ~90.5% |
|||
|- |
|||
! 6 |
|||
| style="text-align:center" | 1 July - 7 July |
|||
| Adding words<br/>Transfer rules fra > byv<br/>Begin testvoc fra > byv |
|||
| style="text-align:center" | ~17,000 |
|||
| style="text-align:center" | |
|||
| style="text-align:center" | ~91% |
|||
|- |
|||
! 7 |
|||
| style="text-align:center" | 8 June - 14 July |
|||
| Adding words<br/>Transfer rules fra > byv<br/>Testvoc fra > byv |
|||
| style="text-align:center" | ~18,000 |
|||
| style="text-align:center" | |
|||
| style="text-align:center" | ~91.5% |
|||
|- |
|||
! 8 |
|||
| style="text-align:center" | 15 July - 21 July |
|||
| Developing the GUI for adding words |
|||
| style="text-align:center" | - |
|||
| style="text-align:center" | - |
|||
| style="text-align:center" | - |
|||
|- |
|||
! 9 |
|||
| style="text-align:center" | 22 July - 28 July |
|||
| Developing the backend code (github auth, pushes and pulls) |
|||
'''Second evaluation''' (26 July) |
|||
| style="text-align:center" | - |
|||
| style="text-align:center" | (WP) 7,2% |
|||
| style="text-align:center" | - |
|||
|- |
|||
! 10 |
|||
| style="text-align:center" | 29 July - 4 August |
|||
| Adding missing words in decreasing order of frequency byv > fra<br/>Transfer rules byv > fra<br/>Testvoc byv > fra |
|||
| style="text-align:center" | ~18,500 |
|||
| style="text-align:center" | (WP) 6,6% |
|||
| style="text-align:center" | ~92.0% |
|||
|- |
|||
! 11 |
|||
| style="text-align:center" | 5 August - 11 August |
|||
| Adding words<br/>Transfer rules byv > fra <br/>Testvoc byv > fra |
|||
| style="text-align:center" | ~19,500 |
|||
| style="text-align:center" | fra>byv (WP) 15,0% |
|||
| style="text-align:center" | ~93.0% |
|||
|- |
|||
! 12 |
|||
| style="text-align:center" | 12 August - 18 August |
|||
| Final improvements for the UI |
|||
| style="text-align:center" | - |
|||
| style="text-align:center" | - |
|||
| style="text-align:center" | - |
|||
|- |
|||
! 13 |
|||
| style="text-align:center" | 19 August - 25 August |
|||
| Final Improvements for dictionary<br/>Deliverable #3: Mə̀dʉ̂mbɑ̀ to French translator plus minimal GUI |
|||
'''Final evaluation''' (26 August) |
|||
| style="text-align:center" | ~19,500 |
|||
| style="text-align:center" | <15% |
|||
| style="text-align:center" | ~93.5% |
|||
|} |
|||
{{comment|Why does coverage go down in some cases? —[[User:Firespeaker|Firespeaker]] ([[User talk:Firespeaker|talk]]) 15:12, 7 April 2019 (CEST)}} |
|||
{{comment|Mistake. I am sorry —[[User:Mathematic-alpha|Mathematic-alpha]] ([[User talk:Mathematic-alpha|talk]]) 21:59, 8 April 2019 (CEST)}} |
|||
=== List your skills and give evidence of your qualifications === |
|||
I am a level 2 computer engineering student and I have the necessary skills needed to work on a software project. |
|||
Mə̀dʉ̂mbɑ̀ is my mother tongue. I am fluent in Français and English (due to the bilingual nature of my country and I was trained in a special bilingual setting). I am also a student of the Kǔm Vʉ̌ Mə̀dʉ̂mbɑ̀ (CEPOM: Comité d'Etude et de Production des Œuvres Bamiléké Mə̀dʉ̂mbɑ̀) hence I have the sufficient skills required for the Mə̀dʉ̂mbɑ̀ language. |
|||
I’ve been working on Apertium since 2016 though there have been times of break due to school. In 2016 I created the Mə̀dʉ̂mbɑ̀-French pair which I worked on during GCI 2016 (I was selected as a finalist). I’ve mentored and was strongly involved in the 2018 edition of GCI. |
|||
== List any non-Summer-of-Code plans you have for the Summer == |
|||
I can guarantee at least 70 hours per week of work as from ending June onwards. As I love this kind of work, I'm sure I'll be engaged quite more. Before then, I will be able to commit only 35 hours of work per week due to the second-semester exams. |
|||
[[Category:GSoC 2019 student proposals]] |
|||
May 27 |
|||
Coding officially begins! |
|||
June 24 18:00 UTC |
|||
Mentors and students can begin submitting Phase 1 evaluations |
|||
June 28 18:00 UTC |
|||
Phase 1 Evaluation deadline |
|||
Work Period |
|||
Students work on their project with guidance from Mentors |
|||
July 22 18:00 UTC |
|||
Mentors and students can begin submitting Phase 2 evaluations |
|||
July 26 18:00 UTC |
|||
Phase 2 Evaluation deadline |
|||
Work Period |
|||
Students continue working on their project with guidance from Mentors |
|||
August 19 - 26 18:00 UTC |
|||
Final week: Students submit their final work product and their final mentor evaluation |
Latest revision as of 20:32, 11 May 2019
Contents
Information[edit]
This project was accepted for the 2019 edition of Google Summer of Code. Consider checking this page to check on the progress of the work. Thanks
Contact Information[edit]
Name: Ngadou Yopa Sylvestre Ronald
Display name: Ngadou Yopa
Location: Malingo Street, Buea, Cameroon
E-mail: yopasylvestre@gmail.com (mathalpha26@gmail.com)
IRC: math-alpha (m-alpha)
GitHub: math-alpha
Gitlab: mathematic-alpha
Telegram: @ngadou
Website: http://ngadou.me/portfolio
Time Zone: UTC +1:00 (Central Africa)
School/Degree: B.Eng. in Computer Engineering, Faculty of Engineering and Technology, Buea, Cameroon Expected Graduation Year: December 2021
Why is it I am interested in machine translation?[edit]
A language is a method of human communication, either spoken or written, consisting of the use of words in a structured and conventional way. For a community to integrate into this evolving world, it needs an interface to communicate with other cultures. I study computer engineering and I am highly interested in AI and mathematics. Machine translation is one of the branches of these sciences hence my interest.
Why is it that I am interested in Apertium?[edit]
Apertium is an open source rule-based Machine Translation project and one of the very rare organizations working on NLP. I highly appreciate the community and developers who are doing great work in machine translation.
- Because Apertium is free/open-source software.
- Because its community is strongly committed to under-resourced and minoritised/marginalised languages.
- Because there is a lot of good work done and being done in it.
- Because it is not only machine translation, but also free resources that can be used for other purposes: e.g. dictionaries, morphological analysers, spellcheckers, etc.
Which of the published tasks are you interested in? What do you plan to do?[edit]
Adopt an unreleased language pair: I'd like to develop the pairs Mə̀dʉ̂mbɑ̀-Français which is actually in the nursery plus a minimal user interface.
My proposal[edit]
Title[edit]
Adopt an unreleased language pair with a minimal user interface
Major goals[edit]
- Improving the Mə̀dʉ̂mbɑ̀-Français language pair up to 91 % of publicly available Mə̀dʉ̂mbɑ̀ corpus
- Mə̀dʉ̂mbɑ̀ to Français
- Français to Mə̀dʉ̂mbɑ̀
Do you have a coding challenge related to this pair? That would be very important. —Firespeaker (talk) 15:13, 7 April 2019 (CEST)
Yes since I have worked on it in GCI. I can still bootstrap a new pair if it is imperative —Mathematic-alpha (talk) 21:55, 8 April 2019 (CEST)
- Bootstrapping a new pair isn't useful if one already exists, but the coding challenge is indeed imperative! —Firespeaker (talk) 23:12, 12 April 2019 (CEST)
Noted
- Developing a minimal interface for adding words and transfer rules
- Demo https://github.com/math-alpha/contribution
- Interface for adding words
- Interface for adding transfer rules
The rationale for proposing a GUI is to encourage more people to get involved since most people I meet are not very comfortable with the terminal and all the rest.
During the post-application period, I plan to study in more detail the apertium-ambiguous package. I will also learn more about developing applications on the GitHub platform. My current perception is that for the limited time of the GSoC it is better to invest in developing what might serve as a basis to bring more non-technical contributors while showing a case study.
Reasons why Google and Apertium should sponsor it[edit]
As mentioned above, the Apertium community is strongly committed to under-resourced and minoritised/marginalised languages and Google helps its own way via programs like GSoC and GCI. There exist many local cultural movements in Africa with the goal of developing language and opening to the world but they generally fail to duel on a scientific basis. This project will definitely mark a starting point or proof of concept in Machine Translation in Cameroon and will greatly have a positive impact on language development.
Online translations[edit]
Some efforts have been made by some independent organisations to develop dictionaries for Mə̀dʉ̂mbɑ̀ such as Glosbe, Babylon-Software, Resulam and some more. The problem is they use the "naive approach" in a sense they do not do PoS tagging nor have transfer rules.
Workplan[edit]
Week | Dates | Goals | Bidix (excluding proper names) |
WER | Coverage |
---|---|---|---|---|---|
Post-application period | 10 April - 26 May |
|
~6,000 | fra > byv ~30% | ~88% |
1 | 27 May - 2 June | Improving Mə̀dʉ̂mbɑ̀ monodix Adding prn, pr, cnj*, basic adv to bidix |
~9,000 | ||
2 | 3 June- 9 June | Adding n, adj, adv to the bidix from the French dictionary | ~12,000 | ~86,0% | |
3 | 10 June - 16 June | Adding vblex to the bidix from the French dictionary Beginning to add missing words in decreasing order of frequency fra > byv |
~14,000 | ~89% | |
4 | 17 June - 23 June | Adding words Transfer rules fra > byv |
~15,000 | ~90% | |
5 | 24 June - 30 June | Adding words Deliverable #1: Mə̀dʉ̂mbɑ̀ to French translator First evaluation (28 June) |
~16,000 | (WP) 15,0% | ~90.5% |
6 | 1 July - 7 July | Adding words Transfer rules fra > byv Begin testvoc fra > byv |
~17,000 | ~91% | |
7 | 8 June - 14 July | Adding words Transfer rules fra > byv Testvoc fra > byv |
~18,000 | ~91.5% | |
8 | 15 July - 21 July | Developing the GUI for adding words | - | - | - |
9 | 22 July - 28 July | Developing the backend code (github auth, pushes and pulls)
Second evaluation (26 July) |
- | (WP) 7,2% | - |
10 | 29 July - 4 August | Adding missing words in decreasing order of frequency byv > fra Transfer rules byv > fra Testvoc byv > fra |
~18,500 | (WP) 6,6% | ~92.0% |
11 | 5 August - 11 August | Adding words Transfer rules byv > fra Testvoc byv > fra |
~19,500 | fra>byv (WP) 15,0% | ~93.0% |
12 | 12 August - 18 August | Final improvements for the UI | - | - | - |
13 | 19 August - 25 August | Final Improvements for dictionary Deliverable #3: Mə̀dʉ̂mbɑ̀ to French translator plus minimal GUI Final evaluation (26 August) |
~19,500 | <15% | ~93.5% |
Why does coverage go down in some cases? —Firespeaker (talk) 15:12, 7 April 2019 (CEST)
Mistake. I am sorry —Mathematic-alpha (talk) 21:59, 8 April 2019 (CEST)
List your skills and give evidence of your qualifications[edit]
I am a level 2 computer engineering student and I have the necessary skills needed to work on a software project.
Mə̀dʉ̂mbɑ̀ is my mother tongue. I am fluent in Français and English (due to the bilingual nature of my country and I was trained in a special bilingual setting). I am also a student of the Kǔm Vʉ̌ Mə̀dʉ̂mbɑ̀ (CEPOM: Comité d'Etude et de Production des Œuvres Bamiléké Mə̀dʉ̂mbɑ̀) hence I have the sufficient skills required for the Mə̀dʉ̂mbɑ̀ language. I’ve been working on Apertium since 2016 though there have been times of break due to school. In 2016 I created the Mə̀dʉ̂mbɑ̀-French pair which I worked on during GCI 2016 (I was selected as a finalist). I’ve mentored and was strongly involved in the 2018 edition of GCI.
List any non-Summer-of-Code plans you have for the Summer[edit]
I can guarantee at least 70 hours per week of work as from ending June onwards. As I love this kind of work, I'm sure I'll be engaged quite more. Before then, I will be able to commit only 35 hours of work per week due to the second-semester exams.