Difference between revisions of "User:Anakuznetsova/Proposta"
(Created page with " == Contact Information == Anastasia Kuznetsova '''E-mail:''' menina.indigena.17@gmail.com '''GitHub:''' ana-kuznetsova '''Phone number:''' +7 916 804 79 55 '''Timezon...") |
|||
(35 intermediate revisions by 2 users not shown) | |||
Line 23: | Line 23: | ||
== Which of the published tasks are you interested in? What do you plan to do? == |
== Which of the published tasks are you interested in? What do you plan to do? == |
||
I am planning to adopt new language pair to Apertium Guarani - |
I am planning to adopt new language pair to Apertium Guarani - Spanish/Portuguese. |
||
Line 34: | Line 34: | ||
=== A description of how and who it will benefit in society === |
=== A description of how and who it will benefit in society === |
||
The adoption of aforementioned language pair will benefit both Guarani and Spanish/Portuguese speakers, especially in regions where Guarani is represented by small indigenous communities. As a social anthropologist, I conducted a fieldwork in aldeia Boa Vista in São Paulo state, Brazil and the local Guaranis have problems when communicating in Portuguese with local Portuguese-speaking population. Only a |
The adoption of aforementioned language pair will benefit both Guarani and Spanish/Portuguese speakers, especially in regions where Guarani is represented by small indigenous communities. As a social anthropologist, I conducted a fieldwork in aldeia Boa Vista in São Paulo state, Brazil and the local Guaranis have problems when communicating in Portuguese with local Portuguese-speaking population. Only a part of Guarani Mbya community can communicate fluently in Portuguese (others understand, but cannot communicate normally as they are embarrassed with their pronunciation and lack of Portuguese vocabulary), so this translator would be of a great importance not only to them, but also for other similar communities. It also could alleviate the communication between Indigenous and governmental stuff of the aldeias, who do not have a good command of Guarani. |
||
Guarani-Spanish/Portuguese pair could be used as well by professional linguists, interpreters, philologists and others interested in a topic. |
Guarani-Spanish/Portuguese pair could be used as well by professional linguists, interpreters, philologists and others interested in a topic. |
||
== Work Plan == |
|||
{| class="wikitable" style="text-align: left;" |
|||
|- |
|||
! style="width:10em;" |Week (dates) |
|||
! style="width:10em;" |Cov.Goal |
|||
! style="width:10em; |Cov.Reached |
|||
! style="width:10em; |Testvoc |
|||
! style="width:10em; |Evaluation |
|||
! style="width:10em; |WER |
|||
! style="width:20em; |Grammar categories (goals) |
|||
|- |
|||
!colspan="7" style="border: 1px solid black; padding: 6px; background: #ffdead;" | Community Bonding |
|||
|- |
|||
|'''Week 1 |
|||
(24/04 - 29/04)''' |
|||
| |
|||
| |
|||
| |
|||
| |
|||
| |
|||
| |
|||
* Studying Apertium Documentation; |
|||
* Exploring .dix and other formats of bilingual dictionary (how they work); |
|||
* Discussing more detailed work plan with mentors (based on the results of Post-Application Period). |
|||
|- |
|||
|'''Week 2 |
|||
(30/04 - 06/05)''' |
|||
| 30 - 35% (for com.bond) |
|||
| 23% |
|||
| |
|||
| |
|||
| |
|||
| |
|||
NOUNS |
|||
* <n><pl> |
|||
* <n><post> |
|||
* <n><top> (incomplete) |
|||
* Biform nouns (parentesco) |
|||
Personal Pronouns |
|||
Possessive Determiners |
|||
Demonstrative Determiners |
|||
|- |
|||
| '''Week 3 |
|||
(07/05 - 13/05)''' |
|||
| 30 -35% (for com.bond.) |
|||
| 29% |
|||
| |
|||
| |
|||
| |
|||
| |
|||
* Added nouns, verbs, adjectives to morph analyzer |
|||
* Added nouns to bilingual dictionary |
|||
|- |
|||
!colspan="7" style="border: 1px solid black; padding: 6px; background: #ffdead;" | Stage I. Morphological Analyzer and Bilingual Dictionary |
|||
|- |
|||
|'''Week 1 |
|||
(14/05 - 20/05)''' |
|||
|50 - 55% |
|||
|47% |
|||
| |
|||
| |
|||
| |
|||
| |
|||
NOUNS |
|||
* Find a list of triform nouns |
|||
ADJECTIVES |
|||
* <s><adj><dist> Distributive Determiners (los ambos niños) |
|||
* <adj><ind> Indefinite Adjectives (todos, muchos, pocos)</s> |
|||
PRONOUNS |
|||
*<s><pro> (proclitic) Personal pronouns as direct complement (reflexivos)</s> |
|||
*<s><prn><tn> Personal pronouns as inderect complement</s> |
|||
|- |
|||
|'''Week 2 |
|||
(21/05 - 27/05)''' |
|||
| 60% |
|||
| 59% |
|||
| |
|||
| |
|||
| |
|||
| |
|||
PRONOUNS |
|||
*<s>Possessive Pronouns <prn><pos><tn> (different from Possessive Suffixes) tonicos (separated with space)</s> |
|||
*<s><prn><dem> Demonstrative Pronouns</s> |
|||
* <s><prn><ind>Indefinite Pronouns (outro, nadie, cualqiuera)</s> |
|||
* <s><prn><itg> Interrogative Pronouns</s> |
|||
* <s> <adv> Adverbal suffixes </s> |
|||
|- |
|||
|'''Week 3 |
|||
(28/05 - 03/06)''' |
|||
| 65% |
|||
| 65% |
|||
| |
|||
| |
|||
| |
|||
| |
|||
ADVERBS |
|||
* <s>List of adverbs</s> |
|||
* <s>Interrogative Suffixes (added to interrogative adverbs)</s> |
|||
* <s>Comparatives <comp> Superlatives </s> |
|||
|- |
|||
|'''Week 4 |
|||
(04/06 - 10/06)''' |
|||
| 70% |
|||
| 70% |
|||
| |
|||
| |
|||
| |
|||
| |
|||
VERBS |
|||
* <s> Prefixes of Number and Person </s> |
|||
* <s>Pronominal verbs </s> |
|||
* <s> Irregular verbs </s> |
|||
* <s> Defective verbs (fenomenos naturales) </s> |
|||
|- |
|||
|'''Week 5 |
|||
(11/06 -17/06)''' |
|||
|75% |
|||
|76% |
|||
| |
|||
| |
|||
| |
|||
| |
|||
VERBAL ACCIDENTS |
|||
* <s>Prefixes of Number and Person for Proper and Predicative verbs.</s> |
|||
* <s>Optatives for proper and predicative verbs</s> |
|||
* <s>Interrogative accidents. (Interrogative adverbs)</s> |
|||
* <s>Negative forms</s> |
|||
|- |
|||
|'''Week 6 |
|||
(18/06 - 24/06)''' |
|||
|80% |
|||
| |
|||
| |
|||
| |
|||
| |
|||
| |
|||
VERBAL ACCIDENTS |
|||
Time accidents: |
|||
<s><pres> |
|||
<past></s> |
|||
<s><pret><imperf> imperfect</s> |
|||
<s><pret><perf></s> |
|||
|- |
|||
|'''Week 7 |
|||
(25/06 - 01/07)''' |
|||
|85% |
|||
| |
|||
| |
|||
| |
|||
| |
|||
| |
|||
VERBAL ACCIDENTS |
|||
Time accidents: |
|||
<s><pret><perf></s> |
|||
<s><pret> pluscuamperfecto</s> |
|||
<s>future accidents</s> |
|||
|- |
|||
| '''Week 8 |
|||
(02/07 - 08/07)''' |
|||
| 85 - 90% |
|||
| |
|||
| |
|||
| |
|||
| |
|||
| |
|||
VERBAL ACCIDENTS |
|||
Voice accidents |
|||
<s> active </s> |
|||
<s> passive </s> |
|||
<s> reciprocal |
|||
<s>coactive </s> |
|||
<s>objective</s> |
|||
<s> subjunctive</s> |
|||
* mixed |
|||
|- |
|||
| '''Week 9 |
|||
(08/07 - 15/07)''' |
|||
| 85 - 90% |
|||
| 85% |
|||
| |
|||
| |
|||
| |
|||
| |
|||
<s>Mode accidents </s> |
|||
|- |
|||
!colspan="7" style="border: 1px solid black; padding: 5px; background: #ffdead;"| Stage II. Lexical Transfer |
|||
|- |
|||
|'''Weeks 10 - 11 |
|||
(16/07 - 29/07)''' |
|||
| 93% (?) |
|||
| 89.3% |
|||
| |
|||
| |
|||
| |
|||
| Lexical transfer rules (In course, not finished) |
|||
|- |
|||
|'''Week 12 |
|||
(30/07 - 05/08) |
|||
''' |
|||
| |
|||
| |
|||
| |
|||
| |
|||
| |
|||
| |
|||
Preparing code for the final evaluation. |
|||
|} |
|||
== List your skills and give evidence of your qualifications == |
== List your skills and give evidence of your qualifications == |
||
Line 53: | Line 268: | ||
1. In June I have summer session university exams (1 - 15 June) approximately, so I won’t be able fully dedicate myself to the task. Probably, I will be able to spend only 15 hours/week for the task. |
1. In June I have summer session university exams (1 - 15 June) approximately, so I won’t be able fully dedicate myself to the task. Probably, I will be able to spend only 15 hours/week for the task. |
||
2. In July I am planning to go to Brazil for 18th summit of International Universal Anthropological and Ethnological Society in Florianopolis (16 - 20 July) during these five days I will be available for 15 hours of coding only, but then I will switch to a normal 40 hours a week, although I am going to stay in Brazil. |
2. (Probably?) In July I am planning to go to Brazil for 18th summit of International Universal Anthropological and Ethnological Society in Florianopolis (16 - 20 July) during these five days I will be available for 15 hours of coding only, but then I will switch to a normal 40 hours a week, although I am going to stay in Brazil. |
||
3. In August I will probably visit my parents, but it won’t affect the schedule. |
3. In August I will probably visit my parents, but it won’t affect the schedule. |
||
[[Category:GSoC_2018_student_proposals|Anakuznetsova]] |
Latest revision as of 12:10, 31 July 2018
Contents
Contact Information[edit]
Anastasia Kuznetsova
E-mail: menina.indigena.17@gmail.com
GitHub: ana-kuznetsova
Phone number: +7 916 804 79 55
Timezone: UTC+3
Why is it that you are interested in the Apertium project?[edit]
I have a great interest in Apertium project because it pays attention not only to large-source languages that have lots of speakers around the world, but also to low-source languages. It is not an easy task to find a web version of a dictionary or a translator for such languages, although sometimes it is necessary. Nowadays machine translation based on neural networks prevails, but sometimes it becomes hard to apply such methods on low-source languages due to the absence of corpora/extended corpora (especially annotated one) and Apertium suggests that we apply finite-state transducers that serves as a good solution for machine translation purposes of small languages.
Which of the published tasks are you interested in? What do you plan to do?[edit]
I am planning to adopt new language pair to Apertium Guarani - Spanish/Portuguese.
Reasons why Google and Apertium should sponsor it[edit]
Guarani is one of the most widely spread indigenous languages of southern South America. It is spoken by 6 million people in Paraguay (where it is one of the official languages), Brazil, Argentina and Bolivia. Guarani translators are present online, but there is no rule-based translator with morphological analysis, which could be more plausible than translators made from Spanish/Portuguese - Guarani parallel corpora. So I believe we can improve the quality of translation by applying rule-based model.
A description of how and who it will benefit in society[edit]
The adoption of aforementioned language pair will benefit both Guarani and Spanish/Portuguese speakers, especially in regions where Guarani is represented by small indigenous communities. As a social anthropologist, I conducted a fieldwork in aldeia Boa Vista in São Paulo state, Brazil and the local Guaranis have problems when communicating in Portuguese with local Portuguese-speaking population. Only a part of Guarani Mbya community can communicate fluently in Portuguese (others understand, but cannot communicate normally as they are embarrassed with their pronunciation and lack of Portuguese vocabulary), so this translator would be of a great importance not only to them, but also for other similar communities. It also could alleviate the communication between Indigenous and governmental stuff of the aldeias, who do not have a good command of Guarani.
Guarani-Spanish/Portuguese pair could be used as well by professional linguists, interpreters, philologists and others interested in a topic.
Work Plan[edit]
Week (dates) | Cov.Goal | Cov.Reached | Testvoc | Evaluation | WER | Grammar categories (goals) |
---|---|---|---|---|---|---|
Community Bonding | ||||||
Week 1
(24/04 - 29/04) |
| |||||
Week 2
(30/04 - 06/05) |
30 - 35% (for com.bond) | 23% |
NOUNS
Personal Pronouns Possessive Determiners Demonstrative Determiners | |||
Week 3
(07/05 - 13/05) |
30 -35% (for com.bond.) | 29% |
| |||
Stage I. Morphological Analyzer and Bilingual Dictionary | ||||||
Week 1
(14/05 - 20/05) |
50 - 55% | 47% |
NOUNS
ADJECTIVES
PRONOUNS
| |||
Week 2
(21/05 - 27/05) |
60% | 59% |
PRONOUNS
| |||
Week 3
(28/05 - 03/06) |
65% | 65% |
ADVERBS
| |||
Week 4
(04/06 - 10/06) |
70% | 70% |
VERBS
| |||
Week 5
(11/06 -17/06) |
75% | 76% |
VERBAL ACCIDENTS
| |||
Week 6
(18/06 - 24/06) |
80% |
VERBAL ACCIDENTS Time accidents:
| ||||
Week 7
(25/06 - 01/07) |
85% |
VERBAL ACCIDENTS Time accidents:
| ||||
Week 8
(02/07 - 08/07) |
85 - 90% |
VERBAL ACCIDENTS Voice accidents
| ||||
Week 9
(08/07 - 15/07) |
85 - 90% | 85% |
| |||
Stage II. Lexical Transfer | ||||||
Weeks 10 - 11
(16/07 - 29/07) |
93% (?) | 89.3% | Lexical transfer rules (In course, not finished) | |||
Week 12
(30/07 - 05/08) |
Preparing code for the final evaluation. |
List your skills and give evidence of your qualifications[edit]
I am a graduate B. A. of Social Anthropology and Ethnology, but currently studying Computational Linguistics on M. A. program of National Research University ‘Higher School of Economics’ (Moscow).
Skills: Python (have some experience with Keras, Tensorflow apart from standard Python libraries) , R, some knowledge of SQL, did touch Bash a bit and although I don’t have enough experience in it I am eager to learn .
Experience: Currently involved in educational M. A. project on Popular Science Texts research that includes such NLP tasks as domain-specific named entity recognition, extracting readability metrics, text similarity, text clusterization. Also involved in a project on domain-specific sentiment analysis. Familiar with HFST (did some coursework on finite-state transducer of Chuvash language).
Natural languages: Russian (Native), English (Advanced), Portuguese (Advanced), Spanish and Lithuanian (Intermediate), basic knowledge of Maori (can read).
List any non-Summer-of-Code plans you have for the Summer[edit]
1. In June I have summer session university exams (1 - 15 June) approximately, so I won’t be able fully dedicate myself to the task. Probably, I will be able to spend only 15 hours/week for the task.
2. (Probably?) In July I am planning to go to Brazil for 18th summit of International Universal Anthropological and Ethnological Society in Florianopolis (16 - 20 July) during these five days I will be available for 15 hours of coding only, but then I will switch to a normal 40 hours a week, although I am going to stay in Brazil.
3. In August I will probably visit my parents, but it won’t affect the schedule.