User:Pedromanic/Gsoc2024Proposal
Contents
- 1 Contact Information
- 2 Background
- 3 Skills
- 4 Why am I interested in Apertium?
- 5 Which of the published tasks am I interested in? What do I plan to do?
- 6 Proposal
- 7 Why should Google and Apertium sponsor it?
- 8 How and who will it benefit in society?
- 9 Coding Challenge
- 10 Work plan
- 11 Other Summer Plans
Contact Information[edit]
- Name: Pedro Manicardi Soares
- Email: pedromanic@usp.br
- Github: https://github.com/PedroManicardi
- IRC: pedro_manic
- Timezone: UTC-2
Background[edit]
A fourth year computer engineering student at the University of São Paulo in Brazil.
Native Language: Portuguese
Other Languages: English (advanced), Spanish (basics)
Skills[edit]
- C
- C++
- xml
- python
- linux
Why am I interested in Apertium?[edit]
- Apertium has an active community, with people to help you when you are in difficulty.
- Learn and contribute to open-source projects and know how they work.
- Apertium is a PLN community and I have specialized in this field during my undergraduate studies.
Which of the published tasks am I interested in? What do I plan to do?[edit]
I am interested in the task "Add Capitalization Handling Module to a Language Pair". My goal is to add Capitalization rules to Spanish-Portuguese(es-pt) pair.
Proposal[edit]
Deliverables[edit]
Why should Google and Apertium sponsor it?[edit]
Correct capitalization will improve the experience of millions of people when using MT. This project will contribute to the community, with the potential to use the rules for more language-pairs. For more, once Google sponsor, will contribute to open source development.
How and who will it benefit in society?[edit]
Coding Challenge[edit]
I implemented the modules for extracting and restoring capitalization for the es-pt language pair. The code cand be find here. This program have the capitalization handling module, so sets the first word of a sentence to uppercase.
Work plan[edit]
Community bonding period (May 1 - 26):[edit]
- Get to know better the tools of Apertium and its community
- See how capitalization of uppercase letters is currently implemented
- Learn more about XML tools
- Identify issues with the current implementation
Work Period (May 27 - August 26)[edit]
Week 1 (May 27 - July 02):
- Introducing caps-restoration for the language pair es-pt.
- Understanding better the capitalization rules.
Week 2 (June 03-09):
- Implementing the existing rules.
- Adding more rules targeting common scenarios.
Week 3 (June 10-16):
- Continuing to create new capitalization rules.
- Beginning to transition capitalization to monolingual modules.
Week 4 (June 17-23):
- Creating tests.
- Correcting errors.
- Updating documentation.
- Deliverable #1
Week 5 (June 24-30):
Week 6 (July 01-07):
Week 7 (July 12-18):
Week 8 (July 19-25):
- Deliverable #2
Week 9,10 (July 26 - August 07):
Week 11,12 (August 08 - 19):
Project completed
Other Summer Plans[edit]
I will have final exams in two last weeks of June so i will dedicate my self 10 hours per week. In the rest I can dedicate 30 hours per week. Vacation in July and August so I can devote 40 hours per week .