User:Pedromanic/Gsoc2024Proposal
Contents
- 1 Contact Information
- 2 Background
- 3 Skills
- 4 Why am I interested in Apertium?
- 5 Which of the published tasks am I interested in? What do I plan to do?
- 6 Proposal
- 7 Why should Google and Apertium sponsor it?
- 8 How and who will it benefit in society?
- 9 Coding Challenge
- 10 Work plan
- 11 Other Summer Plans
Contact Information
- Name: Pedro Manicardi Soares
- Email: pedromanic@usp.br
- Github: https://github.com/PedroManicardi
- IRC: pedro_manic
- Timezone: UTC-2
Background
A fourth year computer engineering student at the University of São Paulo in Brazil.
Native Language: Portuguese
Other Languages: English (advanced), Spanish (basics)
Skills
- C
- C++
- xml
- python
- linux
Why am I interested in Apertium?
- Apertium has an active community, with people to help you when you are in difficulty.
- Learn and contribute to open-source projects and know how they work.
- Apertium is a PLN community and I have specialized in this field during my undergraduate studies.
Which of the published tasks am I interested in? What do I plan to do?
I am interested in the task "Add Capitalization Handling Module to a Language Pair". My goal is to add Capitalization rules to Spanish-Portuguese(es-pt) pair.
Proposal
Deliverables
Why should Google and Apertium sponsor it?
Correct capitalization will improve the experience of millions of people when using MT. This project will contribute to the community, with the potential to use the rules for more language-pairs. For more, once Google sponsor, will contribute to open source development.
How and who will it benefit in society?
Coding Challenge
I implemented the modules for extracting and restoring capitalization for the es-pt language pair. The code cand be find here. This program have the capitalization handling module, so sets the first word of a sentence to uppercase.
Work plan
Community bonding period (May 1 - 26):
- Get to know better the tools of Apertium and its community
- See how capitalization of uppercase letters is currently implemented
- Learn more about XML tools
- Identify issues with the current implementation
Work Period (May 27 - August 26)
Week 1 (May 27 - July 02):
- Introducing caps-restoration for the language pair es-pt.
- Understanding better the capitalization rules.
Week 2 (June 03-09):
- Implementing the existing rules.
- Adding more rules targeting common scenarios.
Week 3 (June 10-16):
- Continuing to create new capitalization rules.
- Beginning to transition capitalization to monolingual modules.
Week 4 (June 17-23):
- Creating tests.
- Correcting errors.
- Updating documentation.
- Deliverable #1
Week 5 (June 24-30):
Week 6 (July 01-07):
Week 7 (July 12-18):
Week 8 (July 19-25):
- Deliverable #2
Week 9,10 (July 26 - August 07):
Week 11,12 (August 08 - 19):
Project completed
Other Summer Plans
I will have final exams in two last weeks of June so i will dedicate my self 10 hours per week. In the rest I can dedicate 30 hours per week. Vacation in July and August so I can devote 40 hours per week .