User:Pedromanic/Gsoc2024Proposal

From Apertium
Jump to navigation Jump to search

Contact Information


Background

A fourth year computer engineering student at the University of São Paulo in Brazil.
Native Language: Portuguese
Other Languages: English (advanced), Spanish (basics)

Skills

  • C
  • C++
  • xml
  • python
  • linux

Why am I interested in Apertium?

  • Apertium has an active community, with people to help you when you are in difficulty.
  • Learn and contribute to open-source projects and know how they work.
  • Apertium is a PLN community and I have specialized in this field during my undergraduate studies.

Which of the published tasks am I interested in? What do I plan to do?

I am interested in the task "Add Capitalization Handling Module to a Language Pair". My goal is to add Capitalization rules to Spanish-Portuguese(es-pt) pair.

Proposal

Deliverables

Why should Google and Apertium sponsor it?

Correct capitalization will improve the experience of millions of people when using MT. This project will contribute to the community, with the potential to use the rules for more language-pairs. For more, once Google sponsor, will contribute to open source development.

How and who will it benefit in society?

Coding Challenge

I implemented the modules for extracting and restoring capitalization for the es-pt language pair. The code cand be find here. This program have the capitalization handling module, so sets the first word of a sentence to uppercase.

Work plan

Community bonding period (May 1 - 26):

  • Get to know better the tools of Apertium and its community
  • See how capitalization of uppercase letters is currently implemented
  • Learn more about XML tools
  • Identify issues with the current implementation

Work Period (May 27 - August 26)

Week 1 (May 27 - July 02):

  • Introducing caps-restoration for the language pair es-pt.
  • Understanding better the capitalization rules.

Week 2 (June 03-09):

  • Implementing the existing rules.
  • Adding more rules targeting common scenarios.

Week 3 (June 10-16):

  • Continuing to create new capitalization rules.
  • Beginning to transition capitalization to monolingual modules.

Week 4 (June 17-23):

  • Creating tests.
  • Correcting errors.
  • Updating documentation.
  • Deliverable #1

Week 5 (June 24-30):

Week 6 (July 01-07):

Week 7 (July 12-18):

Week 8 (July 19-25):

  • Deliverable #2

Week 9,10 (July 26 - August 07):

Week 11,12 (August 08 - 19):


Project completed

Other Summer Plans

I will have final exams in two last weeks of June so i will dedicate my self 10 hours per week. In the rest I can dedicate 30 hours per week. Vacation in July and August so I can devote 40 hours per week .