Difference between revisions of "User:Pedromanic/Gsoc2024Proposal"

From Apertium
Jump to navigation Jump to search
Line 46: Line 46:
   
 
== Work plan ==
 
== Work plan ==
  +
=== Community bonding period (May 1 - 26): ===
   
  +
* Get to know better the tools of Apertium and its community
* Week 1:
 
  +
* See how capitalization of uppercase letters is currently implemented
  +
* Learn more about XML tools
  +
* Identify issues with the current implementation
  +
  +
===Work Period (May 27 - August 26)===
 
* Week 1 (:
 
* Week 2:
 
* Week 2:
 
* Week 3:
 
* Week 3:
Line 67: Line 74:
   
 
* '''Project completed'''
 
* '''Project completed'''
 
 
   
 
== Other Summer Plans ==
 
== Other Summer Plans ==

Revision as of 04:06, 2 April 2024

Contact Information


Background

A fourth year computer engineering student at the University of São Paulo in Brazil.
Native Language: Portuguese
Other Languages: English (advanced), Spanish (basics)

Skills

  • C
  • C++
  • xml
  • python
  • linux

Why am I interested in Apertium?

  • Apertium has an active community, with people to help you when you are in difficulty.
  • Learn and contribute to open-source projects and know how they work.
  • Apertium is a PLN community and I have specialized in this field during my undergraduate studies.

Which of the published tasks am I interested in? What do I plan to do?

I am interested in the task "Add Capitalization Handling Module to a Language Pair". My goal is to add Capitalization rules to Spanish-Portuguese(es-pt) pair.

Proposal

Deliverables

Why should Google and Apertium sponsor it?

Correct capitalization will improve the experience of millions of people when using MT. This project will contribute to the community, with the potential to use the rules for more language-pairs. For more, once Google sponsor, will contribute to open source development.

How and who will it benefit in society?

Coding Challenge

I implemented the modules for extracting and restoring capitalization for the es-pt language pair. The code cand be find here. This program have the capitalization handling module, so sets the first word of a sentence to uppercase.

Work plan

Community bonding period (May 1 - 26):

  • Get to know better the tools of Apertium and its community
  • See how capitalization of uppercase letters is currently implemented
  • Learn more about XML tools
  • Identify issues with the current implementation

Work Period (May 27 - August 26)

  • Week 1 (:
  • Week 2:
  • Week 3:
  • Week 4:
  • Deliverable #1
  • Week 5:
  • Week 6:
  • Week 7:
  • Week 8:
  • Deliverable #2
  • Week 9:
  • Week 10:
  • Week 11:
  • Week 12:
  • Project completed

Other Summer Plans

I will have final exams in two last weeks of June so i will dedicate my self 10 hours per week. In the rest I can dedicate 30 hours per week. Vacation in July and August so I can devote 40 hours per week .