Difference between revisions of "User:Pedromanic/Gsoc2024Proposal"

From Apertium
Jump to navigation Jump to search
(Created page with "Name: E-mail address: Other information that may be useful to contact you (e.g. IRC): Why is it that you are interested in Apertium? Which of the published tasks are you int...")
 
 
(20 intermediate revisions by the same user not shown)
Line 1: Line 1:
  +
== Contact Information ==
Name:
 
  +
*'''Name:''' Pedro Manicardi Soares
E-mail address:
 
  +
* '''Email:''' pedromanic@usp.br
Other information that may be useful to contact you (e.g. IRC):
 
  +
* '''Github:''' https://github.com/PedroManicardi
  +
* '''IRC:''' pedro_manic
  +
* '''Timezone:''' UTC-2
   
Why is it that you are interested in Apertium?
 
   
  +
== Background ==
Which of the published tasks are you interested in? What do you plan to do?
 
  +
A fourth year computer engineering student at the University of São Paulo in Brazil.
  +
<br>
  +
Native Language: Portuguese
  +
<br>
  +
Other Languages: English (advanced), Spanish (basics)
   
  +
== Skills ==
Include a proposal, including
 
  +
* C
* a title,
 
  +
* C++
* reasons why Google and Apertium should sponsor it,
 
  +
* xml
* a description of how and who it will benefit in society,
 
  +
* python
* and a detailed work plan (including, if possible, a schedule with milestones and deliverables).
 
  +
* linux
   
  +
== Why am I interested in Apertium? ==
=== Work plan ===
 
  +
* Apertium has an active community, with people to help you when you are in difficulty.
  +
* Learn and contribute to open-source projects and know how they work.
  +
* Apertium is a PLN community and I have specialized in this field during my undergraduate studies.
   
  +
==Which of the published tasks am I interested in? What do I plan to do?==
* Week 1:
 
  +
I am interested in the task "Add Capitalization Handling Module to a Language Pair". My goal is to add Capitalization rules to Spanish-Portuguese(es-pt) pair.
* Week 2:
 
* Week 3:
 
* Week 4:
 
   
  +
== Proposal ==
  +
  +
  +
=== Deliverables ===
  +
  +
  +
== Why should Google and Apertium sponsor it? ==
  +
Correct capitalization will improve the experience of millions of people when using MT. This project will contribute to the community, with the potential to use the rules for more language-pairs. For more, once Google sponsor, will contribute to open source development.
  +
  +
== How and who will it benefit in society? ==
  +
  +
  +
== Coding Challenge ==
  +
  +
I implemented the modules for extracting and restoring capitalization for the es-pt language pair. The code cand be find [https://github.com/PedroManicardi/capitalization-restoration here]. This program have the capitalization handling module, so sets the first word of a sentence to uppercase.
  +
  +
== Work plan ==
  +
=== Community bonding period (May 1 - 26): ===
  +
  +
* Get to know better the tools of Apertium and its community
  +
* See how capitalization of uppercase letters is currently implemented
  +
* Learn more about XML tools
  +
* Identify issues with the current implementation
  +
  +
===Work Period (May 27 - August 26)===
  +
Week 1 (May 27 - July 02):
  +
*Introducing caps-restoration for the language pair es-pt.
  +
*Understanding better the capitalization rules.
  +
Week 2 (June 03-09):
  +
*Implementing the existing rules.
  +
*Adding more rules targeting common scenarios.
  +
Week 3 (June 10-16):
  +
*Continuing to create new capitalization rules.
  +
*Beginning to transition capitalization to monolingual modules.
  +
Week 4 (June 17-23):
  +
*Creating tests.
  +
*Correcting errors.
  +
*Updating documentation.
 
* '''Deliverable #1'''
 
* '''Deliverable #1'''
   
* Week 5:
+
Week 5 (June 24-30):
* Week 6:
 
* Week 7:
 
* Week 8:
 
   
  +
Week 6 (July 01-07):
* '''Deliverable #2'''
 
   
* Week 9:
+
Week 7 (July 12-18):
  +
* Week 10:
 
* Week 11:
+
Week 8 (July 19-25):
  +
* Week 12:
 
  +
* '''Deliverable #2'''
   
  +
Week 9,10 (July 26 - August 07):
* '''Project completed'''
 
   
  +
Week 11,12 (August 08 - 19):
Include time needed to think, to program, to document and to disseminate.
 
   
If you are intending to disseminate to a conference, which conference are you intending to submit to. Make sure
 
to factor in time taken to run any experiments/evaluations and write them up in your work plan.
 
   
  +
'''Project completed'''
List your skills and give evidence of your qualifications. Tell us what is your current field of study,
 
major, etc. Convince us that you can do the work.
 
   
  +
== Other Summer Plans ==
List any non-Summer-of-Code plans you have for the Summer, especially employment, if you are applying for
 
  +
I will have final exams in two last weeks of June so i will dedicate my self 10 hours per week. In the rest I can dedicate 30 hours per week. Vacation in July and August so I can devote 40 hours per week
internships, and class-taking. Be specific about schedules and time commitments. we would like to be sure you have
 
  +
.
at least 30 free hours a week to develop for our project.
 

Latest revision as of 04:35, 2 April 2024

Contact Information[edit]


Background[edit]

A fourth year computer engineering student at the University of São Paulo in Brazil.
Native Language: Portuguese
Other Languages: English (advanced), Spanish (basics)

Skills[edit]

  • C
  • C++
  • xml
  • python
  • linux

Why am I interested in Apertium?[edit]

  • Apertium has an active community, with people to help you when you are in difficulty.
  • Learn and contribute to open-source projects and know how they work.
  • Apertium is a PLN community and I have specialized in this field during my undergraduate studies.

Which of the published tasks am I interested in? What do I plan to do?[edit]

I am interested in the task "Add Capitalization Handling Module to a Language Pair". My goal is to add Capitalization rules to Spanish-Portuguese(es-pt) pair.

Proposal[edit]

Deliverables[edit]

Why should Google and Apertium sponsor it?[edit]

Correct capitalization will improve the experience of millions of people when using MT. This project will contribute to the community, with the potential to use the rules for more language-pairs. For more, once Google sponsor, will contribute to open source development.

How and who will it benefit in society?[edit]

Coding Challenge[edit]

I implemented the modules for extracting and restoring capitalization for the es-pt language pair. The code cand be find here. This program have the capitalization handling module, so sets the first word of a sentence to uppercase.

Work plan[edit]

Community bonding period (May 1 - 26):[edit]

  • Get to know better the tools of Apertium and its community
  • See how capitalization of uppercase letters is currently implemented
  • Learn more about XML tools
  • Identify issues with the current implementation

Work Period (May 27 - August 26)[edit]

Week 1 (May 27 - July 02):

  • Introducing caps-restoration for the language pair es-pt.
  • Understanding better the capitalization rules.

Week 2 (June 03-09):

  • Implementing the existing rules.
  • Adding more rules targeting common scenarios.

Week 3 (June 10-16):

  • Continuing to create new capitalization rules.
  • Beginning to transition capitalization to monolingual modules.

Week 4 (June 17-23):

  • Creating tests.
  • Correcting errors.
  • Updating documentation.
  • Deliverable #1

Week 5 (June 24-30):

Week 6 (July 01-07):

Week 7 (July 12-18):

Week 8 (July 19-25):

  • Deliverable #2

Week 9,10 (July 26 - August 07):

Week 11,12 (August 08 - 19):


Project completed

Other Summer Plans[edit]

I will have final exams in two last weeks of June so i will dedicate my self 10 hours per week. In the rest I can dedicate 30 hours per week. Vacation in July and August so I can devote 40 hours per week .