Difference between revisions of "User:Pedromanic/Gsoc2024Proposal"

From Apertium
Jump to navigation Jump to search
 
(5 intermediate revisions by the same user not shown)
Line 25: Line 25:
 
* Learn and contribute to open-source projects and know how they work.
 
* Learn and contribute to open-source projects and know how they work.
 
* Apertium is a PLN community and I have specialized in this field during my undergraduate studies.
 
* Apertium is a PLN community and I have specialized in this field during my undergraduate studies.
  +
  +
==Which of the published tasks am I interested in? What do I plan to do?==
  +
I am interested in the task "Add Capitalization Handling Module to a Language Pair". My goal is to add Capitalization rules to Spanish-Portuguese(es-pt) pair.
   
 
== Proposal ==
 
== Proposal ==
Line 33: Line 36:
   
 
== Why should Google and Apertium sponsor it? ==
 
== Why should Google and Apertium sponsor it? ==
  +
Correct capitalization will improve the experience of millions of people when using MT. This project will contribute to the community, with the potential to use the rules for more language-pairs. For more, once Google sponsor, will contribute to open source development.
 
   
 
== How and who will it benefit in society? ==
 
== How and who will it benefit in society? ==
  +
   
 
== Coding Challenge ==
 
== Coding Challenge ==
   
I implemented the modules for extracting and restoring capitalization for the es-pt language pair. The code cand be find [https://github.com/PedroManicardi/capitalization-restoration here]. This program have the capitalization handling module, so sets the first word of a sentence to upper case.
+
I implemented the modules for extracting and restoring capitalization for the es-pt language pair. The code cand be find [https://github.com/PedroManicardi/capitalization-restoration here]. This program have the capitalization handling module, so sets the first word of a sentence to uppercase.
   
 
== Work plan ==
 
== Work plan ==
  +
=== Community bonding period (May 1 - 26): ===
   
  +
* Get to know better the tools of Apertium and its community
* Week 1:
 
  +
* See how capitalization of uppercase letters is currently implemented
* Week 2:
 
  +
* Learn more about XML tools
* Week 3:
 
  +
* Identify issues with the current implementation
* Week 4:
 
   
  +
===Work Period (May 27 - August 26)===
  +
Week 1 (May 27 - July 02):
  +
*Introducing caps-restoration for the language pair es-pt.
  +
*Understanding better the capitalization rules.
  +
Week 2 (June 03-09):
  +
*Implementing the existing rules.
  +
*Adding more rules targeting common scenarios.
  +
Week 3 (June 10-16):
  +
*Continuing to create new capitalization rules.
  +
*Beginning to transition capitalization to monolingual modules.
  +
Week 4 (June 17-23):
  +
*Creating tests.
  +
*Correcting errors.
  +
*Updating documentation.
 
* '''Deliverable #1'''
 
* '''Deliverable #1'''
   
* Week 5:
+
Week 5 (June 24-30):
* Week 6:
 
* Week 7:
 
* Week 8:
 
   
  +
Week 6 (July 01-07):
* '''Deliverable #2'''
 
   
* Week 9:
+
Week 7 (July 12-18):
* Week 10:
 
* Week 11:
 
* Week 12:
 
   
  +
Week 8 (July 19-25):
* '''Project completed'''
 
  +
 
* '''Deliverable #2'''
   
  +
Week 9,10 (July 26 - August 07):
Include time needed to think, to program, to document and to disseminate.
 
   
  +
Week 11,12 (August 08 - 19):
If you are intending to disseminate to a conference, which conference are you intending to submit to. Make sure
 
to factor in time taken to run any experiments/evaluations and write them up in your work plan.
 
   
List your skills and give evidence of your qualifications. Tell us what is your current field of study,
 
major, etc. Convince us that you can do the work.
 
   
 
'''Project completed'''
List any non-Summer-of-Code plans you have for the Summer, especially employment, if you are applying for
 
internships, and class-taking. Be specific about schedules and time commitments. we would like to be sure you have
 
at least 30 free hours a week to develop for our project.
 
   
 
== Other Summer Plans ==
 
== Other Summer Plans ==

Latest revision as of 04:35, 2 April 2024

Contact Information[edit]


Background[edit]

A fourth year computer engineering student at the University of São Paulo in Brazil.
Native Language: Portuguese
Other Languages: English (advanced), Spanish (basics)

Skills[edit]

  • C
  • C++
  • xml
  • python
  • linux

Why am I interested in Apertium?[edit]

  • Apertium has an active community, with people to help you when you are in difficulty.
  • Learn and contribute to open-source projects and know how they work.
  • Apertium is a PLN community and I have specialized in this field during my undergraduate studies.

Which of the published tasks am I interested in? What do I plan to do?[edit]

I am interested in the task "Add Capitalization Handling Module to a Language Pair". My goal is to add Capitalization rules to Spanish-Portuguese(es-pt) pair.

Proposal[edit]

Deliverables[edit]

Why should Google and Apertium sponsor it?[edit]

Correct capitalization will improve the experience of millions of people when using MT. This project will contribute to the community, with the potential to use the rules for more language-pairs. For more, once Google sponsor, will contribute to open source development.

How and who will it benefit in society?[edit]

Coding Challenge[edit]

I implemented the modules for extracting and restoring capitalization for the es-pt language pair. The code cand be find here. This program have the capitalization handling module, so sets the first word of a sentence to uppercase.

Work plan[edit]

Community bonding period (May 1 - 26):[edit]

  • Get to know better the tools of Apertium and its community
  • See how capitalization of uppercase letters is currently implemented
  • Learn more about XML tools
  • Identify issues with the current implementation

Work Period (May 27 - August 26)[edit]

Week 1 (May 27 - July 02):

  • Introducing caps-restoration for the language pair es-pt.
  • Understanding better the capitalization rules.

Week 2 (June 03-09):

  • Implementing the existing rules.
  • Adding more rules targeting common scenarios.

Week 3 (June 10-16):

  • Continuing to create new capitalization rules.
  • Beginning to transition capitalization to monolingual modules.

Week 4 (June 17-23):

  • Creating tests.
  • Correcting errors.
  • Updating documentation.
  • Deliverable #1

Week 5 (June 24-30):

Week 6 (July 01-07):

Week 7 (July 12-18):

Week 8 (July 19-25):

  • Deliverable #2

Week 9,10 (July 26 - August 07):

Week 11,12 (August 08 - 19):


Project completed

Other Summer Plans[edit]

I will have final exams in two last weeks of June so i will dedicate my self 10 hours per week. In the rest I can dedicate 30 hours per week. Vacation in July and August so I can devote 40 hours per week .