Difference between revisions of "User:Pedromanic/Gsoc2024Proposal"
Pedromanic (talk | contribs)  | 
				Pedromanic (talk | contribs)   | 
				||
| (12 intermediate revisions by the same user not shown) | |||
| Line 22: | Line 22: | ||
== Why am I interested in Apertium? ==  | 
  == Why am I interested in Apertium? ==  | 
||
* Apertium has an active community, with people to help you when you are in difficulty.  | 
|||
* Learn and contribute to open-source projects and know how they work.  | 
|||
* Apertium is a PLN community and I have specialized in this field during my undergraduate studies.  | 
|||
| ⚫ | |||
I am interested in the task "Add Capitalization Handling Module to a Language Pair". My goal is to add Capitalization rules to Spanish-Portuguese(es-pt) pair.  | 
|||
== Proposal ==  | 
  == Proposal ==  | 
||
| Line 31: | Line 36: | ||
== Why should Google and Apertium sponsor it? ==  | 
  == Why should Google and Apertium sponsor it? ==  | 
||
Correct capitalization will improve the experience of millions of people when using MT. This project will contribute to the community, with the potential to use the rules for more language-pairs. For more, once Google sponsor, will contribute to open source development.  | 
|||
== How and who will it benefit in society? ==  | 
  == How and who will it benefit in society? ==  | 
||
== Coding Challenge ==  | 
  == Coding Challenge ==   | 
||
I implemented the modules for extracting and restoring capitalization for the es-pt language pair. The code cand be find [https://github.com/PedroManicardi/capitalization-restoration here]. This program have the capitalization handling module, so sets the first word of a sentence to uppercase.  | 
|||
| ⚫ | |||
Why is it that you are interested in Apertium?  | 
  |||
=== Community bonding period (May 1 - 26): ===  | 
|||
* Get to know better the tools of Apertium and its community  | 
|||
| ⚫ | |||
* See how capitalization of uppercase letters is currently implemented  | 
|||
* Learn more about XML tools  | 
|||
* Identify issues with the current implementation  | 
|||
===Work Period (May 27 - August 26)===  | 
|||
Include a proposal, including   | 
  |||
Week 1 (May 27 - July 02):   | 
|||
    * a title,  | 
  |||
*Introducing caps-restoration for the language pair es-pt.  | 
|||
    * reasons why Google and Apertium should sponsor it,  | 
  |||
*Understanding better the capitalization rules.  | 
|||
    * a description of how and who it will benefit in society,  | 
  |||
Week 2 (June 03-09):  | 
|||
    * and a detailed work plan (including, if possible, a schedule with milestones and deliverables).  | 
  |||
*Implementing the existing rules.  | 
|||
*Adding more rules targeting common scenarios.  | 
|||
Week 3 (June 10-16):  | 
|||
*Continuing to create new capitalization rules.  | 
|||
*Beginning to transition capitalization to monolingual modules.  | 
|||
Week 4 (June 17-23):  | 
|||
*Creating tests.  | 
|||
*Correcting errors.  | 
|||
*Updating documentation.  | 
|||
| ⚫ | |||
Week 5 (June 24-30):   | 
|||
| ⚫ | |||
Week 6 (July 01-07):  | 
|||
* Week 2:  | 
  |||
* Week 3:  | 
  |||
* Week 4:  | 
  |||
Week 7 (July 12-18):  | 
|||
| ⚫ | |||
Week 8 (July 19-25):  | 
|||
* Week 6:  | 
  |||
* Week 7:  | 
  |||
* Week 8:  | 
  |||
* '''Deliverable #2'''  | 
  * '''Deliverable #2'''  | 
||
Week 9,10 (July 26 - August 07):   | 
|||
* Week 10:  | 
  |||
* Week 11:  | 
  |||
* Week 12:  | 
  |||
| ⚫ | |||
Week 11,12 (August 08 - 19):  | 
|||
Include time needed to think, to program, to document and to disseminate.  | 
  |||
If you are intending to disseminate to a conference, which conference are you intending to submit to. Make sure  | 
  |||
to factor in time taken to run any experiments/evaluations and write them up in your work plan.  | 
  |||
| ⚫ | |||
List your skills and give evidence of your qualifications. Tell us what is your current field of study,   | 
  |||
major, etc. Convince us that you can do the work.   | 
  |||
== Other Summer Plans ==  | 
|||
List any non-Summer-of-Code plans you have for the Summer, especially employment, if you are applying for   | 
  |||
I will have final exams in two last weeks of June so i will dedicate my self 10 hours per week. In the rest I can dedicate 30 hours per week. Vacation in July and August so I can devote 40 hours per week  | 
|||
internships, and class-taking. Be specific about schedules and time commitments. we would like to be sure you have   | 
  |||
.  | 
|||
at least 30 free hours a week to develop for our project.  | 
  |||
Latest revision as of 04:35, 2 April 2024
Contents
- 1 Contact Information
 - 2 Background
 - 3 Skills
 - 4 Why am I interested in Apertium?
 - 5 Which of the published tasks am I interested in? What do I plan to do?
 - 6 Proposal
 - 7 Why should Google and Apertium sponsor it?
 - 8 How and who will it benefit in society?
 - 9 Coding Challenge
 - 10 Work plan
 - 11 Other Summer Plans
 
Contact Information[edit]
- Name: Pedro Manicardi Soares
 - Email: pedromanic@usp.br
 - Github: https://github.com/PedroManicardi
 - IRC: pedro_manic
 - Timezone: UTC-2
 
Background[edit]
A fourth year computer engineering student at the University of São Paulo in Brazil.
Native Language: Portuguese
Other Languages: English (advanced), Spanish (basics)
Skills[edit]
- C
 - C++
 - xml
 - python
 - linux
 
Why am I interested in Apertium?[edit]
- Apertium has an active community, with people to help you when you are in difficulty.
 - Learn and contribute to open-source projects and know how they work.
 - Apertium is a PLN community and I have specialized in this field during my undergraduate studies.
 
Which of the published tasks am I interested in? What do I plan to do?[edit]
I am interested in the task "Add Capitalization Handling Module to a Language Pair". My goal is to add Capitalization rules to Spanish-Portuguese(es-pt) pair.
Proposal[edit]
Deliverables[edit]
Why should Google and Apertium sponsor it?[edit]
Correct capitalization will improve the experience of millions of people when using MT. This project will contribute to the community, with the potential to use the rules for more language-pairs. For more, once Google sponsor, will contribute to open source development.
How and who will it benefit in society?[edit]
Coding Challenge[edit]
I implemented the modules for extracting and restoring capitalization for the es-pt language pair. The code cand be find here. This program have the capitalization handling module, so sets the first word of a sentence to uppercase.
Work plan[edit]
Community bonding period (May 1 - 26):[edit]
- Get to know better the tools of Apertium and its community
 - See how capitalization of uppercase letters is currently implemented
 - Learn more about XML tools
 - Identify issues with the current implementation
 
Work Period (May 27 - August 26)[edit]
Week 1 (May 27 - July 02):
- Introducing caps-restoration for the language pair es-pt.
 - Understanding better the capitalization rules.
 
Week 2 (June 03-09):
- Implementing the existing rules.
 - Adding more rules targeting common scenarios.
 
Week 3 (June 10-16):
- Continuing to create new capitalization rules.
 - Beginning to transition capitalization to monolingual modules.
 
Week 4 (June 17-23):
- Creating tests.
 - Correcting errors.
 - Updating documentation.
 - Deliverable #1
 
Week 5 (June 24-30):
Week 6 (July 01-07):
Week 7 (July 12-18):
Week 8 (July 19-25):
- Deliverable #2
 
Week 9,10 (July 26 - August 07):
Week 11,12 (August 08 - 19):
Project completed
Other Summer Plans[edit]
I will have final exams in two last weeks of June so i will dedicate my self 10 hours per week. In the rest I can dedicate 30 hours per week. Vacation in July and August so I can devote 40 hours per week .