Difference between revisions of "User:Jalopeura/GSOC2010Application"

From Apertium
Jump to navigation Jump to search
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
In progress -- NOT a final draft!
 
 
 
== Name ==
 
== Name ==
 
Sean Healy
 
Sean Healy
Line 6: Line 4:
 
== E-mail address ==
 
== E-mail address ==
 
Gmail: sean.max
 
Gmail: sean.max
  +
 
Hotmail: jalopeura
 
Hotmail: jalopeura
   
 
== Other information that may be useful to contact you ==
 
== Other information that may be useful to contact you ==
  +
  +
IRC: SeanH
   
 
== Why is it you are interested in machine translation? ==
 
== Why is it you are interested in machine translation? ==
  +
  +
I recently went back to school after eight years as a professional programmer. I decided to combine my professional abilities with the interest I've always had in languages and go into Natural Language Processing. I am currently in the first year of my Master's degree program.
   
 
== Why is it that you are interested in the Apertium project? ==
 
== Why is it that you are interested in the Apertium project? ==
  +
  +
I am interested in seeing how people outside of my particular academic program are doing Machine Translation. Apertium's open-source nature means I can work on it without being a student at a particular university or an employee of a particular company.
   
 
== Which of the published tasks are you interested in? What do you plan to do? ==
 
== Which of the published tasks are you interested in? What do you plan to do? ==
Line 22: Line 27:
 
=== Why Google and Apertium should sponsor it ===
 
=== Why Google and Apertium should sponsor it ===
   
A new language pair is always good for Apertium's visibility, and it will of course benefit users needing this particular language pair. As one Apertium contributor put it, language pairs are Apertium's "bread and butter", so this project will contribute to Apertium in a meaningful way.
+
A new language pair is always good for Apertium's visibility; as one Apertium contributor put it, language pairs are Apertium's "bread and butter". French is third most widely spoken Romance language, after Spanish and Portuguese. As such, within the domain of Romance languages, pairings with French would seem to be the next logical target for Apertium.
   
 
=== How and who will it benefit in society ===
 
=== How and who will it benefit in society ===
  +
  +
French is the third most widely used language in the European Union, after English and German (http://wapedia.mobi/en/Languages_of_the_European_Union?t=3.). Simply put, more information is available in French than in Portuguese in the EU. An open source machine translation system from French to Portuguese would be helpful for Portuguese speakers.
   
 
=== Work plan ===
 
=== Work plan ===
   
  +
Community bonding period (26.04-23.05): Familiarize self with Apertium dictionary and transfer rules formats
Tasks to complete:
 
Convert bilingual dictionary to Apertium format
 
Create French monolingual dictionary from existing pairs
 
Add words from bilingual dictionary not already present
 
Verify coverage
 
Create Portuguese monolingual dictionary from existing pairs
 
Add words from bilingual dictionary not already present
 
Verify coverage
 
Create transfer rules
 
Test/debug
 
   
  +
*Week 1 (24.05-30.05): Generate dictionaries using crossdics
Deliverables: Dictionaries for this pair
 
  +
*Week 2 (31.05-06.06): Verify 100% coverage for closed categories and inflection paradigms and 80% coverage otherwise in French monolingual dictionary
Final deliverable: Functioning language pair
 
  +
*Week 3 (07.06-13.06): Verify 100% coverage for closed categories and inflection paradigms and 80% coverage otherwise in Portuguese monolingual dictionary
  +
*Week 4 (14.06-20.06): Verify all words from monolingual dictionaries are present in bilingual dictionary (using testvoc); copy transfer rules from Spanish-French as a starting point.
   
  +
Deliverable #1: Dictionaries and first ("Spanishesque") version of translator
== List your skills and give evidence of your qualifications ==
 
   
  +
*Week 5 (21.06-27.06): Transfer rules
I am a Masters student in Natural Language Processing; I will defend my thesis in June 2011. I am interested in doing a GSOC project for Apertium because I would like to see how other people are doing machine translation.
 
  +
*Week 6 (28.06-04.07): Transfer rules
  +
*Week 7 (05.07-11.07): Transfer rules
  +
*Week 8 (12.07-18.07): Transfer rules
  +
  +
Deliverable #2: Second version of translator
  +
  +
*Week 9 (19.07-25.07): Test on large blocks of text; debug rules and dictionaries, add entries as necessary
  +
*Week 10 (26.07-01.08): Continuation of testing
  +
*Week 11 (02.08-08.08): Generate statistics (correction rates); documentation
  +
*Week 12 (09.08-15.08): Final evaluation
  +
  +
 
== List your skills and give evidence of your qualifications ==
   
I am a native speaker of English, and have the following other language skills appropriate to my project idea:
+
I have the following other language skills appropriate to my project idea:
   
 
French: Minored in it, good explicit knowledge of grammar, but until recently not much practice in speaking it with native speakers. However, I have been studying in France for the last six months and steadily improving.
 
French: Minored in it, good explicit knowledge of grammar, but until recently not much practice in speaking it with native speakers. However, I have been studying in France for the last six months and steadily improving.
Line 54: Line 67:
 
As far as programming, I know both Perl and PHP. I was a professional programmer for eight years before returning to school, and have experience with additional technologies, but these seem the most relevant to the project.
 
As far as programming, I know both Perl and PHP. I was a professional programmer for eight years before returning to school, and have experience with additional technologies, but these seem the most relevant to the project.
   
I have participated, both through mailing list discussions and code contributions, to multiple Perl modules. I have also been following the development of the Haiku operating system. I have not yet contributed any code to the project, but I have done programming in the OS.
+
I have participated, both through mailing list discussions and code contributions, to multiple Perl modules. I have also been following the development of the Haiku operating system. I have not yet contributed any code to the project, but I have done programming in the OS.
   
 
== List any non-Summer-of-Code plans you have for the Summer ==
 
== List any non-Summer-of-Code plans you have for the Summer ==

Latest revision as of 16:58, 9 April 2010

Name[edit]

Sean Healy

E-mail address[edit]

Gmail: sean.max

Hotmail: jalopeura

Other information that may be useful to contact you[edit]

IRC: SeanH

Why is it you are interested in machine translation?[edit]

I recently went back to school after eight years as a professional programmer. I decided to combine my professional abilities with the interest I've always had in languages and go into Natural Language Processing. I am currently in the first year of my Master's degree program.

Why is it that you are interested in the Apertium project?[edit]

I am interested in seeing how people outside of my particular academic program are doing Machine Translation. Apertium's open-source nature means I can work on it without being a student at a particular university or an employee of a particular company.

Which of the published tasks are you interested in? What do you plan to do?[edit]

Title[edit]

French-Portuguese language pair for Apertium

Why Google and Apertium should sponsor it[edit]

A new language pair is always good for Apertium's visibility; as one Apertium contributor put it, language pairs are Apertium's "bread and butter". French is third most widely spoken Romance language, after Spanish and Portuguese. As such, within the domain of Romance languages, pairings with French would seem to be the next logical target for Apertium.

How and who will it benefit in society[edit]

French is the third most widely used language in the European Union, after English and German (http://wapedia.mobi/en/Languages_of_the_European_Union?t=3.). Simply put, more information is available in French than in Portuguese in the EU. An open source machine translation system from French to Portuguese would be helpful for Portuguese speakers.

Work plan[edit]

Community bonding period (26.04-23.05): Familiarize self with Apertium dictionary and transfer rules formats

  • Week 1 (24.05-30.05): Generate dictionaries using crossdics
  • Week 2 (31.05-06.06): Verify 100% coverage for closed categories and inflection paradigms and 80% coverage otherwise in French monolingual dictionary
  • Week 3 (07.06-13.06): Verify 100% coverage for closed categories and inflection paradigms and 80% coverage otherwise in Portuguese monolingual dictionary
  • Week 4 (14.06-20.06): Verify all words from monolingual dictionaries are present in bilingual dictionary (using testvoc); copy transfer rules from Spanish-French as a starting point.

Deliverable #1: Dictionaries and first ("Spanishesque") version of translator

  • Week 5 (21.06-27.06): Transfer rules
  • Week 6 (28.06-04.07): Transfer rules
  • Week 7 (05.07-11.07): Transfer rules
  • Week 8 (12.07-18.07): Transfer rules

Deliverable #2: Second version of translator

  • Week 9 (19.07-25.07): Test on large blocks of text; debug rules and dictionaries, add entries as necessary
  • Week 10 (26.07-01.08): Continuation of testing
  • Week 11 (02.08-08.08): Generate statistics (correction rates); documentation
  • Week 12 (09.08-15.08): Final evaluation


List your skills and give evidence of your qualifications[edit]

I have the following other language skills appropriate to my project idea:

French: Minored in it, good explicit knowledge of grammar, but until recently not much practice in speaking it with native speakers. However, I have been studying in France for the last six months and steadily improving.

Portuguese (Brazilian): Lived for three years with a Brazilian roommate while taking Portuguese classes; we spoke mostly Portuguese in the apartment.

As far as programming, I know both Perl and PHP. I was a professional programmer for eight years before returning to school, and have experience with additional technologies, but these seem the most relevant to the project.

I have participated, both through mailing list discussions and code contributions, to multiple Perl modules. I have also been following the development of the Haiku operating system. I have not yet contributed any code to the project, but I have done programming in the OS.

List any non-Summer-of-Code plans you have for the Summer[edit]

I have a large due June 2, and I must present it at the end of June, so I will have other obligations during Weeks 1, 2 and 6. I foresee no difficulties in finding 30 hours during Weeks 2 and 6, but during Week 1 I may be unable to spend a full 30 hours on this project. I have no other outside constraints on my time during the 12 weeks of GSOC 2010.