Difference between revisions of "User:Jalopeura/GSOC2010Application"

From Apertium
Jump to navigation Jump to search
Line 6: Line 6:
 
== E-mail address ==
 
== E-mail address ==
 
Gmail: sean.max
 
Gmail: sean.max
  +
 
Hotmail: jalopeura
 
Hotmail: jalopeura
   
 
== Other information that may be useful to contact you ==
 
== Other information that may be useful to contact you ==
  +
  +
IRC: SeanH
   
 
== Why is it you are interested in machine translation? ==
 
== Why is it you are interested in machine translation? ==
  +
  +
I recently went back to school after eight years as a professional programmer. I decided to combine my professional abilities with the interest I've always had in languages and go into Natural Language Processing. I am currently in the first year of my Master's degree program.
   
 
== Why is it that you are interested in the Apertium project? ==
 
== Why is it that you are interested in the Apertium project? ==
  +
  +
I am interested in seeing how people outside of my particular academic program are doing Machine Translation. Apertium's open-source nature means I can work on it without being a student at a particular university or an employee of a particular company.
   
 
== Which of the published tasks are you interested in? What do you plan to do? ==
 
== Which of the published tasks are you interested in? What do you plan to do? ==
Line 22: Line 29:
 
=== Why Google and Apertium should sponsor it ===
 
=== Why Google and Apertium should sponsor it ===
   
A new language pair is always good for Apertium's visibility, and it will of course benefit users needing this particular language pair. As one Apertium contributor put it, language pairs are Apertium's "bread and butter", so this project will contribute to Apertium in a meaningful way.
+
A new language pair is always good for Apertium's visibility; as one Apertium contributor put it, language pairs are Apertium's "bread and butter". French is third most widely spoken Romance language, after Spanish and Portuguese. As such, within the domain of Romance languages, pairings with French would seem to be the next logical target for Apertium.
   
 
=== How and who will it benefit in society ===
 
=== How and who will it benefit in society ===
  +
  +
French is the third most widely used language in the European Union, after English and German (http://wapedia.mobi/en/Languages_of_the_European_Union?t=3.). Simply put, more information is available in French than in Portuguese in the EU. An open source machine translation system from French to Portuguese would be helpful for Portuguese speakers.
   
 
=== Work plan ===
 
=== Work plan ===
Line 44: Line 53:
 
== List your skills and give evidence of your qualifications ==
 
== List your skills and give evidence of your qualifications ==
   
 
I have the following other language skills appropriate to my project idea:
I am a Masters student in Natural Language Processing; I will defend my thesis in June 2011. I am interested in doing a GSOC project for Apertium because I would like to see how other people are doing machine translation.
 
 
I am a native speaker of English, and have the following other language skills appropriate to my project idea:
 
   
 
French: Minored in it, good explicit knowledge of grammar, but until recently not much practice in speaking it with native speakers. However, I have been studying in France for the last six months and steadily improving.
 
French: Minored in it, good explicit knowledge of grammar, but until recently not much practice in speaking it with native speakers. However, I have been studying in France for the last six months and steadily improving.
Line 54: Line 61:
 
As far as programming, I know both Perl and PHP. I was a professional programmer for eight years before returning to school, and have experience with additional technologies, but these seem the most relevant to the project.
 
As far as programming, I know both Perl and PHP. I was a professional programmer for eight years before returning to school, and have experience with additional technologies, but these seem the most relevant to the project.
   
I have participated, both through mailing list discussions and code contributions, to multiple Perl modules. I have also been following the development of the Haiku operating system. I have not yet contributed any code to the project, but I have done programming in the OS.
+
I have participated, both through mailing list discussions and code contributions, to multiple Perl modules. I have also been following the development of the Haiku operating system. I have not yet contributed any code to the project, but I have done programming in the OS.
   
 
== List any non-Summer-of-Code plans you have for the Summer ==
 
== List any non-Summer-of-Code plans you have for the Summer ==

Revision as of 08:01, 8 April 2010

In progress -- NOT a final draft!

Name

Sean Healy

E-mail address

Gmail: sean.max

Hotmail: jalopeura

Other information that may be useful to contact you

IRC: SeanH

Why is it you are interested in machine translation?

I recently went back to school after eight years as a professional programmer. I decided to combine my professional abilities with the interest I've always had in languages and go into Natural Language Processing. I am currently in the first year of my Master's degree program.

Why is it that you are interested in the Apertium project?

I am interested in seeing how people outside of my particular academic program are doing Machine Translation. Apertium's open-source nature means I can work on it without being a student at a particular university or an employee of a particular company.

Which of the published tasks are you interested in? What do you plan to do?

Title

French-Portuguese language pair for Apertium

Why Google and Apertium should sponsor it

A new language pair is always good for Apertium's visibility; as one Apertium contributor put it, language pairs are Apertium's "bread and butter". French is third most widely spoken Romance language, after Spanish and Portuguese. As such, within the domain of Romance languages, pairings with French would seem to be the next logical target for Apertium.

How and who will it benefit in society

French is the third most widely used language in the European Union, after English and German (http://wapedia.mobi/en/Languages_of_the_European_Union?t=3.). Simply put, more information is available in French than in Portuguese in the EU. An open source machine translation system from French to Portuguese would be helpful for Portuguese speakers.

Work plan

Tasks to complete: Convert bilingual dictionary to Apertium format Create French monolingual dictionary from existing pairs Add words from bilingual dictionary not already present Verify coverage Create Portuguese monolingual dictionary from existing pairs Add words from bilingual dictionary not already present Verify coverage Create transfer rules Test/debug

Deliverables: Dictionaries for this pair Final deliverable: Functioning language pair

List your skills and give evidence of your qualifications

I have the following other language skills appropriate to my project idea:

French: Minored in it, good explicit knowledge of grammar, but until recently not much practice in speaking it with native speakers. However, I have been studying in France for the last six months and steadily improving.

Portuguese (Brazilian): Lived for three years with a Brazilian roommate while taking Portuguese classes; we spoke mostly Portuguese in the apartment.

As far as programming, I know both Perl and PHP. I was a professional programmer for eight years before returning to school, and have experience with additional technologies, but these seem the most relevant to the project.

I have participated, both through mailing list discussions and code contributions, to multiple Perl modules. I have also been following the development of the Haiku operating system. I have not yet contributed any code to the project, but I have done programming in the OS.

List any non-Summer-of-Code plans you have for the Summer

I have a large due June 2, and I must present it at the end of June, so I will have other obligations during Weeks 1, 2 and 6. I foresee no difficulties in finding 30 hours during Weeks 2 and 6, but during Week 1 I may be unable to spend a full 30 hours on this project. I have no other outside constraints on my time during the 12 weeks of GSOC 2010.