Difference between revisions of "User talk:Rlopez/Application GSoC-2014"
Line 14: | Line 14: | ||
== Why is it you are interested in machine translation? == |
== Why is it you are interested in machine translation? == |
||
I am master student majoring in Natural Language Processing, and I like many tasks of this area. The machine translation is a task with various challenges. Nowadays, with the growth of internet, it is very common to find many non-standard texts like spelling mistakes, internet abbreviation, etc. These texts represent a big challenge to the machine translation. I'm so interested in finding better translations using a good preprocessing of these kinds of texts. |
|||
== Why is it that you are interested in the Apertium project? == |
== Why is it that you are interested in the Apertium project? == |
||
I think Apertium is one of the most important NLP open-source project. I'm very interested in the NLP area and I worked in some projects about this area. But, unfortunately I haven't had the opportunity to contribute to any open source project, and I think that Apertium is the right place to start. In addition, Apertium has many tasks which are so amazing to doing. I would really like to participate in this organitation because I have a big desire to contribute to the open source community. |
|||
== Which of the published tasks are you interested in? What do you plan to do? == |
== Which of the published tasks are you interested in? What do you plan to do? == |
Revision as of 10:49, 20 March 2014
Contents
- 1 Contact information
- 2 Why is it you are interested in machine translation?
- 3 Why is it that you are interested in the Apertium project?
- 4 Which of the published tasks are you interested in? What do you plan to do?
- 5 List your skills and give evidence of your qualifications
- 6 List any non-Summer-of-Code plans you have for the Summer
- 7 About me
Contact information
Name: Roque Enrique López Condori
Email: rlopezc27@gmail.com
IRC: Roque
Personal page: http://maskaygroup.com/rlopez/
Github repo: https://github.com/rlopezc27
Assembla repo: https://www.assembla.com/profile/roque27
Why is it you are interested in machine translation?
I am master student majoring in Natural Language Processing, and I like many tasks of this area. The machine translation is a task with various challenges. Nowadays, with the growth of internet, it is very common to find many non-standard texts like spelling mistakes, internet abbreviation, etc. These texts represent a big challenge to the machine translation. I'm so interested in finding better translations using a good preprocessing of these kinds of texts.
Why is it that you are interested in the Apertium project?
I think Apertium is one of the most important NLP open-source project. I'm very interested in the NLP area and I worked in some projects about this area. But, unfortunately I haven't had the opportunity to contribute to any open source project, and I think that Apertium is the right place to start. In addition, Apertium has many tasks which are so amazing to doing. I would really like to participate in this organitation because I have a big desire to contribute to the open source community.
Which of the published tasks are you interested in? What do you plan to do?
I am interested in the “Improving support for non-standard text input” task. I plan to work with English, Spanish and Portuguese languages.
Description
Why Google and Apertium should sponsor this project?
Machine translation systems are pretty fragile working with non-standard input like spelling mistakes, internet abbreviation, etc. These types of inputs reduce the machine translation performance. Currently Apertium has no a preprocessing module to normalize these type of non-standard input. With a good text normalitation module, Apertium can significantly increases the translation quality and gets more human translation. Also this module would be helpful for other NLP tasks.
How and who it will benefit in society?
English, Spanish and Portuguese are some of the most spoken languages. The implementation of this module will improve the translations and it benefit to Apertium users that are learning these languages.
Work Plan
Coding Challenge
I have finished the Coding Challenge and, in addition to the English, I aggregate support for two languages (Spanish and Portuguese). All the instructions and explanations are in my Github Repo.
Community Bonding Period
- Familiaritation with Apertium tool and community.
- Find and analyze language resources for English, Spanish and Portuguese.
- Make preparations that will be used in the implementation.
Week Plan
GSoC Week | Tasks |
---|---|
Week 1 | A |
Week 2 | A |
Week 3 | A |
Week 4 | A |
First Deliverable | |
Week 5 | A |
Week 6 | A |
Week 7 | A |
Week 8 | A |
Second Deliverable | |
Week 9 | A |
Week 10 | A |
Week 11 | A |
Week 12 | A |
Finalitation |
List your skills and give evidence of your qualifications
I am currently a 2-nd year master student majoring in Natural Language Processing. After my graduation I worked during 2 years and some months in three NLP projects that have a direct relation with this project. With my jobs and studies I have gained the following skills:
Programming Skills: During my undergraduate period, I took courses about Python, Java and C++ programing. I finished my undergraduate placed in the top 5 position. In my second and third jobs I used Python as a main programming language. At the Master, I am using Python more frequently. Some of my works are in my Github repo.
Natural Language Processing: I worked in three NLP projects. Some of the main topics are: text-processing, sentiment analysis, text classification, summaritation, etc.
Spanish, Portuguese and English language: I'm a Spanish native speaker. I'm living in Brazil for over a year, which improves my Portuguese. About my English, I studied for two years.
I develop all of my own software under free licenses and make an effort to work in groups as often as possible. However, unfortunately I can't claim much in terms of experience with open-source projects.
List any non-Summer-of-Code plans you have for the Summer
During the May 19 and August 18 period, mainly my activities will be focused in the GSoC project and the progress of my master’s work. This year I don’t have courses at my master, therefore, my tasks are related to research activities. I do not pretend to make any trip, I will stay in São Paulo–Brazil.
About me
I studied at San Agustin National University in Perú, where I gained my BA (Hons) degree in System Engineering. After my studies I worked two years. In the first year, I worked in a research project about Automatic Summaritation of medical records, as a result of my work, I got some publications about medical record classification. In the second year I was member and worked in the Lindexa startup (Natural Language Processing startup), which was one of 10 startups that won the Wayra-Peru 2011 competition.
From Perú I moved to Brazil, where I am doing my Master in Computer Science at São Paulo University (http://www.nilc.icmc.usp.br/nilc/index.php). My research topic is about Opinion Summaritation. In Brazil, the last year, I worked in the DicionarioCriativo startup, which is an online dictionary that relates words, concepts, phrases, quotes, images and other contents by semantic fields.
This is the first year applying to the GSoC. I hope to be part of your team in this summer.