Difference between revisions of "User:Blanda.alex"
Blanda.alex (talk | contribs) |
Blanda.alex (talk | contribs) |
||
(50 intermediate revisions by 2 users not shown) | |||
Line 15: | Line 15: | ||
==Why is it you are interested in machine translation? == |
==Why is it you are interested in machine translation? == |
||
As a student in the field of Computer Science, I believe that one of the most intriguing paths that one can take is that of Artificial Intelligence. I considered machine translation to be one of the great applications of the AI field. Also my interest comes from the fact that for my next year's final thesis I |
As a student in the field of Computer Science, I believe that one of the most intriguing paths that one can take is that of Artificial Intelligence. I considered machine translation to be one of the great applications of the AI field. Also my interest comes from the fact that for my next year's final thesis I will be working on a project related to natural language processing and pattern recognition. |
||
So I considered that being part of developing a project related to a machine translation system would be a great and most useful experience. |
So I considered that being part of developing a project related to a machine translation system would be a great and most useful experience. |
||
Line 29: | Line 29: | ||
===Why should Google and Apertium sponsor it?=== |
===Why should Google and Apertium sponsor it?=== |
||
The language pair I would like to develop would be, in my opinion, a great addition to the Apertium set of language pairs. The fact that some work is already done on the pair, is a |
The language pair I would like to develop would be, in my opinion, a great addition to the Apertium set of language pairs. The fact that some work is already done on the pair, is a major advantage, and increases the chances of success of the project. Also, given the fact that the the selected language-pair is not very well represented in other free machine translation systems (see resources [1] and [2] for a list of considered machine translation systems), I believe that a fr-ro pair would be a most useful release. |
||
===How and who it will benefit?=== |
===How and who it will benefit?=== |
||
Given the fact that I intend to create a very thorough documentation, I believe that the language pair could stand as a tutorial for developing other language pairs inside the Apertium project, such as: italian-romanian or aromanian-romanian. |
|||
Also, I believe that this project will offer a valuable free educational and cultural tool, which is in the spirit of open-source and of the Apertium organization. |
|||
===What do you plan to do?=== |
===What do you plan to do?=== |
||
I have several aspects in mind, that I plan to implement during the project: |
|||
- clean and repair the existing code(tags,paradigm definitions, transfer rules); |
|||
- add entries to the bilingual dictionary, as well as to the Romanian and French monodictionaries; |
|||
- create scripts for multiple purposes: adding data, testing; |
|||
- work on transfer rules; |
|||
- solve disambiguation problems; |
|||
- produce comprising documentation, so that the language pair can be easily maintained; |
|||
===Work already done=== |
|||
- installed Apertium and prepared the necessary environment for developing an Apertium project; |
|||
- familiarized myself with the Apertium system by working on the coding challenge; |
|||
- got to know the community, got used to the means of communication |
|||
- read part of the available documentation ; |
|||
==Proposed schedule== |
==Proposed schedule== |
||
===Before the coding period=== |
===Before the coding period=== |
||
- practice working with the Apertium system; |
|||
- stay connected with the community in order to find the best solutions, for emerging questions and problems; |
|||
===During the coding period=== |
|||
- search for online resources regarding the languages; |
|||
==Non-GSOC activities== |
|||
- think of and try to implement auxiliary tools and scripts that would be useful to the project; |
|||
GsoC 2012 would be my primary concern this summer. The only non Gsoc-activities during the coding period would be my final exams, that end on 31 May. However, I am confident that I will be able to allocate at least 35 hours of week, during the summer, to working on the Apertium project. |
|||
===During the coding period=== |
|||
'''Week 1-2:'''(21.05-3.06) |
|||
- investigate the monolingual dictionaries and the bilingual dictionary |
|||
- clean and repair problems related to the the dictionaries: regarding entries; regarding structure of the dictionary: tags, paradigm definitions |
|||
- enhance structure with needed information |
|||
- scripts for testing entries in monolingual dictionaries |
|||
- documentation: comments on data files and scripts produced |
|||
''Deliverable 1: Improved monolingual and bilingual dictionaries'' |
|||
'''Week 3-5:'''(4.06- 24.06) |
|||
- add new entries to the monolingual ro dictionary |
|||
- add new entries to the monolingual fr dictionary |
|||
- add new entries to the bilingual dictionary |
|||
- scripts for adding data |
|||
- scripts for testing translations between fr-ro and ro-fr |
|||
- begin work on transfer rules |
|||
- documentation: comments on data files and scripts produced |
|||
''Deliverable 2: Complete monolingual dictionaries and bilingual dictionary'' ( '''in time for midterm evaluation''') |
|||
'''Week 6-9:'''(25.06- 22.07) |
|||
- investigate existing transfer rules files (search for possible errors or improvements) |
|||
- identify and write needed rules |
|||
- focus on frequency |
|||
- structural transfer (insertion, deletion, substitution, reordering) |
|||
- advanced structural transfer |
|||
- identify types of disambiguation problems in both languages |
|||
- solve disambiguation problems |
|||
''Deliverable 3: Updated transfer rule files'' |
|||
'''Week 10-12:'''(23.07- 12-08) |
|||
- manual and automated testing of the system |
|||
- run generation tests |
|||
- run vocabulary tests |
|||
- testvoc |
|||
- solve problems that may arise after testing |
|||
- write documentation: readme files and logs |
|||
''Deliverable 4: Release quality fr-ro language pair and documentation'' ( '''in time for suggested "pencils down" date''') |
|||
'''Week 13:'''(13.08- 19-08) |
|||
- improve documentation, last minute modifications on code |
|||
- evaluation of the language pair produced |
|||
'''Final submission''' |
|||
#REDIRECT [[Target page name]] |
|||
==Bio== |
==Bio== |
||
Line 51: | Line 156: | ||
Mainly, I have worked with with C/C++, Java and Python but I have also basic knowledge of PHP,Javascript,XML,HTML,CSS. I have also worked in Octave(Matlab) and some functional programming languages(Haskell, Clips, Scheme). Regarding aspects more related to machine translation systems, I was enrolled in courses that dealt with formal languages, finite state machines and automata theory (as example of applications, I would mention implementing a parser using flex). |
Mainly, I have worked with with C/C++, Java and Python but I have also basic knowledge of PHP,Javascript,XML,HTML,CSS. I have also worked in Octave(Matlab) and some functional programming languages(Haskell, Clips, Scheme). Regarding aspects more related to machine translation systems, I was enrolled in courses that dealt with formal languages, finite state machines and automata theory (as example of applications, I would mention implementing a parser using flex). |
||
I cannot say that I have worked on a real open-source project before. However, I am more than familiar with the open-source philosophy, particularly because I have worked mostly with open-source tools and technologies, but also because our university made it a goal to encourage any activity related to open-source. In addition, I can mention an open-source project that I |
I cannot say that I have worked on a real open-source project before. However, I am more than familiar with the open-source philosophy, particularly because I have worked mostly with open-source tools and technologies, but also because our university made it a goal to encourage any activity related to open-source. In addition, I can mention an open-source project that I started for a contest, but is still in a very incipient phase(a brief description here:http://ceata.org/proiecte/get-involved/wiki). |
||
==Non-GSOC activities== |
|||
GsoC 2012 would be my primary concern this summer. The only non Gsoc-activities during the coding period would be my final exams, that end on 30 May(that means a little over a week of conflicting activities: 21-30 May). However, I am confident that I will be able to dedicate a minimum of 35 hours of week, during the summer, to working on the Apertium project. |
|||
==Resources== |
==Resources== |
||
[1] |
[1] http://en.wikipedia.org/wiki/Machine_translation#Applications |
||
[2] [http://en.wikipedia.org/wiki/Comparison_of_machine_translation_applications] |
|||
[2] http://en.wikipedia.org/wiki/Comparison_of_machine_translation_applications |
|||
==Coding challenge == |
|||
https://apertium.svn.sourceforge.net/svnroot/apertium/branches/gsoc2012/dutzy/ |
|||
[[Category:GSoC 2012 Student Proposals|Blanda.alex]] |
Latest revision as of 00:36, 13 April 2012
Contents
- 1 Google Summer of Code 2012 Application - adopting a new language pair fr-ro
Google Summer of Code 2012 Application - adopting a new language pair fr-ro[edit]
Contact information[edit]
Name: Alexandru Blanda
Email: blanda.alexandru@gmail.com
Alternative Email: ioan.blanda@cti.pub.ro
IRC: blanda(on #apertium)
Phone: +40 745628343
Why is it you are interested in machine translation?[edit]
As a student in the field of Computer Science, I believe that one of the most intriguing paths that one can take is that of Artificial Intelligence. I considered machine translation to be one of the great applications of the AI field. Also my interest comes from the fact that for my next year's final thesis I will be working on a project related to natural language processing and pattern recognition. So I considered that being part of developing a project related to a machine translation system would be a great and most useful experience.
Why is it that you are interested in the Apertium project?[edit]
In the beginning I was drawn towards the topic. Programming and linguistics is a great combination. As I stated earlier, I have a special interest in the field of natural language processing, and before entering the university I participated in a few extra-curricular activities regarding linguistics. I found out that the Apertium project has a comprehensive and well maintained documentation that proves to be of great help, and the community is very responsive to whatever problems I may have. These aspects made me consider Apertium to be a great choice for applying to GsoC 2012.
Which of the published tasks are you interested in?[edit]
Adopting an orphaned language pair: french-romanian (fr-ro)
Why should Google and Apertium sponsor it?[edit]
The language pair I would like to develop would be, in my opinion, a great addition to the Apertium set of language pairs. The fact that some work is already done on the pair, is a major advantage, and increases the chances of success of the project. Also, given the fact that the the selected language-pair is not very well represented in other free machine translation systems (see resources [1] and [2] for a list of considered machine translation systems), I believe that a fr-ro pair would be a most useful release.
How and who it will benefit?[edit]
Given the fact that I intend to create a very thorough documentation, I believe that the language pair could stand as a tutorial for developing other language pairs inside the Apertium project, such as: italian-romanian or aromanian-romanian. Also, I believe that this project will offer a valuable free educational and cultural tool, which is in the spirit of open-source and of the Apertium organization.
What do you plan to do?[edit]
I have several aspects in mind, that I plan to implement during the project:
- clean and repair the existing code(tags,paradigm definitions, transfer rules);
- add entries to the bilingual dictionary, as well as to the Romanian and French monodictionaries;
- create scripts for multiple purposes: adding data, testing;
- work on transfer rules;
- solve disambiguation problems;
- produce comprising documentation, so that the language pair can be easily maintained;
Work already done[edit]
- installed Apertium and prepared the necessary environment for developing an Apertium project;
- familiarized myself with the Apertium system by working on the coding challenge;
- got to know the community, got used to the means of communication
- read part of the available documentation ;
Proposed schedule[edit]
Before the coding period[edit]
- practice working with the Apertium system;
- stay connected with the community in order to find the best solutions, for emerging questions and problems;
- search for online resources regarding the languages;
- think of and try to implement auxiliary tools and scripts that would be useful to the project;
During the coding period[edit]
Week 1-2:(21.05-3.06)
- investigate the monolingual dictionaries and the bilingual dictionary
- clean and repair problems related to the the dictionaries: regarding entries; regarding structure of the dictionary: tags, paradigm definitions
- enhance structure with needed information
- scripts for testing entries in monolingual dictionaries
- documentation: comments on data files and scripts produced
Deliverable 1: Improved monolingual and bilingual dictionaries
Week 3-5:(4.06- 24.06)
- add new entries to the monolingual ro dictionary
- add new entries to the monolingual fr dictionary
- add new entries to the bilingual dictionary
- scripts for adding data
- scripts for testing translations between fr-ro and ro-fr
- begin work on transfer rules
- documentation: comments on data files and scripts produced
Deliverable 2: Complete monolingual dictionaries and bilingual dictionary ( in time for midterm evaluation)
Week 6-9:(25.06- 22.07)
- investigate existing transfer rules files (search for possible errors or improvements)
- identify and write needed rules
- focus on frequency
- structural transfer (insertion, deletion, substitution, reordering)
- advanced structural transfer
- identify types of disambiguation problems in both languages
- solve disambiguation problems
Deliverable 3: Updated transfer rule files
Week 10-12:(23.07- 12-08)
- manual and automated testing of the system
- run generation tests
- run vocabulary tests
- testvoc
- solve problems that may arise after testing
- write documentation: readme files and logs
Deliverable 4: Release quality fr-ro language pair and documentation ( in time for suggested "pencils down" date)
Week 13:(13.08- 19-08)
- improve documentation, last minute modifications on code
- evaluation of the language pair produced
Final submission
- REDIRECT Target page name
Bio[edit]
I am a 3rd year Undergraduate student in the field of Computer Science, at the Faculty of Automatic Control and Computers, Polytechnic University of Bucharest.
Mainly, I have worked with with C/C++, Java and Python but I have also basic knowledge of PHP,Javascript,XML,HTML,CSS. I have also worked in Octave(Matlab) and some functional programming languages(Haskell, Clips, Scheme). Regarding aspects more related to machine translation systems, I was enrolled in courses that dealt with formal languages, finite state machines and automata theory (as example of applications, I would mention implementing a parser using flex).
I cannot say that I have worked on a real open-source project before. However, I am more than familiar with the open-source philosophy, particularly because I have worked mostly with open-source tools and technologies, but also because our university made it a goal to encourage any activity related to open-source. In addition, I can mention an open-source project that I started for a contest, but is still in a very incipient phase(a brief description here:http://ceata.org/proiecte/get-involved/wiki).
Non-GSOC activities[edit]
GsoC 2012 would be my primary concern this summer. The only non Gsoc-activities during the coding period would be my final exams, that end on 30 May(that means a little over a week of conflicting activities: 21-30 May). However, I am confident that I will be able to dedicate a minimum of 35 hours of week, during the summer, to working on the Apertium project.
Resources[edit]
[1] http://en.wikipedia.org/wiki/Machine_translation#Applications
[2] http://en.wikipedia.org/wiki/Comparison_of_machine_translation_applications
Coding challenge[edit]
https://apertium.svn.sourceforge.net/svnroot/apertium/branches/gsoc2012/dutzy/