Difference between revisions of "User:Dpenas/GSOC 2013 Application - Plain-text formats for application data"

From Apertium
Jump to navigation Jump to search
Line 65: Line 65:
 
Google Summer of Code would be my main plan for the whole summer. I finish the university's exams the 7th of June, so I might loose some time the first week to get to know the mentor, read the documentation etc. However, I'll be able to dedicate around 30 to 40 hours that week to the project.
 
Google Summer of Code would be my main plan for the whole summer. I finish the university's exams the 7th of June, so I might loose some time the first week to get to know the mentor, read the documentation etc. However, I'll be able to dedicate around 30 to 40 hours that week to the project.
   
[[Category:GSoC_2013_Student_proposals|Mihirrege]]
+
[[Category:GSoC_2013_Student_proposals|DPenas]]

Revision as of 19:04, 30 April 2013

Contact information

Name: Darío Penas Sabín

E-mail address: dario.penas[at]udc.es

A more private information can be provided to the mentor.

Why is that you are interested in machine translation?

I’m a computer engineering student and programming is one of the activities I have always enjoyed the most. Last year I’ve been involved in a natural language processing project using different tools such as NLTK and, even though it’s a really demanding area of study, I had a lot of fun learning new things.

Besides, I’m from Galicia, an autonomous community in the northwest of Spain with its own language and therefore I understand the problems and restrictions of less-spoken languages.

Why is that you are interested in the Apertium project?

I’m really amazed that an open source project like Apertium has been putting such a big effort on providing its services to almost every language they could, including those with a low number of speakers.It’s inspiring knowing that there will always be a group of people putting so much effort with “smaller” languages when big companies and even institutions don’t care. This reason, plus the reason I've written before, made Apertium an attractive option to get involved into the Google Summer of code.

Also, I’ve been an active free source user for over five years and I would love to participate in an open source project like this one and make my contribution to the community.

Which of the published tasks are you interested in? What do you plan to do?

I’m planning to do the proposed idea: “Plain-text formats for Apertium data”.

Work that I have already done

I’ve used and installed Apertium so I could get used to how it works. Also, I’ve been connected to the IRC and I’ve subscribed to the emailing list to get to know the mentors and the community better. I’ve also read the papers from InterNostrum [1] and MorphTrans [2] as well as learning to use XSLT and making some little examples to get to know it.

During this process I’ve had some doubts about how the system worked and I contacted with the mentor Mikel Forcada, who provided me with useful information.

Work plan

- Week 1-3: Since this will probably be the most difficult one to make, I'll be using part of the previous weeks to get used to how the whole files are and start coding the compiler. By the end of the 3rd week I may have it and it will be able to transform the .mt1, .mt2 and .mt3 Morphtrans' style input into XML.

- Week 3-4: Compiler that will do the contrary: take the XML and generate the Motphtrans' style input.

- Deliverable #1: The compiler previously mentioned.

- Week 5: Implement what has been decided in week 4 with the Apertium community.

- Week 6: Compiler that will convert the .dix files into interNostrum's style format (the second task)

- Week 7-8: Finishing that part and starting to do the contrary; converting the interNostrum's style format into .dix.

- Deliverable #2: The completed first compiler (from .dix into interNostrum's style format) and a "working beta" of the other one.

- Week 9-10: Work on correcting the errors of the "beta" compiler and finishing it by the end of the second week.

- Week 11-12: Finishing the documentation, final testing, etc.

Extra tasks:

- In case I'm able to finish the two tasks sooner than expected I would be glad to start working or helping in another area of the project.

Skills and qualifications

I’m in the 4th year of computing engineering. I’m comfortable programming in C, Java and Python as well as using Bison and Flex. I’ve also done some things in Pascal, Fortran, Ocaml, Matlab and Coq. I’ve studied a Natural Language subject this year and I've developed with some friends a software that, given a simple question, looks it up on Google, evaluates the web results, and obtains the (more or less) correct answer(s) [3]. We've talked about continuing developing the idea in a future to obtain better results.

I've also worked with compilers and I've programmed some things using flex/bison that can be found here [4] and here [5].

Non-Summer-of-Code plans

Google Summer of Code would be my main plan for the whole summer. I finish the university's exams the 7th of June, so I might loose some time the first week to get to know the mentor, read the documentation etc. However, I'll be able to dedicate around 30 to 40 hours that week to the project.