User:Jmcejuela/GSoC11Application

From Apertium
Jump to navigation Jump to search

I am a Master Computer Science student at Technical University of Munich (TUM), currently in my fourth-last semester and about to start my Master Thesis. As I announced in the mailing list my intention is to combine into the same endeavor both my thesis and the GSoC project (possible both from my university and from Google) I desire such a combination because I want to do both but due to the entire overlap between them, considering the European/German academic calendar, it would very difficult to do them independently for both require full-time commitment.

Having a solid background in transducers and their mathematical foundations, for my project I want to work extensively on transducers and this is my highest motivator. Coming from a more training/learning world, being Apertium rule-based, and also considering that my thesis should expand the work of the GSoC project to comply with a master thesis's higher effort/academic requirements (exactly 6 months at TUM), for my project I expand and elaborate further on an idea discussed with Jimregan on the use of transducers in replacement of flag diacritics, as used in hfst, and include a part for automatic topology learning to generate such transducers. Furthermore, I suggest my own idea which involves mostly topology learning and weight training using one of the corpus you list in your corpora page, the Southeast European Times for considering it particularly interesting due to its aligned structure for multiple languages.

The organization for such a combined thesis/project if you accepted my proposal (one thereof) would be probably as follows: Hasan Ibne Akram would be my official advisor for my thesis at TUM, while one of you would be my official mentor for the GSoC project. Please tell me if you wanted to be also my official thesis advisor; we would have to discuss such possible arrangement.

  • Name: Juan Miguel Cejuela
  • Email: juanmi@jmcejuela.com
  • Citizenship: Spanish, European Union
  • Location: Munich, Germany
  • Position: MSc Computer Science student at Technical University of Munich.
  • irc, skype, twitter, ...: jmcejuela


Why is it you are interested in machine translation?

As my background & skills show, see below, I've followed a work/research that directly conduct me to this. Despite not having yet worked directly in machine translation, I've had for many years a strong desire in it, and now I'd love to invest the effort and time of my master thesis to finally get dirty with it. I'm well acquainted with many tools that are used in machine translation, including transducers, automata, HMMs, grammatical parsers, programming languages parsers, text mining, stemmers, string edit distance algorithms, ...

Besides, I'm myself an avid language learner and currently speak Spanish, English, and German ---apart of programming languages, of course. I find languages fascinating for they frame and make possible communication, both between humans, computers, and maybe one day humans-computers. Also and although, as analogy with the computer science world, all languages are Turing machine complete, in practice it's extremely different how to convey different ideas in different languages, and some languages are best suited for particular concepts. Furthermore, the well understanding and translation of languages plays a crucial role in the development of this already globalized world.


Bottom line: I'd love to work on machine translation.


Why is it that they are interested in the Apertium project?

Which of the published tasks are you interested in? What do you plan to do?

TODO: Being obliged to expand the GSoC project, I try to delimit officially both things as far as it's me possible and see now both things.

Include a proposal, including

   * a title,
   * reasons why Google and Apertium should sponsor it,
   * a description of how and who it will benefit in society,
   * and a detailed work plan (including, if possible, a brief schedule with milestones and deliverables).

Include time needed to think, to program, to document and to disseminate.

Background & Skills

List your skills and give evidence of your qualifications. Tell us what is your current field of study, major, etc. Convince us that you can do the work. In particular we would like to know whether you have programmed before in open-source projects.


Other Commitments

I have for the following 6/7 months no other important commitment and I will focus entirely on my thesis/project.