Difference between revisions of "User:Littleowl/Littleowl ff"

From Apertium
Jump to navigation Jump to search
m (formatting)
(No difference)

Revision as of 12:06, 9 April 2010

Apertium: Format filters (LaTeX) - DRAFT

GSoC on-line application

Abstract

My proposal is to facilitate the translation of LaTeX documents through the Apertium project, generating the Apertium-deslatex and Apertium-relatex between the LaTeX format specification and the Apertium stream format.

I have excluded the MediaWiki format from my proposal due to the complexity and difficulties expressed in the mailing list of the project. However, I would be very keen to include other formats in my proposal such as PDF as long as it would fit into the task and its deadlines. Content:

Content

Name: Carles Sanz Casañas

E-mail address: carles.sanz@pangea.org

Other information that may be useful to contact you:

Why is it you are interested in machine translation?

I live in Catalonia where there is two official languages, Catalan and Spanish. Therefore, documentation is always in either Catalan or Spanish or even in English for international purposes. I believe that Machine Translation Systems are key tools in order to improve the communication within and between Organizations in Catalonia and over the World.

Why is it that you are interested in the Apertium project?

I am really interested in Apertium because is an open-source platform for the purpose mentioned above. And I also like the democratic spirit of open-source projects I would be very excited to take the opportunity to collaborate on this kind of project.

Which of the published tasks are you interested in? What do you plan to do?

Title

Format filters (LaTeX)

Why Google and Apertium should sponsor it

It will make Apertium capable of dealing with at least another different format: files marked up with LaTeX.

Apertium uses its own format in order to translate documents, the Apertium stream format[1]. Apertium can currently deal with texts in RTF, HTML, DOCX, WXML, PPTX, XLSX, XpressTag and ODT format by means of a format definition file.

For example, in order to deal with HTML there is an application or scripts following a specification to de-formatter from HTML into Apertium stream format (Apertium-deshtml) and re-formatter again into HTML (Apertium-rehtml)[2].

The main idea of this proposal is to develop the same structure and rules for LaTeX documents creating the Apertium-deslatex and Apertium-relatex between the LaTeX format specification[3] and the Apertium stream format.

How and who it will benefit in society

Apertium can currently deal with many formats as mentioned before, however it cannot deal with LaTeX yet. LaTeX is widely used in the academic and the commercial world, and other professionals[4]. It would allow them to translate automatically all their documents through Apertium System.

Work plan

  • Week 1: Study of Apertium stream format.
  • Week 2: Analysis of existing de-formatter and re-formatter examples
  • Week 3: Specification of Format filter to be added (LaTeX)
  • Week 4: Basic integration of a new format filter into Apertium
  • Deliverable #1 Goal: Specification and analysis of the LaTeX format and its basic Apertium integration
  • Week 5: Integration of new format filter + testing
  • Week 6: Integration of new format filter + testing
  • Week 7: Integration of new format filter + testing
  • Week 8: Integration of new format filter + testing
  • Deliverable #2 Goal: Integration of full format filter specified previously
  • Week 9: Testing and revision
  • Week 10: Pre-release
  • Week 11: Documentation: User guide
  • Week 12: General documentation of the project
  • Project completed Goal: Realise of Apertium-deslatex and Apertium-relatex

List your skills and give evidence of your qualifications

I am Computer Scientist and Engineer by the Barcelona School of Informatics (www.fib.upc.edu). I also have a Postgraduate degree in Open-Source by the Technical University of Catalonia Foundation (www.fundacio.upc.edu)

Currently I am an student of Master in Business Administration by the IESE Business School in Barcelona. Previously I worked in an Open-Source company for two years. My aim doing the Master is to further develop my business administration skills in order to collaborate successfully to the Open-Source community from the private sector in the future.

I have strong background in Open-source projects. On one hand my final degree in the Barcelona School of Informatics was an Open-Source project which was awarded by the Catalonia Government and the Computer Science and Engineering Association of Catalonia. On the other hand I worked in an Open-Source company for two years where, among many other projects, we made the first migration of a Council to Open-Source in Catalonia. During these periods I gained excellent skills with Script Languages such as Perl or PHP and Web development. I also worked with LaTeX for academic purposes.

List any non-Summer-of-Code plans you have for the Summer

Currently I am unemployed and looking forward to collaborate with the Summer-of-Code project before the Master begins again this September. Therefore I have full availability to participate in this task of the GSOC.

References

[1 ] Apertium Project. Apertium stream formant. http://wiki.apertium.org/wiki/Apertium_stream_format

[2] Apertium Project. Format handling. http://wiki.apertium.org/wiki/Format_handling

[3] Wikipedia. LaTeX. http://en.wikipedia.org/wiki/LaTeX

[4] The Comprehensive TeX Archive Network (The CTAN team). What are TeX, LaTeX, and friends?. http://www.ctan.org/what_is_tex.html