User:Jimregan/Luis' email - English

From Apertium
Jump to navigation Jump to search

Hello,

I write this mail to all the students that have shown an interest in developing a post-edition tool and have already sent me a draft proposal for the project. The idea of this mail is to develop some of the ideas that I have in mind for the project a bit more, to both serve as a source of ideas and to help you to elaborate a bit more your proposal.

I have divided it into three parts: general description, resources available for integration, and articles to consult.

If you have any doubts, send me a mail. I will try to be connected as much in the morning as in the afternoon whenever possible, to follow work on your proposals. When we have something more concrete would suit that we sent it to the list that facilitated you previously together with a mini presentation so that the other mentors can give us feedback on the proposals.

Spirit, Luis

ps: Bear in mind that this has been sent to all of you and that a cut&paste would end up in several proposals with the same paragraphs so the his would be *digerirlos and from them elaborate your own proposal on these same functionalities and resources or on others that occur you.

1) General description

- Previous proposals to develop a tool of post-edition for Apertium have centred in encoding linguistic information of static way and integrate it of automatic way in the pipeline of Apertium without giving the possibility to the user to act live on the application of these rules. In this project, propose the integration in the pipeline of Apertium of a graphic interface of semi-automatic post-edition that take part on the translation without formatting (look the documentation of Apertium on the wiki: 3.6 de-formatter and re-formatter) and that make possible a human interaction in real time where the system present him to the user useful linguistic information for the pertinent postedition of different sources, configurable by the user. This tool will have to contemplate the possibility of being inhibited in a dynamic way, so that it directly obtains the translation that the engine offers, without human post-edition.

- The central idea is, as we have said, to make possible the integration of useful linguistic resources for the post-edition of the Apertium's translations. Apertium already incorporates the possibility to use translation memories as a step prior to the automatic translation, therefore, our project will not consist of a typical post-edition tool that integrates the use of translation memories but that will consist of the integration of a group of linguistic resources for some pre-determined languages and which will be a platform that allow to the user to integrate his own resources and, for example, his reference dictionaries. For this the user will have to provide the information of, for example, how to access the definition of a word in his reference dictionary. From this base, and to complete the tool we can focus on three languages: en, es and ca. For each of these three tongues will identify (collaborating with the language technicians of the Servei Lingüístic of the UOC) a group of linguistic resources to integrate.

- The SL of the UOC translate every year around *X pages of text in Spanish, Catalan and English. To address these translations, the technicians of tongue use the exit of Apertium and draft of translation and the *post-edit to arrive to the final translation. In this process, use a series of linguistic resources (dictionaries, linguistic guides, *corpus of query, etc.) For each tongue that, added to the experience of the technician of tongue, contribute the necessary information to round the draft that offers the system. With regard to the usability of a post-edition tool, the technicians of the *SL also will play an important paper when specifying which peculiarities would have to have the tool (for example, if it results useful to offer a view in which can simultaneously visualise the original document and the draft of translation that works ).


2) available and susceptible Resources to be integrated:

- Spell checkers: the use of a spell checker on the exit of Apertium can seem a priori a without sense since the entrances of the dictionary do not contain orthographical errors. However, the spell checker can apply on the unknown words of the source language so that we detect typos and can suggest words in the source language and its potential translations in the destination language. For example, if we translate of the Spanish to the English the sentence: "We wave the white bath from the beginning", will obtain the following translation "We waved the white *banera from the beginning" where obtain the word '*banera' marked with an asterisk like stranger in Spanish. If we applied a spell checker (that it implement distances of edition between words and the consistent suggestions) on the text in Spanish will obtain a suggestion that will indicate us that possibly the word that the user wanted to use was "flag". This information can integrate it with the dictionaries of *apertium (or, of not finding the word there, with the service of translation of Google) so that the tool of postedition offer in the text of exit "*flag" like alternative to "**banera" and that it was the user the one who validate said replacement.

- Grammar checkers: integration of LanguageTool

- On-line dictionaries: RAE, DIEC, merrian-webster,.... (Here the SL technicians will have a lot to say). One of the direct applications of the dictionaries is the resolution of ambiguities. Apertium offers the possibility to use a method of operation where, for an ambiguous word in the source text, mark the distinct alternatives of translation in the output text. Present the definition of each one of the translation alternatives in a dynamic and non-intrusive way (for example, when the mouse pointer hovers over each of them) can accelerate the process of lexical selection that the user has to carry out.

- Consultable corpus on-line: today there are many available resources on-line that allow their corpora to be queried. So that can consult the uses of a word or expression in reference corpora. An example of these services is SpringerOnLine

- Use of the experience of the technicians of the SL to correct errors in linguistic phenomena which Apertium inevitably makes. In this sense the linguistic guides of the SL of the UOC for Catalan and Spanish are also some valuable resources of where extract linguistic information for integration.

- Apertium-view/viewer/tolk: Within the Apertium platform, there are a number of applications integrated in the pipeline of the engine that offer a visualisation of the information that the system handles. Apertium-view, for example, can represent a base to develop the surroundings of the post-edition tool.

3) Articles to consult on postedition of the exit of an automatic system:

- Tutorial on MT post-editing: http://www.mt-it file.info/MTS-2009-OBrien-ppt.Pdf
- What is MT post-editing?: http://www.box.net/shared/dgfec2tmf5 / http://www.box.net/shared/s1xhg3eioy
- Article on utility of MT post-editing: http://accurapid.com/journal/42mt.htm