User:Jimregan/Luis' email - English

From Apertium
< User:Jimregan
Revision as of 10:40, 3 April 2010 by Jimregan (talk | contribs) (oops. Just the part for transation)
Jump to navigation Jump to search

Hello,

I write you this post to all the students that *teneis interest in developing the tool of *post-edition and that already me *habeis sent a first proposal on the project. The idea of this post is to develop a bit more some of the ideas that have in mind for the project so that it serve you of inspiration and *podais elaborate a bit more your proposal.

I have divided it in three parts: general description, available resources to integrate and articles to consult.

If *teneis any doubt, send me a post. *Procuraré Connect me so much in the morning as in the afternoon of intermittent way to follow working in your proposals. When we have something more enclosed would suit that we sent it to the list that facilitated you previously together with a *mini presentation so that other mentors give us *feedback on the proposals.

Spirit, Luis

  • ps: Take into account that this you it *envio to all and that the *cut&*paste would end in several proposals with the same paragraphs so the his would be *digerirlos and from them elaborate your own proposal on these same functionalities and resources or on others that occur you .

1) general Description

- Previous proposals to develop a tool of *post-edition for Apertium have centred in encoding linguistic information of static way and integrate it of automatic way in the *pipeline of Apertium without giving the possibility to the user to act live on the application of these rules. In this project, propose the integration in the *pipeline of Apertium of a *interfície *gràfica of *post-semi-automatic edition that take part on the translation without *formatear (look the documentation of Apertium that there is in the *wiki: 3.6 of-*formatter and *re-*formatter) and that *posibilite a human interaction in real time where the system present him to the user useful linguistic information for the *postedición pertinent of different sources *configurables by the user. This tool will have to contemplate the possibility to be inhibited of dynamic way, so that it obtain directly the translation that offers the engine, without *post-human edition.

- The central idea is, as we have said, *posibilitar the integration of useful linguistic resources for the *post-edition of the translations of Apertium. Apertium already incorporates the possibility to use memories of translation like previous step to the automatic translation, therefore, our project will not consist in the typical tool of *post-edition that integrates the use of memories of translation but that will consist in the integration of a group of linguistic resources for some determinate tongues and whose result will be a platform that allow to the user integrate his own resources and for example, his dictionaries of reference. For this the user will have to provide the information of, for example, how access to the definition of a word in his dictionary of reference. From this base, and to complete the tool can centre us in three tongues: in, *sp and *ca. For each one of these three tongues will identify (collaborating with the technicians of tongue of the *Servei *Lingüístic of the *UOC) a group of linguistic resources to integrate.

- The *SL of the *UOC translates every year around *X *pàgs of text in Spanish, Catalan and English. To address these translations, the technicians of tongue use the exit of Apertium and draft of translation and the *post-edit to arrive to the final translation. In this process, use a series of linguistic resources (dictionaries, linguistic guides, *corpus of query, etc.) For each tongue that, added to the experience of the technician of tongue, contribute the necessary information to round the draft that offers the system. With regard to the *usabilidad of a tool of *post-edition, the technicians of the *SL also will play an important paper when specifying which peculiarities would have to have the tool (for example, if it results useful to offer a sight in which can visualise *simultàneamente the original document and the draft of translation that works ).


2) available and susceptible Resources to be integrated:

- *Correctores Orthographical: the use of a *corrector orthographical on the exit of Apertium can seem a priori a without sense since the entrances of the dictionary do not contain orthographical errors. However, the *corrector orthographical can apply on the unknown words of the tongue origin so that we detect errors *tipográficos and can suggest words in the tongue origin and his potential translations in the tongue destination. For example, if we translate of the Spanish to the English the sentence: "We wave the *banera white from the beginning", will obtain the following translation "*We *waved *the *white **banera *from *the *beginning" where obtain the word '*banera' marked with an asterisk like stranger in Spanish. If we applied a *corrector orthographical (that it implement distances of edition between words and the consistent suggestions) on the text in Spanish will obtain a suggestion that will indicate us that possibly the word that the user wanted to use was "flag". This *informacón can integrate it with the dictionaries of *apertium (or, of not finding the word there, with the service of translation of Google) so that the tool of *postedición offer in the text of exit "*flag" like alternative to "**banera" and that it was the user the one who validate said replacement.

- *Correctores Grammatical: integration of *LanguageTool

- Dictionaries on line: it SCRAPES, *DIEC, *merrian-*webster,.... (Here the technicians of the *SL will have a lot to say). One of the direct applications of the dictionaries is the resolution of the ambiguity. Apertium offers the possibility to use a way of operation where from an ambiguous word in the text origin, mark the distinct alternatives of translation in the text of exit. Present the definition of each one of the alternatives of translation of a dynamic way and no *intrusiva (for example, when happening the pointer of the mouse on each one of them) can accelerate the process of lexical selection that the user has to realise.

- *Corpus *consultables In *linea: today day exist crowd of available resources in *linea that offer the query of his *corpus textual. So that can consult the uses of a word or expression in *corpora of reference. An example of these services is *SpringerOnLine

- Use of the experience of the technicians of the *SL to correct errors in linguistic phenomena in which Apertium incurs inevitably. In this sense the linguistic guides of the *SL of the *UOC for Catalan and Spanish also are some valuable resources of where extract linguistic information for his integration.

- Apertium-*view/*viewer/*tolk: Inside the platform of Apertium, exist diverse *aplicativos integrated in the *pipeline of the engine and that offer a visualisation of the information that handles the system. Apertium-*view, for example, can represent a base to develop the surroundings of the tool of *post-edition.

3) Articles to consult on *postedición of the exit of an automatic system:

- *Tutorial On *MT *post-*editing: http://www.mt-it file.*info/*MTS-2009-*OBrien-*ppt.Pdf - *What *is *MT *post-*editing?: http://www.box.net/shared/dgfec2tmf5 / http://www.box.net/shared/s1xhg3eioy - Article on utility of *MT *post-*editing: http://accurapid.com/journal/42mt.htm