Evaluating with Wikipedia

From Apertium
Revision as of 11:36, 24 March 2012 by Bech (talk | contribs) (Category:Documentation in English)
Jump to navigation Jump to search

One of the ways of improving your MT system, and at the same time improve and add content in Wikipedias is to use Wikipedias as a test bed. You can translate text from one Wikipedia to another, then either post-edit yourself, or wait for, or ask other people to post-edit the text. One of the nice things is that MediaWiki (the software Wikipedia is based on) allows you to view diffs between the versions (see the 'history' tab).

This strategy is beneficial both to Wikipedia and to Apertium. Wikipedia gets new articles in languages which might not otherwise have them, and Apertium gets information on how we can improve the software. It is important to note that Wikipedia is a community effort, and that rightly people can be concerned about machine translation. To get an idea of this, put yourself in the place of people having to fix a lot of "hit and run" SYSTRAN translations, with little time and not much patience.

Guidelines

  • Don't just start translating texts and waiting for people to fix them. The first thing you should do, is create an account on the Wikipedia, and then find the "Community notice board". Ask there how regular contributors would feel about you using the Wikipedia for tests. The community notice board should be linked from the front page. It might be called something like "La tavèrna" in Occitan, or "Geselshoekie" in Afrikaans. When you are asking them, make the following clear:
  • This is free software / open source machine translation.
  • You would like to help the community and are doing these translations both to help their Wikipedia expand the range of articles, and to improve the translation software.
  • The translations will be added only with the consent of the community, you do not intend to flood them with poorly translated articles.
  • The translations will be added by a human not by a bot.
  • Ask them if there are any subjects that they prefer you would cover, perhaps they have a page of "requested translations".
  • One way of looking at it might be as a non-native speaker of the language trying to learn the language. Point out that the initial translation will be done by machine, then you will try and fix the translation, but anything that you don't fix you would be grateful for other people to fix.

An example of the kind of conversation you might have is found here.

How to translate

In order to be more useful, when you create the page, first paste in the uneditted machine translation output. Save the page with an edit summary saying that you're still working on it. Then proceed to post-edit the output. After you've finished, save the page again. If you go to the history tab at the top of the page and do "Compare selected versions" you will see the differences (diff) between the machine translation and the post-editted output. This gives a good indication of how good the original Apertium output was.

Current collaborations

If you´d like to know more about contributing to Wikipedia with Apertium, you can ask people below: