From Apertium
Jump to navigation Jump to search


Lextor exists, but it is not turned on. It was found that using Lextor did not provide an improvement in translation quality above the 'baseline' of just choosing the most frequent or general translation. This is why it is turned off.


There are many approaches to lexical selection, we're studying them and hope to implement something in the future. Although we don't have anything concrete planned for now.

I think Felipe was thinking of something based on hierarchical decision lists, but he might be able to offer a more in depth reply.

Words and expressions[edit]

Since properly translated words and properly translated expressions are the key to translation quality, I think, exactly the modules, that handle word and expression recognition and translation are the key to final quality.

A human translator has for example the word coach. The word can mean trainer or a mean, in that we travel. If in the text context (sentence) where the coach appears, there are words, that indicate, the speech is about travelling (horses, motors, trains, way, and the like), it is likely, we are talking about travel, if the words indicate sport or working environment (training, human leading, success, and the like) we are talking about a trainer.

If it is not possible to find he meaning using just the sentence, the whole text must be considered. If all fails, than we must fall back to the statistically most likely meaning.

In fact the above is the only way, I can imagine, that works. Therefore I am really curious, how really lextor is intended to work, using real example words like coach and real example texts. The better things are documented, the more the chance, that we get a working solution. The wiki is a very good mean for documentation, and I suggest to add more throughout documentation for word disambiguation with understandable step-by-step examples.

It is possible, that lextor's way is viable, it just needs much larger corpora, that we used to give it for training. Maybe the corpus minimum size and coverage need to be specified.

You write "run all possible disambiguations". What does that mean in the case of coach? Take texts where coach is a trainer, and remember all words, than take texts, where coach is a mean to travel and notice all words. Finally take the text to translate, and check all words in it (first just the sentence), to which version it is more similar?

However to the above approach needs aligned large corporas of both languages. And I can not imagine anything else, that works. Therefore my request to clarify such questions.