Difference between revisions of "Lexical selection in target language"

From Apertium
Jump to navigation Jump to search
Line 33: Line 33:


:<code>Language is everyone declaring clear be the capacity to get education through the medium Welsh after increase constantly in the last years , and be your government wishing to promote that objective</code>
:<code>Language is everyone declaring clear be the capacity to get education through the medium Welsh after increase constantly in the last years , and be your government wishing to promote that objective</code>

Original:

:<code>Language is everyone declaring clear be the capacity to get education through the medium Welsh after increase constant in the last years , and be your government wishing to promote that objective</code>

A minor improvement, but something that could be improved with more work.


[[Category:Development]]
[[Category:Development]]

Revision as of 12:13, 30 October 2008

With apertium-multiple-translations it is possible to get an ambiguous text output from transfer, it comes in the following form:

Language is everyone declaring clear be the capacity to get education through the medium Welsh after increase {constantly|steadily|constant|steady} in the {recent|last} years , and be your government wishing to promote that objective

While this is useful as-is, one of the things we can do with it is try to do a ranking based on a target language model. Each of the options given is kind of ok, but some sound more fluent than others. So... first you calculate your very basic n-gram model, for example of [1-5] grams over a corpus. It might look something like this:

$ cat test.ngrams | head
3086,1,last
1157,2,the last
1128,1,recent
703,1,recently
501,2,last year
301,2,in recent
277,2,recent years
250,2,the recent
231,1,constantly
225,3,in the last

Then you run the ambiguous text through a ranker, which works on a window of ambiguity:

231.0 Language is everyone declaring clear be the capacity to get education through the medium Welsh after increase constantly
30.0 Language is everyone declaring clear be the capacity to get education through the medium Welsh after increase steadily
177.0 Language is everyone declaring clear be the capacity to get education through the medium Welsh after increase constant
31.0 Language is everyone declaring clear be the capacity to get education through the medium Welsh after increase steady
1703.0  in the recent
6075.0  in the last

At each stage you choose the most likely and construct the final sentence as you go along:

Language is everyone declaring clear be the capacity to get education through the medium Welsh after increase constantly in the last years , and be your government wishing to promote that objective

Original:

Language is everyone declaring clear be the capacity to get education through the medium Welsh after increase constant in the last years , and be your government wishing to promote that objective

A minor improvement, but something that could be improved with more work.