Difference between revisions of "User:Francis Tyers/Experiments"
Jump to navigation
Jump to search
(22 intermediate revisions by the same user not shown) | |||
Line 215: | Line 215: | ||
====BLEU==== |
====BLEU==== |
||
==MaxEnt |
==MaxEnt== |
||
===With alignments=== |
|||
{|class=wikitable |
{|class=wikitable |
||
Line 224: | Line 226: | ||
| mk-en || 19.9 || 19.8 || 18.9 || 17.8 |
| mk-en || 19.9 || 19.8 || 18.9 || 17.8 |
||
|- |
|- |
||
| eu-es || 18. |
| eu-es || 18.5 || 17.9 || 17.4 || 19.9 |
||
|- |
|- |
||
| en-es || 8.6 || 7.0 || 6.3 || 6.3 |
| en-es || 8.6 || 7.0 || 6.3 || 6.3 |
||
Line 230: | Line 232: | ||
|} |
|} |
||
== |
===With fractional counts=== |
||
{|class=wikitable |
|||
! Pair || alig || rule-best || ME (>5) || ME (>3) || ME (>1) || ME (>0) |
|||
|- |
|||
| br-fr || 43.4 || 43.1 || 61.9 || 46.2 || 48.2 || 49.9 |
|||
|- |
|||
| mk-en || 29.5 || || || || |
|||
|- |
|||
| eu-es || 41.2 || || 43.9 || 44.4 || |
|||
|- |
|||
| en-es || 11.9 || 11.7 || 11.4 || 11.9 || |
|||
|- |
|||
|} |
|||
==Notes== |
==Notes== |
Latest revision as of 10:49, 22 November 2012
TODO[edit]
Do LER in/out domain testing for the en-es setup with news commentary.Do BLEU in/out domain testing for the en-es setup with news commentary.mk-en: why is TLM LER/BLEU so much better ?- (partial) answer: 0-context rules (e.g. defaults) not applying properly. Fixed by running in series. This "solves" the LER issue.
- (partial) answer: preposition selection is much better. We could try running with ling-default preps.
- Do pairwise bootstrap resampling for each of best baseline + best rules
- (done) for parallel
why do eu-es rules not improve over freq ?- (partial) answer: some rules do not apply because of tag wankery. See line #129774 in the test corpus. Need to define better how tags work. Perhaps only include tags where ambiguous ?
why do breton numbers for monolingual rules not approach TLM ?- because of crispiness being too low.
why when we add more data, do the results get worse ?- because of crispiness being too low.
- rerun the mk-en stuff with frac counts.
- run br-fr test with huge data.
- try decreasing the C with corpus size.
Corpus stats[edit]
Pair | Corpus | Lines | W. (src) | SL cov. | Extracted | Extracted (%) | L. (train) | L. (test) | L (dev) | Uniq. tokens >1 trad. | Avg. trad / word | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
br-fr | oab | 57,305 | 702,328 | 94.47% | 4,668 | 8.32 | 2,668 | 1,000 | 1,000 | 603 | 1.07 | |
en-es | europarl | 1,467,708 | 30,154,098 | 98.08% | 312,162 | 22.18 | 310,162 | 1,000 | 1,000 | 2,082 | 1.08 | |
eu-es | opendata.euskadi.net | 765,115 | 10,190,079 | 91.70% | 87,907 | 11.48 | 85,907 | 1,000 | 1,000 | 1,806 | 1.30 | |
mk-en | setimes | 190,493 | 4,259,338 | 92.17% | 19,747 | 10.94 | 17,747 | 1,000 | 1,000 | 13,134 | 1.86 | |
sh-mk | setimes |
Evaluation corpus[edit]
Out of domain[edit]
Pair | Lines | Words (L1) | Words (L2) | Ambig. tokens | Ambig. types | Ambig token/type | % ambig | Av. trad/word |
---|---|---|---|---|---|---|---|---|
en-es | 434 | 9,463 | 10,280 | 619 | 303 | 2.04 | 6.54% | - |
In domain[edit]
Pair | Lines | Words (L1) | Words (L2) | Ambig. tokens | Ambig. types | Ambig token/type | % ambig | Av. trad/word |
---|---|---|---|---|---|---|---|---|
br-fr | 1,000 | 13,854 | 13,878 | 1,163 | 372 | 3.13 | 8.39% | - |
en-es | 1,000 | 19,882 | 20,944 | 1,469 | 337 | 4.35 | 7.38% | - |
eu-es | 1,000 | 7,967 | 11,476 | 1,360 | 412 | 3.30 | 17.07% | - |
mk-en | 1,000 | 13,441 | 14,228 | 3,872 | 1,289 | 3.00 | 28.80% | - |
- % ambig = number of SL tokens with >1 translation
EAMT-style results[edit]
Out of domain[edit]
LER[edit]
Pair | freq | tlm | ling | alig | rules (c>1.5) |
rules (c>2.0) |
rules (c>2.5) |
rules (c>3.0) |
rules (c>3.5) |
rules (c>4.0) |
---|---|---|---|---|---|---|---|---|---|---|
en-es | — [44.5, 52.0] |
— [34.7, 41.9] |
667 [24.7, 31.9] |
630 [ 21.4 , 28.4 ] |
2881 [20.2, 27.2] |
2728 [20.2, 27.2] |
1683 [20.7, 27.6] |
1578 [20.7, 27.6] |
1242 [20.7, 27.6] |
1197 [20.7, 27.6] |
BLEU[edit]
Pair | freq | tlm | ling | alig | rules (c>1.5) |
rules (c>2.0) |
rules (c>2.5) |
rules (c>3.0) |
rules (c>3.5) |
rules (c>4.0) |
---|---|---|---|---|---|---|---|---|---|---|
en-es | [0.1885, 0.2133] | [0.1953, 0.2201] | [0.1832, 0.2067] | [0.1832, 0.2067] | [0.1831, 0.2067] | [0.1830, 0.2067] | [ [0.1828, 0.2063] | [0.1828, 0.2063] | [0.1828, 0.2063] |
In domain[edit]
LER[edit]
is the "crispiness" ratio, the amount of times an alternative translation is seen in a given context compared to the default translation. So, a of 2.0 means that the translation appears twice as frequently as the default.
Pair | freq | tlm | ling | alig | rules (c>1.5) |
rules (c>2.0) |
rules (c>2.5) |
rules (c>3.0) |
rules (c>3.5) |
rules (c>4.0) |
---|---|---|---|---|---|---|---|---|---|---|
br-fr | — [58.9, 64.8] |
— [44.2, 50.5] |
168 [54.8, 60.7] |
115 [28.5, 34.1] |
221 [27.8, 33.3] |
213 [27.6, 33.0] |
159 [26.3, 31.8] |
150 [26.1, 31.6] |
135 [27.2, 32.8] |
135 [27.2, 32.8] |
en-es | — [21.0, 25.3] |
— [15.1, 18.9] |
667 [20.7, 25.1 |
630 [7.2, 10.0] |
2881 [5.9, 8.6] |
2728 [6.0, 8.6] |
1683 [5.7, 8.3] |
1578 [5.7, 8.3] |
1242 [6.0, 8.5] |
1197 [5.9, 8.6] |
eu-es | — [41.1, 46.6] |
— [38.8, 44.2] |
697 [47.8, 53.0] |
598 [16.5, 20.8] |
2253 [20.2, 24.7] |
2088 [17.2, 21.7] |
1382 [16.8, 21.0] |
1266 [16.1, 20.4] |
1022 [15.9, 20.2] |
995 [16.0, 20.3] |
mk-en | — [42.4, 46.3] |
— [27.1, 30.8] |
1385 [28.8, 32.6] |
1079 [19.0, 22.2] |
1684 [18.5, 21.5] |
1635 [18.6, 21.6] |
1323 [19.1, 22.2] |
1271 [19.0, 22.0] |
1198 [19.1, 22.1] |
1079 [19.1, 22.1] |
BLEU[edit]
Pair | freq | tlm | ling | alig | rules (c>1.5) |
rules (c>2.0) |
rules (c>2.5) |
rules (c>3.0) |
rules (c>3.5) |
rules (c>4.0) |
---|---|---|---|---|---|---|---|---|---|---|
br-fr | — [0.1247, 0.1420] |
— [0.1397, 0.1572] |
168 [0.1325, 0.1503] |
115 [0.1344, 0.1526] |
221 [0.1367, 0.1551] |
213 [0.1367, 0.1549] |
159 [0.1374, 0.1554] |
150 [0.1364, 0.1543] |
135 [0.1352, 0.1535] |
135 [0.1352, 0.1535] |
en-es | — [0.2151, 0.2340] |
— [0.2197, 0.2384] |
667 [0.2148, 0.2337] |
630 [0.2208, 0.2398] |
2881 [0.2217, 0.2405] |
2728 [0.2217, 0.2406] |
1683 [0.2217, 0.2407] |
1578 [0.2217, 0.2407] |
1242 [0.2217, 0.2407] |
1197 [0.2217, 0.2408] |
eu-es | — [0.0873, 0.1038] |
— [0.0921, 0.1093] |
697 [0.0870, 0.1030] |
598 [0.0972, 0.1149] |
2253 [0.0965, 0.1142] |
2088 [0.0971, 0.1147] |
1382 [0.0971, 0.1148] |
1266 [0.0971, 0.1148] |
1022 [0.0973, 0.1150] |
995 [0.0973, 0.1150] |
mk-en | — [0.2300, 0.2511] |
— [0.2976, 0.3230] |
1385 [0.2337, 0.2563] |
1079 [0.2829, 0.3064] |
1684 [0.2838, 0.3071] |
1635 [0.2834, 0.3067] |
1323 [0.2825, 0.3058] |
1271 [0.2827, 0.3059] |
1198 [0.2827, 0.3059] |
1079 |
Learning monolingually (winner-takes-all)[edit]
Setup:
- SL side of the training corpus
- All possibilities translated and scored
- Absolute winners taken
- Rules generated by counting ngrams in the same way as with the parallel corpus, only no alignment needed as it works like an annotated corpus.
Out of domain[edit]
LER[edit]
Pair | freq | tlm | ling | alig | rules (c>1.5) |
rules (c>2.0) |
rules (c>2.5) |
rules (c>3.0) |
rules (c>3.5) |
rules (c>4.0) |
---|---|---|---|---|---|---|---|---|---|---|
en-es | [44.5, 52.0] | [34.7, 41.9] | [24.7, 31.9] | [30.2, 37.9] | [30.2, 37.9] | [29.2, 37.0] | [29.3, 36.8] | [29.0, 36.4] | [29.1, 36.5] |
BLEU[edit]
Pair | freq | tlm | ling | alig | rules (c>1.5) |
rules (c>2.0) |
rules (c>2.5) |
rules (c>3.0) |
rules (c>3.5) |
rules (c>4.0) |
---|---|---|---|---|---|---|---|---|---|---|
en-es | [0.1885, 0.2133] | [0.1953, 0.2201] | [0.1832, 0.2067] | [0.1806, 0.2042] | [0.1806, 0.2042] | [0.1808, 0.2043] | [0.1810, 0.2046] | [0.1809, 0.2045] | [0.1809, 0.2045] |
In domain[edit]
LER[edit]
Pair | freq | tlm | ling | alig | rules (c>1.5) |
rules (c>2.0) |
rules (c>2.5) |
rules (c>3.0) |
rules (c>3.5) |
rules (c>4.0) |
---|---|---|---|---|---|---|---|---|---|---|
br-fr | — [58.9, 64.8] |
— [44.2, 50.5] |
168 [54.8, 60.7] |
115 |
261 [53.5, 59.2] |
247 [52.1, 58.2] |
172 [54.3, 60.2] |
165 [52.7, 58.4] |
138 [50.5, 56.3] |
136 [50.6, 56.6] |
en-es | — [21.0, 25.3] |
— [15.1, 18.9] |
667 [20.7, 25.1] |
? ? |
2595 [15.0, 19.0] |
2436 [15.1, 19.1] |
1520 [13.7, 17.6] |
1402 [13.6, 17.3] |
1065 [13.9, 17.7] |
1024 [13.9, 17.8] |
eu-es | — [41.1, 46.6] |
— [38.8, 44.2] |
? [47.8, 53.0] |
? |
2631 [40.9, 46.4] |
2427 [40.9, 46.5] |
1186 [40.7, 46.1] |
1025 [40.7, 46.2] |
685 [40.5, 45.9] |
641 [40.5, 45.9] |
mk-en | — [42.4, 46.3] |
— [27.1, 30.8] |
1385 [28.8, 32.6] |
? |
1698 [27.8, 31.5] |
1662 [27.8, 31.4] |
1321 [27.8, 31.4] |
1285 [27.8, 31.4] |
1186 [27.7, 31.4] |
1180 [27.7, 31.4] |
BLEU[edit]
Pair | freq | tlm | ling | alig | rules (c>1.5) |
rules (c>2.0) |
rules (c>2.5) |
rules (c>3.0) |
rules (c>3.5) |
rules (c>4.0) |
---|---|---|---|---|---|---|---|---|---|---|
br-fr | — [0.1247, 0.1420] |
— [0.1397, 0.1572] |
168 [0.1325, 0.1503] |
115 |
261 [0.1250, 0.1425] |
247 [0.1252, 0.1429] |
172 [0.1240, 0.1412] |
165 [0.1243, 0.1416] |
138 [0.1255, 0.1429] |
136 [0.1255, 0.1429] |
en-es | — [0.2151, 0.2340] |
— [0.2197, 0.2384] |
667 [0.2148, 0.2337] |
? |
2595 [0.2180, 0.2371] |
2436 [0.2180, 0.2372] |
1520 [0.2190, 0.2380] |
1402 [0.2190, 0.2381] |
1065 [0.2189, 0.2380] |
1024 [0.2189, 0.2380] |
eu-es | — [0.0873, 0.1038] |
— [0.0921, 0.1093] |
? [0.0870, 0.1030] |
? |
2631 [0.0875, 0.1040] |
2427 [0.0878, 0.1042] |
1186 [0.0878, 0.1043] |
1025 [0.0878, 0.1043] |
685 [0.0879, 0.1043] |
641 [0.0879, 0.1043] |
mk-en | — [0.2300, 0.2511] |
— [0.2976, 0.3230] |
1385 [0.2567, 0.2798] |
1698 [0.2694, 0.2930] |
1662 [0.2695, 0.2931] |
1321 [0.2696, 0.2935] |
1285 [0.2696, 0.2935] |
1186 [0.2696, 0.2934] |
1180 [0.2696, 0.2934] |
Learning monolingually (fractional counts)[edit]
Setup:
- SL side of the training corpus
- All possibilities translated and scored
- Probabilities normalised into fractional counts (e.g. add them up to get a total, then divide each prob by the total).
- log prob converted into normal prob using exp10()
- Rules generated by counting fractions from the translated file.
In domain[edit]
LER[edit]
BLEU[edit]
Out of domain[edit]
LER[edit]
BLEU[edit]
MaxEnt[edit]
With alignments[edit]
Pair | alig | rule-best | ME (>5) | ME (>3) |
---|---|---|---|---|
br-fr | 33.4 | 31.5 | 31.8 | 29.9 |
mk-en | 19.9 | 19.8 | 18.9 | 17.8 |
eu-es | 18.5 | 17.9 | 17.4 | 19.9 |
en-es | 8.6 | 7.0 | 6.3 | 6.3 |
With fractional counts[edit]
Pair | alig | rule-best | ME (>5) | ME (>3) | ME (>1) | ME (>0) |
---|---|---|---|---|---|---|
br-fr | 43.4 | 43.1 | 61.9 | 46.2 | 48.2 | 49.9 |
mk-en | 29.5 | |||||
eu-es | 41.2 | 43.9 | 44.4 | |||
en-es | 11.9 | 11.7 | 11.4 | 11.9 |
Notes[edit]
- Experiments in Domain Adaptation for Statistical Machine Translation
- Semi-supervised model adaptation for statistical machine translation
- Domain Adaptation for Statistical Machine Translation with Monolingual Resources
- "We found that the largest gain (25% relative) is achieved when in-domain data are available for the target language. A smaller performance improvement is still observed (5% relative) if source adaptation data are available. We also observed that the most important role is played by the LM adaptation, while the adaptation of the TM and RM gives consistent but small improvement."