User:Francis Tyers/Experiments
Jump to navigation
Jump to search
TODO
Do LER in/out domain testing for the en-es setup with news commentary.Do BLEU in/out domain testing for the en-es setup with news commentary.mk-en: why is TLM LER/BLEU so much better ?- (partial) answer: 0-context rules (e.g. defaults) not applying properly. Fixed by running in series. This "solves" the LER issue.
- (partial) answer: preposition selection is much better. We could try running with ling-default preps.
- Do pairwise bootstrap resampling for each of best baseline + best rules
- (done) for parallel
why do eu-es rules not improve over freq ?- (partial) answer: some rules do not apply because of tag wankery. See line #129774 in the test corpus. Need to define better how tags work. Perhaps only include tags where ambiguous ?
why do breton numbers for monolingual rules not approach TLM ?- because of crispiness being too low.
why when we add more data, do the results get worse ?- because of crispiness being too low.
- rerun the mk-en stuff with frac counts.
- run br-fr test with huge data.
- try decreasing the C with corpus size.
Corpus stats
Pair | Corpus | Lines | W. (src) | SL cov. | Extracted | Extracted (%) | L. (train) | L. (test) | L (dev) | Uniq. tokens >1 trad. | Avg. trad / word | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
br-fr | oab | 57,305 | 702,328 | 94.47% | 4,668 | 8.32 | 2,668 | 1,000 | 1,000 | 603 | 1.07 | |
en-es | europarl | 1,467,708 | 30,154,098 | 98.08% | 312,162 | 22.18 | 310,162 | 1,000 | 1,000 | 2,082 | 1.08 | |
eu-es | opendata.euskadi.net | 765,115 | 10,190,079 | 91.70% | 87,907 | 11.48 | 85,907 | 1,000 | 1,000 | 1,806 | 1.30 | |
mk-en | setimes | 190,493 | 4,259,338 | 92.17% | 19,747 | 10.94 | 17,747 | 1,000 | 1,000 | 13,134 | 1.86 | |
sh-mk | setimes |
Evaluation corpus
Out of domain
Pair | Lines | Words (L1) | Words (L2) | Ambig. tokens | Ambig. types | Ambig token/type | % ambig | Av. trad/word |
---|---|---|---|---|---|---|---|---|
en-es | 434 | 9,463 | 10,280 | 619 | 303 | 2.04 | 6.54% | - |
In domain
Pair | Lines | Words (L1) | Words (L2) | Ambig. tokens | Ambig. types | Ambig token/type | % ambig | Av. trad/word |
---|---|---|---|---|---|---|---|---|
br-fr | 1,000 | 13,854 | 13,878 | 1,163 | 372 | 3.13 | 8.39% | - |
en-es | 1,000 | 19,882 | 20,944 | 1,469 | 337 | 4.35 | 7.38% | - |
eu-es | 1,000 | 7,967 | 11,476 | 1,360 | 412 | 3.30 | 17.07% | - |
mk-en | 1,000 | 13,441 | 14,228 | 3,872 | 1,289 | 3.00 | 28.80% | - |
- % ambig = number of SL tokens with >1 translation
EAMT-style results
Out of domain
LER
Pair | freq | tlm | ling | alig | rules (c>1.5) |
rules (c>2.0) |
rules (c>2.5) |
rules (c>3.0) |
rules (c>3.5) |
rules (c>4.0) |
---|---|---|---|---|---|---|---|---|---|---|
en-es | — [44.5, 52.0] |
— [34.7, 41.9] |
667 [24.7, 31.9] |
630 [ 21.4 , 28.4 ] |
2881 [20.2, 27.2] |
2728 [20.2, 27.2] |
1683 [20.7, 27.6] |
1578 [20.7, 27.6] |
1242 [20.7, 27.6] |
1197 [20.7, 27.6] |
BLEU
Pair | freq | tlm | ling | alig | rules (c>1.5) |
rules (c>2.0) |
rules (c>2.5) |
rules (c>3.0) |
rules (c>3.5) |
rules (c>4.0) |
---|---|---|---|---|---|---|---|---|---|---|
en-es | [0.1885, 0.2133] | [0.1953, 0.2201] | [0.1832, 0.2067] | [0.1832, 0.2067] | [0.1831, 0.2067] | [0.1830, 0.2067] | [ [0.1828, 0.2063] | [0.1828, 0.2063] | [0.1828, 0.2063] |
In domain
LER
is the "crispiness" ratio, the amount of times an alternative translation is seen in a given context compared to the default translation. So, a of 2.0 means that the translation appears twice as frequently as the default.
Pair | freq | tlm | ling | alig | rules (c>1.5) |
rules (c>2.0) |
rules (c>2.5) |
rules (c>3.0) |
rules (c>3.5) |
rules (c>4.0) |
---|---|---|---|---|---|---|---|---|---|---|
br-fr | — [58.9, 64.8] |
— [44.2, 50.5] |
168 [54.8, 60.7] |
115 [28.5, 34.1] |
221 [27.8, 33.3] |
213 [27.6, 33.0] |
159 [26.3, 31.8] |
150 [26.1, 31.6] |
135 [27.2, 32.8] |
135 [27.2, 32.8] |
en-es | — [21.0, 25.3] |
— [15.1, 18.9] |
667 [20.7, 25.1 |
630 [7.2, 10.0] |
2881 [5.9, 8.6] |
2728 [6.0, 8.6] |
1683 [5.7, 8.3] |
1578 [5.7, 8.3] |
1242 [6.0, 8.5] |
1197 [5.9, 8.6] |
eu-es | — [41.1, 46.6] |
— [38.8, 44.2] |
697 [47.8, 53.0] |
598 [16.5, 20.8] |
2253 [20.2, 24.7] |
2088 [17.2, 21.7] |
1382 [16.8, 21.0] |
1266 [16.1, 20.4] |
1022 [15.9, 20.2] |
995 [16.0, 20.3] |
mk-en | — [42.4, 46.3] |
— [27.1, 30.8] |
1385 [28.8, 32.6] |
1079 [19.0, 22.2] |
1684 [18.5, 21.5] |
1635 [18.6, 21.6] |
1323 [19.1, 22.2] |
1271 [19.0, 22.0] |
1198 [19.1, 22.1] |
1079 [19.1, 22.1] |
BLEU
Pair | freq | tlm | ling | alig | rules (c>1.5) |
rules (c>2.0) |
rules (c>2.5) |
rules (c>3.0) |
rules (c>3.5) |
rules (c>4.0) |
---|---|---|---|---|---|---|---|---|---|---|
br-fr | — [0.1247, 0.1420] |
— [0.1397, 0.1572] |
168 [0.1325, 0.1503] |
115 [0.1344, 0.1526] |
221 [0.1367, 0.1551] |
213 [0.1367, 0.1549] |
159 [0.1374, 0.1554] |
150 [0.1364, 0.1543] |
135 [0.1352, 0.1535] |
135 [0.1352, 0.1535] |
en-es | — [0.2151, 0.2340] |
— [0.2197, 0.2384] |
667 [0.2148, 0.2337] |
630 [0.2208, 0.2398] |
2881 [0.2217, 0.2405] |
2728 [0.2217, 0.2406] |
1683 [0.2217, 0.2407] |
1578 [0.2217, 0.2407] |
1242 [0.2217, 0.2407] |
1197 [0.2217, 0.2408] |
eu-es | — [0.0873, 0.1038] |
— [0.0921, 0.1093] |
697 [0.0870, 0.1030] |
598 [0.0972, 0.1149] |
2253 [0.0965, 0.1142] |
2088 [0.0971, 0.1147] |
1382 [0.0971, 0.1148] |
1266 [0.0971, 0.1148] |
1022 [0.0973, 0.1150] |
995 [0.0973, 0.1150] |
mk-en | — [0.2300, 0.2511] |
— [0.2976, 0.3230] |
1385 [0.2337, 0.2563] |
1079 [0.2829, 0.3064] |
1684 [0.2838, 0.3071] |
1635 [0.2834, 0.3067] |
1323 [0.2825, 0.3058] |
1271 [0.2827, 0.3059] |
1198 [0.2827, 0.3059] |
1079 |
Learning monolingually (winner-takes-all)
Setup:
- SL side of the training corpus
- All possibilities translated and scored
- Absolute winners taken
- Rules generated by counting ngrams in the same way as with the parallel corpus, only no alignment needed as it works like an annotated corpus.
Out of domain
LER
Pair | freq | tlm | ling | alig | rules (c>1.5) |
rules (c>2.0) |
rules (c>2.5) |
rules (c>3.0) |
rules (c>3.5) |
rules (c>4.0) |
---|---|---|---|---|---|---|---|---|---|---|
en-es | [44.5, 52.0] | [34.7, 41.9] | [24.7, 31.9] | [30.2, 37.9] | [30.2, 37.9] | [29.2, 37.0] | [29.3, 36.8] | [29.0, 36.4] | [29.1, 36.5] |
BLEU
Pair | freq | tlm | ling | alig | rules (c>1.5) |
rules (c>2.0) |
rules (c>2.5) |
rules (c>3.0) |
rules (c>3.5) |
rules (c>4.0) |
---|---|---|---|---|---|---|---|---|---|---|
en-es | [0.1885, 0.2133] | [0.1953, 0.2201] | [0.1832, 0.2067] | [0.1806, 0.2042] | [0.1806, 0.2042] | [0.1808, 0.2043] | [0.1810, 0.2046] | [0.1809, 0.2045] | [0.1809, 0.2045] |
In domain
LER
Pair | freq | tlm | ling | alig | rules (c>1.5) |
rules (c>2.0) |
rules (c>2.5) |
rules (c>3.0) |
rules (c>3.5) |
rules (c>4.0) |
---|---|---|---|---|---|---|---|---|---|---|
br-fr | — [58.9, 64.8] |
— [44.2, 50.5] |
168 [54.8, 60.7] |
115 |
261 [53.5, 59.2] |
247 [52.1, 58.2] |
172 [54.3, 60.2] |
165 [52.7, 58.4] |
138 [50.5, 56.3] |
136 [50.6, 56.6] |
en-es | — [21.0, 25.3] |
— [15.1, 18.9] |
667 [20.7, 25.1] |
? ? |
2595 [15.0, 19.0] |
2436 [15.1, 19.1] |
1520 [13.7, 17.6] |
1402 [13.6, 17.3] |
1065 [13.9, 17.7] |
1024 [13.9, 17.8] |
eu-es | — [41.1, 46.6] |
— [38.8, 44.2] |
? [47.8, 53.0] |
? |
2631 [40.9, 46.4] |
2427 [40.9, 46.5] |
1186 [40.7, 46.1] |
1025 [40.7, 46.2] |
685 [40.5, 45.9] |
641 [40.5, 45.9] |
mk-en | — [42.4, 46.3] |
— [27.1, 30.8] |
1385 [28.8, 32.6] |
? |
1698 [27.8, 31.5] |
1662 [27.8, 31.4] |
1321 [27.8, 31.4] |
1285 [27.8, 31.4] |
1186 [27.7, 31.4] |
1180 [27.7, 31.4] |
BLEU
Pair | freq | tlm | ling | alig | rules (c>1.5) |
rules (c>2.0) |
rules (c>2.5) |
rules (c>3.0) |
rules (c>3.5) |
rules (c>4.0) |
---|---|---|---|---|---|---|---|---|---|---|
br-fr | — [0.1247, 0.1420] |
— [0.1397, 0.1572] |
168 [0.1325, 0.1503] |
115 |
261 [0.1250, 0.1425] |
247 [0.1252, 0.1429] |
172 [0.1240, 0.1412] |
165 [0.1243, 0.1416] |
138 [0.1255, 0.1429] |
136 [0.1255, 0.1429] |
en-es | — [0.2151, 0.2340] |
— [0.2197, 0.2384] |
667 [0.2148, 0.2337] |
? |
2595 [0.2180, 0.2371] |
2436 [0.2180, 0.2372] |
1520 [0.2190, 0.2380] |
1402 [0.2190, 0.2381] |
1065 [0.2189, 0.2380] |
1024 [0.2189, 0.2380] |
eu-es | — [0.0873, 0.1038] |
— [0.0921, 0.1093] |
? [0.0870, 0.1030] |
? |
2631 [0.0875, 0.1040] |
2427 [0.0878, 0.1042] |
1186 [0.0878, 0.1043] |
1025 [0.0878, 0.1043] |
685 [0.0879, 0.1043] |
641 [0.0879, 0.1043] |
mk-en | — [0.2300, 0.2511] |
— [0.2976, 0.3230] |
1385 [0.2567, 0.2798] |
1698 [0.2694, 0.2930] |
1662 [0.2695, 0.2931] |
1321 [0.2696, 0.2935] |
1285 [0.2696, 0.2935] |
1186 [0.2696, 0.2934] |
1180 [0.2696, 0.2934] |
Learning monolingually (fractional counts)
Setup:
- SL side of the training corpus
- All possibilities translated and scored
- Probabilities normalised into fractional counts (e.g. add them up to get a total, then divide each prob by the total).
- log prob converted into normal prob using exp10()
- Rules generated by counting fractions from the translated file.