User:Francis Tyers/Experiments

From Apertium
Jump to navigation Jump to search

TODO[edit]

  • Do LER in/out domain testing for the en-es setup with news commentary.
  • Do BLEU in/out domain testing for the en-es setup with news commentary.
  • mk-en: why is TLM LER/BLEU so much better ?
    • (partial) answer: 0-context rules (e.g. defaults) not applying properly. Fixed by running in series. This "solves" the LER issue.
    • (partial) answer: preposition selection is much better. We could try running with ling-default preps.
  • Do pairwise bootstrap resampling for each of best baseline + best rules
    • (done) for parallel
  • why do eu-es rules not improve over freq ?
    • (partial) answer: some rules do not apply because of tag wankery. See line #129774 in the test corpus. Need to define better how tags work. Perhaps only include tags where ambiguous ?
  • why do breton numbers for monolingual rules not approach TLM ?
    • because of crispiness being too low.
  • why when we add more data, do the results get worse ?
    • because of crispiness being too low.
  • rerun the mk-en stuff with frac counts.
  • run br-fr test with huge data.
  • try decreasing the C with corpus size.

Corpus stats[edit]

Pair Corpus Lines W. (src) SL cov. Extracted Extracted (%) L. (train) L. (test) L (dev) Uniq. tokens >1 trad. Avg. trad / word
br-fr oab 57,305 702,328 94.47% 4,668 8.32 2,668 1,000 1,000 603 1.07
en-es europarl 1,467,708 30,154,098 98.08% 312,162 22.18 310,162 1,000 1,000 2,082 1.08
eu-es opendata.euskadi.net 765,115 10,190,079 91.70% 87,907 11.48 85,907 1,000 1,000 1,806 1.30
mk-en setimes 190,493 4,259,338 92.17% 19,747 10.94 17,747 1,000 1,000 13,134 1.86
sh-mk setimes

Evaluation corpus[edit]

Out of domain[edit]

Pair Lines Words (L1) Words (L2) Ambig. tokens Ambig. types Ambig token/type % ambig Av. trad/word
en-es 434 9,463 10,280 619 303 2.04 6.54% -

In domain[edit]

Pair Lines Words (L1) Words (L2) Ambig. tokens Ambig. types Ambig token/type % ambig Av. trad/word
br-fr 1,000 13,854 13,878 1,163 372 3.13 8.39% -
en-es 1,000 19,882 20,944 1,469 337 4.35 7.38% -
eu-es 1,000 7,967 11,476 1,360 412 3.30 17.07% -
mk-en 1,000 13,441 14,228 3,872 1,289 3.00 28.80% -
  •  % ambig = number of SL tokens with >1 translation

EAMT-style results[edit]

Out of domain[edit]

LER[edit]

Pair freq tlm ling alig rules
(c>1.5)
rules
(c>2.0)
rules
(c>2.5)
rules
(c>3.0)
rules
(c>3.5)
rules
(c>4.0)
en-es
[44.5, 52.0]

[34.7, 41.9]
667
[24.7, 31.9]
630
[ 21.4 , 28.4 ]
2881
[20.2, 27.2]
2728
[20.2, 27.2]
1683
[20.7, 27.6]
1578
[20.7, 27.6]
1242
[20.7, 27.6]
1197
[20.7, 27.6]

BLEU[edit]

Pair freq tlm ling alig rules
(c>1.5)
rules
(c>2.0)
rules
(c>2.5)
rules
(c>3.0)
rules
(c>3.5)
rules
(c>4.0)
en-es [0.1885, 0.2133] [0.1953, 0.2201] [0.1832, 0.2067] [0.1832, 0.2067] [0.1831, 0.2067] [0.1830, 0.2067] [ [0.1828, 0.2063] [0.1828, 0.2063] [0.1828, 0.2063]

In domain[edit]

LER[edit]

is the "crispiness" ratio, the amount of times an alternative translation is seen in a given context compared to the default translation. So, a of 2.0 means that the translation appears twice as frequently as the default.

Pair freq tlm ling alig rules
(c>1.5)
rules
(c>2.0)
rules
(c>2.5)
rules
(c>3.0)
rules
(c>3.5)
rules
(c>4.0)
br-fr
[58.9, 64.8]

[44.2, 50.5]
168
[54.8, 60.7]
115
[28.5, 34.1]
221
[27.8, 33.3]
213
[27.6, 33.0]
159
[26.3, 31.8]
150
[26.1, 31.6]
135
[27.2, 32.8]
135
[27.2, 32.8]
en-es
[21.0, 25.3]

[15.1, 18.9]
667
[20.7, 25.1
630
[7.2, 10.0]
2881
[5.9, 8.6]
2728
[6.0, 8.6]
1683
[5.7, 8.3]
1578
[5.7, 8.3]
1242
[6.0, 8.5]
1197
[5.9, 8.6]
eu-es
[41.1, 46.6]

[38.8, 44.2]
697
[47.8, 53.0]
598
[16.5, 20.8]
2253
[20.2, 24.7]
2088
[17.2, 21.7]
1382
[16.8, 21.0]
1266
[16.1, 20.4]
1022
[15.9, 20.2]
995
[16.0, 20.3]
mk-en
[42.4, 46.3]

[27.1, 30.8]
1385
[28.8, 32.6]
1079
[19.0, 22.2]
1684
[18.5, 21.5]
1635
[18.6, 21.6]
1323
[19.1, 22.2]
1271
[19.0, 22.0]
1198
[19.1, 22.1]
1079
[19.1, 22.1]

BLEU[edit]

Pair freq tlm ling alig rules
(c>1.5)
rules
(c>2.0)
rules
(c>2.5)
rules
(c>3.0)
rules
(c>3.5)
rules
(c>4.0)
br-fr
[0.1247, 0.1420]

[0.1397, 0.1572]
168
[0.1325, 0.1503]
115
[0.1344, 0.1526]
221
[0.1367, 0.1551]
213
[0.1367, 0.1549]
159
[0.1374, 0.1554]
150
[0.1364, 0.1543]
135
[0.1352, 0.1535]
135
[0.1352, 0.1535]
en-es
[0.2151, 0.2340]

[0.2197, 0.2384]
667
[0.2148, 0.2337]
630
[0.2208, 0.2398]
2881
[0.2217, 0.2405]
2728
[0.2217, 0.2406]
1683
[0.2217, 0.2407]
1578
[0.2217, 0.2407]
1242
[0.2217, 0.2407]
1197
[0.2217, 0.2408]
eu-es
[0.0873, 0.1038]

[0.0921, 0.1093]
697
[0.0870, 0.1030]
598
[0.0972, 0.1149]
2253
[0.0965, 0.1142]
2088
[0.0971, 0.1147]
1382
[0.0971, 0.1148]
1266
[0.0971, 0.1148]
1022
[0.0973, 0.1150]
995
[0.0973, 0.1150]
mk-en
[0.2300, 0.2511]

[0.2976, 0.3230]
1385
[0.2337, 0.2563]
1079
[0.2829, 0.3064]
1684
[0.2838, 0.3071]
1635
[0.2834, 0.3067]
1323
[0.2825, 0.3058]
1271
[0.2827, 0.3059]
1198
[0.2827, 0.3059]
1079

Learning monolingually (winner-takes-all)[edit]

Setup:

  • SL side of the training corpus
  • All possibilities translated and scored
  • Absolute winners taken
  • Rules generated by counting ngrams in the same way as with the parallel corpus, only no alignment needed as it works like an annotated corpus.

Out of domain[edit]

LER[edit]

Pair freq tlm ling alig rules
(c>1.5)
rules
(c>2.0)
rules
(c>2.5)
rules
(c>3.0)
rules
(c>3.5)
rules
(c>4.0)
en-es [44.5, 52.0] [34.7, 41.9] [24.7, 31.9] [30.2, 37.9] [30.2, 37.9] [29.2, 37.0] [29.3, 36.8] [29.0, 36.4] [29.1, 36.5]

BLEU[edit]

Pair freq tlm ling alig rules
(c>1.5)
rules
(c>2.0)
rules
(c>2.5)
rules
(c>3.0)
rules
(c>3.5)
rules
(c>4.0)
en-es [0.1885, 0.2133] [0.1953, 0.2201] [0.1832, 0.2067] [0.1806, 0.2042] [0.1806, 0.2042] [0.1808, 0.2043] [0.1810, 0.2046] [0.1809, 0.2045] [0.1809, 0.2045]

In domain[edit]

LER[edit]

Pair freq tlm ling alig rules
(c>1.5)
rules
(c>2.0)
rules
(c>2.5)
rules
(c>3.0)
rules
(c>3.5)
rules
(c>4.0)
br-fr
[58.9, 64.8]

[44.2, 50.5]
168
[54.8, 60.7]
115
261
[53.5, 59.2]
247
[52.1, 58.2]
172
[54.3, 60.2]
165
[52.7, 58.4]
138
[50.5, 56.3]
136
[50.6, 56.6]
en-es
[21.0, 25.3]

[15.1, 18.9]
667
[20.7, 25.1]
?
?
2595
[15.0, 19.0]
2436
[15.1, 19.1]
1520
[13.7, 17.6]
1402
[13.6, 17.3]
1065
[13.9, 17.7]
1024
[13.9, 17.8]
eu-es
[41.1, 46.6]

[38.8, 44.2]
?
[47.8, 53.0]
?
2631
[40.9, 46.4]
2427
[40.9, 46.5]
1186
[40.7, 46.1]
1025
[40.7, 46.2]
685
[40.5, 45.9]
641
[40.5, 45.9]
mk-en
[42.4, 46.3]

[27.1, 30.8]
1385
[28.8, 32.6]
?
1698
[27.8, 31.5]
1662
[27.8, 31.4]
1321
[27.8, 31.4]
1285
[27.8, 31.4]
1186
[27.7, 31.4]
1180
[27.7, 31.4]

BLEU[edit]

Pair freq tlm ling alig rules
(c>1.5)
rules
(c>2.0)
rules
(c>2.5)
rules
(c>3.0)
rules
(c>3.5)
rules
(c>4.0)
br-fr
[0.1247, 0.1420]

[0.1397, 0.1572]
168
[0.1325, 0.1503]
115
261
[0.1250, 0.1425]
247
[0.1252, 0.1429]
172
[0.1240, 0.1412]
165
[0.1243, 0.1416]
138
[0.1255, 0.1429]
136
[0.1255, 0.1429]
en-es
[0.2151, 0.2340]

[0.2197, 0.2384]
667
[0.2148, 0.2337]
?
2595
[0.2180, 0.2371]
2436
[0.2180, 0.2372]
1520
[0.2190, 0.2380]
1402
[0.2190, 0.2381]
1065
[0.2189, 0.2380]
1024
[0.2189, 0.2380]
eu-es
[0.0873, 0.1038]

[0.0921, 0.1093]
?
[0.0870, 0.1030]
?
2631
[0.0875, 0.1040]
2427
[0.0878, 0.1042]
1186
[0.0878, 0.1043]
1025
[0.0878, 0.1043]
685
[0.0879, 0.1043]
641
[0.0879, 0.1043]
mk-en
[0.2300, 0.2511]

[0.2976, 0.3230]
1385
[0.2567, 0.2798]
1698
[0.2694, 0.2930]
1662
[0.2695, 0.2931]
1321
[0.2696, 0.2935]
1285
[0.2696, 0.2935]
1186
[0.2696, 0.2934]
1180
[0.2696, 0.2934]


Learning monolingually (fractional counts)[edit]

Setup:

  • SL side of the training corpus
  • All possibilities translated and scored
  • Probabilities normalised into fractional counts (e.g. add them up to get a total, then divide each prob by the total).
    • log prob converted into normal prob using exp10()
  • Rules generated by counting fractions from the translated file.

In domain[edit]

LER[edit]

BLEU[edit]

Out of domain[edit]

LER[edit]

BLEU[edit]

MaxEnt[edit]

With alignments[edit]

Pair alig rule-best ME (>5) ME (>3)
br-fr 33.4 31.5 31.8 29.9
mk-en 19.9 19.8 18.9 17.8
eu-es 18.5 17.9 17.4 19.9
en-es 8.6 7.0 6.3 6.3

With fractional counts[edit]

Pair alig rule-best ME (>5) ME (>3) ME (>1) ME (>0)
br-fr 43.4 43.1 61.9 46.2 48.2 49.9
mk-en 29.5
eu-es 41.2 43.9 44.4
en-es 11.9 11.7 11.4 11.9

Notes[edit]