Comparison of part-of-speech tagging systems
Apertium would like to have really good part-of-speech tagging, but in many cases falls below the state-of-the-art (around 97% tagging accuracy). This page intends to collect a comparison of tagging systems in Apertium and give some ideas of what could be done to improve them.
The scripts to generate these results are written in Python and available from SVN, /branches/apertium-tagger/experiments/ : https://svn.code.sf.net/p/apertium/svn/branches/apertium-tagger/experiments/
In the following two tables, values of the form x±y are the sample mean and standard deviation of the results of 10-fold cross validation.
In the following table the values represent tagger recall (= [true positives]/[total tokens]):
System | Language | |||||||
---|---|---|---|---|---|---|---|---|
Catalan | Spanish | Serbo-Croatian | Russian | Kazakh | Portuguese | Swedish | Italian | |
23,673 | 20,487 | 20,071 | 1,052 | 13,714 | 6,725 | 369 | 5,201 | |
1st | 86.50 | 90.34 | 44.99±1.20 | 38.19 | 72.08 | 76.70 | 34.70 | 82.28±3.05 |
Bigram (unsup, 0 iters) | 88.96±1.12 | 88.49±1.54 | 47.31±1.24 | 81.41±5.78 | 79.16±3.12 | |||
Bigram (unsup, 50 iters) | 91.74±1.15 | 91.13±1.52 | 48.28±1.33 | 81.09±5.99 | 84.93±2.71 | |||
Bigram (unsup, 250 iters) | 91.51±1.16 | 90.85±1.48 | 48.05±1.47 | 80.31±6.60 | 84.52±2.78 | |||
Lwsw (0 iters) | 92.73±0.89 | 92.86±0.95 | 43.56±1.20 | 83.01±5.47 | 86.12±2.96 | |||
Lwsw (50 iters) | 92.98±0.85 | 93.01±1.02 | 45.09±1.15 | 82.70±5.76 | 86.07±2.68 | |||
Lwsw (250 iters) | 92.99±0.84 | 93.06±1.02 | 45.13±1.17 | 82.75±5.79 | 86.08±2.67 | |||
CG→1st | 88.05 | 91.10 | 64.01±1.04 | 39.81 | 81.56 | 87.99 | 42.90 | 83.29±3.07 |
CG→Bigram (unsup, 0 iters) | 91.83±1.03 | 91.39±1.42 | 60.37±1.45 | 86.77±6.33 | 81.31±3.10 | |||
CG→Bigram (unsup, 50 iters) | 93.16±1.39 | 92.53±1.29 | 60.91±1.65 | 87.48±6.16 | 86.11±2.46 | |||
CG→Bigram (unsup, 250 iters) | 92.99±1.38 | 92.50±1.23 | 60.88±1.66 | 87.20±6.72 | 86.01±2.59 | |||
CG→Lwsw (0 iters) | 93.17±1.08 | 92.72±1.09 | 59.93±1.46 | 86.60±6.20 | 85.64±2.83 | |||
CG→Lwsw (50 iters) | 93.37±1.02 | 92.74±1.16 | 60.38±1.57 | 86.54±6.21 | 85.55±2.72 | |||
CG→Lwsw (250 iters) | 93.38±1.05 | 92.77±1.18 | 60.42±1.53 | 86.54±6.20 | 85.54±2.72 | |||
Unigram model 1 | 93.86±1.13 | 93.96±0.98 | 63.96±0.92 | 39.11±8.91 | 80.63±3.87 | 86.00±6.63 | 46.48±5.78 | 89.37±1.63 |
Unigram model 2 | 93.90±1.09 | 93.69±0.94 | 67.51±0.67 | 40.36±8.59 | 82.19±3.70 | 87.13±6.23 | 47.12±8.29 | 89.23±0.97 |
Unigram model 3 | 93.88±1.08 | 93.67±0.94 | 67.47±0.64 | 40.36±8.59 | 82.45±3.80 | 87.11±6.13 | 47.12±8.29 | 89.00±0.95 |
Bigram (sup) | 96.00±0.87 | 95.47±1.07 | 55.26±0.87 | 88.07±6.50 | ||||
CG→Unigram model 1 | 94.34±1.11 | 94.73±0.88 | 68.42±0.69 | 40.71±9.39 | 84.54±3.29 | 88.42±6.55 | 46.84±5.48 | 89.04±1.45 |
CG→Unigram model 2 | 94.11±1.09 | 94.33±0.82 | 68.93±0.72 | 41.43±9.21 | 84.62±3.47 | 88.64±6.13 | 47.07±7.39 | 88.67±0.93 |
CG→Unigram model 3 | 94.09±1.08 | 94.31±0.81 | 68.88±0.72 | 41.43±9.21 | 84.71±3.54 | 88.63±6.07 | 47.07±7.39 | 88.45±0.94 |
CG→Bigram (sup) | 96.00±1.13 | 94.88±1.18 | 65.66±1.16 | 88.73±6.36 | ||||
Percep (coarsebigram) | 94.02±1.26 | 94.79±0.86 | 55.64±1.17 | 87.04±6.23 | 90.87±0.87 | |||
Percep (kaztags) | 93.66±0.76 | 94.28±0.93 | 70.44±0.92 | 91.41±2.09 | 87.07±6.16 | 99.70±0.96 | 90.64±1.13 | |
Percep (spacycoarsetags) | 95.06±1.01 | 95.23±0.66 | 56.34±1.21 | 87.32±6.22 | 90.96±0.76 | |||
Percep (spacyflattags) | 95.25±0.85 | 95.46±0.64 | 73.02±1.12 | 91.91±2.13 | 87.45±6.24 | 99.70±0.96 | 90.13±1.37 | |
Percep (unigram) | 93.59±0.77 | 94.09±0.96 | 70.11±0.97 | 91.08±2.13 | 87.16±6.22 | 99.70±0.96 | 90.23±0.95 | |
CG→Percep (coarsebigram) | 94.01±1.28 | 94.75±0.69 | 67.32±0.96 | 88.70±6.29 | 89.25±1.17 | |||
CG→Percep (kaztags) | 93.91±0.90 | 94.72±0.88 | 72.79±1.11 | 87.73±3.12 | 88.72±6.23 | 94.34±3.16 | 89.82±1.29 | |
CG→Percep (spacycoarsetags) | 94.93±1.12 | 95.16±0.78 | 67.81±1.11 | 88.83±6.13 | 89.88±1.03 | |||
CG→Percep (spacyflattags) | 95.19±0.98 | 95.40±0.66 | 72.80±0.76 | 87.62±2.83 | 88.85±6.21 | 94.34±3.16 | 89.34±1.24 | |
CG→Percep (unigram) | 93.87±0.92 | 94.73±0.77 | 72.42±0.86 | 87.52±3.09 | 88.81±6.28 | 94.34±3.16 | 89.39±1.24 |
In the following table the values represent availability adjusted tagger recall (= [true positives]/[words with a correct analysis from the morphological parser]). This data is also available in box plot form here:
System | Language | |||||||
---|---|---|---|---|---|---|---|---|
Catalan | Spanish | Serbo-Croatian | Russian | Kazakh | Portuguese | Swedish | Italian | |
23,673 | 20,487 | 20,071 | 1,052 | 13,714 | 6,725 | 369 | 5,201 | |
1st | 87.86 | 91.82 | 52.56±1.53 | 75.93 | 77.72 | 83.00 | 64.47 | 82.77±3.09 |
Bigram (unsup, 0 iters) | 90.35±1.17 | 89.95±1.45 | 55.27±1.63 | 89.72±2.06 | 79.64±3.11 | |||
Bigram (unsup, 50 iters) | 93.17±1.21 | 92.63±1.40 | 56.40±1.70 | 89.35±1.99 | 85.45±2.78 | |||
Bigram (unsup, 250 iters) | 92.94±1.22 | 92.35±1.33 | 56.13±1.87 | 88.45±2.51 | 85.03±2.87 | |||
Lwsw (0 iters) | 94.18±0.91 | 94.40±0.77 | 50.88±1.54 | 91.51±1.22 | 86.64±3.15 | |||
Lwsw (50 iters) | 94.44±0.81 | 94.54±0.83 | 52.67±1.46 | 91.14±1.62 | 86.59±2.82 | |||
Lwsw (250 iters) | 94.44±0.79 | 94.60±0.84 | 52.72±1.50 | 91.20±1.64 | 86.60±2.81 | |||
CG→1st | 89.44 | 92.60 | 74.77±1.32 | 79.10 | 87.95 | 95.22 | 79.70 | 83.79±3.08 |
CG→Bigram (unsup, 0 iters) | 93.27±1.10 | 92.90±1.30 | 70.52±1.71 | 95.61±1.77 | 81.80±3.08 | |||
CG→Bigram (unsup, 50 iters) | 94.62±1.49 | 94.05±1.13 | 71.15±1.94 | 96.41±1.38 | 86.63±2.51 | |||
CG→Bigram (unsup, 250 iters) | 94.45±1.48 | 94.03±1.09 | 71.11±1.95 | 96.06±2.05 | 86.53±2.62 | |||
CG→Lwsw (0 iters) | 94.63±1.08 | 94.25±0.91 | 70.00±1.74 | 95.43±1.52 | 86.16±2.97 | |||
CG→Lwsw (50 iters) | 94.83±1.01 | 94.27±0.97 | 70.53±1.86 | 95.36±1.54 | 86.07±2.79 | |||
CG→Lwsw (250 iters) | 94.84±1.03 | 94.30±0.99 | 70.58±1.81 | 95.36±1.53 | 86.06±2.79 | |||
Unigram model 1 | 95.33±1.05 | 95.51±0.84 | 74.72±1.43 | 77.54±6.51 | 87.03±3.03 | 94.74±2.44 | 89.26±7.32 | 89.91±1.93 |
Unigram model 2 | 95.37±1.04 | 95.23±0.77 | 78.87±1.05 | 80.06±6.11 | 88.72±2.76 | 96.01±1.70 | 89.82±7.70 | 89.77±1.23 |
Unigram model 3 | 95.35±1.03 | 95.22±0.79 | 78.82±1.06 | 80.06±6.11 | 88.99±2.83 | 95.99±1.52 | 89.82±7.70 | 89.54±1.25 |
Bigram (sup) | 97.50±0.93 | 97.04±0.86 | 64.55±1.33 | 97.03±1.75 | ||||
CG→Unigram model 1 | 95.82±1.06 | 96.30±0.68 | 79.92±0.95 | 80.56±6.70 | 91.25±2.01 | 97.42±1.76 | 90.00±6.99 | 89.58±1.75 |
CG→Unigram model 2 | 95.58±1.07 | 95.89±0.59 | 80.51±0.95 | 82.06±6.50 | 91.33±2.15 | 97.70±1.32 | 89.97±7.50 | 89.21±1.13 |
CG→Unigram model 3 | 95.56±1.05 | 95.86±0.60 | 80.46±0.99 | 82.06±6.50 | 91.43±2.26 | 97.69±1.28 | 89.97±7.50 | 88.98±1.18 |
CG→Bigram (sup) | 97.51±1.21 | 96.45±0.93 | 76.70±1.46 | 97.78±1.52 | ||||
Percep (coarsebigram) | 95.71±1.36 | 96.60±0.75 | 61.99±1.24 | 95.92±1.60 | 92.89±1.10 | |||
Percep (kaztags) | 95.34±0.77 | 96.08±0.69 | 78.47±0.99 | 91.41±2.08 | 95.95±1.69 | 99.70±0.96 | 92.67±1.31 | |
Percep (spacycoarsetags) | 96.76±1.06 | 97.05±0.56 | 62.77±1.29 | 96.22±1.52 | 92.99±0.93 | |||
Percep (spacyflattags) | 96.96±0.87 | 97.28±0.58 | 81.35±1.19 | 91.92±2.12 | 96.37±1.53 | 99.70±0.96 | 92.14±1.44 | |
Percep (unigram) | 95.27±0.76 | 95.89±0.74 | 78.11±1.03 | 91.08±2.12 | 96.05±1.64 | 99.70±0.96 | 92.24±1.11 | |
CG→Percep (coarsebigram) | 95.70±1.37 | 96.55±0.55 | 75.00±1.04 | 97.75±1.47 | 91.25±1.50 | |||
CG→Percep (kaztags) | 95.59±0.92 | 96.53±0.66 | 81.10±1.20 | 87.74±3.11 | 97.78±1.41 | 94.34±3.16 | 91.83±1.50 | |
CG→Percep (spacycoarsetags) | 96.64±1.17 | 96.98±0.64 | 75.54±1.31 | 97.90±1.30 | 91.89±1.20 | |||
CG→Percep (spacyflattags) | 96.90±1.02 | 97.22±0.51 | 81.10±0.86 | 87.62±2.82 | 97.92±1.38 | 94.34±3.16 | 91.34±1.42 | |
CG→Percep (unigram) | 95.55±0.92 | 96.54±0.52 | 80.68±0.93 | 87.52±3.08 | 97.87±1.47 | 94.34±3.16 | 91.38±1.40 |
In the following table, the intervals represent the [low, high] values from 10-fold cross validation.
Language | Corpus | System | |||||||
---|---|---|---|---|---|---|---|---|---|
Sent | Tok | Amb | 1st | CG+1st | Unigram | CG+Unigram | apertium-tagger | CG+apertium-tagger | |
Catalan | 1,413 | 24,144 | ? | 81.85 | 83.96 | [75.65, 78.46] | [87.76, 90.48] | [94.16, 96.28] | [93.92, 96.16] |
Spanish | 1,271 | 21,247 | ? | 86.18 | 86.71 | [78.20, 80.06] | [87.72, 90.27] | [90.15, 94.86] | [91.84, 93.70] |
Serbo-Croatian | 1,190 | 20,128 | ? | 75.22 | 79.67 | [75.36, 78.79] | [75.36, 77.28] | ||
Russian | 451 | 10,171 | ? | 75.63 | 79.52 | [70.49, 72.94] | [74.68, 78.65] | n/a | n/a |
Kazakh | 403 | 4,348 | ? | 80.79 | 86.19 | [84.36, 87.79] | [85.56, 88.72] | n/a | n/a |
Portuguese | 119 | 3,823 | ? | 72.54 | 87.34 | [77.10, 87.72] | [84.05, 91.96] | ||
Swedish | 11 | 239 | ? | 72.90 | 73.86 | [56.00, 82.97] |
Sent = sentences, Tok = tokens, Amb = average ambiguity from the morphological analyser
Systems[edit]
1st
: Selects the first analysis from the morphological analyserCG
: Uses the CG (from the monolingual language package in languages) to preprocess the input.Unigram
: Lexicalised unigram taggerapertium-tagger
: Uses the bigram HMM tagger included with Apertium.
Corpora[edit]
The tagged corpora used in the experiments are found in the monolingual packages in languages, under the texts/
subdirectory.