Difference between revisions of "User:LA2"

From Apertium
Jump to navigation Jump to search
m
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''LA2''' is the username of '''Lars Aronsson, Sweden.''' More [http://en.wikipedia.org/wiki/User:LA2 on Wikipedia].
'''LA2''' is the username of '''Lars Aronsson, Sweden.''' More [http://en.wikipedia.org/wiki/User:LA2 on Wikipedia] and [http://wiki.openoffice.org/wiki/User:LA2 Apache OpenOffice wiki]. <code><nowiki>{{babel|sv|en-3|de-2|da-1|no-1}}</nowiki></code>


* Other projects to watch: [http://joshua-decoder.org/ Joshua] ([http://es.wikipedia.org/wiki/Joshua_MT_Tool es.WP]), [http://www.statmt.org/moses/ Moses] ([http://en.wikipedia.org/wiki/Moses_%28machine_translation%29 WP], [[Moses|this wiki]])
<pre>{{babel|sv|en-3|de-2|da-1|no-1}}</pre>


==Diary==
==Diary==
'''September 6, 2013:''' Tihomir Rangelov announces his thesis work, [http://skemman.is/en/item/view/1946/16376 Developing Apertium-is-sv].

'''August 15, 2013:''' A collection of [http://hlt.sztaki.hu/resources/index.html resources] from Judit Ács, among them translations extracted from Wiktionary.

'''August 9, 2013:''' There is now a Danish parallel text of [http://runeberg.org/berlidan/ Gösta Berlings Saga], so we have something to compare the machine translation against. The opening sentence should read "Endelig stod præsten på prækestolen" ("på", not "i").

'''August 4, 2013:''' The <code><pardef n="numeros"></code> in sv.dix matches and catches too much. For the input "34 good" it matches "34 g" as a numeral, and "ood" as an unrecognized word. In Bible cross references, 1 Mos. 5 becomes 1 m*os. 5, which is quite useless.
'''August 4, 2013:''' The <code><pardef n="numeros"></code> in sv.dix matches and catches too much. For the input "34 good" it matches "34 g" as a numeral, and "ood" as an unrecognized word. In Bible cross references, 1 Mos. 5 becomes 1 m*os. 5, which is quite useless.



Latest revision as of 03:43, 28 September 2013

LA2 is the username of Lars Aronsson, Sweden. More on Wikipedia and Apache OpenOffice wiki. {{babel|sv|en-3|de-2|da-1|no-1}}

Diary[edit]

September 6, 2013: Tihomir Rangelov announces his thesis work, Developing Apertium-is-sv.

August 15, 2013: A collection of resources from Judit Ács, among them translations extracted from Wiktionary.

August 9, 2013: There is now a Danish parallel text of Gösta Berlings Saga, so we have something to compare the machine translation against. The opening sentence should read "Endelig stod præsten på prækestolen" ("på", not "i").

August 4, 2013: The <pardef n="numeros"> in sv.dix matches and catches too much. For the input "34 good" it matches "34 g" as a numeral, and "ood" as an unrecognized word. In Bible cross references, 1 Mos. 5 becomes 1 m*os. 5, which is quite useless.

Why do "@för att" and "@så att" appear in the Danish translation? Shouldn't it be "for at" and "så at"? Do we need rules for that?

August 3, 2013: In my first SVN commit to Apertium (r46198), I add the Swedish verb form 'sade', past tense of 'säga'. Current coverage immediately increases by as much as half a percent. The second commit (r46199) fixes some errors in the beginning of Gösta Berlings saga, which now looks better: "Endelig stod præsten i prædikestolen. Forsamlingens hoveder løfter. Så, der var han altså! Det skulle ikke blive messefald denne søndagen så som den *förra og mange søndage @förut."

August 2, 2013: In trunk/apertium-sv-da/apertium-sv-da.sv.dix there are 426 <pardef> tags. The number uses of <par n="..."> are 838 inside pardefs and 10,673 outside of pardefs. The XML file contains no lines with more than one <par n="..."> tag. The most commonly used paradigms used inside pardefs are:

Count Paradigm What does it do
550 S__case
283 S__voice
2 no-cp\Ø\S__case
2 cp-R
1 cp-L

The most commonly used paradigms used outside of pardefs are listed below. There are many duplicates and minor errors, which paradigm chopper and dixtools might fix.

Count Paradigm Part of
speech
Suffix What does it do
1306 bil__n n -en -ar -arna
1150 ackord__n n -et = -en
799 adressera__vblex vblex
679 accent__n n -en -er -erna
379 var__adv adv
377 t.ex.__abbr abbr
376 mask__n n exact duplicate of accent_n
353 Afrika__np np
351 aktiv__adj adj
340 frug/a__n n /a -a -an -or -orna
303 afrikansk__adj adj
236 Magnusson__np np
206 Trond__np np
177 mat__n n -en; uncountable
166 ack__ij ij
163 äpple__n n -t -n -na
157 adress__n n -en -er -erna, ending in -s
156 apotekar/e__n n /e -e -en -e -na
154 extra__adj adj
105 geometri__n n -n -er -erna
103 Eva__np np
98 adel__n n -n; uncountable
94 badhus__n n -et = -en; ending in -s
76 ed__n n exact duplicate of accent__n
70 beväpna/d__adj adj /d
66 abborr/e__n n /e -e -ar -arna
65 aktie__n n -n -r -rna
64 afghansk/a__n n /a -a -an; uncountable -- but afghanska should be countable, i.e. fruga/a__n
61 på__pr pr
58 följ/a__vblex vblex /a
58 blod__n n -et; uncountable
55 koppar__n n duplicate of mat__n; should add -n
53 kolleg/a__n n /a -a -an -or/-er -orna/-erna; with errors
53 aktstycke__n n exact duplicate of äpple__n
48 nak/en__adj adj /en
47 hys/a__vblex vblex /a
45 hör/a__vblex vblex /a
45 diskret__adj adj
42 dragspel__n n exact duplicate of ackord__n
40 arton__num num
39 kalend/er__n n /er -er -ern -rar -rarna
36 cyk/el__n n /el -el -eln -lar -larna
36 blin/d__adj adj /d
31 oro__n n
31 första__det det
30 blygsam__adj adj
28 bageri__n n -et -er -erna
28 att__cnjsub cnjsub
27 Zeeland__np np
27 bro__n n -n -ar -arna
23 gram__n n -met = -men
22 pill/er__n n /er -er -ret -er -ren
22 Bjarnas/on__np np /on
20 laboratori/um__n n /um -um -et -er -erna
19 inre/da__vblex vblex /da
18 g/å__vblex vblex
18 Abbas__np np
17 samverkan__n n =; uncountable
17 m/an__n n /an -an -annen -än -ännen
17 glass__n n -en -ar -arna; ending in -s
16 order__n n -n = -na
16 ant/a__vblex vblex /a
16 altare__n n -t = -n; uses -ns instead of S__case, probably has errors; words that use this should use äpple__n instead. Altare follows a special pattern that needs a paradigm of its own, but it must be altar/e__n since the definite plural is altarna and compounds are altar-.
14 böter__n n -na; plural only; misleading comment
13 Wikipedia__np np
13 sek/el__n n /el -el -let -el -len
13 ankom/ma__vblex vblex /ma
12 teck/en__n n /en -en -net -en -nen
12 s/ätta__vblex vblex /ätta
12 lycka/s__vblex vblex /s
12 inneh/ålla__vblex vblex /ålla
12 april__n n -månaden -månader -månaderna
11 syre__n n -t; uncountable; erroneously marked as ut, should be nt
11 medg/e__vblex vblex /e
11 med/el__n n /el duplicate of sek/el__n; but (erroneously?) with separate sg and pl instead of sp
11 lat__adj adj
11 g/öra__vblex vblex /öra
11 eller__cnjcoo cnjcoo
10 s/e__vblex vblex /e
10 l/and__n n /and -and -andet -änder -ändera
10 Jonas__np np
10 Azorerna__np np
9 st/å__vblex vblex
9 när__cnjadv cnjadv
9 l/ida__vblex vblex /ida
9 alfabet__n n exact duplicate of ackord__n
8 stam__n n -men -mar -marna
8 heli/um__n n /um -um -umet/-et; uncountable
8 göm/ma__vblex vblex /ma
8 början__n n = = = ; similar to samverkan__n but also covers plural (do these words really have plural)
8 beskr/iva__vblex vblex /iva
8 avbr/yta__vblex vblex /yta
7 särskil/d__adj adj /d
7 politiker__n n -n = -na; uses separate sg, pl instead of sp
7 Paris__np np
7 l/åta__vblex vblex /åta
7 grun/d__adj adj /d
7 gla/d__adj adj /d
7 Europol__np np
7 bygg/a__vblex vblex /a
7 bi__n n -et -n -na
7 begr/ipa__vblex vblex /ipa
7 avsky__vblex vblex
7 använ/da__vblex vblex /da
7 ankar/e__n n /e -e -et -e -na
6 verk/an__n n /an -an -an -ning -ningar
6 tapp/er__adj adj /er
6 st/ad__n n /ad -ad -aden/-an -äder -äderna
6 sl/å__vblex vblex
6 l/ägga__vblex vblex /ägga
6 h/and__n n /and -and -anden -änder -änderna
6 försiktig/t__adv adv /t
6 fäst/a__vblex vblex /a
6 f/ar__n n /ar -ar -adern -äder -äderna; see also f/ader
6 dubb/el__adj adj /el
6 br/or__n n /or -or/-oder -odern/-orn -öder -öderna; erroneously specifies -röder
6 bris__n n -en -er/-ar -erna/-arna; erroneously sets S__case on sg ind (briss)
6 br/inna__vblex vblex /inna
6 b/ok__n n /ok -ok -oken -öcker -öckerna
5 yen__n n -en = -en
5 skr/ika__vblex vblex /ika
5 sex__n n -et; uncountable, ending in -s
5 m/or__n n /or -or -odern -ödrar -ödrarna; does not cover -oder
5 morg/on__n n /on -on -onen -nar -narna
5 ledam/ot__n n /ot -ot -oten -öter -öterna
5 kvicksilv/er__n n /er -er -ret; uncountable
5 kaffe__n n exact duplicate of syre__n; but correctly marked as nt, not erroneously as ut
5 hög/er__adj adj /er
5 gift__adj adj
5 bo__vblex vblex
5 anf/alla__vblex vblex /alla
5 Amsterdamfördraget__np np
4 s/on__n n /on -on -onen -öner -önerna
4 Schweiz__np np
4 konstla/d__adj adj /d
4 kilo__n n exact duplicate of äpple__n; slightly misleading comment
4 fri__adj adj
4 d/yka__vblex vblex /yka
4 dollar__n n
4 brän/na__vblex vblex /na
4 bott/en__n n /en
4 bokträ__n n
4 bj/uda__vblex vblex /uda
4 bel/ysa__vblex vblex /ysa
4 back/e__n n /e
4 ans__n n
4 anhörig__n n
3 vå/t__adj adj /t
3 Världshälsoorganisation__np np
3 vän__n n
3 tänk/a__vblex vblex /a
3 san/n__adj adj /n
3 ög/a__n n /a
3 mor/ot__n n /ot
3 lj/uga__vblex vblex /uga
3 l/ång__adj adj /ång
3 kis/el__n n /el
3 huv/ud__n n /ud
3 ha__vblex vblex
3 film__n n
3 f/å__vblex vblex
3 djung/el__n n /el
3 best/å__vblex vblex
3 besl/uta__vblex vblex /uta
3 antiken__n n
3 Allmogeförbund__np np
2 /vara__vbser vbser /vara
2 v/älja__vblex vblex /älja
2 /ung__adj adj /ung
2 undr/an__n n /an
2 undertecknad__prn prn
2 t/iga__vblex vblex /iga
2 sv/ärja__vblex vblex /ärja
2 str/ypa__vblex vblex /ypa
2 st/or__adj adj /or
2 stö/dja__vblex vblex /dja
2 som/mar__n n /mar
2 skam__n n
2 samm/a__prn prn /a
2 s/äga__vblex vblex /äga
2 rätt__adv adv
2 num/mer__n n /mer
2 nå__vblex vblex
2 n/att__n n /att
2 mytologi__n n
2 må__vblex vblex
2 mång/a__det det /a
2 lyck/a__vblex vblex /a
2 löp/a__vblex vblex /a
2 /liten__adj adj /liten
2 lire__n n
2 l/igga__vblex vblex /igga
2 långsam/t__adv adv /t
2 hur__adv adv
2 ham/mare__n n /mare
2 godis__n n
2 gäll/a__vblex vblex /a
2 frejda/d__adj adj /d
2 försäkr/an__n n /an
2 fl/yga__vblex vblex /yga
2 flest__det det
2 flera__det det
2 Finnbogadottir__np np
2 f/inna__vblex vblex /inna
2 fing/er__n n /er
2 erkän/na__vblex vblex /na
2 dr/a__vblex vblex /a
2 d/otter__n n /otter
2 de/n__prn prn /n
2 data__n n
2 chips__n n
2 b/onde__n n /onde
2 bo__n n
2 blues__n n
2 bl/i__vblex vblex /i
2 blå__adj adj
2 b/inda__vblex vblex /inda
2 b/ära__vblex vblex /ära
2 bane__n n
2 bal__n n
2 båda__prn prn
2 allting__prn prn
2 akademi__n n
2 advent__n n
1 /vi__prn prn /vi
1 vilk/en__prn prn /en
1 v/ilja__vaux vaux /ilja
1 v/eta__vblex vblex /eta
1 v/em__prn prn /em
1 varje__det det
1 varifrån__prn prn
1 varenda__prn prn
1 vår/__det det /
1 var/dera__prn prn /dera
1 var/annan__prn prn /annan
1 varan/dra__prn prn /dra
1 valfångstkommissionen__np np
1 väldig/t__adv adv /t
1 /väl__adv adv /väl
1 umg/ås__vblex vblex /ås
1 studerande__n n
1 st/jäla__vblex vblex /jäla
1 Stilla_havet__np np
1 spr/inga__vblex vblex /inga
1 spelrum__n n
1 sov/a__vblex vblex /a
1 /som__prn prn /som
1 somligas__det det
1 somliga__prn prn
1 solo__n n
1 Socialdemokraterna__np np
1 sl/ippa__vblex vblex /ippa
1 sl/åss__vblex vblex /åss
1 sk/ola__vbmod vbmod /ola
1 sk/juta__vblex vblex /juta
1 sk/ina__vblex vblex /ina
1 skil/ja__vblex vblex /ja
1 skäm/mas__vblex vblex /mas
1 sj/unka__vblex vblex /unka
1 sj/unga__vblex vblex /unga
1 själv__prn prn
1 s/itta__vblex vblex /itta
1 si/n__det det /n
1 s/imma__vblex vblex /imma
1 sig__prn prn
1 separa
1 samtliga__prn prn
1 saltvatt/en__n n /en
1 s/älja__vblex vblex /älja
1 så/dan__prn prn /dan
1 rym/ma__vblex vblex /ma
1 ros__n n
1 rö/d__adj adj /d
1 rät/t__adv adv /t
1 rä/tt__adv adv /tt
1 Putin__np np
1 procent__n n
1 pengar__n n
1 paresq
1 pardret
1 övrig__adj adj
1 öl__n n
1 ofta__adv adv
1 oelek__n n
1 Odyssé__np np
1 n/ysa__vblex vblex /ysa
1 nykom/men__adj adj /men
1 nyfö/dd__adj adj /dd
1 numeros
1 nöt__n n
1 Nordsjön__np np
1 /ni__prn prn /ni
1 när/a__vblex vblex /a
1 någ/on__prn prn /on
1 någ/on__det det /on
1 m/ycket__adv adv /ycket
1 mun__n n
1 mott__n n
1 min/nas__vblex vblex /nas
1 mi/n__det det /n
1 mil__n n
1 medi/a__n n /a
1 måst/a__vaux vaux /a
1 masala__n n
1 mars__n n
1 /man__prn prn /man
1 manisk__adj adj
1 mång/a__prn prn /a
1 /många__adj adj /många
1 mandrom__prn prn
1 l/yda__vblex vblex /yda
1 l/us__n n /us
1 /lite__adv adv /lite
1 l/e__vblex vblex /e
1 lev/a__vblex vblex /a
1 l/ågt__adv adv /ågt
1 l/åg__adj adj /åg
1 /la__adv adv /la
1 k/unna__vbmod vbmod /unna
1 kr/ypa__vblex vblex /ypa
1 kok/a__vblex vblex /a
1 kl/yva__vblex vblex /yva
1 karl__n n
1 /jag__prn prn /jag
1 ingenting__prn prn
1 ing/en__prn prn /en
1 ing/en__det det /en
1 hy__n n
1 hum/mer__n n /mer
1 h/on__prn prn /on
1 histori/a__n n /a
1 him/mel__n n /mel
1 het/a__vblex vblex /a
1 herr__n n
1 hen__prn prn
1 ha__vbhaver vbhaver
1 h/an__prn prn /an
1 gravid__adj adj
1 godtycke__n n
1 /god__adj adj /god
1 gläns/a__vblex vblex /a
1 gl/ädja__vblex vblex /ädja
1 get__n n
1 garage__n n
1 /gammal__adj adj /gammal
1 fr/ysa__vblex vblex /ysa
1 försvunn/en__adj adj /en
1 författar/en__prn prn /en
1 förf/ader__n n /ader
1 flop/p__n n /p
1 f/innas__vblex vblex /innas
1 fakt/um__n n /um
1 f/å__adv adv
1 er/__det det /
1 ens__det det
1 en__prn prn
1 e/n__num num /n
1 enk/el__adj adj /el
1 e/n__det det /n
1 end/a__prn prn /a
1 eg/en__prn prn /en
1 d/u__prn prn /u
1 dunk/el__adj adj /el
1 dr/icka__vblex vblex /icka
1 dr/aga__vblex vblex /aga
1 d/ö__vblex vblex
1 d/ölja__vblex vblex /ölja
1 di/n__det det /n
1 de/t__prn prn /t
1 dess/__det det /
1 deras__det det
1 d/e__prn prn /e
1 d/enna__prn prn /enna
1 de/nna__det det /nna
1 d/en__det det /en
1 delt/a__vblex vblex /a
1 contain/er__n n /er
1 coma
1 bull/er__n n /er
1 br/ista__vblex vblex /ista
1 b/öra__vblex vblex /öra
1 bokst/av__n n /av
1 b/ita__vblex vblex /ita
1 bil_cp__n n
1 b/e__vblex vblex /e
1 bekym/mer__n n /mer
1 begr/ava__vblex vblex /ava
1 avl/ida__vblex vblex /ida
1 Atlanten__np np
1 /äta__vblex vblex /äta
1 an/nan__prn prn /nan
1 all__prn prn
1 allmän__adj adj


August 1, 2013: Running lt-expand to generate a list of all Swedish words (surface forms) in the current sv.dix yields 62,836 words. Piping them through Aspell (as packaged in Ubuntu Linux) reports 12,922 as not in the Aspell dictionary. In most cases, these are correct words that ought to have been in Aspell. Words that actually seem to be wrong are:

  • bad spelling or foreign words: acceptable (English), andelsboligforening (Danish), återffinns (single f)
  • plural -s after -s: aidss, Anderss, Andrewss, arbetsplatss, Arnass (from Arnas?), arroganss, åss
  • bad noun forms: ändet, ändena, ankarlinen, ansikteen, ansikteer, ansvarigheter, ärendeen, ärendeet, aromar, atletar
    • Should be <e lm="ände"> <i>änd</i><par n="abborr/e__n"/></e>
    • SALDO has used <par n="kolleg/a"> which generates both -er and -or, but this is not applicable to lina, bila, dansbana, dosa, lucka, ... Maybe they should use <par n="frug/a__n"> instead.
  • bad adjective forms: anfåare,
    • Because <e lm="anfådd" a="is"> <i>anfå</i><par n="lat__adj"/></e> -- should be anfådd on both sides
  • bad verb forms: antande (antagande, participle of anta), åtande (åtagande, participle of åta), återupptande (återupptagande, participle of återuppta), avtande (avtagande)
    • Because of errors in <pardef n="ant/a__vblex">, which have been there since 2009

Questionable words are:

  • comparative form of some adjectives: adekvatast, administrativare, akademiskare, anatomiskare, ändlösare, animaliskare, ansvarigare, åtskilligast, avsevärdare, avsiktligare
  • genitive -s after sch: affischs,
  • plural of some nouns: ägandena, agerandena, åldrandena, anarkismerna, andor, användandena, äror, avocadoar, avsaknader, anskaffandena
  • participle form of some verbs: anbelangad, ansvarad
  • passive form of some verbs: angåtts, återstogs, avlidas, avvikas

(And that's only those starting with a, ä, å.)

Words ending in -tion appear with paradigm accent_n (81 words; -en -er -erna), mask__n (30; -en -er -erna), mat__n (7; -en, uncountable), ed__n (6; -en -er -erna), koppar__n (6; -en), and Världshälsoorganisation__np (2). It turns out that:

  • Paradigms ed__n, mask__n, and accent__n are exact duplicates.
  • Paradigms koppar__n and mat__n are exact duplicates.
  • The noun "koppar" should have -n (preferred) and -en (less common), which is different from "mat".
  • Paradigms bi__n and bo__n are exact duplicates except for the error that bo__n has -na marked as "ind" (should be "def").
Date Number of
paradigms
defined
2007-07-01 220
2008-01-01 222
2008-07-01 228
2009-01-01 228
2009-07-01 271
2010-01-01 274
2010-07-01 274
2011-01-01 274
2011-07-01 274
2012-01-01 274
2012-07-01 274
2013-01-01 300
2013-07-01 426

Paradigms introduced since 2012-07-01 (i.e. by PT, who uses a="is" for items copied from is-sv): Abbas__np, ack__ij, ackord_cp__n, advent__n, allmän__adj, Allmogeförbund__np, Amsterdamfördraget__np, anhörig__n, ans__n, Atlanten__np, Azorerna__np, back/e__n (duplicate of abborr/e__n; erroneously marked "nt"), bane__n, b/ära__vblex, bekym/mer__n, beväpna/d__adj, b/e__vblex, bil_cp__n, b/inda__vblex, b/ita__vblex, Bjarnas/on__np, blå__adj, bokträ__n, böter__n, bo__n (an erroneous duplicate of bi__n), br/oder__n, br/ista__vblex, bull/er__n, contain/er__n, cp-both\e\S__case, cp-both\Ø\'__case, cp-both\Ø_LR_s\'__case, cp-both\Ø_LR_s\S__case, cp-both\Ø\S__case, cp-both\s\'__case, cp-both\s\S__case, cp-L, cp-L\Ø\S__case, cp-R, data__n, deras__det, dess/__det, d/ölja__vblex, dollar__n, dr/aga__vblex, dr/a__vblex, dr/icka__vblex, dunk/el__adj (similar to dubb/el but has comparative), EMU__np, end/a__prn, enk/el__adj (erroneously with -er and -rare; otherwise similar to dunk/el__adj), ens__det, erkän/na__vblex, f/ar__n, fäst/a__vblex, f/innas__vblex, Finnbogadottir__np, flop/p__n, förf/ader__n, författar/en__prn, försäkr/an__n, försiktig/t__adv, försvunn/en__adj, ...

Paradigms changed: belys/a__vblex into bel/ysa__vblex (in order to add -ös, which is not applicable to upplysa), de__prn into d/e__prn, farbr/or__n into br/or__n

Some very common Swedish words currently missing in sv-da: bröder, detsamma, erhålles, fader, farao, huvud, höga, intet, mycken, ock, opp, sade, skull, stugan, sådana, sådant, taga, torde, varpå, vore, vuxit

The most common old verb forms (used before 1970): blevo, fingo, foro, gingo, kommo, lågo, stodo, sågo, voro, äro

Current coverage of some texts:

Date Herr Arnes
penningar
Gösta Berlings
saga
Nils Holgerssons
underbara resa
Röda rummet Hemsöborna Bibeln (1917)
Words 22,659 128,144 195,797 91,434 42,874 805,005
2013-08-01 86.8 % 86.6 % 88.2 % 86.0 % 84.2 % 84.6 %
2013-08-03 87.8 % 86.8 % 88.8 % 86.3 % 84.3 % 85.4 %
2013-08-04 87.9 % 86.9 % 88.9 % 86.3 % 84.4 % 85.7 %
@Äntligen stod præsten i prædikestolen.

Han kom som et *yrväder en *aprilafton og havde
et *höganäskrus i en *svångrem om halsen.

  1.  I *begynnelsen skabte Gud himmel og jord.
  2.  Og jorden var @öde og tom, og mørke var over dybet, og Guds
      @Ande svævede over vandet.

July 30, 2013: Length of translation dictionaries (see script and details below) in trunk over time, both as a graph and table.

Apertium-dictionary-growth.png

Date af-nl br-fr ca-it cy-en en-ca en-es en-gl eo-ca eo-en eo-es eo-fr es-an es-ca es-gl es-it es-pt es-ro eu-en eu-es fr-ca fr-es id-ms is-en mk-bg mk-en nn-nb oc-ca oc-es pt-ca pt-gl sh-mk sv-da hbs-slv kaz-tat sme-nob
2007-01-01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2007-02-01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2007-03-01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8087 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2007-04-01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8048 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2007-05-01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14934 0 0 0 8123 0 0 0 0 0 10502 0 0 0 0 617 0 0 0
2007-06-01 0 0 0 8991 23491 0 0 0 0 0 0 0 10145 0 0 6540 14935 0 0 7861 13990 0 0 0 0 0 10502 0 0 0 0 617 0 0 0
2007-07-01 0 0 0 8991 23494 0 0 0 0 0 0 0 10146 0 0 6540 15031 0 14 7863 14995 0 0 0 0 0 10779 0 0 0 8 617 0 0 0
2007-08-01 0 0 0 9015 23494 0 0 0 0 0 0 0 10178 0 0 6540 15934 0 21323 7870 14995 0 0 0 0 0 11740 0 0 0 8 617 0 0 0
2007-09-01 0 0 0 8609 23494 0 0 0 0 0 0 0 10178 0 0 6540 16971 0 19812 7870 14998 0 0 0 0 0 12052 0 0 0 8 617 0 0 0
2007-10-01 0 0 0 8609 23494 0 0 0 0 0 0 0 10178 0 0 6540 17876 0 19814 7870 14999 0 0 0 0 0 12322 0 0 0 8 617 0 0 0
2007-11-01 0 0 0 8687 23494 0 0 0 0 0 0 0 10178 0 0 6540 20123 0 19250 7870 17420 0 0 0 0 0 12523 11428 0 0 8 617 0 0 0
2007-12-01 0 0 0 8687 23493 0 0 0 0 0 0 0 14821 0 0 6763 23231 0 19261 7870 22478 0 0 0 0 0 12586 12241 0 0 130 617 0 0 0
2008-01-01 0 0 0 8689 23563 0 0 17932 0 13150 0 0 14821 0 0 6763 24465 0 19261 7870 22377 0 0 0 0 0 13142 12297 0 0 911 617 0 0 0
2008-02-01 0 0 0 9361 23857 0 0 17932 0 13150 0 0 14823 0 0 6763 24467 0 19261 8380 22453 0 0 0 0 0 13882 12297 0 0 963 617 0 0 0
2008-03-01 0 0 0 9361 23637 0 0 17932 0 13150 0 0 17188 0 0 6763 24467 0 19261 8380 22885 0 0 0 0 0 14185 16666 0 0 970 867 0 0 0
2008-04-01 0 0 0 9361 23650 0 0 17932 0 13150 0 0 17188 0 0 6763 24467 0 19261 8382 22885 0 0 0 0 36454 18413 16663 7993 0 978 880 0 0 0
2008-05-01 0 902 0 9361 23651 0 0 17932 0 13150 0 0 17098 0 0 6763 24467 0 19261 8382 22885 0 0 0 0 36454 19019 17352 7993 0 978 1597 0 0 0
2008-06-01 0 902 0 9361 23651 0 0 17932 0 13150 0 0 17098 0 0 6763 24470 0 19503 8382 23205 0 0 0 0 36454 21560 18723 8026 0 978 1597 0 0 0
2008-07-01 0 902 0 9806 23651 0 0 17932 0 13150 0 0 17538 0 0 6793 24470 0 19420 8382 24162 0 0 0 0 36454 21564 18727 8026 5908 976 1597 0 0 0
2008-08-01 0 902 0 11198 23661 0 0 17932 0 13150 0 0 17606 0 0 6793 24470 0 19603 8382 24617 0 0 0 0 36454 21599 18777 8026 5908 976 1597 0 0 0
2008-09-01 0 902 0 11208 23672 0 0 17932 9050 13150 0 0 17606 0 0 6793 24470 0 19632 8382 25276 0 0 0 0 36454 21599 18777 8026 5908 976 1597 0 0 0
2008-10-01 0 902 0 11209 23694 0 0 17932 9430 13150 0 0 18196 0 0 6793 24459 0 19606 8382 26631 0 0 0 0 36462 21599 18777 8026 5908 976 1597 0 0 0
2008-11-01 0 902 0 11216 24099 12795 0 17932 8821 13150 0 0 19584 0 0 6793 24478 0 19611 8382 26652 0 0 0 0 36462 21599 18777 8026 5908 976 1597 0 0 0
2008-12-01 0 10220 0 11310 24251 12922 28506 17932 19630 13150 0 0 21908 0 8894 6875 24478 0 19627 8382 26748 0 0 0 0 36462 21599 18777 8026 5908 976 1597 0 0 0
2009-01-01 0 13933 0 11538 24296 17779 28506 17932 19677 13150 0 0 21934 0 9082 6875 24478 0 19627 8382 26748 0 0 0 0 36462 21599 18777 8026 5908 976 1597 0 0 0
2009-02-01 0 15563 0 11550 24350 17856 28506 17932 34727 13150 0 0 21935 0 9078 6875 24478 0 19627 8382 26766 0 0 0 0 36462 21599 18777 8026 5908 976 1597 0 0 0
2009-03-01 0 16235 0 11564 24387 17895 28506 17974 25557 13150 0 0 21938 0 9260 6875 24478 0 19694 8382 26774 0 330 0 0 36520 21599 18777 8026 5908 976 1597 0 0 0
2009-04-01 0 17827 0 11570 24405 17894 28506 18381 32068 13453 0 0 21938 0 9274 6875 24478 0 19694 8382 26906 0 349 0 0 36522 21599 18777 8026 5908 976 1597 0 0 0
2009-05-01 0 17831 0 11674 24419 17930 29222 19237 32087 13577 0 0 21940 5128 9272 6875 24478 0 19715 8382 26906 0 349 0 0 36542 21599 18777 8026 5908 976 4059 0 0 0
2009-06-01 0 18329 8835 11623 24431 18531 29222 20274 32088 13849 8647 11 31804 5128 9580 6875 24478 8646 19717 8382 26908 0 349 0 0 37368 21599 18777 8026 5908 976 5921 0 0 0
2009-07-01 0 19983 8835 11625 24449 19616 29222 20670 32088 13849 8647 11 31856 5128 9931 6875 24478 13535 19717 8382 26908 0 1341 0 0 45042 21599 18777 8026 5908 976 6350 0 0 0
2009-08-01 0 20259 9427 11625 24508 20045 29486 20801 32088 14280 8647 11 31856 5128 11443 6875 24478 13535 19720 8382 26914 0 5331 0 0 47127 21599 18777 8026 5908 976 9243 0 0 0
2009-09-01 0 20311 9427 11625 24509 20708 29486 20801 32083 14280 8647 11 31897 5128 11558 6875 24478 13522 19720 8382 26914 0 6744 0 0 50257 21599 18777 8026 5908 976 6854 0 0 0
2009-10-01 0 20515 9498 11625 24509 21142 29486 20822 32032 15085 8647 11 31920 5128 12563 6875 24478 13522 19725 8382 26914 0 10102 0 0 50831 21599 18777 8026 5908 976 6854 0 0 0
2009-11-01 0 20921 9498 11625 24510 21504 29486 20837 31959 17763 8647 11 31929 5128 12563 6875 24478 13522 19725 8382 26917 0 10141 0 0 51430 21599 18777 8026 5908 976 6855 0 0 0
2009-12-01 0 21349 9499 11625 24505 21624 29486 21367 31956 18159 38407 11 32056 5128 12563 6940 24478 13522 19734 10447 26917 0 10234 0 0 51452 21599 18777 8026 5908 976 6702 0 0 0
2010-01-01 0 21626 9499 11625 34108 21960 29486 21472 32332 18162 38407 11 32073 5128 12563 6940 24478 13522 19734 10447 26917 0 11500 0 0 51596 21599 18777 8026 5908 976 6701 0 0 0
2010-02-01 0 21937 9499 11625 34747 21976 29486 21472 32387 18162 38407 11 32073 5128 12564 6941 24478 13522 19734 10447 26918 0 11682 0 0 51803 21599 18777 8026 5908 976 6691 0 0 2775
2010-03-01 0 22203 9499 11625 34747 22414 29486 21472 32387 18162 38407 11 32092 5128 12667 6941 24478 13522 19734 10447 26918 0 26340 0 0 51820 21599 18777 8026 5908 976 6691 0 0 2816
2010-04-01 0 22906 9499 11625 34927 22425 29486 21481 32418 18162 34121 11 32335 5128 12667 6941 24478 13522 19734 10447 26918 0 26461 0 0 51826 21599 18777 8026 5908 976 6692 0 0 2945
2010-05-01 0 23461 9499 11625 34927 22441 29486 21481 32418 18162 34205 11 32351 5128 12667 7056 24478 13522 19734 10447 26917 0 26464 767 0 51867 21599 18777 8026 5908 976 6389 0 0 20469
2010-06-01 0 23549 9499 11625 34942 22450 29486 21588 32418 18201 34719 11 32358 5128 12667 7056 24478 13522 19734 10447 26917 0 26464 853 0 51876 21599 18777 8026 5908 976 6389 0 0 22281
2010-07-01 0 23521 9499 11625 35096 22969 29486 21610 32418 18203 34786 11 32385 5128 12670 7056 24478 13522 19734 10447 26917 0 26281 4065 0 51894 21599 18777 8026 5908 976 6389 0 0 22522
2010-08-01 0 23522 9499 11625 35433 23321 29486 21610 32419 18203 34787 11 32386 5128 12671 7056 24478 13522 19734 10447 26917 0 26281 3997 0 67513 21600 18777 8026 5908 976 6389 0 0 64756
2010-09-01 0 24217 9499 11625 35432 23322 29486 21610 32419 18203 34787 11 32386 5128 12671 7062 24478 13522 19734 10447 26917 0 26281 5480 0 67517 21600 18777 8026 5908 976 6389 0 0 66109
2010-10-01 0 24412 9499 11625 35537 23326 29486 21610 32419 18203 34835 3269 32391 5128 12676 8192 24478 13522 19734 10447 26917 0 26281 8203 10249 67522 21600 18777 8026 5908 976 6389 0 0 66351
2010-11-01 0 24961 9654 11625 35537 23343 29486 21636 32421 18210 34818 3390 33092 5128 12678 8192 24478 13522 19734 10447 26917 0 26281 8811 10265 67606 21600 18777 8026 5908 976 6389 0 0 66657
2010-12-01 3731 25141 9716 11625 35546 25941 29486 22005 32421 18481 34944 3330 33090 5128 12682 8192 24478 13522 19734 10447 26926 0 26281 8811 10265 67617 21600 18777 8026 5908 976 6389 0 0 66679
2011-01-01 6094 25638 9716 11625 35546 26400 29486 22040 32421 18542 37751 4642 33419 5128 12858 9217 24527 13522 19734 10447 27163 0 26281 8812 10268 67620 21600 18777 8026 5908 976 6389 0 0 66648
2011-02-01 6114 26409 9716 11625 35546 26407 29486 22065 32421 18562 42415 10252 33426 5128 12858 9217 24529 13522 19734 10447 27164 0 26281 8812 10268 67620 21600 18777 8026 5908 976 6389 0 0 66657
2011-03-01 6114 26450 9716 11625 35553 26408 29486 22065 32421 18562 43094 13708 33429 5128 12859 9356 24529 13526 19734 10447 27164 0 26281 8812 10268 67653 21600 18777 8026 5908 976 6389 0 0 66660
2011-04-01 6113 26499 9716 11625 35553 27543 29486 22065 32421 18562 43094 13917 33429 5128 12859 9357 24529 13526 19333 10448 27164 0 26281 8812 10268 67653 21600 18777 8026 5908 976 6390 0 0 66662
2011-05-01 6128 26533 9716 11625 35553 28092 29486 22066 32422 18562 43094 13922 33429 5128 12862 9621 24529 13526 19334 10448 27164 0 26281 8812 10268 67653 21600 18777 8026 5908 974 6390 0 0 66662
2011-06-01 6128 26547 9716 11625 35553 28246 29486 22066 32422 18562 43094 13922 33429 5128 12862 9621 24529 13541 19334 10451 27164 0 26281 8812 10268 67653 21600 18777 8026 5908 1029 6392 0 0 66662
2011-07-01 6128 26591 9716 11625 35563 28292 29486 22066 32422 18562 43094 13922 33429 5128 12862 9722 24529 13567 19334 10451 27164 0 26288 8812 10268 67653 21600 18777 8026 5908 4970 6392 0 0 66662
2011-08-01 6128 26604 9716 11625 35564 28309 29486 22102 32422 18658 43094 15236 33440 5128 12870 9722 24529 13567 19334 10451 27270 0 26294 8812 10268 68159 21600 18777 8026 5908 9256 6392 0 0 66662
2011-09-01 6128 26653 9716 11625 35563 28313 29486 43201 32422 45074 43109 15236 33443 5128 12870 9904 24529 13567 19334 10451 27271 0 26321 8812 10268 68159 21600 18777 8026 5908 13043 6392 0 0 66662
2011-10-01 6128 26667 9834 11625 35563 28349 29486 43202 32422 48403 43109 15236 33454 5128 12870 9904 24529 13635 19334 10451 27274 0 26325 8812 10268 68160 21600 18777 8026 5908 13044 6392 0 0 66662
2011-11-01 6137 26767 9834 11625 35589 30269 29486 43202 32422 48403 43109 15294 33458 9426 12870 9918 24531 14371 19334 10451 27274 0 26325 8812 10268 68160 21600 18777 8026 5908 13044 6392 0 0 66731
2011-12-01 6137 27127 9834 11625 35589 30314 29486 43202 32422 48403 43109 15294 33458 9426 12870 9963 24531 14378 19334 10451 27274 0 26325 8812 10276 68160 21600 18777 8026 5914 13044 6392 0 0 66707
2012-01-01 6137 27162 9834 11625 35589 30408 22235 43202 32422 48403 43109 15543 33459 9426 12870 9963 24531 14378 19334 10451 27274 0 26325 8812 10276 68160 21600 18777 8026 6155 13044 6392 0 0 66708
2012-02-01 6259 27170 9834 11625 35589 30653 22235 43201 32422 48403 43109 16998 33459 9426 12870 9965 24531 14378 19334 10451 27274 0 26325 8812 10276 68160 21600 18777 8026 6155 13046 6392 0 0 66708
2012-03-01 6289 27225 9834 11625 35589 30653 22235 43201 32422 48403 43109 17039 33459 9426 12870 9966 24531 14378 19334 10451 27274 0 26325 8812 10276 68160 21600 18777 8026 6155 13047 6392 0 0 66721
2012-04-01 6289 27242 9834 11625 35589 30651 22235 43201 32422 48403 43109 17039 33463 9426 12870 9968 24531 14383 19334 10451 27274 0 26325 8812 10276 68160 21600 18777 8026 6155 13053 6392 0 0 65716
2012-05-01 6289 27244 9834 11625 35589 30651 22235 43201 32422 48403 43109 17497 33464 9426 12870 10103 24531 14384 19334 10451 27274 0 26325 8812 10276 68160 21600 18777 8026 6155 13053 6392 213 777 65834
2012-06-01 6289 27269 9834 11625 35767 30651 22235 43201 32422 48403 43110 17497 33464 9426 12870 10103 24531 14392 19334 10451 27256 0 26325 8812 10276 68160 21600 18777 8026 6155 13053 6392 413 937 65825
2012-07-01 6289 27274 9835 11625 35767 30651 22235 43201 32422 48403 43110 17618 33464 9426 12870 10106 24531 14392 19334 10451 26984 0 26325 8812 32487 68191 21600 18777 8026 6155 13053 6392 8266 1640 65826
2012-08-01 6289 27361 9835 11625 35767 30653 22235 43201 32422 48403 43155 17625 33475 9426 12870 10106 24531 14392 19334 10451 26984 0 26325 8812 33407 68226 21600 18777 8026 6155 13053 6392 15548 4091 65826
2012-09-01 6289 27361 9835 11625 35767 30653 22235 43254 32422 48403 43155 17625 33475 9426 12870 10106 24531 14392 19334 10451 26984 12142 26325 8812 33402 68239 21600 18777 8026 6155 13053 6392 24971 7377 65826
2012-10-01 6289 27361 9835 11625 35767 30654 22235 43254 32422 48403 43155 17625 33554 9426 12870 10224 24531 14392 19334 10451 26984 12142 26325 8812 33406 68239 21600 18777 8026 6155 13053 6392 25011 7454 65836
2012-11-01 6289 27369 9835 11625 35767 30654 22235 43254 32422 48403 43155 17625 33567 9426 12870 10359 24531 14392 19334 10451 26984 12142 26325 8812 33406 69057 21600 18777 8026 6155 13055 6431 23827 7454 65836
2012-12-01 6289 27369 9835 11625 35767 30654 22235 43254 32422 48403 43155 17625 33567 9426 12870 10359 24531 14392 19334 10451 26984 12142 26325 8812 33407 69057 21600 18777 8026 6155 13055 6483 23827 7454 65836
2013-01-01 6289 27592 9835 11625 35793 30654 22235 43254 32422 48403 43155 17675 33574 9426 12870 10359 24531 14392 19334 10451 26984 12142 26325 8812 33407 69059 21600 18777 8026 6155 13055 6516 23827 7611 65836
2013-02-01 6289 27607 9835 11625 35793 30654 22235 43254 32422 48403 43155 17675 33574 9426 12870 10359 24531 14392 19334 10451 26983 12142 26325 8812 33407 69059 21600 18777 8026 6155 13055 7081 23827 7926 65836
2013-03-01 6289 27888 9835 11625 35793 30654 22235 43254 32422 48403 43155 17675 33574 9426 12870 10359 24531 14392 19334 10451 26983 12142 26325 8812 33407 69059 21600 18777 8026 6155 13055 7657 23827 7979 65836
2013-04-01 6289 27922 9835 11625 35793 30654 22235 43254 32422 48403 43155 17675 33574 9426 12870 10359 24531 14392 19334 10451 26983 12142 26325 8812 33407 69059 21600 18777 8026 6155 13055 7829 24899 8447 65836
2013-05-01 6289 27932 9835 11625 35793 30749 22235 43254 32422 48403 43155 17675 33574 9426 12870 10359 24531 14392 19334 10451 26983 12142 26325 8812 33407 69059 21600 18777 8026 6155 13055 8176 25064 9233 65837
2013-06-01 6289 27952 9835 11625 35793 30280 22235 43254 32422 48403 43155 17675 33593 9426 12870 10359 24531 14392 19334 10451 26983 12142 26325 8812 33407 69059 21600 18777 8026 6155 13055 8176 25064 9233 65837
2013-07-01 6289 27982 9835 11625 35793 30280 22235 43254 32422 48403 43155 17675 33594 9426 12870 10386 24531 14392 19334 10451 26983 12142 26325 8812 33407 69059 21600 18777 8026 6155 13055 8223 25064 9190 65837

The table above was produced with this shell script:

 cd trunk
 for y in 07 08 09 10 11 12 13
   do
   for m in 01 02 03 04 05 06 07 08 09 10 11 12
     do
     echo '|-'
     echo '|' 20$y-$m-01
     for lang in apertium-??-?? apertium-???-???
       do
       pair=`echo $lang|sed s/apertium-//`
       echo -n '| '
       svn cat -r "{20$y-$m-01}" $lang/$lang.$pair.dix 2> /dev/null | grep '<l>' | wc -l
       done
     done
   done

That means, it only counts the number of lines containing '<l>' in files such as trunk/apertium-sv-da/apertium-sv-da.sv-da.dix which is not an accurate measure, but a rough estimate, of the length of the dictionary.

Longest translation dictionaries:

section pair translations
incubator de-en 168789
trunk nn-nb 66522
trunk sme-nob 63012
trunk eo-es 47273
nursery dan-nor 43687
trunk eo-fr 42872
trunk eo-ca 40407
nursery sme-smj 39385
trunk es-ast 34221
trunk en-ca 33370
trunk mk-en 33053
trunk eo-en 32053
incubator eo-hu 31423
trunk es-ca 31040
nursery ru-eo 30495
incubator eo-de 28540
trunk en-es 28303
incubator en-hi 27829
incubator sl-mk 24828
trunk is-en 23989
nursery no-en 23614
incubator ga-gv 22571
trunk fr-es 22188
incubator eo-pl 19330
incubator pl-eo 19311
trunk eu-es 18608
trunk br-fr 18343
incubator eo-nl 17954
staging fra-por 17622
trunk es-ro 17552
incubator en-it 17304
trunk en-gl 16703
trunk hbs-slv 16504
incubator sh-sl 16497
trunk es-an 16152
incubator mr-hi 15628
trunk eu-eu_bis 15262