User:Francis Tyers/Sandbox

From Apertium
Jump to navigation Jump to search

Lexical selection

Information

  • Surface form -- tud etc.
  • Lemma -- den etc.
  • Category -- n.f etc.
  • Syntax -- @SUBJ etc.

Ideas

For some things linguistic knowledge is better, or easier. It is also better for hacking. For other things, statistics are better. Wider coverage for cheaper. The lexical selection module(s) should allow both the use of rules and of statistics. Rules for things we "know", statistics for those we don't.

Inferring rules from collocations

Rules as described below are already used in apertium-cy-en, apertium-br-fr and apertium-sme-smj. This stage would be the first pass of lexical selection.

  • The bilingual dictionary has several translations for each ambiguous word.
  • Rules are created to select between them based on context.
  • For each word in the bilingual dictionary, collocations (n-grams) are extracted from a source language corpus.

+ in, skyldi ég þá á munúð hyggja, þar sem  bóndi minn er einnig gamall?``
+ ,Drottinn hefir séð raunir mínar. Nú mun  bóndi minn elska mig.``
  þunguð og ól son. Þá sagði hún: ,,Nú mun  bóndi minn loks hænast að mér, því að é
   ,,Guð hefir gefið mér góða gjöf. Nú mun  bóndi minn búa við mig, því að ég hefi 
  af, þá haldi hann bótum uppi, slíkum sem  bóndi konunnar kveður á hann, og greiði
  l niður fyrir húsdyrum mannsins, þar sem  bóndi hennar var inni, og lá þar, uns b
-                                27  En er  bóndi hennar reis um morguninn og lauk 
+ kubúinn hafi soltið til þess að franskur  bóndi þurfi ekki
  • For each ambiguous word, these collocations are run with each of the entries in the bilingual dictionary through the translator.
not the translator but just the bilingual dictionary? --Mlforcada 10:30, 10 October 2009 (UTC)
how wide is the window around the problem word? is it symmetrical? --Mlforcada 10:30, 10 October 2009 (UTC)


 
as  my farmer is
Now #remember  my farmer love
Now #remember  my farmer the lid
Now #remember  my farmer live
*slíkum as  the woman's farmer composes
as her farmer  was
But is her farmer  rose
to French  farmer need not
as  my husband is
Now #remember  my husband love
Now #remember  my husband the lid
Now #remember  my husband live
*slíkum as  the woman's husband composes
as her husband  was
But is her husband  rose
to French  husband need not
  • Translations are scored on a target language corpus. -- The target language model training corpora would need to be preprocessed in some cases, to, for example give the word in POS or syntactic context. n _farmer_ prn.pos, n _husband_ prn.pos etc. The number of target words would be limited to the number of correspondences in the bilingual dictionary.
What do you mean by the number of target words? --Mlforcada 10:30, 10 October 2009 (UTC)
Wouldn't it be similar to do this as in Sánchez-Martínez et al. (2008), that is, run all "disambiguations" through the dictionary and score the translations themselves? --Mlforcada 10:30, 10 October 2009 (UTC)
Vector Element0 : -6.13119,as  my farmer
Vector Element1 : -1.5997,as  my husband

Vector Element0 : -5.93468,Now remember my farmer
Vector Element1 : -3.19992,Now remember my husband

Vector Element0 : -6.13119,slíkum as my farmer
Vector Element1 : -1.5997,slíkum as my husband

Vector Element0 : -5.55918,as her farmer
Vector Element1 : -2.81087,as her husband

Vector Element0 : -5.58205,But is her farmer
Vector Element1 : -2.83373,But is her husband

Vector Element0 : -4.54752,to French farmer
Vector Element1 : -5.27222,to French husband
  • Where the difference in score between one translation and another reaches a threshold, a rule is created in the form of:
    • MAP (husband) ("bóndi") IF (1 ("minn"));
  • Morphology or syntax could also be included.
    • MAP (husband) ("bóndi") IF (1 PrnPos);
    • MAP (husband) ("bóndi") IF (-1 Genitive);
  • It would be interesting to see if rules can be learnt which use different discriminators (e.g. surface form, syntax) etc.
To select the winner, one could use a maximum-entropy approach in which the absence or presence of particular trigger words in the context would be treated as a feature. Then the winner would be chosen maximizing the probability. There is the work by Márquez et al. and also Armando Suárez's DLSI thesis. However, these fall quite far from being applicable in Apertium, so some engineering would be needed. --10:30, 10 October 2009 (UTC)
Another interesting question is: instead of rules, could you detect (in some cases) clear multiwords that would go directly into dictionaries? --Mlforcada 10:30, 10 October 2009 (UTC)
Advantages
  • Fairly straightforward -- the rules can be created automatically in constraint grammar.
  • Human readable / editable.
  • Doesn't require parallel corpus -- although might work better with one.
  • Unsupervised.
Disadvantages
  • Many rules will be slow.
That is why probably it is a good idea to move as much inferred stuff as possible to the dictionary --Mlforcada 10:30, 10 October 2009 (UTC)
  • Might not work very well.
Relevant prior work
  • Jin Yang (1999) "Towards the Automatic Acquisition of Lexical Selection Rules"
  • Eckhard Bick (2005) "Dan2eng: Wide-Coverage Danish-English Machine Translation"
Examples

Pediñ can translate as 'prier' or 'inviter'. If it is used transitively it means "inviter", intransitively it means "prier"

  • o huñvreal muioc'h eget o pediñ .
    • Leur *huñvreal plus que en train de prier .
  • Koulskoude e tiviz Francis pediñ e zaou vreur d'ober ...
    • Pourtant il décide Francis prier ses deux frères à faire ...
  • O fal a zo pediñ arzourien a bep seurt evel kizellerien
    • Leur objectif il est inviter des artistes de toute sorte comme les sculpteurs
  • ... bleunioù ha peadra da yac'haat o zreid hag o pediñ evito ...
    • ... de fleurs et des moyens à guérir leurs pieds et en train de prier pour eux ...
  • ha tu a oa bet d'al labourerien pediñ o familhoù hag o mignoned
    • ... et il y avait moyen été aux travailleurs prier leurs familles et leurs amis ...
  • Raktresoù all a zo ivez : pediñ skrivagnerien a-benn eskemm ganto
    • ... de Projets autres il est aussi : inviter des écrivains pour échanger avec eux ...
  • Sharon Stone eo bet an hini gwellañ evit pediñ an embregerien da zisammañ
    • *Sharon *Stone il a été les ceux le plus mieux pour prier les entrepreneurs à décharger ...

The current rule says: SUBSTITUTE (vblex) (vblex tv) ("pediñ" vblex) (1C NC);, that is "choose 'inviter' if the next word can only be a common noun". Obviously, this fails in the case of definite NPs, o familhoù 'their families'.


To read
  • Hinrich Schütze "Automatic Word Sense Discrimination"
  • Hang Li and Cong Li "Word Translation Disambigation Useing Bilingual Bootstrapping"
  • E. Crestan "Which length for a Multi-level view of content for WSD"
  • Noah Coccaro "Towards better integration of semantic prediction in statistical language modelling"
  • Vickrey David "Word-sense disambiguation for machine translation"
  • Lucia Specia "Multilingual versus monolingual WSD"
  • Lucia Specia "Mining rules for WSD"
  • SUPERTAGS.

Pipeline

You need, a tagged source language corpus:

^L'/El<det><def><mf><sg>$ ^origen/origen<n><m><sg>$ ^de/de<pr>$ ^l'/el<det><def><mf><sg>$ 
^àbac/àbac<n><m><sg>$ ^està/estar<vblex><pri><p3><sg>$ ^literalment/literalment<adv>$ 
^perdut/perdre<vblex><pp><m><sg>$ ^en/en<pr>$ ^el/el<det><def><m><sg>$ ^temps/temps<n><m><sp>$

A list of ambiguities extracted from your bilingual dictionary,

time<n>:<:temps<n><:0>
weather<n>:<:temps<n><:1>
languge<n>:<:llengua<n><:0>
tongue<n>:<:llengua<n><:1>
history<n>:<:història<n><:0>
story<n>:<:història<n><:1>
station<n>:<:estació<n><:0>
season<n>:<:estació<n><:1>

Only the first tag is taken into account.

The script generate_sl_ambig_corpus.py generates the possible paths in the test corpus, by replacing the tag with the tag and the translation equivalent number and numbers the sentences for later recombination.

[1:0	].[] ^L'/El<det><def><mf><sg>$ ^origen/origen<n><m><sg>$ ^de/de<pr>$ 
^l'/el<det><def><mf><sg>$ ^àbac/àbac<n><m><sg>$ ^està/estar<vblex><pri><p3><sg>$ 
^literalment/literalment<adv>$ ^perdut/perdre<vblex><pp><m><sg>$ ^en/en<pr>$ 
^el/el<det><def><m><sg>$ ^temps/temps<n><:1><m><sp>$
[1:1	].[] ^L'/El<det><def><mf><sg>$ ^origen/origen<n><m><sg>$ ^de/de<pr>$ 
^l'/el<det><def><mf><sg>$ ^àbac/àbac<n><m><sg>$ ^està/estar<vblex><pri><p3><sg>$ 
^literalment/literalment<adv>$ ^perdut/perdre<vblex><pp><m><sg>$ ^en/en<pr>$ 
^el/el<det><def><m><sg>$ ^temps/temps<n><:0><m><sp>$
[2:0 ||	].[] ^El/El<det><def><m><sg>$ ^territori/territori<n><m><sg>$ 
^era/ser<vbser><past><p3><sg>$ ^habitat/habitar<vblex><pp><m><sg>$   ^des de/des de<pr>$ 
^temps/temps<n><:1><m><sp>$ ^per/per<pr>$ ^tribus/tribu<n><f><pl>$ ^la/el<det><def><f><sg>$ 
^llengua/llengua<n><:0><f><sg>$   ^dels quals/de<pr>+el qual<rel><an><m><pl>$ ^no/no<adv>$ 
^entenien/entendre<vblex><pii><p3><pl>$
[2:1 ||	].[] ^El/El<det><def><m><sg>$ ^territori/territori<n><m><sg>$ 
^era/ser<vbser><past><p3><sg>$ ^habitat/habitar<vblex><pp><m><sg>$   ^des de/des de<pr>$ 
^temps/temps<n><:1><m><sp>$ ^per/per<pr>$ ^tribus/tribu<n><f><pl>$ ^la/el<det><def><f><sg>$ 
^llengua/llengua<n><:1><f><sg>$   ^dels quals/de<pr>+el qual<rel><an><m><pl>$ ^no/no<adv>$ 
^entenien/entendre<vblex><pii><p3><pl>$
[2:2 ||	].[] ^El/El<det><def><m><sg>$ ^territori/territori<n><m><sg>$ 
^era/ser<vbser><past><p3><sg>$ ^habitat/habitar<vblex><pp><m><sg>$   ^des de/des de<pr>$ 
^temps/temps<n><:0><m><sp>$ ^per/per<pr>$ ^tribus/tribu<n><f><pl>$ ^la/el<det><def><f><sg>$ 
^llengua/llengua<n><:1><f><sg>$   ^dels quals/de<pr>+el qual<rel><an><m><pl>$ ^no/no<adv>$ 
^entenien/entendre<vblex><pii><p3><pl>$
[2:3 ||	].[] ^El/El<det><def><m><sg>$ ^territori/territori<n><m><sg>$ 
^era/ser<vbser><past><p3><sg>$ ^habitat/habitar<vblex><pp><m><sg>$   ^des de/des de<pr>$ 
^temps/temps<n><:0><m><sp>$ ^per/per<pr>$ ^tribus/tribu<n><f><pl>$ ^la/el<det><def><f><sg>$ 
^llengua/llengua<n><:0><f><sg>$   ^dels quals/de<pr>+el qual<rel><an><m><pl>$ ^no/no<adv>$ 
^entenien/entendre<vblex><pii><p3><pl>$

These are then translated with the rest of the Apertium pipeline:

[1:0 ||	].[] The origin of the abacus is literally lost in the weather
[1:1 ||	].[] The origin of the abacus is literally lost in the time
[2:0 ||	].[] The territory was inhabited since weather for tribes 
  the language of which did not understand
[2:1 ||	].[] The territory was inhabited since weather for tribes 
  the tongue of which did not understand
[2:2 ||	].[] The territory was inhabited since time for tribes 
  the tongue of which did not understand
[2:3 ||	].[] The territory was inhabited since time for tribes 
  the language of which did not understand

All of the translations are passed through to the irstlm-ranker which assigns each whole sentence a probability:

-4.44739	||	[1:0 ||	].[] The origin of the abacus is literally lost in the weather
-2.98177	||	[1:1 ||	].[] The origin of the abacus is literally lost in the time
-5.05685	||	[2:0 ||	].[] The territory was inhabited   since weather for tribes 
  the language   of which did not understand
-6.05685	||	[2:1 ||	].[] The territory was inhabited   since weather for tribes 
  the tongue   of which did not understand
-3.05685	||	[2:2 ||	].[] The territory was inhabited   since time for tribes 
  the tongue   of which did not understand
-2.05685	||	[2:3 ||	].[] The territory was inhabited   since time for tribes 
  the language   of which did not understand
-2.80612	||	[3:0 ||	].[] When to the futile following year go back the good weather
-3.09621	||	[3:1 ||	].[] When to the futile following year go back the good time

From here, we extract the entries with a large probability difference with extract_candidate_phrases.py.

$ cat ca.ranked.txt | python extract_candidate_phrases.py 
-4.44739	||	[1:0 ||	].[] The origin of the abacus is literally lost in the weather
-2.98177	||	[1:1 ||	].[] The origin of the abacus is literally lost in the time
-5.05685	||	[2:0 ||	].[] The territory was inhabited   since weather for tribes 
-6.05685	||	[2:1 ||	].[] The territory was inhabited   since weather for tribes 
-3.05685	||	[2:2 ||	].[] The territory was inhabited   since time for tribes 
-2.05685	||	[2:3 ||	].[] The territory was inhabited   since time for tribes 

Then we generate rules using generate_candidate_rules.py

SUBSTITUTE:r1  (n :0) (n :1) ("estació"ri) (0 ("estació"ri))  ; # c:  12
SUBSTITUTE:r2  (n :0) (n :1) ("estació"ri) (0 ("<estació>"ri))  ; # c:  10
SUBSTITUTE:r3  (n :0) (n :1) ("temps"ri) (0 ("temps"ri))  ; # c:  9
SUBSTITUTE:r4  (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri))  ; # c:  9
SUBSTITUTE:r5  (n :0) (n :1) ("estació"ri) (-1 ("el"ri)) (0 ("estació"ri))  ; # c:  5
SUBSTITUTE:r6  (n :0) (n :1) ("estació"ri) (-1 ("<l'>"ri)) (0 ("<estació>"ri))  ; # c:  5
SUBSTITUTE:r7  (n :0) (n :1) ("temps"ri) (-1 ("mal"ri)) (0 ("temps"ri))  ; # c:  4
SUBSTITUTE:r8  (n :0) (n :1) ("temps"ri) (-1 ("<mal>"ri)) (0 ("<temps>"ri))  ; # c:  4
SUBSTITUTE:r9  (n :0) (n :1) ("estació"ri) (0 ("estació"ri)) (1 ("sec"ri))  ; # c:  4
SUBSTITUTE:r10  (n :0) (n :1) ("estació"ri) (0 ("<estació>"ri)) (1 ("<seca>"ri))  ; # c:  4
SUBSTITUTE:r11  (n :0) (n :1) ("estació"ri) (0 ("estació"ri)) (1 ("humit"ri))  ; # c:  3
SUBSTITUTE:r12  (n :0) (n :1) ("estació"ri) (0 ("estació"ri)) (1 ("de"ri))  ; # c:  3
SUBSTITUTE:r13  (n :0) (n :1) ("estació"ri) (0 ("<estació>"ri)) (1 ("<humida>"ri))  ; # c:  3
SUBSTITUTE:r14  (n :0) (n :1) ("temps"ri) (-1 ("un"ri)) (0 ("temps"ri))  ; # c:  2
SUBSTITUTE:r15  (n :0) (n :1) ("temps"ri) (-1 ("<un>"ri)) (0 ("<temps>"ri))  ; # c:  2
SUBSTITUTE:r16  (n :0) (n :1) ("llengua"ri) (0 ("llengua"ri))  ; # c:  2
SUBSTITUTE:r17  (n :0) (n :1) ("llengua"ri) (0 ("<llengua>"ri))  ; # c:  2
SUBSTITUTE:r18  (n :0) (n :1) ("estació"ri) (0 ("estació"ri)) (1 ("humit"ri)) (2 ("i"ri))  ; # c:  2
SUBSTITUTE:r19  (n :0) (n :1) ("estació"ri) (0 ("estació"ri)) (1 ("de"ri)) (2 ("treball"ri))  ; # c:  2
SUBSTITUTE:r20  (n :0) (n :1) ("estació"ri) (0 ("<estació>"ri)) (1 ("<humida>"ri)) (2 ("<i>"ri))  ; # c:  2
SUBSTITUTE:r21  (n :0) (n :1) ("estació"ri) (0 ("<estació>"ri)) (1 ("<de>"ri))  ; # c:  2
SUBSTITUTE:r22  (n :0) (n :1) ("estació"ri) (0 ("<estacions>"ri))  ; # c:  2
SUBSTITUTE:r23  (n :0) (n :1) ("estació"ri) (-1 ("un"ri)) (0 ("estació"ri))  ; # c:  2
SUBSTITUTE:r24  (n :0) (n :1) ("estació"ri) (-1 ("el"ri)) (0 ("estació"ri)) (1 ("sec"ri))  ; # c:  2
SUBSTITUTE:r25  (n :0) (n :1) ("estació"ri) (-1 ("el"ri)) (0 ("estació"ri)) (1 ("humit"ri))  ; # c:  2
SUBSTITUTE:r26  (n :0) (n :1) ("estació"ri) (-1 ("<una>"ri)) (0 ("<estació>"ri))  ; # c:  2
SUBSTITUTE:r27  (n :0) (n :1) ("estació"ri) (-1 ("<l'>"ri)) (0 ("<estació>"ri)) (1 ("<seca>"ri))  ; # c:  2
SUBSTITUTE:r28  (n :0) (n :1) ("estació"ri) (-1 ("<l'>"ri)) (0 ("<estació>"ri)) (1 ("<humida>"ri))  ; # c:  2
SUBSTITUTE:r29  (n :0) (n :0) ("llengua"ri) (0 ("llengua"ri))  ; # c:  2
SUBSTITUTE:r30  (n :0) (n :0) ("llengua"ri) (0 ("<llengua>"ri))  ; # c:  2
SUBSTITUTE:r31  (n :0) (n :2) ("ràbia"ri) (0 ("ràbia"ri)) (1 ("i"ri)) (2 ("fúria"ri))  ; # c:  1
SUBSTITUTE:r32  (n :0) (n :2) ("ràbia"ri) (0 ("ràbia"ri)) (1 ("i"ri))  ; # c:  1
SUBSTITUTE:r33  (n :0) (n :2) ("ràbia"ri) (0 ("ràbia"ri))  ; # c:  1
SUBSTITUTE:r34  (n :0) (n :2) ("ràbia"ri) (0 ("<ràbia>"ri)) (1 ("<i>"ri)) (2 ("<fúria>"ri))  ; # c:  1
SUBSTITUTE:r35  (n :0) (n :2) ("ràbia"ri) (0 ("<ràbia>"ri)) (1 ("<i>"ri))  ; # c:  1
SUBSTITUTE:r36  (n :0) (n :2) ("ràbia"ri) (0 ("<ràbia>"ri))  ; # c:  1
SUBSTITUTE:r37  (n :0) (n :2) ("ràbia"ri) (-1 ("de"ri)) (0 ("ràbia"ri)) (1 ("i"ri))  ; # c:  1
SUBSTITUTE:r38  (n :0) (n :2) ("ràbia"ri) (-1 ("de"ri)) (0 ("ràbia"ri))  ; # c:  1
SUBSTITUTE:r39  (n :0) (n :2) ("ràbia"ri) (-1 ("<de>"ri)) (0 ("<ràbia>"ri)) (1 ("<i>"ri))  ; # c:  1
SUBSTITUTE:r40  (n :0) (n :2) ("ràbia"ri) (-1 ("<de>"ri)) (0 ("<ràbia>"ri))  ; # c:  1
SUBSTITUTE:r41  (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("sever"ri)) (2 ("per"ri))  ; # c:  1
SUBSTITUTE:r42  (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("sever"ri))  ; # c:  1
SUBSTITUTE:r43  (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("obligar"ri)) (2 ("el"ri))  ; # c:  1
SUBSTITUTE:r44  (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("obligar"ri))  ; # c:  1
SUBSTITUTE:r45  (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("i"ri)) (2 ("el"ri))  ; # c:  1
SUBSTITUTE:r46  (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("i"ri))  ; # c:  1
SUBSTITUTE:r47  (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("ho"ri))  ; # c:  1
SUBSTITUTE:r48  (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("fred"ri)) (2 ("emmagatzemar"ri))  ; # c:  1
SUBSTITUTE:r49  (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("fred"ri))  ; # c:  1
SUBSTITUTE:r50  (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("es"ri)) (2 ("anar"ri))  ; # c:  1
SUBSTITUTE:r51  (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("es"ri))  ; # c:  1
SUBSTITUTE:r52  (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("canviar"ri)) (2 ("moure"ri))  ; # c:  1
SUBSTITUTE:r53  (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("canviar"ri))  ; # c:  1
SUBSTITUTE:r54  (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("calent"ri)) (2 ("i"ri))  ; # c:  1
SUBSTITUTE:r55  (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("calent"ri))  ; # c:  1
SUBSTITUTE:r56  (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("atrapar"ri)) (2 ("a"ri))  ; # c:  1
SUBSTITUTE:r57  (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("atrapar"ri))  ; # c:  1
SUBSTITUTE:r58  (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<sever>"ri)) (2 ("<per>"ri))  ; # c:  1
SUBSTITUTE:r59  (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<sever>"ri))  ; # c:  1
SUBSTITUTE:r60  (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<obligà>"ri)) (2 ("<la>"ri))  ; # c:  1
SUBSTITUTE:r61  (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<obligà>"ri))  ; # c:  1
SUBSTITUTE:r62  (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<i>"ri)) (2 ("<l'>"ri))  ; # c:  1
SUBSTITUTE:r63  (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<i>"ri))  ; # c:  1
SUBSTITUTE:r64  (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<ho>"ri))  ; # c:  1
SUBSTITUTE:r65  (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<fred>"ri)) (2 ("<emmagatzemant>"ri))  ; # c:  1
SUBSTITUTE:r66  (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<fred>"ri))  ; # c:  1
SUBSTITUTE:r67  (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<es>"ri)) (2 ("<va>"ri))  ; # c:  1
SUBSTITUTE:r68  (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<es>"ri))  ; # c:  1
SUBSTITUTE:r69  (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<canvia>"ri)) (2 ("<mogut>"ri))  ; # c:  1
SUBSTITUTE:r70  (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<canvia>"ri))  ; # c:  1

And we then rank these rules using rank-rules.sh, which takes each one in turn and runs the whole ambiguous corpus through including this rule. We also run the corpus again, using only the baseline translation, without any rule. We then rank each of the translations of each of the sentences produced by each of the rules.

The final ranked rule list is made with show-rule-ranking.sh by summing the scores and subtracting the baseline. A threshhold may be given, for example if we take the average difference (in this case 0.000013) and select rules that score above that.

$ sh show-rule-ranking.sh ca-en.rules.txt resul/
0.000093	SUBSTITUTE:r9 (n :0) (n :1) ("estació"ri) (0 ("estació"ri)) (1 ("sec"ri)) ; # c: 4
0.000087	SUBSTITUTE:r10 (n :0) (n :1) ("estació"ri) (0 ("<estació>"ri)) (1 ("<seca>"ri)) ; # c: 4
0.000063	SUBSTITUTE:r123 (n :0) (n :1) ("estació"ri) (0 ("estació"ri)) (1 ("amb"ri)) ; # c: 1
0.000060	SUBSTITUTE:r11 (n :0) (n :1) ("estació"ri) (0 ("estació"ri)) (1 ("humit"ri)) ; # c: 3
0.000059	SUBSTITUTE:r19 (n :0) (n :1) ("estació"ri) (0 ("estació"ri)) (1 ("de"ri)) (2 ("treball"ri)) ; # c: 2
0.000051	SUBSTITUTE:r27 (n :0) (n :1) ("estació"ri) (-1 ("<l'>"ri)) (0 ("<estació>"ri)) (1 ("<seca>"ri)) ; # c: 2
0.000051	SUBSTITUTE:r24 (n :0) (n :1) ("estació"ri) (-1 ("el"ri)) (0 ("estació"ri)) (1 ("sec"ri)) ; # c: 2
0.000050	SUBSTITUTE:r13 (n :0) (n :1) ("estació"ri) (0 ("<estació>"ri)) (1 ("<humida>"ri)) ; # c: 3
0.000045	SUBSTITUTE:r133 (n :0) (n :1) ("estació"ri) (0 ("<estacions>"ri)) (1 ("<de>"ri)) (2 ("<treball>"ri)) ; # c: 1
0.000034	SUBSTITUTE:r28 (n :0) (n :1) ("estació"ri) (-1 ("<l'>"ri)) (0 ("<estació>"ri)) (1 ("<humida>"ri)) ; # c: 2
0.000034	SUBSTITUTE:r25 (n :0) (n :1) ("estació"ri) (-1 ("el"ri)) (0 ("estació"ri)) (1 ("humit"ri)) ; # c: 2
0.000032	SUBSTITUTE:r129 (n :0) (n :1) ("estació"ri) (0 ("<estació>"ri)) (1 ("<seca>"ri)) (2 ("<(>"ri)) ; # c: 1
0.000032	SUBSTITUTE:r128 (n :0) (n :1) ("estació"ri) (0 ("<estació>"ri)) (1 ("<seca>"ri)) (2 ("<de>"ri)) ; # c: 1
0.000032	SUBSTITUTE:r120 (n :0) (n :1) ("estació"ri) (0 ("estació"ri)) (1 ("sec"ri)) (2 ("("ri)) ; # c: 1
0.000032	SUBSTITUTE:r119 (n :0) (n :1) ("estació"ri) (0 ("estació"ri)) (1 ("sec"ri)) (2 ("de"ri)) ; # c: 1
0.000031	SUBSTITUTE:r155 (n :0) (n :1) ("estació"ri) (-1 ("<una>"ri)) (0 ("<estació>"ri)) (1 ("<seca>"ri)) ; # c: 1
0.000031	SUBSTITUTE:r141 (n :0) (n :1) ("estació"ri) (-1 ("un"ri)) (0 ("estació"ri)) (1 ("sec"ri)) ; # c: 1
0.000028	SUBSTITUTE:r20 (n :0) (n :1) ("estació"ri) (0 ("<estació>"ri)) (1 ("<humida>"ri)) (2 ("<i>"ri)) ; # c: 2
0.000028	SUBSTITUTE:r18 (n :0) (n :1) ("estació"ri) (0 ("estació"ri)) (1 ("humit"ri)) (2 ("i"ri)) ; # c: 2
0.000027	SUBSTITUTE:r99 (n :0) (n :1) ("llengua"ri) (0 ("llengua"ri)) (1 ("de"ri)) (2 ("terra"ri)) ; # c: 1
0.000027	SUBSTITUTE:r116 (n :0) (n :1) ("estació"ri) (0 ("estació"ri)) (1 ("sense"ri)) ; # c: 1
0.000027	SUBSTITUTE:r103 (n :0) (n :1) ("llengua"ri) (0 ("<llengua>"ri)) (1 ("<de>"ri)) (2 ("<terra>"ri)) ; # c: 1
0.000026	SUBSTITUTE:r161 (n :0) (n :1) ("estació"ri) (-1 ("<en>"ri)) (0 ("<estacions>"ri)) (1 ("<de>"ri)) ; # c: 1
0.000026	SUBSTITUTE:r146 (n :0) (n :1) ("estació"ri) (-1 ("en"ri)) (0 ("estació"ri)) (1 ("de"ri)) ; # c: 1
0.000025	SUBSTITUTE:r34 (n :0) (n :2) ("ràbia"ri) (0 ("<ràbia>"ri)) (1 ("<i>"ri)) (2 ("<fúria>"ri)) ; # c: 1
0.000025	SUBSTITUTE:r31 (n :0) (n :2) ("ràbia"ri) (0 ("ràbia"ri)) (1 ("i"ri)) (2 ("fúria"ri)) ; # c: 1
0.000025	SUBSTITUTE:r159 (n :0) (n :1) ("estació"ri) (-1 ("<i>"ri)) (0 ("<estació>"ri)) (1 ("<seca>"ri)) ; # c: 1
0.000025	SUBSTITUTE:r144 (n :0) (n :1) ("estació"ri) (-1 ("i"ri)) (0 ("estació"ri)) (1 ("sec"ri)) ; # c: 1
0.000025	SUBSTITUTE:r126 (n :0) (n :1) ("estació"ri) (0 ("<estació>"ri)) (1 ("<seca>"ri)) (2 ("<també>"ri)) ; # c: 1
0.000025	SUBSTITUTE:r117 (n :0) (n :1) ("estació"ri) (0 ("estació"ri)) (1 ("sec"ri)) (2 ("també"ri)) ; # c: 1
0.000024	SUBSTITUTE:r39 (n :0) (n :2) ("ràbia"ri) (-1 ("<de>"ri)) (0 ("<ràbia>"ri)) (1 ("<i>"ri)) ; # c: 1
0.000024	SUBSTITUTE:r37 (n :0) (n :2) ("ràbia"ri) (-1 ("de"ri)) (0 ("ràbia"ri)) (1 ("i"ri)) ; # c: 1
0.000024	SUBSTITUTE:r35 (n :0) (n :2) ("ràbia"ri) (0 ("<ràbia>"ri)) (1 ("<i>"ri)) ; # c: 1
0.000024	SUBSTITUTE:r32 (n :0) (n :2) ("ràbia"ri) (0 ("ràbia"ri)) (1 ("i"ri)) ; # c: 1
0.000024	SUBSTITUTE:r165 (n :0) (n :1) ("estació"ri) (-1 ("<completa>"ri)) (0 ("<estació>"ri)) ; # c: 1
0.000024	SUBSTITUTE:r164 (n :0) (n :1) ("estació"ri) (-1 ("<completa>"ri)) (0 ("<estació>"ri)) (1 ("<de>"ri)) ; # c: 1
0.000024	SUBSTITUTE:r151 (n :0) (n :1) ("estació"ri) (-1 ("complet"ri)) (0 ("estació"ri)) ; # c: 1
0.000024	SUBSTITUTE:r150 (n :0) (n :1) ("estació"ri) (-1 ("complet"ri)) (0 ("estació"ri)) (1 ("de"ri)) ; # c: 1
0.000024	SUBSTITUTE:r131 (n :0) (n :1) ("estació"ri) (0 ("<estació>"ri)) (1 ("<de>"ri)) (2 ("<treball>"ri)) ; # c: 1
0.000024	SUBSTITUTE:r130 (n :0) (n :1) ("estació"ri) (0 ("<estació>"ri)) (1 ("<humida>"ri)) (2 ("<acostumen>"ri)) ; # c: 1
0.000024	SUBSTITUTE:r121 (n :0) (n :1) ("estació"ri) (0 ("estació"ri)) (1 ("humit"ri)) (2 ("acostumar"ri)) ; # c: 1
0.000023	SUBSTITUTE:r154 (n :0) (n :1) ("estació"ri) (-1 ("<una>"ri)) (0 ("<estació>"ri)) (1 ("<sense>"ri)) ; # c: 1
0.000023	SUBSTITUTE:r143 (n :0) (n :1) ("estació"ri) (-1 ("més"ri)) (0 ("estació"ri)) ; # c: 1
0.000023	SUBSTITUTE:r140 (n :0) (n :1) ("estació"ri) (-1 ("un"ri)) (0 ("estació"ri)) (1 ("sense"ri)) ; # c: 1
0.000023	SUBSTITUTE:r125 (n :0) (n :1) ("estació"ri) (0 ("<estació>"ri)) (1 ("<sense>"ri)) ; # c: 1
0.000023	SUBSTITUTE:r124 (n :0) (n :1) ("estació"ri) (0 ("<estació>"ri)) (1 ("<sense>"ri)) (2 ("<pluja>"ri)) ; # c: 1
0.000023	SUBSTITUTE:r115 (n :0) (n :1) ("estació"ri) (0 ("estació"ri)) (1 ("sense"ri)) (2 ("pluja"ri)) ; # c: 1
0.000022	SUBSTITUTE:r167 (n :0) (n :1) ("estació"ri) (-1 ("<amb>"ri)) (0 ("<estació>"ri)) ; # c: 1
0.000022	SUBSTITUTE:r162 (n :0) (n :1) ("estació"ri) (-1 ("<en>"ri)) (0 ("<estacions>"ri)) ; # c: 1
0.000022	SUBSTITUTE:r160 (n :0) (n :1) ("estació"ri) (-1 ("<i>"ri)) (0 ("<estació>"ri)) ; # c: 1
0.000022	SUBSTITUTE:r147 (n :0) (n :1) ("estació"ri) (-1 ("en"ri)) (0 ("estació"ri)) ; # c: 1
0.000021	SUBSTITUTE:r114 (n :0) (n :1) ("llengua"ri) (-1 ("<estreta>"ri)) (0 ("<llengua>"ri)) ; # c: 1
0.000021	SUBSTITUTE:r113 (n :0) (n :1) ("llengua"ri) (-1 ("<estreta>"ri)) (0 ("<llengua>"ri)) (1 ("<de>"ri)) ; # c: 1
0.000021	SUBSTITUTE:r108 (n :0) (n :1) ("llengua"ri) (-1 ("estret"ri)) (0 ("llengua"ri)) ; # c: 1
0.000021	SUBSTITUTE:r107 (n :0) (n :1) ("llengua"ri) (-1 ("estret"ri)) (0 ("llengua"ri)) (1 ("de"ri)) ; # c: 1
0.000020	SUBSTITUTE:r157 (n :0) (n :1) ("estació"ri) (-1 ("<més>"ri)) (0 ("<estacions>"ri)) ; # c: 1
0.000020	SUBSTITUTE:r156 (n :0) (n :1) ("estació"ri) (-1 ("<més>"ri)) (0 ("<estacions>"ri)) (1 ("<amb>"ri)) ; # c: 1
0.000020	SUBSTITUTE:r142 (n :0) (n :1) ("estació"ri) (-1 ("més"ri)) (0 ("estació"ri)) (1 ("amb"ri)) ; # c: 1
0.000020	SUBSTITUTE:r127 (n :0) (n :1) ("estació"ri) (0 ("<estació>"ri)) (1 ("<seca>"ri)) (2 ("<poc>"ri)) ; # c: 1
0.000020	SUBSTITUTE:r118 (n :0) (n :1) ("estació"ri) (0 ("estació"ri)) (1 ("sec"ri)) (2 ("poc"ri)) ; # c: 1
0.000019	SUBSTITUTE:r166 (n :0) (n :1) ("estació"ri) (-1 ("<amb>"ri)) (0 ("<estació>"ri)) (1 ("<humida>"ri)) ; # c: 1
0.000019	SUBSTITUTE:r152 (n :0) (n :1) ("estació"ri) (-1 ("amb"ri)) (0 ("estació"ri)) (1 ("humit"ri)) ; # c: 1
0.000017	SUBSTITUTE:r40 (n :0) (n :2) ("ràbia"ri) (-1 ("<de>"ri)) (0 ("<ràbia>"ri)) ; # c: 1
0.000017	SUBSTITUTE:r38 (n :0) (n :2) ("ràbia"ri) (-1 ("de"ri)) (0 ("ràbia"ri)) ; # c: 1
0.000017	SUBSTITUTE:r135 (n :0) (n :1) ("estació"ri) (0 ("<estacions>"ri)) (1 ("<amb>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r98 (n :0) (n :1) ("temps"ri) (-1 ("<El>"ri)) (0 ("<temps>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r97 (n :0) (n :1) ("temps"ri) (-1 ("<El>"ri)) (0 ("<temps>"ri)) (1 ("<canvia>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r96 (n :0) (n :1) ("temps"ri) (-1 ("<el>"ri)) (0 ("<temps>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r95 (n :0) (n :1) ("temps"ri) (-1 ("<el>"ri)) (0 ("<temps>"ri)) (1 ("<es>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r94 (n :0) (n :1) ("temps"ri) (-1 ("<mal>"ri)) (0 ("<temps>"ri)) (1 ("<atrapà>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r93 (n :0) (n :1) ("temps"ri) (-1 ("<mal>"ri)) (0 ("<temps>"ri)) (1 ("<ho>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r92 (n :0) (n :1) ("temps"ri) (-1 ("<mal>"ri)) (0 ("<temps>"ri)) (1 ("<i>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r91 (n :0) (n :1) ("temps"ri) (-1 ("<mal>"ri)) (0 ("<temps>"ri)) (1 ("<obligà>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r90 (n :0) (n :1) ("temps"ri) (-1 ("<preparen>"ri)) (0 ("<temps>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r8 (n :0) (n :1) ("temps"ri) (-1 ("<mal>"ri)) (0 ("<temps>"ri)) ; # c: 4
0.000010	SUBSTITUTE:r89 (n :0) (n :1) ("temps"ri) (-1 ("<preparen>"ri)) (0 ("<temps>"ri)) (1 ("<fred>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r88 (n :0) (n :1) ("temps"ri) (-1 ("<un>"ri)) (0 ("<temps>"ri)) (1 ("<calent>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r87 (n :0) (n :1) ("temps"ri) (-1 ("<un>"ri)) (0 ("<temps>"ri)) (1 ("<sever>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r86 (n :0) (n :1) ("temps"ri) (-1 ("El"ri)) (0 ("temps"ri)) ; # c: 1
0.000010	SUBSTITUTE:r85 (n :0) (n :1) ("temps"ri) (-1 ("El"ri)) (0 ("temps"ri)) (1 ("canviar"ri)) ; # c: 1
0.000010	SUBSTITUTE:r84 (n :0) (n :1) ("temps"ri) (-1 ("el"ri)) (0 ("temps"ri)) ; # c: 1
0.000010	SUBSTITUTE:r83 (n :0) (n :1) ("temps"ri) (-1 ("el"ri)) (0 ("temps"ri)) (1 ("es"ri)) ; # c: 1
0.000010	SUBSTITUTE:r82 (n :0) (n :1) ("temps"ri) (-1 ("mal"ri)) (0 ("temps"ri)) (1 ("atrapar"ri)) ; # c: 1
0.000010	SUBSTITUTE:r81 (n :0) (n :1) ("temps"ri) (-1 ("mal"ri)) (0 ("temps"ri)) (1 ("ho"ri)) ; # c: 1
0.000010	SUBSTITUTE:r80 (n :0) (n :1) ("temps"ri) (-1 ("mal"ri)) (0 ("temps"ri)) (1 ("i"ri)) ; # c: 1
0.000010	SUBSTITUTE:r7 (n :0) (n :1) ("temps"ri) (-1 ("mal"ri)) (0 ("temps"ri)) ; # c: 4
0.000010	SUBSTITUTE:r79 (n :0) (n :1) ("temps"ri) (-1 ("mal"ri)) (0 ("temps"ri)) (1 ("obligar"ri)) ; # c: 1
0.000010	SUBSTITUTE:r78 (n :0) (n :1) ("temps"ri) (-1 ("preparar"ri)) (0 ("temps"ri)) ; # c: 1
0.000010	SUBSTITUTE:r77 (n :0) (n :1) ("temps"ri) (-1 ("preparar"ri)) (0 ("temps"ri)) (1 ("fred"ri)) ; # c: 1
0.000010	SUBSTITUTE:r76 (n :0) (n :1) ("temps"ri) (-1 ("un"ri)) (0 ("temps"ri)) (1 ("calent"ri)) ; # c: 1
0.000010	SUBSTITUTE:r75 (n :0) (n :1) ("temps"ri) (-1 ("un"ri)) (0 ("temps"ri)) (1 ("sever"ri)) ; # c: 1
0.000010	SUBSTITUTE:r74 (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<atrapà>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r73 (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<atrapà>"ri)) (2 ("<a>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r72 (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<calent>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r71 (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<calent>"ri)) (2 ("<i>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r70 (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<canvia>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r69 (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<canvia>"ri)) (2 ("<mogut>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r68 (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<es>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r67 (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<es>"ri)) (2 ("<va>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r66 (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<fred>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r65 (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<fred>"ri)) (2 ("<emmagatzemant>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r64 (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<ho>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r63 (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<i>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r62 (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<i>"ri)) (2 ("<l'>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r61 (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<obligà>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r60 (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<obligà>"ri)) (2 ("<la>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r59 (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<sever>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r58 (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) (1 ("<sever>"ri)) (2 ("<per>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r57 (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("atrapar"ri)) ; # c: 1
0.000010	SUBSTITUTE:r56 (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("atrapar"ri)) (2 ("a"ri)) ; # c: 1
0.000010	SUBSTITUTE:r55 (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("calent"ri)) ; # c: 1
0.000010	SUBSTITUTE:r54 (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("calent"ri)) (2 ("i"ri)) ; # c: 1
0.000010	SUBSTITUTE:r53 (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("canviar"ri)) ; # c: 1
0.000010	SUBSTITUTE:r52 (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("canviar"ri)) (2 ("moure"ri)) ; # c: 1
0.000010	SUBSTITUTE:r51 (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("es"ri)) ; # c: 1
0.000010	SUBSTITUTE:r50 (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("es"ri)) (2 ("anar"ri)) ; # c: 1
0.000010	SUBSTITUTE:r4 (n :0) (n :1) ("temps"ri) (0 ("<temps>"ri)) ; # c: 9
0.000010	SUBSTITUTE:r49 (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("fred"ri)) ; # c: 1
0.000010	SUBSTITUTE:r48 (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("fred"ri)) (2 ("emmagatzemar"ri)) ; # c: 1
0.000010	SUBSTITUTE:r47 (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("ho"ri)) ; # c: 1
0.000010	SUBSTITUTE:r46 (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("i"ri)) ; # c: 1
0.000010	SUBSTITUTE:r45 (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("i"ri)) (2 ("el"ri)) ; # c: 1
0.000010	SUBSTITUTE:r44 (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("obligar"ri)) ; # c: 1
0.000010	SUBSTITUTE:r43 (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("obligar"ri)) (2 ("el"ri)) ; # c: 1
0.000010	SUBSTITUTE:r42 (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("sever"ri)) ; # c: 1
0.000010	SUBSTITUTE:r41 (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) (1 ("sever"ri)) (2 ("per"ri)) ; # c: 1
0.000010	SUBSTITUTE:r3 (n :0) (n :1) ("temps"ri) (0 ("temps"ri)) ; # c: 9
0.000010	SUBSTITUTE:r177 (n :0) (n :0) ("llengua"ri) (-1 ("<que>"ri)) (0 ("<llengua>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r176 (n :0) (n :0) ("llengua"ri) (-1 ("<que>"ri)) (0 ("<llengua>"ri)) (1 ("<materna>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r175 (n :0) (n :0) ("llengua"ri) (-1 ("altre"ri)) (0 ("llengua"ri)) ; # c: 1
0.000010	SUBSTITUTE:r174 (n :0) (n :0) ("llengua"ri) (-1 ("altre"ri)) (0 ("llengua"ri)) (1 ("que"ri)) ; # c: 1
0.000010	SUBSTITUTE:r173 (n :0) (n :0) ("llengua"ri) (-1 ("que"ri)) (0 ("llengua"ri)) ; # c: 1
0.000010	SUBSTITUTE:r172 (n :0) (n :0) ("llengua"ri) (-1 ("que"ri)) (0 ("llengua"ri)) (1 ("*materna"ri)) ; # c: 1
0.000010	SUBSTITUTE:r171 (n :0) (n :0) ("llengua"ri) (0 ("<llengua>"ri)) (1 ("<materna>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r170 (n :0) (n :0) ("llengua"ri) (0 ("<llengua>"ri)) (1 ("<que>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r169 (n :0) (n :0) ("llengua"ri) (0 ("llengua"ri)) (1 ("*materna"ri)) ; # c: 1
0.000010	SUBSTITUTE:r163 (n :0) (n :1) ("estació"ri) (-1 ("<de>"ri)) (0 ("<:>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r15 (n :0) (n :1) ("temps"ri) (-1 ("<un>"ri)) (0 ("<temps>"ri)) ; # c: 2
0.000010	SUBSTITUTE:r14 (n :0) (n :1) ("temps"ri) (-1 ("un"ri)) (0 ("temps"ri)) ; # c: 2
0.000010	SUBSTITUTE:r149 (n :0) (n :1) ("estació"ri) (-1 ("de"ri)) (0 (":"ri)) ; # c: 1
0.000010	SUBSTITUTE:r139 (n :0) (n :1) ("estació"ri) (0 (":"ri)) ; # c: 1
0.000010	SUBSTITUTE:r138 (n :0) (n :1) ("estació"ri) (0 (":"ri)) (1 ("."ri)) (2 ("Per a"ri)) ; # c: 1
0.000010	SUBSTITUTE:r137 (n :0) (n :1) ("estació"ri) (0 ("<:>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r136 (n :0) (n :1) ("estació"ri) (0 ("<:>"ri)) (1 ("<.>"ri)) (2 ("<Per a>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r132 (n :0) (n :1) ("estació"ri) (0 ("<estació>"ri)) (1 ("<de>"ri)) (2 ("<es>"ri)) ; # c: 1
0.000010	SUBSTITUTE:r122 (n :0) (n :1) ("estació"ri) (0 ("estació"ri)) (1 ("de"ri)) (2 ("es"ri)) ; # c: 1
0.000009	SUBSTITUTE:r153 (n :0) (n :1) ("estació"ri) (-1 ("amb"ri)) (0 ("estació"ri)) ; # c: 1
0.000006	SUBSTITUTE:r30 (n :0) (n :0) ("llengua"ri) (0 ("<llengua>"ri)) ; # c: 2
0.000006	SUBSTITUTE:r134 (n :0) (n :1) ("estació"ri) (0 ("<estacions>"ri)) (1 ("<de>"ri)) ; # c: 1
0.000005	SUBSTITUTE:r168 (n :0) (n :0) ("llengua"ri) (0 ("llengua"ri)) (1 ("que"ri)) ; # c: 1
0.000000	SUBSTITUTE:r105 (n :0) (n :1) ("llengua"ri) (0 ("<llengua>"ri)) (1 ("<celta>"ri)) (2 ("<d'>"ri)) ; # c: 1
0.000000	SUBSTITUTE:r101 (n :0) (n :1) ("llengua"ri) (0 ("llengua"ri)) (1 ("celta"ri)) (2 ("de"ri)) ; # c: 1
-0.000003	SUBSTITUTE:r111 (n :0) (n :1) ("llengua"ri) (-1 ("<la>"ri)) (0 ("<llengua>"ri)) (1 ("<celta>"ri)) ; # c: 1
-0.000003	SUBSTITUTE:r109 (n :0) (n :1) ("llengua"ri) (-1 ("el"ri)) (0 ("llengua"ri)) (1 ("celta"ri)) ; # c: 1
-0.000010	SUBSTITUTE:r145 (n :0) (n :1) ("estació"ri) (-1 ("i"ri)) (0 ("estació"ri)) ; # c: 1
-0.000010	SUBSTITUTE:r106 (n :0) (n :1) ("llengua"ri) (0 ("<llengua>"ri)) (1 ("<celta>"ri)) ; # c: 1
-0.000010	SUBSTITUTE:r102 (n :0) (n :1) ("llengua"ri) (0 ("llengua"ri)) (1 ("celta"ri)) ; # c: 1
-0.000011	SUBSTITUTE:r36 (n :0) (n :2) ("ràbia"ri) (0 ("<ràbia>"ri)) ; # c: 1
-0.000011	SUBSTITUTE:r33 (n :0) (n :2) ("ràbia"ri) (0 ("ràbia"ri)) ; # c: 1
-0.000043	SUBSTITUTE:r158 (n :0) (n :1) ("estació"ri) (-1 ("<l'>"ri)) (0 ("<estació>"ri)) (1 ("<de>"ri)) ; # c: 1
-0.000056	SUBSTITUTE:r29 (n :0) (n :0) ("llengua"ri) (0 ("llengua"ri)) ; # c: 2
-0.000140	SUBSTITUTE:r21 (n :0) (n :1) ("estació"ri) (0 ("<estació>"ri)) (1 ("<de>"ri)) ; # c: 2
-0.000216	SUBSTITUTE:r12 (n :0) (n :1) ("estació"ri) (0 ("estació"ri)) (1 ("de"ri)) ; # c: 3
-0.000247	SUBSTITUTE:r148 (n :0) (n :1) ("estació"ri) (-1 ("el"ri)) (0 ("estació"ri)) (1 ("de"ri)) ; # c: 1
-0.000393	SUBSTITUTE:r23 (n :0) (n :1) ("estació"ri) (-1 ("un"ri)) (0 ("estació"ri)) ; # c: 2
-0.000397	SUBSTITUTE:r26 (n :0) (n :1) ("estació"ri) (-1 ("<una>"ri)) (0 ("<estació>"ri)) ; # c: 2
-0.000630	SUBSTITUTE:r104 (n :0) (n :1) ("llengua"ri) (0 ("<llengua>"ri)) (1 ("<de>"ri)) ; # c: 1
-0.000694	SUBSTITUTE:r22 (n :0) (n :1) ("estació"ri) (0 ("<estacions>"ri)) ; # c: 2
-0.001345	SUBSTITUTE:r6 (n :0) (n :1) ("estació"ri) (-1 ("<l'>"ri)) (0 ("<estació>"ri)) ; # c: 5
-0.001903	SUBSTITUTE:r5 (n :0) (n :1) ("estació"ri) (-1 ("el"ri)) (0 ("estació"ri)) ; # c: 5
-0.002080	SUBSTITUTE:r2 (n :0) (n :1) ("estació"ri) (0 ("<estació>"ri)) ; # c: 10
-0.002281	SUBSTITUTE:r100 (n :0) (n :1) ("llengua"ri) (0 ("llengua"ri)) (1 ("de"ri)) ; # c: 1
-0.002785	SUBSTITUTE:r1 (n :0) (n :1) ("estació"ri) (0 ("estació"ri)) ; # c: 12
-0.006686	SUBSTITUTE:r112 (n :0) (n :1) ("llengua"ri) (-1 ("<la>"ri)) (0 ("<llengua>"ri)) ; # c: 1
-0.009206	SUBSTITUTE:r110 (n :0) (n :1) ("llengua"ri) (-1 ("el"ri)) (0 ("llengua"ri)) ; # c: 1
-0.013721	SUBSTITUTE:r17 (n :0) (n :1) ("llengua"ri) (0 ("<llengua>"ri)) ; # c: 2
-0.020805	SUBSTITUTE:r16 (n :0) (n :1) ("llengua"ri) (0 ("llengua"ri)) ; # c: 2