Difference between revisions of "Crossdics"

From Apertium
Jump to navigation Jump to search
m (B-A, not A-B)
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[Crossdics : Génération d'une paire de langue à partir de 2 autres|En français]]

{{TOCD}}
{{TOCD}}
{{main|Building dictionaries}}
{{main|Building dictionaries}}
Line 28: Line 30:
First of all, copy linguistic data into folder "dics"
First of all, copy linguistic data into folder "dics"


* Bilingual dictionary A-B: <code>apertium-bb-aa.bb-aa.dix</code>
* Bilingual dictionary B-A: <code>apertium-bb-aa.bb-aa.dix</code>
* Bilingual dictionary B-C: <code>apertium-bb-cc.bb-cc.dix</code>
* Bilingual dictionary B-C: <code>apertium-bb-cc.bb-cc.dix</code>
* Morphological dictionary A: <code>apertium-bb-aa.aa.dix</code>
* Morphological dictionary A: <code>apertium-bb-aa.aa.dix</code>
Line 44: Line 46:
Use the '''apertium-dixtools''' script to cross the dictionaries:
Use the '''apertium-dixtools''' script to cross the dictionaries:


$ apertium-dixtools cross-param '''monA.dix''' -n '''bilAB.dix''' -n '''bilBC-dix''' '''monC.dix'''
$ apertium-dixtools cross-param '''monA.dix''' -n '''bilBA.dix''' -n '''bilBC-dix''' '''monC.dix'''


An example crossing '''es-ca''' and '''es-pt''' to get the '''ca-pt''' pair.
An example crossing '''es-ca''' and '''es-pt''' to get the '''ca-pt''' pair.
Line 52: Line 54:
</pre>
</pre>


: I was not able to do this without appending /usr/local/apertium-dixtools/schemas/cross-model.xml to the end of the command. --[[User:Unhammer|unhammer]] ([[User talk:Unhammer|talk]]) 21:19, 14 April 2015 (CEST)
== Customizing cross actions ==

== Customising cross actions ==


By default, the crossdics tool uses a simple cross model defining very simple rules for crossing two sets of dictionaries. However, more specific cross actions might be needed in order to cross certain [[List of language pairs|language pairs]] correctly. [[Cross Model|Defining a new cross schema]] with concrete pattern-action elements solves this problem.
By default, the crossdics tool uses a simple cross model defining very simple rules for crossing two sets of dictionaries. However, more specific cross actions might be needed in order to cross certain [[List of language pairs|language pairs]] correctly. [[Cross Model|Defining a new cross schema]] with concrete pattern-action elements solves this problem.

Latest revision as of 13:29, 6 October 2017

En français

Main article: Building dictionaries

Crossdics (part of apertium-dixtools) is a program that can be used to "cross" language pairs. That is, given language pairs aa-bb and bb-cc it will create a new language pair for aa-cc.

Installing[edit]

See apertium-dixtools.

Using apertium-crossdics[edit]

$ apertium-dixtools cross

Crossing dictionaries[edit]

Using a Linguistic Resources Document[edit]

You can define a Linguistic Resources Document (LRD) and use it to indicate which dictionaries will be used for crossing:

$ apertium-dixtools cross -f my-linguistic-resources.xml sl-tl

Therefore, only 2 parameters are needed:

  • my-linguistic-resources.xml: a document specifying a set of linguistic resources (dictionaries, cross models, corpora, other LRD files, etc).
  • sl-tl: source language (sl) and target language (tl).


Without a Linguistic Resources Document[edit]

First of all, copy linguistic data into folder "dics"

  • Bilingual dictionary B-A: apertium-bb-aa.bb-aa.dix
  • Bilingual dictionary B-C: apertium-bb-cc.bb-cc.dix
  • Morphological dictionary A: apertium-bb-aa.aa.dix
  • Morphological dictionary C: apertium-bb-cc.cc.dix


Please note that:

  • all dictionaries must be in the form:
    • apertium-xx-yy.xx-yy.dix (bilingual dictionaries)
    • apertium-xx-yy.xx.dix (morphological dictionaries)
  • the common language (B) must be in the left side, that is, dictionaries in the form B-A and B-C
  • use "-r" instead of "-n" if the dictionary has to be reversed (apertium-aa-bb.aa-bb.dix to apertium-bb-aa.bb-aa.dix)


Use the apertium-dixtools script to cross the dictionaries:

$ apertium-dixtools cross-param monA.dix -n bilBA.dix -n bilBC-dix monC.dix

An example crossing es-ca and es-pt to get the ca-pt pair.

$ apertium-dixtools cross-param dics/apertium-es-ca.ca.dix -n dics/apertium-es-ca.es-ca.dix -n dics/apertium-es-pt.es-pt.dix dics/apertium-es-pt.pt.dix
I was not able to do this without appending /usr/local/apertium-dixtools/schemas/cross-model.xml to the end of the command. --unhammer (talk) 21:19, 14 April 2015 (CEST)

Customising cross actions[edit]

By default, the crossdics tool uses a simple cross model defining very simple rules for crossing two sets of dictionaries. However, more specific cross actions might be needed in order to cross certain language pairs correctly. Defining a new cross schema with concrete pattern-action elements solves this problem.

Troubleshooting[edit]

NullPointerException in crossing sections
[9] Crossing sections 'main' and 'main'

Exception in thread "main" java.lang.NullPointerException
        at dictools.cross.DicCross.crossSections(DicCross.java:342)
        at dictools.cross.DicCross.crossDictionaries(DicCross.java:233)

If you get this error, try and remove unused sections from the bilingual dictionaries until you have only the 'main' sections left.


java.lang.IllegalArgumentException
Comparison method violates its general contract!
Exception in thread "main" java.lang.IllegalArgumentException: Comparison method violates its general contract!
        at java.util.TimSort.mergeHi(TimSort.java:868)
        at java.util.TimSort.mergeAt(TimSort.java:485)
        at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
        at java.util.TimSort.sort(TimSort.java:223)
        at java.util.TimSort.sort(TimSort.java:173)
        at java.util.Arrays.sort(Arrays.java:659)
        at java.util.Collections.sort(Collections.java:217)
        at dictools.cross.DicCross.crossDictionaries(DicCross.java:243)
        at dictools.cross.DicCross.actionCross(DicCross.java:729)
        at dictools.cross.DicCross.doCross(DicCross.java:722)
        at dictools.ProcessDics.process_cross_param(ProcessDics.java:462)
        at dictools.ProcessDics.processArguments(ProcessDics.java:206)
        at dictools.ProcessDics.main(ProcessDics.java:79)
        at ProcessDics.main(ProcessDics.java:30)

This means your java is too fresh. Get an older java (1.6 seems to work).

See also[edit]