Difference between revisions of "Crossdics"

From Apertium
Jump to navigation Jump to search
m
m (B-A, not A-B)
 
(25 intermediate revisions by 6 users not shown)
Line 1: Line 1:
[[Crossdics : Génération d'une paire de langue à partir de 2 autres|En français]]

{{TOCD}}
{{main|Building dictionaries}}
{{main|Building dictionaries}}


'''Crossdics''' (part of [[apertium-dixtools]]) is a program that can be used to "cross" language pairs. That is, given language pairs <code>aa-bb</code> and <code>bb-cc</code> it will create a new language pair for <code>aa-cc</code>.
== Download ==


== Installing ==
<pre>
See [[apertium-dixtools]].
$ svn co https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-crossdics
</pre>


== Software prerequisites ==
== Using apertium-crossdics ==


$ apertium-dixtools cross
You will need to install [http://ant.apache.org/ Ant] and [http://java.sun.com/javase/downloads/index.jsp Java Development Kit 6 (JDK6)]


== Crossing dictionaries ==
$ sudo apt-get install ant sun-java6-jdk


=== Using a Linguistic Resources Document ===
== Compiling ==


You can define a [[Linguistic Resources Document]] (LRD) and use it to indicate which dictionaries will be used for crossing:
<pre>
$ cd apertium-crossdics
$ ant jar
</pre>


$ apertium-dixtools cross -f my-linguistic-resources.xml sl-tl
== Installing ==
$ sudo ant install


Therefore, only 2 parameters are needed:
== Using apertium-crossdics ==
* '''my-linguistic-resources.xml''': a document specifying a set of linguistic resources (dictionaries, cross models, corpora, other LRD files, etc).
* '''sl-tl''': source language (sl) and target language (tl).


$ apertium-crossdics


=== Without a Linguistic Resources Document ===
== Crossing dictionaries ==


First of all, copy linguistic data into folder "dics"
First of all, copy linguistic data into folder "dics"


* Bilingual dictionary A-B: <code>apertium-bb-aa.bb-aa.dix</code>
* Bilingual dictionary B-A: <code>apertium-bb-aa.bb-aa.dix</code>
* Bilingual dictionary B-C: <code>apertium-bb-cc.bb-cc.dix</code>
* Bilingual dictionary B-C: <code>apertium-bb-cc.bb-cc.dix</code>
* Morphological dictionary A: <code>apertium-bb-aa.aa.dix</code>
* Morphological dictionary A: <code>apertium-bb-aa.aa.dix</code>
Line 45: Line 44:




Use the '''dictools''' script to cross the dictionaries:
Use the '''apertium-dixtools''' script to cross the dictionaries:


$ apertium-crossdics '''monA.dix''' -n '''bilAB.dix''' -n '''bilBC-dix''' '''monC.dix'''
$ apertium-dixtools cross-param '''monA.dix''' -n '''bilBA.dix''' -n '''bilBC-dix''' '''monC.dix'''


An example for crossing '''es-ca''' and '''es-pt''' to get the '''ca-pt''' pair.
An example crossing '''es-ca''' and '''es-pt''' to get the '''ca-pt''' pair.


<pre>
<pre>
$ apertium-crossdics dics/apertium-es-ca.ca.dix -n dics/apertium-es-ca.es-ca.dix -n dics/apertium-es-pt.es-pt.dix dics/apertium-es-pt.pt.dix
$ apertium-dixtools cross-param dics/apertium-es-ca.ca.dix -n dics/apertium-es-ca.es-ca.dix -n dics/apertium-es-pt.es-pt.dix dics/apertium-es-pt.pt.dix
</pre>
</pre>


: I was not able to do this without appending /usr/local/apertium-dixtools/schemas/cross-model.xml to the end of the command. --[[User:Unhammer|unhammer]] ([[User talk:Unhammer|talk]]) 21:19, 14 April 2015 (CEST)
== Customizing cross actions ==

== Customising cross actions ==


By default, the crossdics tool uses a simple cross model defining very simple rules for crossing two sets of dictionaries. However, more specific cross actions might be needed in order to cross certain [[List of language pairs|language pairs]] correctly. [[Cross Model|Defining a new cross schema]] with concrete pattern-action elements solves this problem.
By default, the crossdics tool uses a simple cross model defining very simple rules for crossing two sets of dictionaries. However, more specific cross actions might be needed in order to cross certain [[List of language pairs|language pairs]] correctly. [[Cross Model|Defining a new cross schema]] with concrete pattern-action elements solves this problem.

==Troubleshooting==

;NullPointerException in crossing sections

<pre>
[9] Crossing sections 'main' and 'main'

Exception in thread "main" java.lang.NullPointerException
at dictools.cross.DicCross.crossSections(DicCross.java:342)
at dictools.cross.DicCross.crossDictionaries(DicCross.java:233)
</pre>

If you get this error, try and remove unused sections from the bilingual dictionaries until you have only the 'main' sections left.


;java.lang.IllegalArgumentException: Comparison method violates its general contract!
<pre>Exception in thread "main" java.lang.IllegalArgumentException: Comparison method violates its general contract!
at java.util.TimSort.mergeHi(TimSort.java:868)
at java.util.TimSort.mergeAt(TimSort.java:485)
at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
at java.util.TimSort.sort(TimSort.java:223)
at java.util.TimSort.sort(TimSort.java:173)
at java.util.Arrays.sort(Arrays.java:659)
at java.util.Collections.sort(Collections.java:217)
at dictools.cross.DicCross.crossDictionaries(DicCross.java:243)
at dictools.cross.DicCross.actionCross(DicCross.java:729)
at dictools.cross.DicCross.doCross(DicCross.java:722)
at dictools.ProcessDics.process_cross_param(ProcessDics.java:462)
at dictools.ProcessDics.processArguments(ProcessDics.java:206)
at dictools.ProcessDics.main(ProcessDics.java:79)
at ProcessDics.main(ProcessDics.java:30)
</pre>
This means your java is too fresh. Get an older java (1.6 seems to work).


== See also ==
== See also ==
* [[Crossdics Example|Crossing language pairs: a full example]]
* [[Linguistic Resources Document|How to create a Linguistic Resources Document]]
* [[Cross Model|How to define a new cross schema]]
* [[Cross Model|How to define a new cross schema]]
* [[List of language pairs|List of available language pairs]]
* [[List of language pairs|List of available language pairs]]
* [[Sort a dictionary|How to sort a dictionary]]
* [[Sort a dictionary|How to sort a dictionary]]
* [[Merge dictionaries|How to merge dictionaries]]
* [[Merge dictionaries|How to merge dictionaries]]
<!--
* [[Reverse a dictionary|How to reverse a bilingual dictionary]]
* [[Reverse a dictionary|How to reverse a bilingual dictionary]]
-->



[[Category:Documentation]]
[[Category:Tools]]
[[Category:Dixtools]]
[[Category:Development]]
[[Category:Documentation in English]]

Latest revision as of 13:29, 6 October 2017

En français

Main article: Building dictionaries

Crossdics (part of apertium-dixtools) is a program that can be used to "cross" language pairs. That is, given language pairs aa-bb and bb-cc it will create a new language pair for aa-cc.

Installing[edit]

See apertium-dixtools.

Using apertium-crossdics[edit]

$ apertium-dixtools cross

Crossing dictionaries[edit]

Using a Linguistic Resources Document[edit]

You can define a Linguistic Resources Document (LRD) and use it to indicate which dictionaries will be used for crossing:

$ apertium-dixtools cross -f my-linguistic-resources.xml sl-tl

Therefore, only 2 parameters are needed:

  • my-linguistic-resources.xml: a document specifying a set of linguistic resources (dictionaries, cross models, corpora, other LRD files, etc).
  • sl-tl: source language (sl) and target language (tl).


Without a Linguistic Resources Document[edit]

First of all, copy linguistic data into folder "dics"

  • Bilingual dictionary B-A: apertium-bb-aa.bb-aa.dix
  • Bilingual dictionary B-C: apertium-bb-cc.bb-cc.dix
  • Morphological dictionary A: apertium-bb-aa.aa.dix
  • Morphological dictionary C: apertium-bb-cc.cc.dix


Please note that:

  • all dictionaries must be in the form:
    • apertium-xx-yy.xx-yy.dix (bilingual dictionaries)
    • apertium-xx-yy.xx.dix (morphological dictionaries)
  • the common language (B) must be in the left side, that is, dictionaries in the form B-A and B-C
  • use "-r" instead of "-n" if the dictionary has to be reversed (apertium-aa-bb.aa-bb.dix to apertium-bb-aa.bb-aa.dix)


Use the apertium-dixtools script to cross the dictionaries:

$ apertium-dixtools cross-param monA.dix -n bilBA.dix -n bilBC-dix monC.dix

An example crossing es-ca and es-pt to get the ca-pt pair.

$ apertium-dixtools cross-param dics/apertium-es-ca.ca.dix -n dics/apertium-es-ca.es-ca.dix -n dics/apertium-es-pt.es-pt.dix dics/apertium-es-pt.pt.dix
I was not able to do this without appending /usr/local/apertium-dixtools/schemas/cross-model.xml to the end of the command. --unhammer (talk) 21:19, 14 April 2015 (CEST)

Customising cross actions[edit]

By default, the crossdics tool uses a simple cross model defining very simple rules for crossing two sets of dictionaries. However, more specific cross actions might be needed in order to cross certain language pairs correctly. Defining a new cross schema with concrete pattern-action elements solves this problem.

Troubleshooting[edit]

NullPointerException in crossing sections
[9] Crossing sections 'main' and 'main'

Exception in thread "main" java.lang.NullPointerException
        at dictools.cross.DicCross.crossSections(DicCross.java:342)
        at dictools.cross.DicCross.crossDictionaries(DicCross.java:233)

If you get this error, try and remove unused sections from the bilingual dictionaries until you have only the 'main' sections left.


java.lang.IllegalArgumentException
Comparison method violates its general contract!
Exception in thread "main" java.lang.IllegalArgumentException: Comparison method violates its general contract!
        at java.util.TimSort.mergeHi(TimSort.java:868)
        at java.util.TimSort.mergeAt(TimSort.java:485)
        at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
        at java.util.TimSort.sort(TimSort.java:223)
        at java.util.TimSort.sort(TimSort.java:173)
        at java.util.Arrays.sort(Arrays.java:659)
        at java.util.Collections.sort(Collections.java:217)
        at dictools.cross.DicCross.crossDictionaries(DicCross.java:243)
        at dictools.cross.DicCross.actionCross(DicCross.java:729)
        at dictools.cross.DicCross.doCross(DicCross.java:722)
        at dictools.ProcessDics.process_cross_param(ProcessDics.java:462)
        at dictools.ProcessDics.processArguments(ProcessDics.java:206)
        at dictools.ProcessDics.main(ProcessDics.java:79)
        at ProcessDics.main(ProcessDics.java:30)

This means your java is too fresh. Get an older java (1.6 seems to work).

See also[edit]