Difference between revisions of "Crossdics"

From Apertium
Jump to navigation Jump to search
m (B-A, not A-B)
 
(17 intermediate revisions by 6 users not shown)
Line 1: Line 1:
  +
[[Crossdics : Génération d'une paire de langue à partir de 2 autres|En français]]
  +
 
{{TOCD}}
 
{{TOCD}}
 
{{main|Building dictionaries}}
 
{{main|Building dictionaries}}
   
  +
'''Crossdics''' (part of [[apertium-dixtools]]) is a program that can be used to "cross" language pairs. That is, given language pairs <code>aa-bb</code> and <code>bb-cc</code> it will create a new language pair for <code>aa-cc</code>.
== Download ==
 
 
<pre>
 
$ svn co https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-crossdics
 
</pre>
 
 
== Software prerequisites ==
 
 
You will need to install [http://ant.apache.org/ Ant] and [http://java.sun.com/javase/downloads/index.jsp Java Development Kit 6 (JDK6)]
 
 
$ sudo apt-get install ant sun-java6-jdk
 
 
== Compiling ==
 
 
<pre>
 
$ cd apertium-crossdics
 
$ ant jar
 
</pre>
 
   
 
== Installing ==
 
== Installing ==
  +
See [[apertium-dixtools]].
$ sudo ant install
 
   
 
== Using apertium-crossdics ==
 
== Using apertium-crossdics ==
   
$ apertium-crossdics
+
$ apertium-dixtools cross
   
 
== Crossing dictionaries ==
 
== Crossing dictionaries ==
Line 34: Line 19:
 
You can define a [[Linguistic Resources Document]] (LRD) and use it to indicate which dictionaries will be used for crossing:
 
You can define a [[Linguistic Resources Document]] (LRD) and use it to indicate which dictionaries will be used for crossing:
   
$ apertium-crossdics -f my-linguistic-resources.xml sl-tl
+
$ apertium-dixtools cross -f my-linguistic-resources.xml sl-tl
   
 
Therefore, only 2 parameters are needed:
 
Therefore, only 2 parameters are needed:
Line 40: Line 25:
 
* '''sl-tl''': source language (sl) and target language (tl).
 
* '''sl-tl''': source language (sl) and target language (tl).
   
Note that this form uses the <code>apertium-dictools</code> script (the <code>apertium-crossdics</code> script still uses the "old" form)
 
   
 
=== Without a Linguistic Resources Document ===
 
=== Without a Linguistic Resources Document ===
Line 46: Line 30:
 
First of all, copy linguistic data into folder "dics"
 
First of all, copy linguistic data into folder "dics"
   
* Bilingual dictionary A-B: <code>apertium-bb-aa.bb-aa.dix</code>
+
* Bilingual dictionary B-A: <code>apertium-bb-aa.bb-aa.dix</code>
 
* Bilingual dictionary B-C: <code>apertium-bb-cc.bb-cc.dix</code>
 
* Bilingual dictionary B-C: <code>apertium-bb-cc.bb-cc.dix</code>
 
* Morphological dictionary A: <code>apertium-bb-aa.aa.dix</code>
 
* Morphological dictionary A: <code>apertium-bb-aa.aa.dix</code>
Line 60: Line 44:
   
   
Use the '''apertium-dictools''' script to cross the dictionaries:
+
Use the '''apertium-dixtools''' script to cross the dictionaries:
   
$ apertium-dictools cross-param '''monA.dix''' -n '''bilAB.dix''' -n '''bilBC-dix''' '''monC.dix'''
+
$ apertium-dixtools cross-param '''monA.dix''' -n '''bilBA.dix''' -n '''bilBC-dix''' '''monC.dix'''
   
 
An example crossing '''es-ca''' and '''es-pt''' to get the '''ca-pt''' pair.
 
An example crossing '''es-ca''' and '''es-pt''' to get the '''ca-pt''' pair.
   
 
<pre>
 
<pre>
$ apertium-dictools cross-param dics/apertium-es-ca.ca.dix -n dics/apertium-es-ca.es-ca.dix -n dics/apertium-es-pt.es-pt.dix dics/apertium-es-pt.pt.dix
+
$ apertium-dixtools cross-param dics/apertium-es-ca.ca.dix -n dics/apertium-es-ca.es-ca.dix -n dics/apertium-es-pt.es-pt.dix dics/apertium-es-pt.pt.dix
 
</pre>
 
</pre>
   
  +
: I was not able to do this without appending /usr/local/apertium-dixtools/schemas/cross-model.xml to the end of the command. --[[User:Unhammer|unhammer]] ([[User talk:Unhammer|talk]]) 21:19, 14 April 2015 (CEST)
== Customizing cross actions ==
 
  +
 
== Customising cross actions ==
   
 
By default, the crossdics tool uses a simple cross model defining very simple rules for crossing two sets of dictionaries. However, more specific cross actions might be needed in order to cross certain [[List of language pairs|language pairs]] correctly. [[Cross Model|Defining a new cross schema]] with concrete pattern-action elements solves this problem.
 
By default, the crossdics tool uses a simple cross model defining very simple rules for crossing two sets of dictionaries. However, more specific cross actions might be needed in order to cross certain [[List of language pairs|language pairs]] correctly. [[Cross Model|Defining a new cross schema]] with concrete pattern-action elements solves this problem.
  +
  +
==Troubleshooting==
  +
  +
;NullPointerException in crossing sections
  +
 
<pre>
  +
[9] Crossing sections 'main' and 'main'
  +
  +
Exception in thread "main" java.lang.NullPointerException
  +
at dictools.cross.DicCross.crossSections(DicCross.java:342)
  +
at dictools.cross.DicCross.crossDictionaries(DicCross.java:233)
 
</pre>
  +
  +
If you get this error, try and remove unused sections from the bilingual dictionaries until you have only the 'main' sections left.
  +
  +
  +
;java.lang.IllegalArgumentException: Comparison method violates its general contract!
  +
<pre>Exception in thread "main" java.lang.IllegalArgumentException: Comparison method violates its general contract!
  +
at java.util.TimSort.mergeHi(TimSort.java:868)
  +
at java.util.TimSort.mergeAt(TimSort.java:485)
  +
at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
  +
at java.util.TimSort.sort(TimSort.java:223)
  +
at java.util.TimSort.sort(TimSort.java:173)
  +
at java.util.Arrays.sort(Arrays.java:659)
  +
at java.util.Collections.sort(Collections.java:217)
  +
at dictools.cross.DicCross.crossDictionaries(DicCross.java:243)
  +
at dictools.cross.DicCross.actionCross(DicCross.java:729)
  +
at dictools.cross.DicCross.doCross(DicCross.java:722)
  +
at dictools.ProcessDics.process_cross_param(ProcessDics.java:462)
  +
at dictools.ProcessDics.processArguments(ProcessDics.java:206)
  +
at dictools.ProcessDics.main(ProcessDics.java:79)
  +
at ProcessDics.main(ProcessDics.java:30)
 
</pre>
  +
This means your java is too fresh. Get an older java (1.6 seems to work).
   
 
== See also ==
 
== See also ==
  +
* [[Crossdics Example|Crossing language pairs: a full example]]
 
* [[Linguistic Resources Document|How to create a Linguistic Resources Document]]
 
* [[Linguistic Resources Document|How to create a Linguistic Resources Document]]
 
* [[Cross Model|How to define a new cross schema]]
 
* [[Cross Model|How to define a new cross schema]]
Line 80: Line 101:
 
* [[Sort a dictionary|How to sort a dictionary]]
 
* [[Sort a dictionary|How to sort a dictionary]]
 
* [[Merge dictionaries|How to merge dictionaries]]
 
* [[Merge dictionaries|How to merge dictionaries]]
<!--
 
 
* [[Reverse a dictionary|How to reverse a bilingual dictionary]]
 
* [[Reverse a dictionary|How to reverse a bilingual dictionary]]
-->
 
   
  +
[[Category:Documentation]]
 
[[Category:Tools]]
+
[[Category:Dixtools]]
[[Category:Development]]
+
[[Category:Documentation in English]]

Latest revision as of 13:29, 6 October 2017

En français

Main article: Building dictionaries

Crossdics (part of apertium-dixtools) is a program that can be used to "cross" language pairs. That is, given language pairs aa-bb and bb-cc it will create a new language pair for aa-cc.

Installing[edit]

See apertium-dixtools.

Using apertium-crossdics[edit]

$ apertium-dixtools cross

Crossing dictionaries[edit]

Using a Linguistic Resources Document[edit]

You can define a Linguistic Resources Document (LRD) and use it to indicate which dictionaries will be used for crossing:

$ apertium-dixtools cross -f my-linguistic-resources.xml sl-tl

Therefore, only 2 parameters are needed:

  • my-linguistic-resources.xml: a document specifying a set of linguistic resources (dictionaries, cross models, corpora, other LRD files, etc).
  • sl-tl: source language (sl) and target language (tl).


Without a Linguistic Resources Document[edit]

First of all, copy linguistic data into folder "dics"

  • Bilingual dictionary B-A: apertium-bb-aa.bb-aa.dix
  • Bilingual dictionary B-C: apertium-bb-cc.bb-cc.dix
  • Morphological dictionary A: apertium-bb-aa.aa.dix
  • Morphological dictionary C: apertium-bb-cc.cc.dix


Please note that:

  • all dictionaries must be in the form:
    • apertium-xx-yy.xx-yy.dix (bilingual dictionaries)
    • apertium-xx-yy.xx.dix (morphological dictionaries)
  • the common language (B) must be in the left side, that is, dictionaries in the form B-A and B-C
  • use "-r" instead of "-n" if the dictionary has to be reversed (apertium-aa-bb.aa-bb.dix to apertium-bb-aa.bb-aa.dix)


Use the apertium-dixtools script to cross the dictionaries:

$ apertium-dixtools cross-param monA.dix -n bilBA.dix -n bilBC-dix monC.dix

An example crossing es-ca and es-pt to get the ca-pt pair.

$ apertium-dixtools cross-param dics/apertium-es-ca.ca.dix -n dics/apertium-es-ca.es-ca.dix -n dics/apertium-es-pt.es-pt.dix dics/apertium-es-pt.pt.dix
I was not able to do this without appending /usr/local/apertium-dixtools/schemas/cross-model.xml to the end of the command. --unhammer (talk) 21:19, 14 April 2015 (CEST)

Customising cross actions[edit]

By default, the crossdics tool uses a simple cross model defining very simple rules for crossing two sets of dictionaries. However, more specific cross actions might be needed in order to cross certain language pairs correctly. Defining a new cross schema with concrete pattern-action elements solves this problem.

Troubleshooting[edit]

NullPointerException in crossing sections
[9] Crossing sections 'main' and 'main'

Exception in thread "main" java.lang.NullPointerException
        at dictools.cross.DicCross.crossSections(DicCross.java:342)
        at dictools.cross.DicCross.crossDictionaries(DicCross.java:233)

If you get this error, try and remove unused sections from the bilingual dictionaries until you have only the 'main' sections left.


java.lang.IllegalArgumentException
Comparison method violates its general contract!
Exception in thread "main" java.lang.IllegalArgumentException: Comparison method violates its general contract!
        at java.util.TimSort.mergeHi(TimSort.java:868)
        at java.util.TimSort.mergeAt(TimSort.java:485)
        at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
        at java.util.TimSort.sort(TimSort.java:223)
        at java.util.TimSort.sort(TimSort.java:173)
        at java.util.Arrays.sort(Arrays.java:659)
        at java.util.Collections.sort(Collections.java:217)
        at dictools.cross.DicCross.crossDictionaries(DicCross.java:243)
        at dictools.cross.DicCross.actionCross(DicCross.java:729)
        at dictools.cross.DicCross.doCross(DicCross.java:722)
        at dictools.ProcessDics.process_cross_param(ProcessDics.java:462)
        at dictools.ProcessDics.processArguments(ProcessDics.java:206)
        at dictools.ProcessDics.main(ProcessDics.java:79)
        at ProcessDics.main(ProcessDics.java:30)

This means your java is too fresh. Get an older java (1.6 seems to work).

See also[edit]