Difference between revisions of "Crossdics"

From Apertium
Jump to navigation Jump to search
(apertium-tinylex quickstart)
m (B-A, not A-B)
 
(11 intermediate revisions by 6 users not shown)
Line 1: Line 1:
  +
[[Crossdics : Génération d'une paire de langue à partir de 2 autres|En français]]
  +
 
{{TOCD}}
 
{{TOCD}}
  +
{{main|Building dictionaries}}
   
  +
'''Crossdics''' (part of [[apertium-dixtools]]) is a program that can be used to "cross" language pairs. That is, given language pairs <code>aa-bb</code> and <code>bb-cc</code> it will create a new language pair for <code>aa-cc</code>.
'''TinyLex''' is a J2ME (Java 2 Micro Edition) program for mobile devices which
 
looks up dictionary entries. It is free software and released under the
 
terms of the GNU General Public License v2.0.
 
   
== Requirements ==
+
== Installing ==
  +
See [[apertium-dixtools]].
   
  +
== Using apertium-crossdics ==
* Ant
 
* Java Development Kit 6 (JDK6)
 
* Netbeans (>6.0) (some libraries are needed to build the project)
 
* Mobile Device supporting J2ME MIDP 2.0
 
   
  +
$ apertium-dixtools cross
== Download ==
 
  +
  +
== Crossing dictionaries ==
  +
  +
=== Using a Linguistic Resources Document ===
  +
  +
You can define a [[Linguistic Resources Document]] (LRD) and use it to indicate which dictionaries will be used for crossing:
  +
  +
$ apertium-dixtools cross -f my-linguistic-resources.xml sl-tl
  +
  +
Therefore, only 2 parameters are needed:
  +
* '''my-linguistic-resources.xml''': a document specifying a set of linguistic resources (dictionaries, cross models, corpora, other LRD files, etc).
  +
* '''sl-tl''': source language (sl) and target language (tl).
  +
  +
  +
=== Without a Linguistic Resources Document ===
  +
  +
First of all, copy linguistic data into folder "dics"
  +
  +
* Bilingual dictionary B-A: <code>apertium-bb-aa.bb-aa.dix</code>
  +
* Bilingual dictionary B-C: <code>apertium-bb-cc.bb-cc.dix</code>
  +
* Morphological dictionary A: <code>apertium-bb-aa.aa.dix</code>
  +
* Morphological dictionary C: <code>apertium-bb-cc.cc.dix</code>
  +
  +
  +
Please note that:
  +
* all dictionaries must be in the form:
  +
** <code>apertium-xx-yy.xx-yy.dix</code> (bilingual dictionaries)
  +
** <code>apertium-xx-yy.xx.dix</code> (morphological dictionaries)
  +
* the common language (B) must be in the left side, that is, dictionaries in the form B-A and B-C
  +
* use "-r" instead of "-n" if the dictionary has to be [[Reverse a dictionary|reversed]] (apertium-aa-bb.aa-bb.dix to apertium-bb-aa.bb-aa.dix)
  +
  +
  +
Use the '''apertium-dixtools''' script to cross the dictionaries:
  +
  +
$ apertium-dixtools cross-param '''monA.dix''' -n '''bilBA.dix''' -n '''bilBC-dix''' '''monC.dix'''
  +
  +
An example crossing '''es-ca''' and '''es-pt''' to get the '''ca-pt''' pair.
   
 
<pre>
 
<pre>
$ svn co https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-mobile/apertium-tinylex
+
$ apertium-dixtools cross-param dics/apertium-es-ca.ca.dix -n dics/apertium-es-ca.es-ca.dix -n dics/apertium-es-pt.es-pt.dix dics/apertium-es-pt.pt.dix
 
</pre>
 
</pre>
   
  +
: I was not able to do this without appending /usr/local/apertium-dixtools/schemas/cross-model.xml to the end of the command. --[[User:Unhammer|unhammer]] ([[User talk:Unhammer|talk]]) 21:19, 14 April 2015 (CEST)
== Build ==
 
   
  +
== Customising cross actions ==
$ cd apertium-mobile
 
$ cd apertium-tinylex
 
   
  +
By default, the crossdics tool uses a simple cross model defining very simple rules for crossing two sets of dictionaries. However, more specific cross actions might be needed in order to cross certain [[List of language pairs|language pairs]] correctly. [[Cross Model|Defining a new cross schema]] with concrete pattern-action elements solves this problem.
It is recommended to open this project with Netbeans before (only the first time). This will create a 'private' directory inside 'nbproject' with some user properties (this is the easiest way to get them).
 
   
  +
==Troubleshooting==
After that,
 
   
  +
;NullPointerException in crossing sections
$ ant jar-all
 
   
  +
<pre>
== Running the application ==
 
  +
[9] Crossing sections 'main' and 'main'
  +
  +
Exception in thread "main" java.lang.NullPointerException
  +
at dictools.cross.DicCross.crossSections(DicCross.java:342)
  +
at dictools.cross.DicCross.crossDictionaries(DicCross.java:233)
  +
</pre>
  +
  +
If you get this error, try and remove unused sections from the bilingual dictionaries until you have only the 'main' sections left.
  +
  +
  +
;java.lang.IllegalArgumentException: Comparison method violates its general contract!
  +
<pre>Exception in thread "main" java.lang.IllegalArgumentException: Comparison method violates its general contract!
  +
at java.util.TimSort.mergeHi(TimSort.java:868)
  +
at java.util.TimSort.mergeAt(TimSort.java:485)
  +
at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
  +
at java.util.TimSort.sort(TimSort.java:223)
  +
at java.util.TimSort.sort(TimSort.java:173)
  +
at java.util.Arrays.sort(Arrays.java:659)
  +
at java.util.Collections.sort(Collections.java:217)
  +
at dictools.cross.DicCross.crossDictionaries(DicCross.java:243)
  +
at dictools.cross.DicCross.actionCross(DicCross.java:729)
  +
at dictools.cross.DicCross.doCross(DicCross.java:722)
  +
at dictools.ProcessDics.process_cross_param(ProcessDics.java:462)
  +
at dictools.ProcessDics.processArguments(ProcessDics.java:206)
  +
at dictools.ProcessDics.main(ProcessDics.java:79)
  +
at ProcessDics.main(ProcessDics.java:30)
  +
</pre>
  +
This means your java is too fresh. Get an older java (1.6 seems to work).
   
  +
== See also ==
Copy/send the .jar file to your mobile device.
 
  +
* [[Crossdics Example|Crossing language pairs: a full example]]
  +
* [[Linguistic Resources Document|How to create a Linguistic Resources Document]]
  +
* [[Cross Model|How to define a new cross schema]]
  +
* [[List of language pairs|List of available language pairs]]
  +
* [[Sort a dictionary|How to sort a dictionary]]
  +
* [[Merge dictionaries|How to merge dictionaries]]
  +
* [[Reverse a dictionary|How to reverse a bilingual dictionary]]
   
dist/en_es/en-es-apertium-tinylex-0.2.jar
 
dist/es_ca/es-ca-apertium-tinylex-0.2.jar
 
dist/fr_ca/fr-ca-apertium-tinylex-0.2.jar
 
...
 
   
[[Category:Documentation]]
+
[[Category:Dixtools]]
[[Category:Development]]
+
[[Category:Documentation in English]]

Latest revision as of 13:29, 6 October 2017

En français

Main article: Building dictionaries

Crossdics (part of apertium-dixtools) is a program that can be used to "cross" language pairs. That is, given language pairs aa-bb and bb-cc it will create a new language pair for aa-cc.

Installing[edit]

See apertium-dixtools.

Using apertium-crossdics[edit]

$ apertium-dixtools cross

Crossing dictionaries[edit]

Using a Linguistic Resources Document[edit]

You can define a Linguistic Resources Document (LRD) and use it to indicate which dictionaries will be used for crossing:

$ apertium-dixtools cross -f my-linguistic-resources.xml sl-tl

Therefore, only 2 parameters are needed:

  • my-linguistic-resources.xml: a document specifying a set of linguistic resources (dictionaries, cross models, corpora, other LRD files, etc).
  • sl-tl: source language (sl) and target language (tl).


Without a Linguistic Resources Document[edit]

First of all, copy linguistic data into folder "dics"

  • Bilingual dictionary B-A: apertium-bb-aa.bb-aa.dix
  • Bilingual dictionary B-C: apertium-bb-cc.bb-cc.dix
  • Morphological dictionary A: apertium-bb-aa.aa.dix
  • Morphological dictionary C: apertium-bb-cc.cc.dix


Please note that:

  • all dictionaries must be in the form:
    • apertium-xx-yy.xx-yy.dix (bilingual dictionaries)
    • apertium-xx-yy.xx.dix (morphological dictionaries)
  • the common language (B) must be in the left side, that is, dictionaries in the form B-A and B-C
  • use "-r" instead of "-n" if the dictionary has to be reversed (apertium-aa-bb.aa-bb.dix to apertium-bb-aa.bb-aa.dix)


Use the apertium-dixtools script to cross the dictionaries:

$ apertium-dixtools cross-param monA.dix -n bilBA.dix -n bilBC-dix monC.dix

An example crossing es-ca and es-pt to get the ca-pt pair.

$ apertium-dixtools cross-param dics/apertium-es-ca.ca.dix -n dics/apertium-es-ca.es-ca.dix -n dics/apertium-es-pt.es-pt.dix dics/apertium-es-pt.pt.dix
I was not able to do this without appending /usr/local/apertium-dixtools/schemas/cross-model.xml to the end of the command. --unhammer (talk) 21:19, 14 April 2015 (CEST)

Customising cross actions[edit]

By default, the crossdics tool uses a simple cross model defining very simple rules for crossing two sets of dictionaries. However, more specific cross actions might be needed in order to cross certain language pairs correctly. Defining a new cross schema with concrete pattern-action elements solves this problem.

Troubleshooting[edit]

NullPointerException in crossing sections
[9] Crossing sections 'main' and 'main'

Exception in thread "main" java.lang.NullPointerException
        at dictools.cross.DicCross.crossSections(DicCross.java:342)
        at dictools.cross.DicCross.crossDictionaries(DicCross.java:233)

If you get this error, try and remove unused sections from the bilingual dictionaries until you have only the 'main' sections left.


java.lang.IllegalArgumentException
Comparison method violates its general contract!
Exception in thread "main" java.lang.IllegalArgumentException: Comparison method violates its general contract!
        at java.util.TimSort.mergeHi(TimSort.java:868)
        at java.util.TimSort.mergeAt(TimSort.java:485)
        at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
        at java.util.TimSort.sort(TimSort.java:223)
        at java.util.TimSort.sort(TimSort.java:173)
        at java.util.Arrays.sort(Arrays.java:659)
        at java.util.Collections.sort(Collections.java:217)
        at dictools.cross.DicCross.crossDictionaries(DicCross.java:243)
        at dictools.cross.DicCross.actionCross(DicCross.java:729)
        at dictools.cross.DicCross.doCross(DicCross.java:722)
        at dictools.ProcessDics.process_cross_param(ProcessDics.java:462)
        at dictools.ProcessDics.processArguments(ProcessDics.java:206)
        at dictools.ProcessDics.main(ProcessDics.java:79)
        at ProcessDics.main(ProcessDics.java:30)

This means your java is too fresh. Get an older java (1.6 seems to work).

See also[edit]