Difference between revisions of "Crossdics"
m (→Crossing dictionaries: --> apertium-crossdics using a Linguistic Resources Document) |
m (B-A, not A-B) |
||
(22 intermediate revisions by 6 users not shown) | |||
Line 1: | Line 1: | ||
[[Crossdics : Génération d'une paire de langue à partir de 2 autres|En français]] |
|||
{{TOCD}} |
{{TOCD}} |
||
{{main|Building dictionaries}} |
{{main|Building dictionaries}} |
||
'''Crossdics''' (part of [[apertium-dixtools]]) is a program that can be used to "cross" language pairs. That is, given language pairs <code>aa-bb</code> and <code>bb-cc</code> it will create a new language pair for <code>aa-cc</code>. |
|||
== Download == |
|||
⚫ | |||
$ svn co https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-crossdics |
|||
⚫ | |||
== Software prerequisites == |
|||
You will need to install [http://ant.apache.org/ Ant] and [http://java.sun.com/javase/downloads/index.jsp Java Development Kit 6 (JDK6)] |
|||
$ sudo apt-get install ant sun-java6-jdk |
|||
== Compiling == |
|||
⚫ | |||
$ cd apertium-crossdics |
|||
$ ant jar |
|||
</pre> |
|||
== Installing == |
== Installing == |
||
See [[apertium-dixtools]]. |
|||
$ sudo ant install |
|||
== Using apertium-crossdics == |
== Using apertium-crossdics == |
||
$ apertium- |
$ apertium-dixtools cross |
||
== Crossing dictionaries == |
== Crossing dictionaries == |
||
Line 34: | Line 19: | ||
You can define a [[Linguistic Resources Document]] (LRD) and use it to indicate which dictionaries will be used for crossing: |
You can define a [[Linguistic Resources Document]] (LRD) and use it to indicate which dictionaries will be used for crossing: |
||
$ apertium- |
$ apertium-dixtools cross -f my-linguistic-resources.xml sl-tl |
||
Therefore, only 2 parameters are needed: |
Therefore, only 2 parameters are needed: |
||
* '''my-linguistic-resources.xml''': a document specifying a set of linguistic resources ( |
* '''my-linguistic-resources.xml''': a document specifying a set of linguistic resources (dictionaries, cross models, corpora, other LRD files, etc). |
||
* '''sl-tl: source language (sl) |
* '''sl-tl''': source language (sl) and target language (tl). |
||
Note that this form uses the <code>apertium-dictools</code> script (the <code>apertium-crossdics</code> script still uses the "old" form) |
|||
=== Without a Linguistic Resources Document === |
=== Without a Linguistic Resources Document === |
||
Line 46: | Line 30: | ||
First of all, copy linguistic data into folder "dics" |
First of all, copy linguistic data into folder "dics" |
||
* Bilingual dictionary |
* Bilingual dictionary B-A: <code>apertium-bb-aa.bb-aa.dix</code> |
||
* Bilingual dictionary B-C: <code>apertium-bb-cc.bb-cc.dix</code> |
* Bilingual dictionary B-C: <code>apertium-bb-cc.bb-cc.dix</code> |
||
* Morphological dictionary A: <code>apertium-bb-aa.aa.dix</code> |
* Morphological dictionary A: <code>apertium-bb-aa.aa.dix</code> |
||
Line 60: | Line 44: | ||
Use the '''apertium- |
Use the '''apertium-dixtools''' script to cross the dictionaries: |
||
$ apertium- |
$ apertium-dixtools cross-param '''monA.dix''' -n '''bilBA.dix''' -n '''bilBC-dix''' '''monC.dix''' |
||
An example crossing '''es-ca''' and '''es-pt''' to get the '''ca-pt''' pair. |
An example crossing '''es-ca''' and '''es-pt''' to get the '''ca-pt''' pair. |
||
<pre> |
<pre> |
||
$ apertium- |
$ apertium-dixtools cross-param dics/apertium-es-ca.ca.dix -n dics/apertium-es-ca.es-ca.dix -n dics/apertium-es-pt.es-pt.dix dics/apertium-es-pt.pt.dix |
||
</pre> |
</pre> |
||
: I was not able to do this without appending /usr/local/apertium-dixtools/schemas/cross-model.xml to the end of the command. --[[User:Unhammer|unhammer]] ([[User talk:Unhammer|talk]]) 21:19, 14 April 2015 (CEST) |
|||
⚫ | |||
⚫ | |||
By default, the crossdics tool uses a simple cross model defining very simple rules for crossing two sets of dictionaries. However, more specific cross actions might be needed in order to cross certain [[List of language pairs|language pairs]] correctly. [[Cross Model|Defining a new cross schema]] with concrete pattern-action elements solves this problem. |
By default, the crossdics tool uses a simple cross model defining very simple rules for crossing two sets of dictionaries. However, more specific cross actions might be needed in order to cross certain [[List of language pairs|language pairs]] correctly. [[Cross Model|Defining a new cross schema]] with concrete pattern-action elements solves this problem. |
||
==Troubleshooting== |
|||
;NullPointerException in crossing sections |
|||
⚫ | |||
[9] Crossing sections 'main' and 'main' |
|||
Exception in thread "main" java.lang.NullPointerException |
|||
at dictools.cross.DicCross.crossSections(DicCross.java:342) |
|||
at dictools.cross.DicCross.crossDictionaries(DicCross.java:233) |
|||
⚫ | |||
If you get this error, try and remove unused sections from the bilingual dictionaries until you have only the 'main' sections left. |
|||
;java.lang.IllegalArgumentException: Comparison method violates its general contract! |
|||
<pre>Exception in thread "main" java.lang.IllegalArgumentException: Comparison method violates its general contract! |
|||
at java.util.TimSort.mergeHi(TimSort.java:868) |
|||
at java.util.TimSort.mergeAt(TimSort.java:485) |
|||
at java.util.TimSort.mergeForceCollapse(TimSort.java:426) |
|||
at java.util.TimSort.sort(TimSort.java:223) |
|||
at java.util.TimSort.sort(TimSort.java:173) |
|||
at java.util.Arrays.sort(Arrays.java:659) |
|||
at java.util.Collections.sort(Collections.java:217) |
|||
at dictools.cross.DicCross.crossDictionaries(DicCross.java:243) |
|||
at dictools.cross.DicCross.actionCross(DicCross.java:729) |
|||
at dictools.cross.DicCross.doCross(DicCross.java:722) |
|||
at dictools.ProcessDics.process_cross_param(ProcessDics.java:462) |
|||
at dictools.ProcessDics.processArguments(ProcessDics.java:206) |
|||
at dictools.ProcessDics.main(ProcessDics.java:79) |
|||
at ProcessDics.main(ProcessDics.java:30) |
|||
⚫ | |||
This means your java is too fresh. Get an older java (1.6 seems to work). |
|||
== See also == |
== See also == |
||
* [[Crossdics Example|Crossing language pairs: a full example]] |
|||
* [[Linguistic Resources Document|How to create a Linguistic Resources Document]] |
|||
* [[Cross Model|How to define a new cross schema]] |
* [[Cross Model|How to define a new cross schema]] |
||
* [[List of language pairs|List of available language pairs]] |
* [[List of language pairs|List of available language pairs]] |
||
* [[Sort a dictionary|How to sort a dictionary]] |
* [[Sort a dictionary|How to sort a dictionary]] |
||
* [[Merge dictionaries|How to merge dictionaries]] |
* [[Merge dictionaries|How to merge dictionaries]] |
||
<!-- |
|||
* [[Reverse a dictionary|How to reverse a bilingual dictionary]] |
* [[Reverse a dictionary|How to reverse a bilingual dictionary]] |
||
--> |
|||
[[Category:Documentation]] |
|||
[[Category: |
[[Category:Dixtools]] |
||
[[Category: |
[[Category:Documentation in English]] |
Latest revision as of 13:29, 6 October 2017
- Main article: Building dictionaries
Crossdics (part of apertium-dixtools) is a program that can be used to "cross" language pairs. That is, given language pairs aa-bb
and bb-cc
it will create a new language pair for aa-cc
.
Installing[edit]
See apertium-dixtools.
Using apertium-crossdics[edit]
$ apertium-dixtools cross
Crossing dictionaries[edit]
Using a Linguistic Resources Document[edit]
You can define a Linguistic Resources Document (LRD) and use it to indicate which dictionaries will be used for crossing:
$ apertium-dixtools cross -f my-linguistic-resources.xml sl-tl
Therefore, only 2 parameters are needed:
- my-linguistic-resources.xml: a document specifying a set of linguistic resources (dictionaries, cross models, corpora, other LRD files, etc).
- sl-tl: source language (sl) and target language (tl).
Without a Linguistic Resources Document[edit]
First of all, copy linguistic data into folder "dics"
- Bilingual dictionary B-A:
apertium-bb-aa.bb-aa.dix
- Bilingual dictionary B-C:
apertium-bb-cc.bb-cc.dix
- Morphological dictionary A:
apertium-bb-aa.aa.dix
- Morphological dictionary C:
apertium-bb-cc.cc.dix
Please note that:
- all dictionaries must be in the form:
apertium-xx-yy.xx-yy.dix
(bilingual dictionaries)apertium-xx-yy.xx.dix
(morphological dictionaries)
- the common language (B) must be in the left side, that is, dictionaries in the form B-A and B-C
- use "-r" instead of "-n" if the dictionary has to be reversed (apertium-aa-bb.aa-bb.dix to apertium-bb-aa.bb-aa.dix)
Use the apertium-dixtools script to cross the dictionaries:
$ apertium-dixtools cross-param monA.dix -n bilBA.dix -n bilBC-dix monC.dix
An example crossing es-ca and es-pt to get the ca-pt pair.
$ apertium-dixtools cross-param dics/apertium-es-ca.ca.dix -n dics/apertium-es-ca.es-ca.dix -n dics/apertium-es-pt.es-pt.dix dics/apertium-es-pt.pt.dix
- I was not able to do this without appending /usr/local/apertium-dixtools/schemas/cross-model.xml to the end of the command. --unhammer (talk) 21:19, 14 April 2015 (CEST)
Customising cross actions[edit]
By default, the crossdics tool uses a simple cross model defining very simple rules for crossing two sets of dictionaries. However, more specific cross actions might be needed in order to cross certain language pairs correctly. Defining a new cross schema with concrete pattern-action elements solves this problem.
Troubleshooting[edit]
- NullPointerException in crossing sections
[9] Crossing sections 'main' and 'main' Exception in thread "main" java.lang.NullPointerException at dictools.cross.DicCross.crossSections(DicCross.java:342) at dictools.cross.DicCross.crossDictionaries(DicCross.java:233)
If you get this error, try and remove unused sections from the bilingual dictionaries until you have only the 'main' sections left.
- java.lang.IllegalArgumentException
- Comparison method violates its general contract!
Exception in thread "main" java.lang.IllegalArgumentException: Comparison method violates its general contract! at java.util.TimSort.mergeHi(TimSort.java:868) at java.util.TimSort.mergeAt(TimSort.java:485) at java.util.TimSort.mergeForceCollapse(TimSort.java:426) at java.util.TimSort.sort(TimSort.java:223) at java.util.TimSort.sort(TimSort.java:173) at java.util.Arrays.sort(Arrays.java:659) at java.util.Collections.sort(Collections.java:217) at dictools.cross.DicCross.crossDictionaries(DicCross.java:243) at dictools.cross.DicCross.actionCross(DicCross.java:729) at dictools.cross.DicCross.doCross(DicCross.java:722) at dictools.ProcessDics.process_cross_param(ProcessDics.java:462) at dictools.ProcessDics.processArguments(ProcessDics.java:206) at dictools.ProcessDics.main(ProcessDics.java:79) at ProcessDics.main(ProcessDics.java:30)
This means your java is too fresh. Get an older java (1.6 seems to work).