Difference between revisions of "Crossdics"
(apertium-tinylex quickstart) |
(Restoring crossdics article) |
||
Line 1: | Line 1: | ||
{{TOCD}} |
{{TOCD}} |
||
+ | {{main|Building dictionaries}} |
||
+ | '''Crossdics''' (part of [[apertium-dixtools]]) is a program that can be used to "cross" language pairs. That is, given language pairs <code>aa-bb</code> and <code>bb-cc</code> it will create a new language pair for <code>aa-cc</code>. |
||
− | '''TinyLex''' is a J2ME (Java 2 Micro Edition) program for mobile devices which |
||
− | looks up dictionary entries. It is free software and released under the |
||
− | terms of the GNU General Public License v2.0. |
||
− | == |
+ | == Download == |
+ | <pre> |
||
− | * Ant |
||
⚫ | |||
− | * Java Development Kit 6 (JDK6) |
||
+ | </pre> |
||
− | * Netbeans (>6.0) (some libraries are needed to build the project) |
||
− | * Mobile Device supporting J2ME MIDP 2.0 |
||
− | == |
+ | == Software prerequisites == |
+ | |||
+ | You will need to install [http://ant.apache.org/ Ant] and [http://java.sun.com/javase/downloads/index.jsp Java Development Kit 6 (JDK6)] |
||
+ | |||
+ | $ sudo apt-get install ant sun-java6-jdk |
||
+ | |||
+ | == Compiling == |
||
<pre> |
<pre> |
||
⚫ | |||
⚫ | |||
⚫ | |||
</pre> |
</pre> |
||
− | == |
+ | == Installing == |
+ | $ sudo ant install |
||
− | + | == Using apertium-crossdics == |
|
⚫ | |||
+ | $ apertium-crossdics |
||
− | It is recommended to open this project with Netbeans before (only the first time). This will create a 'private' directory inside 'nbproject' with some user properties (this is the easiest way to get them). |
||
+ | == Crossing dictionaries == |
||
− | After that, |
||
+ | === Using a Linguistic Resources Document === |
||
⚫ | |||
+ | |||
+ | You can define a [[Linguistic Resources Document]] (LRD) and use it to indicate which dictionaries will be used for crossing: |
||
+ | |||
+ | $ apertium-crossdics -f my-linguistic-resources.xml sl-tl |
||
+ | |||
+ | Therefore, only 2 parameters are needed: |
||
+ | * '''my-linguistic-resources.xml''': a document specifying a set of linguistic resources (dictionaries, cross models, corpora, other LRD files, etc). |
||
+ | * '''sl-tl''': source language (sl) and target language (tl). |
||
+ | |||
+ | Note that this form uses the <code>apertium-dixtools</code> script (the <code>apertium-crossdics</code> script still uses the "old" form) |
||
+ | |||
+ | === Without a Linguistic Resources Document === |
||
+ | |||
+ | First of all, copy linguistic data into folder "dics" |
||
+ | |||
+ | * Bilingual dictionary A-B: <code>apertium-bb-aa.bb-aa.dix</code> |
||
+ | * Bilingual dictionary B-C: <code>apertium-bb-cc.bb-cc.dix</code> |
||
+ | * Morphological dictionary A: <code>apertium-bb-aa.aa.dix</code> |
||
+ | * Morphological dictionary C: <code>apertium-bb-cc.cc.dix</code> |
||
+ | |||
+ | |||
+ | Please note that: |
||
+ | * all dictionaries must be in the form: |
||
+ | ** <code>apertium-xx-yy.xx-yy.dix</code> (bilingual dictionaries) |
||
+ | ** <code>apertium-xx-yy.xx.dix</code> (morphological dictionaries) |
||
+ | * the common language (B) must be in the left side, that is, dictionaries in the form B-A and B-C |
||
+ | * use "-r" instead of "-n" if the dictionary has to be [[Reverse a dictionary|reversed]] (apertium-aa-bb.aa-bb.dix to apertium-bb-aa.bb-aa.dix) |
||
+ | |||
+ | |||
+ | Use the '''apertium-dixtools''' script to cross the dictionaries: |
||
+ | |||
+ | $ apertium-dixtools cross-param '''monA.dix''' -n '''bilAB.dix''' -n '''bilBC-dix''' '''monC.dix''' |
||
+ | |||
+ | An example crossing '''es-ca''' and '''es-pt''' to get the '''ca-pt''' pair. |
||
+ | |||
+ | <pre> |
||
+ | $ apertium-dixtools cross-param dics/apertium-es-ca.ca.dix -n dics/apertium-es-ca.es-ca.dix -n dics/apertium-es-pt.es-pt.dix dics/apertium-es-pt.pt.dix |
||
+ | </pre> |
||
− | == |
+ | == Customizing cross actions == |
+ | By default, the crossdics tool uses a simple cross model defining very simple rules for crossing two sets of dictionaries. However, more specific cross actions might be needed in order to cross certain [[List of language pairs|language pairs]] correctly. [[Cross Model|Defining a new cross schema]] with concrete pattern-action elements solves this problem. |
||
− | Copy/send the .jar file to your mobile device. |
||
+ | == See also == |
||
− | dist/en_es/en-es-apertium-tinylex-0.2.jar |
||
+ | * [[Crossdics Example|Crossing language pairs: a full example]] |
||
− | dist/es_ca/es-ca-apertium-tinylex-0.2.jar |
||
+ | * [[Linguistic Resources Document|How to create a Linguistic Resources Document]] |
||
− | dist/fr_ca/fr-ca-apertium-tinylex-0.2.jar |
||
+ | * [[Cross Model|How to define a new cross schema]] |
||
− | ... |
||
+ | * [[List of language pairs|List of available language pairs]] |
||
+ | * [[Sort a dictionary|How to sort a dictionary]] |
||
+ | * [[Merge dictionaries|How to merge dictionaries]] |
||
+ | <!-- |
||
+ | * [[Reverse a dictionary|How to reverse a bilingual dictionary]] |
||
+ | --> |
||
[[Category:Documentation]] |
[[Category:Documentation]] |
||
+ | [[Category:Tools]] |
||
[[Category:Development]] |
[[Category:Development]] |
Revision as of 10:38, 29 April 2008
- Main article: Building dictionaries
Crossdics (part of apertium-dixtools) is a program that can be used to "cross" language pairs. That is, given language pairs aa-bb
and bb-cc
it will create a new language pair for aa-cc
.
Download
$ svn co https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-dixtools
Software prerequisites
You will need to install Ant and Java Development Kit 6 (JDK6)
$ sudo apt-get install ant sun-java6-jdk
Compiling
$ cd apertium-dixtools $ ant jar
Installing
$ sudo ant install
Using apertium-crossdics
$ apertium-crossdics
Crossing dictionaries
Using a Linguistic Resources Document
You can define a Linguistic Resources Document (LRD) and use it to indicate which dictionaries will be used for crossing:
$ apertium-crossdics -f my-linguistic-resources.xml sl-tl
Therefore, only 2 parameters are needed:
- my-linguistic-resources.xml: a document specifying a set of linguistic resources (dictionaries, cross models, corpora, other LRD files, etc).
- sl-tl: source language (sl) and target language (tl).
Note that this form uses the apertium-dixtools
script (the apertium-crossdics
script still uses the "old" form)
Without a Linguistic Resources Document
First of all, copy linguistic data into folder "dics"
- Bilingual dictionary A-B:
apertium-bb-aa.bb-aa.dix
- Bilingual dictionary B-C:
apertium-bb-cc.bb-cc.dix
- Morphological dictionary A:
apertium-bb-aa.aa.dix
- Morphological dictionary C:
apertium-bb-cc.cc.dix
Please note that:
- all dictionaries must be in the form:
apertium-xx-yy.xx-yy.dix
(bilingual dictionaries)apertium-xx-yy.xx.dix
(morphological dictionaries)
- the common language (B) must be in the left side, that is, dictionaries in the form B-A and B-C
- use "-r" instead of "-n" if the dictionary has to be reversed (apertium-aa-bb.aa-bb.dix to apertium-bb-aa.bb-aa.dix)
Use the apertium-dixtools script to cross the dictionaries:
$ apertium-dixtools cross-param monA.dix -n bilAB.dix -n bilBC-dix monC.dix
An example crossing es-ca and es-pt to get the ca-pt pair.
$ apertium-dixtools cross-param dics/apertium-es-ca.ca.dix -n dics/apertium-es-ca.es-ca.dix -n dics/apertium-es-pt.es-pt.dix dics/apertium-es-pt.pt.dix
Customizing cross actions
By default, the crossdics tool uses a simple cross model defining very simple rules for crossing two sets of dictionaries. However, more specific cross actions might be needed in order to cross certain language pairs correctly. Defining a new cross schema with concrete pattern-action elements solves this problem.