Difference between revisions of "Crossdics"

From Apertium
Jump to navigation Jump to search
(apertium-tinylex quickstart)
(Restoring crossdics article)
Line 1: Line 1:
 
{{TOCD}}
 
{{TOCD}}
  +
{{main|Building dictionaries}}
   
  +
'''Crossdics''' (part of [[apertium-dixtools]]) is a program that can be used to "cross" language pairs. That is, given language pairs <code>aa-bb</code> and <code>bb-cc</code> it will create a new language pair for <code>aa-cc</code>.
'''TinyLex''' is a J2ME (Java 2 Micro Edition) program for mobile devices which
 
looks up dictionary entries. It is free software and released under the
 
terms of the GNU General Public License v2.0.
 
   
== Requirements ==
+
== Download ==
   
  +
<pre>
* Ant
 
 
$ svn co https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-dixtools
* Java Development Kit 6 (JDK6)
 
  +
</pre>
* Netbeans (>6.0) (some libraries are needed to build the project)
 
* Mobile Device supporting J2ME MIDP 2.0
 
   
== Download ==
+
== Software prerequisites ==
  +
  +
You will need to install [http://ant.apache.org/ Ant] and [http://java.sun.com/javase/downloads/index.jsp Java Development Kit 6 (JDK6)]
  +
  +
$ sudo apt-get install ant sun-java6-jdk
  +
  +
== Compiling ==
   
 
<pre>
 
<pre>
 
$ cd apertium-dixtools
$ svn co https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-mobile/apertium-tinylex
 
 
$ ant jar
 
</pre>
 
</pre>
   
== Build ==
+
== Installing ==
  +
$ sudo ant install
   
$ cd apertium-mobile
+
== Using apertium-crossdics ==
$ cd apertium-tinylex
 
   
  +
$ apertium-crossdics
It is recommended to open this project with Netbeans before (only the first time). This will create a 'private' directory inside 'nbproject' with some user properties (this is the easiest way to get them).
 
   
  +
== Crossing dictionaries ==
After that,
 
   
  +
=== Using a Linguistic Resources Document ===
$ ant jar-all
 
  +
  +
You can define a [[Linguistic Resources Document]] (LRD) and use it to indicate which dictionaries will be used for crossing:
  +
  +
$ apertium-crossdics -f my-linguistic-resources.xml sl-tl
  +
  +
Therefore, only 2 parameters are needed:
  +
* '''my-linguistic-resources.xml''': a document specifying a set of linguistic resources (dictionaries, cross models, corpora, other LRD files, etc).
  +
* '''sl-tl''': source language (sl) and target language (tl).
  +
  +
Note that this form uses the <code>apertium-dixtools</code> script (the <code>apertium-crossdics</code> script still uses the "old" form)
  +
  +
=== Without a Linguistic Resources Document ===
  +
  +
First of all, copy linguistic data into folder "dics"
  +
  +
* Bilingual dictionary A-B: <code>apertium-bb-aa.bb-aa.dix</code>
  +
* Bilingual dictionary B-C: <code>apertium-bb-cc.bb-cc.dix</code>
  +
* Morphological dictionary A: <code>apertium-bb-aa.aa.dix</code>
  +
* Morphological dictionary C: <code>apertium-bb-cc.cc.dix</code>
  +
  +
  +
Please note that:
  +
* all dictionaries must be in the form:
  +
** <code>apertium-xx-yy.xx-yy.dix</code> (bilingual dictionaries)
  +
** <code>apertium-xx-yy.xx.dix</code> (morphological dictionaries)
  +
* the common language (B) must be in the left side, that is, dictionaries in the form B-A and B-C
  +
* use "-r" instead of "-n" if the dictionary has to be [[Reverse a dictionary|reversed]] (apertium-aa-bb.aa-bb.dix to apertium-bb-aa.bb-aa.dix)
  +
  +
  +
Use the '''apertium-dixtools''' script to cross the dictionaries:
  +
  +
$ apertium-dixtools cross-param '''monA.dix''' -n '''bilAB.dix''' -n '''bilBC-dix''' '''monC.dix'''
  +
  +
An example crossing '''es-ca''' and '''es-pt''' to get the '''ca-pt''' pair.
  +
  +
<pre>
  +
$ apertium-dixtools cross-param dics/apertium-es-ca.ca.dix -n dics/apertium-es-ca.es-ca.dix -n dics/apertium-es-pt.es-pt.dix dics/apertium-es-pt.pt.dix
  +
</pre>
   
== Running the application ==
+
== Customizing cross actions ==
   
  +
By default, the crossdics tool uses a simple cross model defining very simple rules for crossing two sets of dictionaries. However, more specific cross actions might be needed in order to cross certain [[List of language pairs|language pairs]] correctly. [[Cross Model|Defining a new cross schema]] with concrete pattern-action elements solves this problem.
Copy/send the .jar file to your mobile device.
 
   
  +
== See also ==
dist/en_es/en-es-apertium-tinylex-0.2.jar
 
  +
* [[Crossdics Example|Crossing language pairs: a full example]]
dist/es_ca/es-ca-apertium-tinylex-0.2.jar
 
  +
* [[Linguistic Resources Document|How to create a Linguistic Resources Document]]
dist/fr_ca/fr-ca-apertium-tinylex-0.2.jar
 
  +
* [[Cross Model|How to define a new cross schema]]
...
 
  +
* [[List of language pairs|List of available language pairs]]
  +
* [[Sort a dictionary|How to sort a dictionary]]
  +
* [[Merge dictionaries|How to merge dictionaries]]
  +
<!--
  +
* [[Reverse a dictionary|How to reverse a bilingual dictionary]]
  +
-->
   
 
[[Category:Documentation]]
 
[[Category:Documentation]]
  +
[[Category:Tools]]
 
[[Category:Development]]
 
[[Category:Development]]

Revision as of 10:38, 29 April 2008

Main article: Building dictionaries

Crossdics (part of apertium-dixtools) is a program that can be used to "cross" language pairs. That is, given language pairs aa-bb and bb-cc it will create a new language pair for aa-cc.

Download

$ svn co https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-dixtools

Software prerequisites

You will need to install Ant and Java Development Kit 6 (JDK6)

$ sudo apt-get install ant sun-java6-jdk

Compiling

$ cd apertium-dixtools
$ ant jar

Installing

$ sudo ant install

Using apertium-crossdics

$ apertium-crossdics

Crossing dictionaries

Using a Linguistic Resources Document

You can define a Linguistic Resources Document (LRD) and use it to indicate which dictionaries will be used for crossing:

$ apertium-crossdics -f my-linguistic-resources.xml sl-tl

Therefore, only 2 parameters are needed:

  • my-linguistic-resources.xml: a document specifying a set of linguistic resources (dictionaries, cross models, corpora, other LRD files, etc).
  • sl-tl: source language (sl) and target language (tl).

Note that this form uses the apertium-dixtools script (the apertium-crossdics script still uses the "old" form)

Without a Linguistic Resources Document

First of all, copy linguistic data into folder "dics"

  • Bilingual dictionary A-B: apertium-bb-aa.bb-aa.dix
  • Bilingual dictionary B-C: apertium-bb-cc.bb-cc.dix
  • Morphological dictionary A: apertium-bb-aa.aa.dix
  • Morphological dictionary C: apertium-bb-cc.cc.dix


Please note that:

  • all dictionaries must be in the form:
    • apertium-xx-yy.xx-yy.dix (bilingual dictionaries)
    • apertium-xx-yy.xx.dix (morphological dictionaries)
  • the common language (B) must be in the left side, that is, dictionaries in the form B-A and B-C
  • use "-r" instead of "-n" if the dictionary has to be reversed (apertium-aa-bb.aa-bb.dix to apertium-bb-aa.bb-aa.dix)


Use the apertium-dixtools script to cross the dictionaries:

$ apertium-dixtools cross-param monA.dix -n bilAB.dix -n bilBC-dix monC.dix

An example crossing es-ca and es-pt to get the ca-pt pair.

$ apertium-dixtools cross-param dics/apertium-es-ca.ca.dix -n dics/apertium-es-ca.es-ca.dix -n dics/apertium-es-pt.es-pt.dix dics/apertium-es-pt.pt.dix

Customizing cross actions

By default, the crossdics tool uses a simple cross model defining very simple rules for crossing two sets of dictionaries. However, more specific cross actions might be needed in order to cross certain language pairs correctly. Defining a new cross schema with concrete pattern-action elements solves this problem.

See also