Difference between revisions of "Language pair packages"

From Apertium
Jump to navigation Jump to search
(Created page with '{{TOCD}} Language pair packages are standalone JARs that can be run independently as well as used from other client apps like Apertium-Caffeine or Apertium-OmegaT. The on…')
 
Line 1: Line 1:
{{TOCD}}
{{TOCD}}
[[Image:Language_pair_packages_screenshot1.png|thumb|350px|right|The English ⇆ Spanish package running as a standalone Java application. The same file could be used from other client applications like [[Apertium-Caffeine]] or [[Apertium-OmegaT]].]]
Language pair packages are standalone JARs that can be run independently as well as used from other client apps like [[Apertium-Caffeine]] or [[Apertium-OmegaT]]. The only prerequisite to use them is a Java VM (apertium, lttoolbox or lttoolbox-java are NOT required), and they can work on practically any platform (Linux, OS X, Windows and even Android!).
'''Language pair packages''' are standalone JARs that can be run independently as well as used from other client applications like [[Apertium-Caffeine]] or [[Apertium-OmegaT]]. The only prerequisite to use them is Java 6 or better (apertium, lttoolbox or lttoolbox-java are ''NOT'' required), and they can work on practically any platform (Linux, OS X, Windows and even Android!).


== Internal structure ==
== Internal structure ==
Line 11: Line 12:
*'''transfer_classes/''': Directory that contains the [[Bytecode_for_transfer|Java bytecode classes for transfer]]. This is only used when using the package from standard Java so, if you are going to use it exclusively from Android, you can delete it.
*'''transfer_classes/''': Directory that contains the [[Bytecode_for_transfer|Java bytecode classes for transfer]]. This is only used when using the package from standard Java so, if you are going to use it exclusively from Android, you can delete it.
*'''org/''': Directory that contains the [[lttoolbox-java]] engine, which makes the package self-executable. If you are not interested on this feature (presumably, because you are going to use the package exclusively from client programs), you can delete it.
*'''org/''': Directory that contains the [[lttoolbox-java]] engine, which makes the package self-executable. If you are not interested on this feature (presumably, because you are going to use the package exclusively from client programs), you can delete it.
*'''META-INF/''': Directory that contains the MANIFEST.MF of this Jar, which is used by Java. It takes a few bytes and can rarely be removed, so please don't touch it unless you know what you are doing.
*'''META-INF/''': Directory that contains the <code>MANIFEST.MF</code> of this Jar, which is used by Java. It takes a few bytes and can rarely be removed, so please don't touch it unless you know what you are doing.
*'''classes.dex''': Dalvik bytecode of the transfer classes, used by Android instead of the standard Java bytecode classes at transfer_classes. If you are not going to use the package from Android, you can delete it.
*'''classes.dex''': Dalvik bytecode of the transfer classes, used by Android instead of the standard Java bytecode classes at <code>transfer_classes/</code>. If you are not going to use the package from Android, you can delete it.
*'''modes''': Text file that lists the path of the available modes inside the package that is used by lttoolbox-java. It takes a few bytes and can rarely be removed, so please don't touch it unless you know what you are doing.
*'''modes''': Text file that lists the path of the available modes inside the package that is used by [[lttoolbox-java]]. It takes a few bytes and can rarely be removed, so please don't touch it unless you know what you are doing.
*'''README''': Text file describing the content of the package.
*'''README''': Text file describing the content of the package.


Line 26: Line 27:
</pre>
</pre>


As you can see, I simply specify the correct location of lttoolbox-java and android-sdk in my machine, and pass the location of eo-en.mode and en-eo.mode (the main modes that correspond to the Esperanto ⇆ English language pair) as argument to apertium-pack-j.
As you can see, I simply specify the correct location of [[lttoolbox-java]] and the [http://developer.android.com/sdk/index.html Android SDK] in my machine, and pass the location of <code>eo-en.mode</code> and <code>en-eo.mode</code> (the main modes that correspond to the Esperanto ⇆ English language pair) as argument to [https://apertium.svn.sourceforge.net/svnroot/apertium/branches/gsoc2012/artetxem/apertium-pack-j apertium-pack-j].


== List of ready-to-use packages ==
== List of ready-to-use packages ==


Out of the 31 released pairs, the following 24 have fully working and ready-to-use packages that are maintained under the <code>builds/</code> directory at [[SVN]]. You can directly launch them in a Java enabled browser by clicking in the "JWS" links. You can also download the JARs and run them as standard Java applications or use them from a client application by clicking in the "JAR" links.
Out of the 31 released pairs, the following 24 have fully working and ready-to-use packages that are maintained under the <code>builds/</code> directory at [[SVN]]. You can directly launch them in a Java enabled browser by clicking in the ''JWS'' links. You can also download the JARs and run them as standard Java applications or use them from a client application by clicking in the ''JAR'' links.


{|style="border: 0px solid #fbfbfb"
{|style="border: 0px solid #fbfbfb"
Line 66: Line 67:
== Problematic language pairs ==
== Problematic language pairs ==


It is not possible to create a package for the last released version of Spanish ⇆ Asturian (apertium-es-ast), although, in principle, the pair is supposed to be compatible. The issue seems to be caused by unreasonably long rules in the transfer files that happen to be problematic due to the method size limitations of Java bytecode. Although the Java bytecode classes are now successfully generated, dx still hangs while trying to convert them to Dalvik bytecode. This means that, although the pair can be used from desktop applications, it would crash in Android, so creating a fully functional package for it is not possible yet.
It is not possible to create a package for the last released version of Spanish ⇆ Asturian (''apertium-es-ast''), although, in principle, the pair is supposed to be compatible. The issue seems to be caused by unreasonably long rules in the transfer files that happen to be problematic due to the method size limitations of Java bytecode. Although the Java bytecode classes are now successfully generated, dx still hangs while trying to convert them to Dalvik bytecode. This means that, although the pair can be used from desktop applications, it would crash in Android, so creating a fully functional package for it is not possible yet.


== Language pairs with external dependencies ==
== Language pairs with external dependencies ==
Line 72: Line 73:
The following 6 released pairs depend on [[Apertium_and_Constraint_Grammar|CG]]:
The following 6 released pairs depend on [[Apertium_and_Constraint_Grammar|CG]]:


*Breton → French (apertium-br-fr)
*Breton → French (''apertium-br-fr'')
*Icelandic → English (apertium-is-en)
*Icelandic → English (''apertium-is-en'')
*Macedonian ⇆ Bulgarian (apertium-mk-bg)
*Macedonian ⇆ Bulgarian (''apertium-mk-bg'')
*Macedonian → English (apertium-mk-en)
*Macedonian → English (''apertium-mk-en'')
*Norwegian Nynorsk ⇆ Bokmål (apertium-nn-nb)
*Norwegian Nynorsk ⇆ Bokmål (''apertium-nn-nb'')
*Welsh → English (apertium-cy-en)
*Welsh → English (''apertium-cy-en'')


Invoking external programs is supported by language pair packages, so it is still possible to create packages for these pairs. However, you will need to install [[Apertium_and_Constraint_Grammar|CG]] in your machine for them to work. Due to this limitation, precompiled binaries are not offered for these pairs, but you can still create them by following the instructions in the previous section.
Invoking external programs is supported by language pair packages, so it is still possible to create packages for these pairs. However, you will need to install [[Apertium_and_Constraint_Grammar|CG]] in your machine for them to work. Due to this limitation, precompiled binaries are not offered for these pairs, but you can still create them by following the instructions in the previous section.

Revision as of 11:36, 15 August 2012

The English ⇆ Spanish package running as a standalone Java application. The same file could be used from other client applications like Apertium-Caffeine or Apertium-OmegaT.

Language pair packages are standalone JARs that can be run independently as well as used from other client applications like Apertium-Caffeine or Apertium-OmegaT. The only prerequisite to use them is Java 6 or better (apertium, lttoolbox or lttoolbox-java are NOT required), and they can work on practically any platform (Linux, OS X, Windows and even Android!).

Internal structure

Since JAR files are nothing but renamed ZIP files, you can easily edit language pair packages to fit your needs. Note that the packages are ready to be used without any modification, so the vast majority of users will not get any notable advantage from doing it. In any case, editing packages could happen to be useful, for instance, in order to reduce their file size by removing unnecessary content.

The typical structure of a language pair package would be the following one:

  • data/: Directory containing the language pair itself. You could extract it and use with your local installation of Apertium.
  • transfer_classes/: Directory that contains the Java bytecode classes for transfer. This is only used when using the package from standard Java so, if you are going to use it exclusively from Android, you can delete it.
  • org/: Directory that contains the lttoolbox-java engine, which makes the package self-executable. If you are not interested on this feature (presumably, because you are going to use the package exclusively from client programs), you can delete it.
  • META-INF/: Directory that contains the MANIFEST.MF of this Jar, which is used by Java. It takes a few bytes and can rarely be removed, so please don't touch it unless you know what you are doing.
  • classes.dex: Dalvik bytecode of the transfer classes, used by Android instead of the standard Java bytecode classes at transfer_classes/. If you are not going to use the package from Android, you can delete it.
  • modes: Text file that lists the path of the available modes inside the package that is used by lttoolbox-java. It takes a few bytes and can rarely be removed, so please don't touch it unless you know what you are doing.
  • README: Text file describing the content of the package.

Creating language pair packages

A simple bash script, apertium-pack-j, can be used to easily create language pair packages. It requires having the last version of lttoolbox-java as well as the Android SDK, and their location must be specified by setting the LTTOOLBOX_JAVA_PATH and ANDROID_SDK_PATH environment variables. After that, you can simply run it passing the path to the mode files for which you want to generate the package as argument, and a ready-to-use package would be created by the script. For instance, the following command would create a ready-to-use package for the Esperanto ⇆ English language pair named apertium-eo-en.jar in my machine:

export LTTOOLBOX_JAVA_PATH="/usr/local/share/apertium/lttoolbox.jar"
export ANDROID_SDK_PATH="/home/mikel/developer/android-sdk-linux"
./apertium-pack-j /usr/local/share/apertium/modes/eo-en.mode /usr/local/share/apertium/modes/en-eo.mode

As you can see, I simply specify the correct location of lttoolbox-java and the Android SDK in my machine, and pass the location of eo-en.mode and en-eo.mode (the main modes that correspond to the Esperanto ⇆ English language pair) as argument to apertium-pack-j.

List of ready-to-use packages

Out of the 31 released pairs, the following 24 have fully working and ready-to-use packages that are maintained under the builds/ directory at SVN. You can directly launch them in a Java enabled browser by clicking in the JWS links. You can also download the JARs and run them as standard Java applications or use them from a client application by clicking in the JAR links.

  • Afrikaans ⇆ Dutch (JWS, JAR)
  • Basque → English (JWS, JAR)
  • Basque → Spanish (JWS, JAR)
  • Catalan ⇆ Italian (JWS, JAR)
  • English ⇆ Catalan (JWS, JAR)
  • English ⇆ Galician (JWS, JAR)
  • English ⇆ Spanish (JWS, JAR)
  • Esperanto ← Catalan (JWS, JAR)
  • Esperanto ⇆ English (JWS, JAR)
  • Esperanto ← French (JWS, JAR)
  • Esperanto ← Spanish (JWS, JAR)
  • French ⇆ Catalan (JWS, JAR)
  • French ⇆ Spanish (JWS, JAR)
  • Haitian → English (JWS, JAR)
  • Occitan ⇆ Catalan (JWS, JAR)
  • Occitan ⇆ Spanish (JWS, JAR)
  • Portuguese ⇆ Catalan (JWS, JAR)
  • Portuguese ⇆ Galician (JWS, JAR)
  • Spanish ⇆ Aragonese (JWS, JAR)
  • Spanish ⇆ Catalan (JWS, JAR)
  • Spanish ⇆ Galician (JWS, JAR)
  • Spanish ⇆ Portuguese (JWS, JAR)
  • Spanish ← Romanian (JWS, JAR)
  • Swedish → Danish (JWS, JAR)

Problematic language pairs

It is not possible to create a package for the last released version of Spanish ⇆ Asturian (apertium-es-ast), although, in principle, the pair is supposed to be compatible. The issue seems to be caused by unreasonably long rules in the transfer files that happen to be problematic due to the method size limitations of Java bytecode. Although the Java bytecode classes are now successfully generated, dx still hangs while trying to convert them to Dalvik bytecode. This means that, although the pair can be used from desktop applications, it would crash in Android, so creating a fully functional package for it is not possible yet.

Language pairs with external dependencies

The following 6 released pairs depend on CG:

  • Breton → French (apertium-br-fr)
  • Icelandic → English (apertium-is-en)
  • Macedonian ⇆ Bulgarian (apertium-mk-bg)
  • Macedonian → English (apertium-mk-en)
  • Norwegian Nynorsk ⇆ Bokmål (apertium-nn-nb)
  • Welsh → English (apertium-cy-en)

Invoking external programs is supported by language pair packages, so it is still possible to create packages for these pairs. However, you will need to install CG in your machine for them to work. Due to this limitation, precompiled binaries are not offered for these pairs, but you can still create them by following the instructions in the previous section.