Difference between revisions of "Crossdics"

From Apertium
Jump to navigation Jump to search
(izing -> ising)
(Link to French page)
Line 1: Line 1:
[[Crossdics : Génération d'une paire de langue à partir de 2 autres|En français]]

{{TOCD}}
{{TOCD}}
{{main|Building dictionaries}}
{{main|Building dictionaries}}

Revision as of 08:51, 6 October 2014

En français

Main article: Building dictionaries

Crossdics (part of apertium-dixtools) is a program that can be used to "cross" language pairs. That is, given language pairs aa-bb and bb-cc it will create a new language pair for aa-cc.

Installing

See apertium-dixtools.

Using apertium-crossdics

$ apertium-dixtools cross

Crossing dictionaries

Using a Linguistic Resources Document

You can define a Linguistic Resources Document (LRD) and use it to indicate which dictionaries will be used for crossing:

$ apertium-dixtools cross -f my-linguistic-resources.xml sl-tl

Therefore, only 2 parameters are needed:

  • my-linguistic-resources.xml: a document specifying a set of linguistic resources (dictionaries, cross models, corpora, other LRD files, etc).
  • sl-tl: source language (sl) and target language (tl).


Without a Linguistic Resources Document

First of all, copy linguistic data into folder "dics"

  • Bilingual dictionary A-B: apertium-bb-aa.bb-aa.dix
  • Bilingual dictionary B-C: apertium-bb-cc.bb-cc.dix
  • Morphological dictionary A: apertium-bb-aa.aa.dix
  • Morphological dictionary C: apertium-bb-cc.cc.dix


Please note that:

  • all dictionaries must be in the form:
    • apertium-xx-yy.xx-yy.dix (bilingual dictionaries)
    • apertium-xx-yy.xx.dix (morphological dictionaries)
  • the common language (B) must be in the left side, that is, dictionaries in the form B-A and B-C
  • use "-r" instead of "-n" if the dictionary has to be reversed (apertium-aa-bb.aa-bb.dix to apertium-bb-aa.bb-aa.dix)


Use the apertium-dixtools script to cross the dictionaries:

$ apertium-dixtools cross-param monA.dix -n bilAB.dix -n bilBC-dix monC.dix

An example crossing es-ca and es-pt to get the ca-pt pair.

$ apertium-dixtools cross-param dics/apertium-es-ca.ca.dix -n dics/apertium-es-ca.es-ca.dix -n dics/apertium-es-pt.es-pt.dix dics/apertium-es-pt.pt.dix

Customising cross actions

By default, the crossdics tool uses a simple cross model defining very simple rules for crossing two sets of dictionaries. However, more specific cross actions might be needed in order to cross certain language pairs correctly. Defining a new cross schema with concrete pattern-action elements solves this problem.

Troubleshooting

NullPointerException in crossing sections
[9] Crossing sections 'main' and 'main'

Exception in thread "main" java.lang.NullPointerException
        at dictools.cross.DicCross.crossSections(DicCross.java:342)
        at dictools.cross.DicCross.crossDictionaries(DicCross.java:233)

If you get this error, try and remove unused sections from the bilingual dictionaries until you have only the 'main' sections left.


java.lang.IllegalArgumentException
Comparison method violates its general contract!
Exception in thread "main" java.lang.IllegalArgumentException: Comparison method violates its general contract!
        at java.util.TimSort.mergeHi(TimSort.java:868)
        at java.util.TimSort.mergeAt(TimSort.java:485)
        at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
        at java.util.TimSort.sort(TimSort.java:223)
        at java.util.TimSort.sort(TimSort.java:173)
        at java.util.Arrays.sort(Arrays.java:659)
        at java.util.Collections.sort(Collections.java:217)
        at dictools.cross.DicCross.crossDictionaries(DicCross.java:243)
        at dictools.cross.DicCross.actionCross(DicCross.java:729)
        at dictools.cross.DicCross.doCross(DicCross.java:722)
        at dictools.ProcessDics.process_cross_param(ProcessDics.java:462)
        at dictools.ProcessDics.processArguments(ProcessDics.java:206)
        at dictools.ProcessDics.main(ProcessDics.java:79)
        at ProcessDics.main(ProcessDics.java:30)

This means your java is too fresh. Get an older java (1.6 seems to work).

See also