Layouts

From Apertium
Jump to navigation Jump to search

There are several possible layouts that apertium linguistic data packages may have.

Apertium 1.0

Apertium 2.0

Apertium 3.0

Example using English—Afrikaans:

Filename Comment
apertium-en-af.af.dix.xml Afrikaans monolingual dictionary
apertium-en-af.en-af.dix.xml English—Afrikaans bilingual dictionary
apertium-en-af.en.dix.xml English monolingual dictionary
apertium-en-af.symbols.xml List of grammatical symbols
apertium-en-af.af-en.t1x Afrikaans—English first stage transfer file (transfer)
apertium-en-af.af-en.t2x Afrikaans—English second stage transfer file (interchunk)
apertium-en-af.af-en.t3x Afrikaans—English third stage transfer file (postchunk)
apertium-en-af.en-af.t1x English—Afrikaans first stage transfer file (transfer)
apertium-en-af.en-af.t2x English—Afrikaans first stage transfer file (interchunk)
apertium-en-af.en-af.t3x English—Afrikaans first stage transfer file (postchunk)
modes.xml Modes specification file (see modes)

Using separate include files

One of the main things that separates this method is using separate include files for various bits, most just use it to define symbols, but it could be used for example for a large list of proper nouns, different domains or almost anything.

In all of the dictionary files:

<?xml version="1.0" encoding="UTF-8"?>
<dictionary>
<alphabet>ÄÊËÖÜäêëöüßABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</alphabet>

  <!-- Symbol definitions -->
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="apertium-en-af.symbols.xml"/>

  <pardefs>
    
    ...
  
  </pardefs>
  <section id="main" type="inconditional">

    ...

  </section>
</dictionary>

In the apertium-en-af.symbols.xml file:

<?xml version="1.0" encoding="UTF-8"?> <!-- -*- nxml -*- -->

  <sdefs>
    <sdef n="n"       c="Noun"/>
    <sdef n="m"       c="Masculine"/>
    <sdef n="f"       c="Feminine"/>

    ...
  </sdefs>

Then in the Makefile.am, append the following to the TARGETS_COMMON:

TARGETS_COMMON = $(BASENAME).$(LANG1).dix $(BASENAME).$(LANG2).dix $(BASENAME).$(LANG1)-$(LANG2).dix \

And add the following targets underneth:

$(BASENAME).$(LANG1).dix:
        xmllint --xinclude $(BASENAME).$(LANG1).dix.xml > $(BASENAME).$(LANG1).dix
$(BASENAME).$(LANG2).dix:
        xmllint --xinclude $(BASENAME).$(LANG2).dix.xml > $(BASENAME).$(LANG2).dix
$(BASENAME).$(LANG1)-$(LANG2).dix:
        xmllint --xinclude $(BASENAME).$(LANG1)-$(LANG2).dix.xml > $(BASENAME).$(LANG1)-$(LANG2).dix

You may also want to specify a clean-dicts target:

clean-dicts:
        rm $(BASENAME).$(LANG1).dix
        rm $(BASENAME).$(LANG2).dix
        rm $(BASENAME).$(PREFIX1).dix

Dependent on monolingual packages (Apertium 3+)

apertium-kaz-tat depends on the packages apertium-kaz and apertium-tat.

These files are stored in the apertium-kaz repository:

Filename Comment
apertium-kaz.kaz.lexc Kazakh monolingual dictionary lexicon (in HFST format)
apertium-kaz.kaz.twol Kazakh monolingual dictionary two-level rules (in HFST format)
apertium-kaz.kaz.rlx Kazakh Constraint Grammar disambiguation rules

The Kazakh monolingual files are used by apertium-kaz-tat during compilation, instead of being stored in both repositories. Similarly for apertium-tat.

These files are in the apertium-kaz-tat repository:

Filename Comment
apertium-kaz-tat.kaz-tat.dix Kazakh-Tatar bilingual dictionary
apertium-kaz-tat.kaz-tat.lrx Kazakh→Tatar Lexical selection rules
apertium-kaz-tat.tat-kaz.lrx Tatar→Kazakh Lexical selection rules
apertium-kaz-tat.kaz-tat.t1x Kazakh→Tatar first stage transfer file
apertium-kaz-tat.kaz-tat.t2x Kazakh→Tatar second stage transfer file
apertium-kaz-tat.tat-kaz.t1x Tatar→Kazakh first stage transfer file
apertium-kaz-tat.tat-kaz.t2x Tatar→Kazakh second stage transfer file
modes.xml Modes specification file (see modes)

See Languages#Requiring_a_monolingual_package_as_a_dependency_of_a_pair for the makefile details (or see http://sourceforge.net/p/apertium/svn/HEAD/tree/trunk/apertium-kaz-tat/ for the actual example).