Difference between revisions of "Layouts"

From Apertium
Jump to navigation Jump to search
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
  +
{{TOCD}}
 
There are several possible layouts that apertium linguistic data packages may have.
 
There are several possible layouts that apertium linguistic data packages may have.
   
Line 34: Line 35:
 
|-
 
|-
 
|-
 
|-
|modes.xml || Modes specification file (see [[Modes]])
+
|modes.xml || Modes specification file (see [[modes]])
 
|-
 
|-
 
|}
 
|}
  +
  +
===Using separate include files===
  +
  +
One of the main things that separates this method is using separate include files for various bits, most just use it to define symbols, but it could be used for example for a large list of proper nouns, different domains or almost anything.
  +
  +
In all of the dictionary files:
  +
<pre>
  +
<?xml version="1.0" encoding="UTF-8"?>
  +
<dictionary>
  +
<alphabet>ÄÊËÖÜäêëöüßABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</alphabet>
  +
  +
<!-- Symbol definitions -->
  +
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="apertium-en-af.symbols.xml"/>
  +
  +
<pardefs>
  +
  +
...
  +
  +
</pardefs>
  +
<section id="main" type="inconditional">
  +
  +
...
  +
  +
</section>
  +
</dictionary>
  +
</pre>
  +
  +
In the <code>apertium-en-af.symbols.xml</code> file:
  +
  +
<pre>
  +
<?xml version="1.0" encoding="UTF-8"?> <!-- -*- nxml -*- -->
  +
  +
<sdefs>
  +
<sdef n="n" c="Noun"/>
  +
<sdef n="m" c="Masculine"/>
  +
<sdef n="f" c="Feminine"/>
  +
  +
...
  +
</sdefs>
  +
</pre>
  +
  +
Then in the <code>Makefile.am</code>, append the following to the <code>TARGETS_COMMON</code>:
  +
  +
<pre>
  +
TARGETS_COMMON = $(BASENAME).$(LANG1).dix $(BASENAME).$(LANG2).dix $(BASENAME).$(LANG1)-$(LANG2).dix \
  +
</pre>
  +
  +
And add the following targets underneth:
  +
  +
<pre>
  +
$(BASENAME).$(LANG1).dix:
  +
xmllint --xinclude $(BASENAME).$(LANG1).dix.xml > $(BASENAME).$(LANG1).dix
  +
$(BASENAME).$(LANG2).dix:
  +
xmllint --xinclude $(BASENAME).$(LANG2).dix.xml > $(BASENAME).$(LANG2).dix
  +
$(BASENAME).$(LANG1)-$(LANG2).dix:
  +
xmllint --xinclude $(BASENAME).$(LANG1)-$(LANG2).dix.xml > $(BASENAME).$(LANG1)-$(LANG2).dix
  +
</pre>
  +
  +
You may also want to specify a <code>clean-dicts</code> target:
  +
  +
<pre>
  +
clean-dicts:
  +
rm $(BASENAME).$(LANG1).dix
  +
rm $(BASENAME).$(LANG2).dix
  +
rm $(BASENAME).$(PREFIX1).dix
  +
</pre>
  +
  +
== Dependent on monolingual packages (Apertium 3+) ==
  +
apertium-kaz-tat depends on the packages apertium-kaz and apertium-tat.
  +
  +
These files are stored in the apertium-kaz repository:
  +
{|class=wikitable
  +
! Filename !! Comment
  +
|-
  +
|apertium-kaz.kaz.lexc || Kazakh monolingual dictionary lexicon (in [[HFST]] format)
  +
|-
  +
|apertium-kaz.kaz.twol || Kazakh monolingual dictionary two-level rules (in [[HFST]] format)
  +
|-
  +
|apertium-kaz.kaz.rlx || Kazakh [[Constraint Grammar]] disambiguation rules
  +
|-
  +
|}
  +
  +
The Kazakh monolingual files are used by apertium-kaz-tat during compilation, instead of being stored in both repositories. Similarly for apertium-tat.
  +
  +
These files are in the apertium-kaz-tat repository:
  +
  +
{|class=wikitable
  +
! Filename !! Comment
  +
|-
  +
|apertium-kaz-tat.kaz-tat.dix || Kazakh-Tatar bilingual dictionary
  +
|-
  +
|apertium-kaz-tat.kaz-tat.lrx || Kazakh→Tatar [[Lexical selection]] rules
  +
|-
  +
|apertium-kaz-tat.tat-kaz.lrx || Tatar→Kazakh [[Lexical selection]] rules
  +
|-
  +
|apertium-kaz-tat.kaz-tat.t1x || Kazakh→Tatar first stage transfer file
  +
|-
  +
|apertium-kaz-tat.kaz-tat.t2x || Kazakh→Tatar second stage transfer file
  +
|-
  +
|apertium-kaz-tat.tat-kaz.t1x || Tatar→Kazakh first stage transfer file
  +
|-
  +
|apertium-kaz-tat.tat-kaz.t2x || Tatar→Kazakh second stage transfer file
  +
|-
  +
|-
  +
|modes.xml || Modes specification file (see [[modes]])
  +
|-
  +
|}
  +
  +
See [[Languages#Requiring_a_monolingual_package_as_a_dependency_of_a_pair]] for the makefile details (or see http://sourceforge.net/p/apertium/svn/HEAD/tree/trunk/apertium-kaz-tat/ for the actual example).

Latest revision as of 11:17, 18 November 2013

There are several possible layouts that apertium linguistic data packages may have.

Apertium 1.0[edit]

Apertium 2.0[edit]

Apertium 3.0[edit]

Example using English—Afrikaans:

Filename Comment
apertium-en-af.af.dix.xml Afrikaans monolingual dictionary
apertium-en-af.en-af.dix.xml English—Afrikaans bilingual dictionary
apertium-en-af.en.dix.xml English monolingual dictionary
apertium-en-af.symbols.xml List of grammatical symbols
apertium-en-af.af-en.t1x Afrikaans—English first stage transfer file (transfer)
apertium-en-af.af-en.t2x Afrikaans—English second stage transfer file (interchunk)
apertium-en-af.af-en.t3x Afrikaans—English third stage transfer file (postchunk)
apertium-en-af.en-af.t1x English—Afrikaans first stage transfer file (transfer)
apertium-en-af.en-af.t2x English—Afrikaans first stage transfer file (interchunk)
apertium-en-af.en-af.t3x English—Afrikaans first stage transfer file (postchunk)
modes.xml Modes specification file (see modes)

Using separate include files[edit]

One of the main things that separates this method is using separate include files for various bits, most just use it to define symbols, but it could be used for example for a large list of proper nouns, different domains or almost anything.

In all of the dictionary files:

<?xml version="1.0" encoding="UTF-8"?>
<dictionary>
<alphabet>ÄÊËÖÜäêëöüßABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</alphabet>

  <!-- Symbol definitions -->
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="apertium-en-af.symbols.xml"/>

  <pardefs>
    
    ...
  
  </pardefs>
  <section id="main" type="inconditional">

    ...

  </section>
</dictionary>

In the apertium-en-af.symbols.xml file:

<?xml version="1.0" encoding="UTF-8"?> <!-- -*- nxml -*- -->

  <sdefs>
    <sdef n="n"       c="Noun"/>
    <sdef n="m"       c="Masculine"/>
    <sdef n="f"       c="Feminine"/>

    ...
  </sdefs>

Then in the Makefile.am, append the following to the TARGETS_COMMON:

TARGETS_COMMON = $(BASENAME).$(LANG1).dix $(BASENAME).$(LANG2).dix $(BASENAME).$(LANG1)-$(LANG2).dix \

And add the following targets underneth:

$(BASENAME).$(LANG1).dix:
        xmllint --xinclude $(BASENAME).$(LANG1).dix.xml > $(BASENAME).$(LANG1).dix
$(BASENAME).$(LANG2).dix:
        xmllint --xinclude $(BASENAME).$(LANG2).dix.xml > $(BASENAME).$(LANG2).dix
$(BASENAME).$(LANG1)-$(LANG2).dix:
        xmllint --xinclude $(BASENAME).$(LANG1)-$(LANG2).dix.xml > $(BASENAME).$(LANG1)-$(LANG2).dix

You may also want to specify a clean-dicts target:

clean-dicts:
        rm $(BASENAME).$(LANG1).dix
        rm $(BASENAME).$(LANG2).dix
        rm $(BASENAME).$(PREFIX1).dix

Dependent on monolingual packages (Apertium 3+)[edit]

apertium-kaz-tat depends on the packages apertium-kaz and apertium-tat.

These files are stored in the apertium-kaz repository:

Filename Comment
apertium-kaz.kaz.lexc Kazakh monolingual dictionary lexicon (in HFST format)
apertium-kaz.kaz.twol Kazakh monolingual dictionary two-level rules (in HFST format)
apertium-kaz.kaz.rlx Kazakh Constraint Grammar disambiguation rules

The Kazakh monolingual files are used by apertium-kaz-tat during compilation, instead of being stored in both repositories. Similarly for apertium-tat.

These files are in the apertium-kaz-tat repository:

Filename Comment
apertium-kaz-tat.kaz-tat.dix Kazakh-Tatar bilingual dictionary
apertium-kaz-tat.kaz-tat.lrx Kazakh→Tatar Lexical selection rules
apertium-kaz-tat.tat-kaz.lrx Tatar→Kazakh Lexical selection rules
apertium-kaz-tat.kaz-tat.t1x Kazakh→Tatar first stage transfer file
apertium-kaz-tat.kaz-tat.t2x Kazakh→Tatar second stage transfer file
apertium-kaz-tat.tat-kaz.t1x Tatar→Kazakh first stage transfer file
apertium-kaz-tat.tat-kaz.t2x Tatar→Kazakh second stage transfer file
modes.xml Modes specification file (see modes)

See Languages#Requiring_a_monolingual_package_as_a_dependency_of_a_pair for the makefile details (or see http://sourceforge.net/p/apertium/svn/HEAD/tree/trunk/apertium-kaz-tat/ for the actual example).