Layouts
There are several possible layouts that apertium linguistic data packages may have.
Apertium 1.0
Apertium 2.0
Apertium 3.0
Example using English—Afrikaans:
Filename | Comment |
---|---|
apertium-en-af.af.dix.xml | Afrikaans monolingual dictionary |
apertium-en-af.en-af.dix.xml | English—Afrikaans bilingual dictionary |
apertium-en-af.en.dix.xml | English monolingual dictionary |
apertium-en-af.symbols.xml | List of grammatical symbols |
apertium-en-af.af-en.t1x | Afrikaans—English first stage transfer file (transfer) |
apertium-en-af.af-en.t2x | Afrikaans—English second stage transfer file (interchunk) |
apertium-en-af.af-en.t3x | Afrikaans—English third stage transfer file (postchunk) |
apertium-en-af.en-af.t1x | English—Afrikaans first stage transfer file (transfer) |
apertium-en-af.en-af.t2x | English—Afrikaans first stage transfer file (interchunk) |
apertium-en-af.en-af.t3x | English—Afrikaans first stage transfer file (postchunk) |
modes.xml | Modes specification file (see modes) |
Using separate include files
One of the main things that separates this method is using separate include files for various bits, most just use it to define symbols, but it could be used for example for a large list of proper nouns, different domains or almost anything.
In all of the dictionary files:
<?xml version="1.0" encoding="UTF-8"?> <dictionary> <alphabet>ÄÊËÖÜäêëöüßABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</alphabet> <!-- Symbol definitions --> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="apertium-en-af.symbols.xml"/> <pardefs> ... </pardefs> <section id="main" type="inconditional"> ... </section> </dictionary>
In the apertium-en-af.symbols.xml
file:
<?xml version="1.0" encoding="UTF-8"?> <!-- -*- nxml -*- --> <sdefs> <sdef n="n" c="Noun"/> <sdef n="m" c="Masculine"/> <sdef n="f" c="Feminine"/> ... </sdefs>
Then in the Makefile.am
, append the following to the TARGETS_COMMON
:
TARGETS_COMMON = $(BASENAME).$(LANG1).dix $(BASENAME).$(LANG2).dix $(BASENAME).$(LANG1)-$(LANG2).dix \
And add the following targets underneth:
$(BASENAME).$(LANG1).dix: xmllint --xinclude $(BASENAME).$(LANG1).dix.xml > $(BASENAME).$(LANG1).dix $(BASENAME).$(LANG2).dix: xmllint --xinclude $(BASENAME).$(LANG2).dix.xml > $(BASENAME).$(LANG2).dix $(BASENAME).$(LANG1)-$(LANG2).dix: xmllint --xinclude $(BASENAME).$(LANG1)-$(LANG2).dix.xml > $(BASENAME).$(LANG1)-$(LANG2).dix
You may also want to specify a clean-dicts
target:
clean-dicts: rm $(BASENAME).$(LANG1).dix rm $(BASENAME).$(LANG2).dix rm $(BASENAME).$(PREFIX1).dix
Dependent on monolingual packages (Apertium 3+)
apertium-kaz-tat depends on the packages apertium-kaz and apertium-tat.
These files are stored in the apertium-kaz repository:
Filename | Comment |
---|---|
apertium-kaz.kaz.lexc | Kazakh monolingual dictionary lexicon (in HFST format) |
apertium-kaz.kaz.twol | Kazakh monolingual dictionary two-level rules (in HFST format) |
apertium-kaz.kaz.rlx | Kazakh Constraint Grammar disambiguation rules |
The Kazakh monolingual files are used by apertium-kaz-tat during compilation, instead of being stored in both repositories. Similarly for apertium-tat.
These files are in the apertium-kaz-tat repository:
Filename | Comment |
---|---|
apertium-kaz-tat.kaz-tat.dix | Kazakh-Tatar bilingual dictionary |
apertium-kaz-tat.kaz-tat.lrx | Kazakh→Tatar Lexical selection rules |
apertium-kaz-tat.tat-kaz.lrx | Tatar→Kazakh Lexical selection rules |
apertium-kaz-tat.kaz-tat.t1x | Kazakh→Tatar first stage transfer file |
apertium-kaz-tat.kaz-tat.t2x | Kazakh→Tatar second stage transfer file |
apertium-kaz-tat.tat-kaz.t1x | Tatar→Kazakh first stage transfer file |
apertium-kaz-tat.tat-kaz.t2x | Tatar→Kazakh second stage transfer file |
modes.xml | Modes specification file (see modes) |
See Languages#Requiring_a_monolingual_package_as_a_dependency_of_a_pair for the makefile details (or see http://sourceforge.net/p/apertium/svn/HEAD/tree/trunk/apertium-kaz-tat/ for the actual example).