Modes introduction
Contents
Modes Introduction
Apertium core ('lt-toolbox', 'apertium-lex-tools') is a collection of tools which pipe data one to another. You can use these tools individually. There are many instructions on the wiki which do this, and the tools are useful for fine-grained debugging.
However, to ease the use of the tools, Apertium language-builds pre-configure chains of tools into scripts. These pre-configured chains are called 'modes'. Go into a dictionary,
ls modes
you will receive a list like,
xxx-yyy-chunker.mode xxx-yyy-pgen.mode yyy-morph.mode ...
and many, many more.
This is a list of files. if you have a look inside one of these files (this is a provocative choice of mode-file, found only in a bilingual dictionary),
cat modes/en-es.mode
You will see something like,
lt-proc '/home/shane/Code/apertium/en-es/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 '/home/shane/Code/apertium/en-es/apertium-en-es/en-es.prob' | apertium-pretransfer | apertium-transfer -n '/home/shane/Code/apertium/en-es/apertium-en-es/apertium-en-es.en-es.genitive.t1x' '/home/shane/Code/apertium/en-es/apertium-en-es/en-es.genitive.bin' | lt-proc -b '/home/shane/Code/apertium/en-es/apertium-en-es/en-es.autobil.bin' | lrx-proc -m '/home/shane/Code/apertium/en-es/apertium-en-es/en-es.autolex.bin' | apertium-transfer -b '/home/shane/Code/apertium/en-es/apertium-en-es/apertium-en-es.en-es.t1x' '/home/shane/Code/apertium/en-es/apertium-en-es/en-es.t1x.bin' | apertium-interchunk '/home/shane/Code/apertium/en-es/apertium-en-es/apertium-en-es.en-es.t2x' '/home/shane/Code/apertium/en-es/apertium-en-es/en-es.t2x.bin' | apertium-postchunk '/home/shane/Code/apertium/en-es/apertium-en-es/apertium-en-es.en-es.t3x' '/home/shane/Code/apertium/en-es/apertium-en-es/en-es.t3x.bin' | lt-proc $1 '/home/shane/Code/apertium/en-es/apertium-en-es/en-es.autogen.bin' | lt-proc -p '/home/shane/Code/apertium/en-es/apertium-en-es/en-es.autopgen.bin'
Phew! the mode-file is a list of tools from Apertium, ('lt-proc', 'apertium-interchunk' etc.) pre-configured for the dictionary/language pair (many file paths to binaries, and commandline switches, written in).
The commands have been written as a simple text file with pipe symbols between the commands. So, when we run the usual translation command (that you can find all over the wiki),
echo 'This is a test sentence.' | apertium -d . xxx-yyy
the program apertium
is loading a mode-file called 'xxx-yyy'. The program then runs the commands in the file. Mode-file xxx-yyy.mode
, as you may have tried or guessed, is a series of commands to generate a full translation. So, let's try on a real mode-file,
echo 'This is a test sentence.' | apertium -d . en-es
returns,
Esto es una frase de prueba.
The different modes
The list returned many mode-files. If xxx-yyy.mode
offers a full translation, what do the others do?
cat modes/en-es-pretransfer.mode
returns,
lt-proc '/home/shane/Code/apertium/en-es/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 '/home/shane/Code/apertium/en-es/apertium-en-es/en-es.prob' | apertium-pretransfer
Oh! A much shorter tool-chain. But you may notice it is much the same as the xxx-yyy.mode
file, but stops applying tools at 'apertium-pretransfer'. So this mode-file constructs a chain that processes input until it reaches the 'transfer' stage. Then it stops, and returns the result. You can test,
echo 'This is a test sentence.' | apertium -d . en-es-pretransfer
(note that the program 'apertium' does not need to be told this is a '.mode' file)
The result might be,
^This<prn><tn><mf><sg>$ ^be<vbser><pri><p3><sg>$ ^a<det><ind><sg>$ ^test<n><sg>$ ^sentence<n><sg>$^.<sent>$^.<sent>$
Much messier, not a full translation, not translated at all. The text represents the processing Apertium had managed by the time it reached the 'transfer' stage (which is before the bilingual dictionary is applied). Apertium worked from a monolingual dictionary, and recognised several words/lexical units in the input stream, decided on their place in grammar, and added some other details e.g. if the words were singular or plural lexical units.
For further interpretation of the output, see Apertium stream format.
The difference in modes between a monolingual and a bilingual dictionary
Once you understand what a mode is - a pre-configured run through Apertium tools - it is much easier to understand why monolingual dictionaries offer different modes to a bilingual dictionary.
A monolingual dictionary can not use the full apertium toolchain. That's why the monolingual modes are, currently,
xxx-morph.mode xxx-genr.mode xxx-tagger.mode xxx-disam.mode
They can only run tools up to the point where they have worked out what they know about ('analysed') the text.
A bilingual dictionary can not only know about the text, but then apply tools to translate. So it offers many more modes,
en-es-anmor.mode en-es-pretransfer.mode en-es-biltrans.mode en-es-tagger.mode en-es-chunker.mode en_GB-en_US.mode en-es-generador.mode en_US-en_GB.mode en-es-genitive.mode en-es-interchunk.mode en-es.mode en-es-postchunk.mode
and a bilingual dictionary can translate in both directions, so it will offer all the modes for translation in the opposite direction,
es-en.mode es-en-postchunk.mode es-en-pretransfer.mode es-en-tagger.mode es-en-anmor.mode es-en_US-generador.mode es-en-chunker.mode es-en_US.mode es-en-generador.mode es-en-interchunk.mode
What use are the modes?
Throughout the wiki you will find instructions and examples which offer fine-grained use of the apertium tools directly.
However, for a beginner, you will not stand a chance of remembering all of them, or their configuration options. for example,
lt-proc -p
is the way to run the 'postgeneration' stage/phase.
A beginner would need a script to handle a full translation, and the mode-files provide them ready-made. And the modes are a little more than that, the variations provide a very good debugging tool. In many lexers and parsers it can be difficult to know what is being done to, and with, the text at any point. But running the different Apertium mode-files exposes the entire pipeline, in close detail.
See also Modes and Monodix basics