Modes introduction

From Apertium
Revision as of 17:25, 22 September 2016 by Rcrowther (talk | contribs) (a little tighter)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Modes Introduction

Apertium core ('lt-toolbox', 'apertium-lex-tools') is a collection of tools which pipe data one to another. You can use these tools individually. There are many instructions on the wiki which do this, and the tools are useful for fine-grained debugging.

However, to ease the use of the tools, Apertium language-builds pre-configure chains of tools into scripts. These pre-configured chains are called 'modes'. Go into a dictionary,

ls modes

you will receive a list like,

xxx-yyy-chunker.mode
xxx-yyy-pgen.mode
yyy-morph.mode
...

and many, many more.

This is a list of files. if you have a look inside one of these files (this is a provocative choice of mode-file, found only in a bilingual dictionary),


cat modes/en-es.mode

You will see something like,

lt-proc '/home/shane/apertium/en-es/apertium-en-es/en-es.automorf.bin' 
| apertium-tagger -g $2 '/home/shane/apertium/en-es/apertium-en-es/en-es.prob' 
| apertium-pretransfer
| apertium-transfer -n '/home/shane/apertium/en-es/apertium-en-es/apertium-en-es.en-es.genitive.t1x'  '/home/shane/apertium/en-es/apertium-en-es/en-es.genitive.bin' 
| lt-proc -b '/home/shane/apertium/en-es/apertium-en-es/en-es.autobil.bin' 
| lrx-proc -m '/home/shane/apertium/en-es/apertium-en-es/en-es.autolex.bin' 
| apertium-transfer -b '/home/shane/apertium/en-es/apertium-en-es/apertium-en-es.en-es.t1x'  '/home/shane/apertium/en-es/apertium-en-es/en-es.t1x.bin' 
| apertium-interchunk '/home/shane/apertium/en-es/apertium-en-es/apertium-en-es.en-es.t2x'  '/home/shane/apertium/en-es/apertium-en-es/en-es.t2x.bin' 
| apertium-postchunk '/home/shane/apertium/en-es/apertium-en-es/apertium-en-es.en-es.t3x'  '/home/shane/apertium/en-es/apertium-en-es/en-es.t3x.bin' 
| lt-proc $1 '/home/shane/apertium/en-es/apertium-en-es/en-es.autogen.bin' 
| lt-proc -p '/home/shane/apertium/en-es/apertium-en-es/en-es.autopgen.bin' 

Phew! the mode-file is a list of tools from Apertium, ('lt-proc', 'apertium-interchunk' etc.) pre-configured for the dictionary/language pair (many file paths to binaries, and commandline switches, written in).

The commands have been written as a simple text file with pipe symbols between the commands. So, when we run the usual translation command (that you can find all over the wiki),

echo 'This is a test sentence.' | apertium -d . xxx-yyy

the program apertium is loading a mode-file called 'xxx-yyy'. The program then runs the commands in the file. Mode-file xxx-yyy.mode, as you may have tried or guessed, is a series of commands to generate a full translation. So, let's try on a real mode-file,

echo 'This is a test sentence.' | apertium -d . en-es

returns,

Esto es una frase de prueba.

The different modes

The list returned many mode-files. If xxx-yyy.mode offers a full translation, what do the others do?

cat modes/en-es-pretransfer.mode

returns,

lt-proc '/home/shane/apertium/en-es/apertium-en-es/en-es.automorf.bin' 
| apertium-tagger -g $2 '/home/shane/apertium/en-es/apertium-en-es/en-es.prob' 
| apertium-pretransfer 

Oh! A much shorter tool-chain. But you may notice it is much the same as the xxx-yyy.mode file, but stops applying tools at 'apertium-pretransfer'. So this mode-file constructs a chain that processes input until it reaches the 'transfer' stage. Then it stops, and returns the result. You can test,

echo 'This is a test sentence.' | apertium -d . en-es-pretransfer

(note that the program 'apertium' does not need to be told this is a '.mode' file)

The result might be,

^This<prn><tn><mf><sg>$ ^be<vbser><pri><p3><sg>$ ^a<det><ind><sg>$ ^test<n><sg>$ ^sentence<n><sg>$^.<sent>$^.<sent>$

Much messier, not a full translation, not translated at all. The text represents the processing Apertium had managed by the time it reached the 'transfer' stage (which is before the bilingual dictionary is applied). Apertium worked from a monolingual dictionary, and recognised several words/lexical units in the input stream, decided on their place in grammar, and added some other details e.g. if the words were singular or plural lexical units.

For further interpretation of the output, see Apertium stream format.


The difference in modes between a monolingual and a bilingual dictionary

Once you understand what a mode is - a pre-configured run through Apertium tools - it is much easier to understand why monolingual dictionaries offer different modes to a bilingual dictionary.

A monolingual dictionary can not use the full apertium toolchain. That's why the monolingual modes are, currently,

  xxx-morph.mode
  xxx-genr.mode
  xxx-tagger.mode
  xxx-disam.mode

They can only run tools up to the point where they have worked out what they know about ('analysed') the text.

A bilingual dictionary can not only know about the text, but then apply tools to translate. So it offers many more modes,

  en-es-anmor.mode       
  en-es-pretransfer.mode 
  en-es-biltrans.mode    
  en-es-tagger.mode      
  en-es-chunker.mode    
  en_GB-en_US.mode  
  en-es-generador.mode  
  en_US-en_GB.mode     
  en-es-genitive.mode       
  en-es-interchunk.mode     
  en-es.mode            
  en-es-postchunk.mode

and a bilingual dictionary can translate in both directions, so it will offer all the modes for translation in the opposite direction,

  es-en.mode
  es-en-postchunk.mode
  es-en-pretransfer.mode
  es-en-tagger.mode
  es-en-anmor.mode       
  es-en_US-generador.mode
  es-en-chunker.mode     
  es-en_US.mode
  es-en-generador.mode
  es-en-interchunk.mode

What use are the modes?

Throughout the wiki you will find instructions and examples which offer fine-grained use of the apertium tools directly.

However, for a beginner, you will not stand a chance of remembering all of them, or their configuration options. for example,

lt-proc -p

is the way to run the 'postgeneration' stage/phase.

A beginner would need a script to handle a full translation, and the mode-files provide them ready-made. And the modes are a little more than that, the variations provide a very good debugging tool. In many lexers and parsers it can be difficult to know what is being done to, and with, the text at any point. But running the different Apertium mode-files exposes the entire pipeline, in close detail.

See also Modes and Monodix basics