Difference between revisions of "Modes"
Line 4: | Line 4: | ||
See [[Writing_Makefiles#Modes]] on how to ensure modes that say install="yes" are installed. |
See [[Writing_Makefiles#Modes]] on how to ensure modes that say install="yes" are installed. |
||
== Naming conventions == |
|||
The main translation mode is always named "from-to", e.g. "sme-nob". The debug modes each have a suffiks, e.g. "sme-nob-morph". |
|||
Common debug mode names: |
|||
* -anmor or -morph run the morphological analysers |
|||
** these are used equivalently |
|||
* -tagger or -disam run up until disambiguation |
|||
** what's the difference? |
|||
** what about when you have both morph and syn disambiguation? |
|||
** what about when you have both CG and prob disambiguation? |
|||
* -biltrans runs up until the bidix |
|||
* -lex runs up until lexical selection |
|||
* -transfer runs up until (1-stage) transfer |
|||
* -chunker runs up until the first stage of 3-or-more-stage transfer |
|||
* -interchunk runs up until the second stage of 3-stage transfer |
|||
** -interchunk1 and -interchunk2 are used when the pair has 4-stage transfer |
|||
* -postchunk runs up until the last stage of transfer |
|||
* -dgen run up until generation (using lt-proc -d to include debug symbols) |
|||
== Modes hacks == |
== Modes hacks == |
Revision as of 12:18, 13 June 2014
There are a few ways you can use pipelines in Apertium. One of them is Modes files. Modes files (typically called modes.xml
) are XML files (see modes.dtd) which specify which programs should be run and in what order. Normally each linguistic package has one of these files which specifies various ways in which you can use the data to perform translations.
See the modes file from es-ca for an example. The modes which do not say install="yes"
are only usable with the -d switch to apertium, these are typically used during development (eg. ca-es-anmor which only performs morphological analysis on Catalan and nothing else).
See Writing_Makefiles#Modes on how to ensure modes that say install="yes" are installed.
Naming conventions
The main translation mode is always named "from-to", e.g. "sme-nob". The debug modes each have a suffiks, e.g. "sme-nob-morph".
Common debug mode names:
- -anmor or -morph run the morphological analysers
- these are used equivalently
- -tagger or -disam run up until disambiguation
- what's the difference?
- what about when you have both morph and syn disambiguation?
- what about when you have both CG and prob disambiguation?
- -biltrans runs up until the bidix
- -lex runs up until lexical selection
- -transfer runs up until (1-stage) transfer
- -chunker runs up until the first stage of 3-or-more-stage transfer
- -interchunk runs up until the second stage of 3-stage transfer
- -interchunk1 and -interchunk2 are used when the pair has 4-stage transfer
- -postchunk runs up until the last stage of transfer
- -dgen run up until generation (using lt-proc -d to include debug symbols)
Modes hacks
Statistics mode
In order to get some statistical information about translations made using Apertium, we've hacked the main translation mode, pausing the pipeline just after disambiguation and saving the output into a temp file. After that, pipeline is resumed with temp file as stdin.
As an example, you can see the /broken/ pipeline for ca-es, installed as ca-es-estadistiques.mode
/usr/local/bin/lt-proc /usr/local/share/apertium/apertium-es-ca/ca-es.automorf.bin > $LOGSDIR$SEC.tmp; /usr/local/bin/apertium-tagger -g /usr/local/share/apertium/apertium-es-ca/ca-es.prob < $LOGSDIR$SEC.tmp \ |/usr/local/bin/apertium-pretransfer|/usr/local/bin/apertium-transfer /usr/local/share/apertium/apertium-es-ca/apertium-es-ca.trules-ca-es.xml \ /usr/local/share/apertium/apertium-es-ca/trules-ca-es.bin /usr/local/share/apertium/apertium-es-ca/ca-es.autobil.bin \ |/usr/local/bin/lt-proc $1 /usr/local/share/apertium/apertium-es-ca/ca-es.autogen.bin \ |/usr/local/bin/lt-proc -p /usr/local/share/apertium/apertium-es-ca/ca-es.autopgen.bin
And an example of calling apertium with this mode would be the following
LOGSDIR=~/logs/apertium/; SEC=`date +%s`; echo "Ara Apertium permet extraure estadístiques" | apertium ca-es-estadistiques
In that example, $LOGSDIR is a folder where the logs will be saved, and $SEC is an unique ID for that log.
When translation is done, we can process the log created in order to get statistics.
Mixed modes
See Mixed modes