Difference between revisions of "Apertium stream format"
Jump to navigation
Jump to search
| Line 8: | Line 8: | ||
* Hash sign, '<code><nowiki>#</nowiki></code>' |
* Hash sign, '<code><nowiki>#</nowiki></code>' |
||
** In morphological generation -- Unable to generate [[surface form]] from [[lexical unit]]. |
** In morphological generation -- Unable to generate [[surface form]] from [[lexical unit]]. |
||
** In morphological analysis -- Start of |
** In morphological analysis -- Start of invariable part of multiword marker. |
||
* Plus symbol, '<code><nowiki>+</nowiki></code>' -- Joined lexical units |
* Plus symbol, '<code><nowiki>+</nowiki></code>' -- Joined lexical units |
||
* Tilde '<code><nowiki>~</nowiki></code>' -- Word needs treating by post-generator. |
* Tilde '<code><nowiki>~</nowiki></code>' -- Word needs treating by post-generator. |
||
Revision as of 19:04, 17 September 2009
This page describes the stream format used in the Apertium machine translation platform.
Special characters
- Asterisk, '
*' -- Unanalysed word. - At sign, '
@' -- Untranslated lemma. - Hash sign, '
#'- In morphological generation -- Unable to generate surface form from lexical unit.
- In morphological analysis -- Start of invariable part of multiword marker.
- Plus symbol, '
+' -- Joined lexical units - Tilde '
~' -- Word needs treating by post-generator.
Formatted input
- See also: Superblanks
F = formatted text, T = text to be analysed.
Formatted text is ignored by all stages.
[<em>]this is[<\/em> ]a[ <b>]test.[][<\/b>]
|____| |_______| |____| |_______|
| | | |
F F F F
[<em>]this is[<\/em> ]a[ <b>]test.[][<\/b>]
|______| | |____|
| | |
T T T
Analyses
S = surface form, L = lemma.
^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$
| | |________|
S L TAGS
|______|
ANALISIS
|_____________________________________________|
AMBIGUOUS LEXICAL UNIT
^vino<n><m><sg>$
|______________|
DISAMBIGUATED
LEXICAL UNIT
^dímelo/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><nt>/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><m><sg>$
|____________________________________________|
JOINED MORPHEMES
^take it away/take<vblex><sep><inf>+prpers<prn><obj><p3><nt><sg># away/take<vblex><sep><pres>+prpers<prn><obj><p3><nt><sg># away$
|___| |_____|
| |
LEMMA HEAD LEMMA QUEUE
Chunks
- See also: Chunks
^Verbcj<SV><vblex><ifi><p3><sg>{^come<vblex><ifi><p3><sg>$}$ ^pr<PREP>{^to<pr>$}$ ^det_nom<SN><f><sg>{^the<det><def><3>$ ^beach<n><3>$}$
| |______________________||__________________________| |
CHUNK CHUNK TAGS LEXICAL UNITS IN LINKED
NAME THE CHUNK TAG
|________________________________________|
|
CHUNK
^det_nom<SN><f><sg>{^the<det><def><3>$ ^beach<n><3>$}$
|______________|
|
POINTERS TO CHUNK TAGS
<1> <2> <3>