Difference between revisions of "Apertium stream format"
Jump to navigation
Jump to search
(→Formatted input: direct link to a wiki page) |
(→Chunks: direct link to a wiki page) |
||
Line 81: | Line 81: | ||
==Chunks== |
==Chunks== |
||
{{see-also| |
{{see-also|Chunking}} |
||
<pre> |
<pre> |
||
Revision as of 23:27, 11 February 2012
This page describes the stream format used in the Apertium machine translation platform.
Characters
Reserved
Reserved characters should only appear escaped in the input stream unless they are part of a lexical unit, chunk or superblank.
- The characters
^
and$
are reserved for delimiting lexical units - The character
/
is reserved for delimiting analyses in ambiguous lexical units - The characters
<
and>
are reserved for encapsulating tags - The characters
{
and}
are reserved for delimiting chunks - The character
\
is the escape character
Special
- Asterisk, '
*
' -- Unanalysed word. - At sign, '
@
' -- Untranslated lemma. - Hash sign, '
#
'- In morphological generation -- Unable to generate surface form from lexical unit.
- In morphological analysis -- Start of invariable part of multiword marker.
- Plus symbol, '
+
' -- Joined lexical units - Tilde '
~
' -- Word needs treating by post-generator.
Formatted input
- See also: Format handling
F = formatted text, T = text to be analysed.
Formatted text is treated as a single whitespace by all stages.
[<em>]this is[<\/em> ]a[ <b>]test.[][<\/b>] |____| |_______| |____| |_______| | | | | F F F F [<em>]this is[<\/em> ]a[ <b>]test.[][<\/b>] |______| | |____| | | | T T T
Analyses
S = surface form, L = lemma.
^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$ | | |________| S L TAGS |______| ANALISIS |_____________________________________________| AMBIGUOUS LEXICAL UNIT ^vino<n><m><sg>$ |______________| DISAMBIGUATED LEXICAL UNIT ^dímelo/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><nt>/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><m><sg>$ |____________________________________________| JOINED MORPHEMES ^take it away/take<vblex><sep><inf>+prpers<prn><obj><p3><nt><sg># away/take<vblex><sep><pres>+prpers<prn><obj><p3><nt><sg># away$ |___| |_____| | | LEMMA HEAD LEMMA QUEUE
Chunks
- See also: Chunking
^Verbcj<SV><vblex><ifi><p3><sg>{^come<vblex><ifi><p3><sg>$}$ ^pr<PREP>{^to<pr>$}$ ^det_nom<SN><f><sg>{^the<det><def><3>$ ^beach<n><3>$}$ | |______________________||__________________________| | CHUNK CHUNK TAGS LEXICAL UNITS IN LINKED NAME THE CHUNK TAG |________________________________________| | CHUNK ^det_nom<SN><f><sg>{^the<det><def><3>$ ^beach<n><3>$}$ |______________| | POINTERS TO CHUNK TAGS <1> <2> <3>