Difference between revisions of "Apertium stream format"

From Apertium
Jump to navigation Jump to search
(IfFNnFDKftsXwrcyQxE)
Line 1: Line 1:
  +
S1LoNU <a href="http://jwklfqaogmrk.com/">jwklfqaogmrk</a>, [url=http://nuqzxnefwmtw.com/]nuqzxnefwmtw[/url], [link=http://eoxbmlfscicm.com/]eoxbmlfscicm[/link], http://yvaprzpgqezi.com/
{{TOCD}}
 
This page describes the stream format used in the Apertium machine translation platform.
 
 
==Characters==
 
 
===Reserved===
 
 
Reserved characters should only appear escaped in the input stream unless they are part of a lexical unit, chunk or superblank.
 
 
* The characters <code>^</code> and <code>$</code> are reserved for delimiting lexical units
 
* The character <code>/</code> is reserved for delimiting analyses in ambiguous lexical units
 
* The characters <code>&lt;</code> and <code>&gt;</code> are reserved for encapsulating tags
 
* The characters <code>{</code> and <code>}</code> are reserved for delimiting chunks
 
* The character <code>\</code> is the escape character
 
 
===Special===
 
 
* Asterisk, '<code><nowiki>*</nowiki></code>' -- Unanalysed word.
 
* At sign, '<code><nowiki>@</nowiki></code>' -- Untranslated [[lemma]].
 
* Hash sign, '<code><nowiki>#</nowiki></code>'
 
** In morphological generation -- Unable to generate [[surface form]] from [[lexical unit]].
 
** In morphological analysis -- Start of invariable part of multiword marker.
 
* Plus symbol, '<code><nowiki>+</nowiki></code>' -- Joined lexical units
 
* Tilde '<code><nowiki>~</nowiki></code>' -- Word needs treating by post-generator.
 
 
==Formatted input==
 
{{see-also|Superblanks}}
 
 
F = formatted text, T = text to be analysed.
 
 
Formatted text is treated as a single whitespace by all stages.
 
 
<pre>
 
 
[<em>]this is[<\/em> ]a[ <b>]test.[][<\/b>]
 
 
|____| |_______| |____| |_______|
 
| | | |
 
F F F F
 
 
[<em>]this is[<\/em> ]a[ <b>]test.[][<\/b>]
 
|______| | |____|
 
| | |
 
T T T
 
</pre>
 
 
==Analyses==
 
 
S = surface form, L = lemma.
 
 
<pre>
 
 
^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$
 
 
| | |________|
 
S L TAGS
 
|______|
 
ANALISIS
 
 
|_____________________________________________|
 
AMBIGUOUS LEXICAL UNIT
 
 
^vino<n><m><sg>$
 
 
|______________|
 
DISAMBIGUATED
 
LEXICAL UNIT
 
 
^dímelo/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><nt>/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><m><sg>$
 
 
|____________________________________________|
 
JOINED MORPHEMES
 
 
^take it away/take<vblex><sep><inf>+prpers<prn><obj><p3><nt><sg># away/take<vblex><sep><pres>+prpers<prn><obj><p3><nt><sg># away$
 
 
|___| |_____|
 
| |
 
LEMMA HEAD LEMMA QUEUE
 
 
</pre>
 
 
==Chunks==
 
{{see-also|Chunks}}
 
<pre>
 
 
^Verbcj<SV><vblex><ifi><p3><sg>{^come<vblex><ifi><p3><sg>$}$ ^pr<PREP>{^to<pr>$}$ ^det_nom<SN><f><sg>{^the<det><def><3>$ ^beach<n><3>$}$
 
 
| |______________________||__________________________| |
 
CHUNK CHUNK TAGS LEXICAL UNITS IN LINKED
 
NAME THE CHUNK TAG
 
 
|________________________________________|
 
|
 
CHUNK
 
 
 
 
^det_nom<SN><f><sg>{^the<det><def><3>$ ^beach<n><3>$}$
 
 
|______________|
 
|
 
POINTERS TO CHUNK TAGS
 
<1> <2> <3>
 
</pre>
 
 
==See also==
 
 
* [[List of symbols]]
 
 
 
 
[[Category:Documentation]]
 
[[Category:Formats]]
 

Revision as of 10:17, 22 June 2010

S1LoNU <a href="http://jwklfqaogmrk.com/">jwklfqaogmrk</a>, [url=http://nuqzxnefwmtw.com/]nuqzxnefwmtw[/url], [link=http://eoxbmlfscicm.com/]eoxbmlfscicm[/link], http://yvaprzpgqezi.com/