Difference between revisions of "Apertium stream format"

From Apertium
Jump to navigation Jump to search
(IfFNnFDKftsXwrcyQxE)
Line 1: Line 1:
S1LoNU <a href="http://jwklfqaogmrk.com/">jwklfqaogmrk</a>, [url=http://nuqzxnefwmtw.com/]nuqzxnefwmtw[/url], [link=http://eoxbmlfscicm.com/]eoxbmlfscicm[/link], http://yvaprzpgqezi.com/
{{TOCD}}
This page describes the stream format used in the Apertium machine translation platform.

==Characters==

===Reserved===

Reserved characters should only appear escaped in the input stream unless they are part of a lexical unit, chunk or superblank.

* The characters <code>^</code> and <code>$</code> are reserved for delimiting lexical units
* The character <code>/</code> is reserved for delimiting analyses in ambiguous lexical units
* The characters <code>&lt;</code> and <code>&gt;</code> are reserved for encapsulating tags
* The characters <code>{</code> and <code>}</code> are reserved for delimiting chunks
* The character <code>\</code> is the escape character

===Special===

* Asterisk, '<code><nowiki>*</nowiki></code>' -- Unanalysed word.
* At sign, '<code><nowiki>@</nowiki></code>' -- Untranslated [[lemma]].
* Hash sign, '<code><nowiki>#</nowiki></code>'
** In morphological generation -- Unable to generate [[surface form]] from [[lexical unit]].
** In morphological analysis -- Start of invariable part of multiword marker.
* Plus symbol, '<code><nowiki>+</nowiki></code>' -- Joined lexical units
* Tilde '<code><nowiki>~</nowiki></code>' -- Word needs treating by post-generator.

==Formatted input==
{{see-also|Superblanks}}

F = formatted text, T = text to be analysed.

Formatted text is treated as a single whitespace by all stages.

<pre>

[<em>]this is[<\/em> ]a[ <b>]test.[][<\/b>]

|____| |_______| |____| |_______|
| | | |
F F F F
[<em>]this is[<\/em> ]a[ <b>]test.[][<\/b>]
|______| | |____|
| | |
T T T
</pre>

==Analyses==

S = surface form, L = lemma.

<pre>

^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$

| | |________|
S L TAGS
|______|
ANALISIS

|_____________________________________________|
AMBIGUOUS LEXICAL UNIT

^vino<n><m><sg>$

|______________|
DISAMBIGUATED
LEXICAL UNIT

^dímelo/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><nt>/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><m><sg>$

|____________________________________________|
JOINED MORPHEMES

^take it away/take<vblex><sep><inf>+prpers<prn><obj><p3><nt><sg># away/take<vblex><sep><pres>+prpers<prn><obj><p3><nt><sg># away$

|___| |_____|
| |
LEMMA HEAD LEMMA QUEUE

</pre>

==Chunks==
{{see-also|Chunks}}
<pre>

^Verbcj<SV><vblex><ifi><p3><sg>{^come<vblex><ifi><p3><sg>$}$ ^pr<PREP>{^to<pr>$}$ ^det_nom<SN><f><sg>{^the<det><def><3>$ ^beach<n><3>$}$

| |______________________||__________________________| |
CHUNK CHUNK TAGS LEXICAL UNITS IN LINKED
NAME THE CHUNK TAG

|________________________________________|
|
CHUNK



^det_nom<SN><f><sg>{^the<det><def><3>$ ^beach<n><3>$}$

|______________|
|
POINTERS TO CHUNK TAGS
<1> <2> <3>
</pre>

==See also==

* [[List of symbols]]



[[Category:Documentation]]
[[Category:Formats]]

Revision as of 10:17, 22 June 2010

S1LoNU <a href="http://jwklfqaogmrk.com/">jwklfqaogmrk</a>, [url=http://nuqzxnefwmtw.com/]nuqzxnefwmtw[/url], [link=http://eoxbmlfscicm.com/]eoxbmlfscicm[/link], http://yvaprzpgqezi.com/