Difference between revisions of "Apertium stream format"
Jump to navigation
Jump to search
(IfFNnFDKftsXwrcyQxE) |
|||
Line 1: | Line 1: | ||
S1LoNU <a href="http://jwklfqaogmrk.com/">jwklfqaogmrk</a>, [url=http://nuqzxnefwmtw.com/]nuqzxnefwmtw[/url], [link=http://eoxbmlfscicm.com/]eoxbmlfscicm[/link], http://yvaprzpgqezi.com/ |
|||
{{TOCD}} |
|||
This page describes the stream format used in the Apertium machine translation platform. |
|||
==Characters== |
|||
===Reserved=== |
|||
Reserved characters should only appear escaped in the input stream unless they are part of a lexical unit, chunk or superblank. |
|||
* The characters <code>^</code> and <code>$</code> are reserved for delimiting lexical units |
|||
* The character <code>/</code> is reserved for delimiting analyses in ambiguous lexical units |
|||
* The characters <code><</code> and <code>></code> are reserved for encapsulating tags |
|||
* The characters <code>{</code> and <code>}</code> are reserved for delimiting chunks |
|||
* The character <code>\</code> is the escape character |
|||
===Special=== |
|||
* Asterisk, '<code><nowiki>*</nowiki></code>' -- Unanalysed word. |
|||
* At sign, '<code><nowiki>@</nowiki></code>' -- Untranslated [[lemma]]. |
|||
* Hash sign, '<code><nowiki>#</nowiki></code>' |
|||
** In morphological generation -- Unable to generate [[surface form]] from [[lexical unit]]. |
|||
** In morphological analysis -- Start of invariable part of multiword marker. |
|||
* Plus symbol, '<code><nowiki>+</nowiki></code>' -- Joined lexical units |
|||
* Tilde '<code><nowiki>~</nowiki></code>' -- Word needs treating by post-generator. |
|||
==Formatted input== |
|||
{{see-also|Superblanks}} |
|||
F = formatted text, T = text to be analysed. |
|||
Formatted text is treated as a single whitespace by all stages. |
|||
<pre> |
|||
[<em>]this is[<\/em> ]a[ <b>]test.[][<\/b>] |
|||
|____| |_______| |____| |_______| |
|||
| | | | |
|||
F F F F |
|||
[<em>]this is[<\/em> ]a[ <b>]test.[][<\/b>] |
|||
|______| | |____| |
|||
| | | |
|||
T T T |
|||
</pre> |
|||
==Analyses== |
|||
S = surface form, L = lemma. |
|||
<pre> |
|||
^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$ |
|||
| | |________| |
|||
S L TAGS |
|||
|______| |
|||
ANALISIS |
|||
|_____________________________________________| |
|||
AMBIGUOUS LEXICAL UNIT |
|||
^vino<n><m><sg>$ |
|||
|______________| |
|||
DISAMBIGUATED |
|||
LEXICAL UNIT |
|||
^dímelo/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><nt>/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><m><sg>$ |
|||
|____________________________________________| |
|||
JOINED MORPHEMES |
|||
^take it away/take<vblex><sep><inf>+prpers<prn><obj><p3><nt><sg># away/take<vblex><sep><pres>+prpers<prn><obj><p3><nt><sg># away$ |
|||
|___| |_____| |
|||
| | |
|||
LEMMA HEAD LEMMA QUEUE |
|||
</pre> |
|||
==Chunks== |
|||
{{see-also|Chunks}} |
|||
<pre> |
|||
^Verbcj<SV><vblex><ifi><p3><sg>{^come<vblex><ifi><p3><sg>$}$ ^pr<PREP>{^to<pr>$}$ ^det_nom<SN><f><sg>{^the<det><def><3>$ ^beach<n><3>$}$ |
|||
| |______________________||__________________________| | |
|||
CHUNK CHUNK TAGS LEXICAL UNITS IN LINKED |
|||
NAME THE CHUNK TAG |
|||
|________________________________________| |
|||
| |
|||
CHUNK |
|||
^det_nom<SN><f><sg>{^the<det><def><3>$ ^beach<n><3>$}$ |
|||
|______________| |
|||
| |
|||
POINTERS TO CHUNK TAGS |
|||
<1> <2> <3> |
|||
</pre> |
|||
==See also== |
|||
* [[List of symbols]] |
|||
[[Category:Documentation]] |
|||
[[Category:Formats]] |
Revision as of 10:17, 22 June 2010
S1LoNU <a href="http://jwklfqaogmrk.com/">jwklfqaogmrk</a>, [url=http://nuqzxnefwmtw.com/]nuqzxnefwmtw[/url], [link=http://eoxbmlfscicm.com/]eoxbmlfscicm[/link], http://yvaprzpgqezi.com/