Difference between revisions of "Apertium stream format"

From Apertium
Jump to navigation Jump to search
Line 11: Line 11:
 
* Plus symbol, '<code><nowiki>+</nowiki></code>' --
 
* Plus symbol, '<code><nowiki>+</nowiki></code>' --
 
* Tilde '<code><nowiki>~</nowiki></code>' -- Word needs treating by post-generator.
 
* Tilde '<code><nowiki>~</nowiki></code>' -- Word needs treating by post-generator.
  +
  +
==Formatted input==
  +
{{see-also|Superblanks}}
  +
  +
F = formatted text, T = text to be analysed.
  +
  +
Formatted text is ignored by all stages.
  +
  +
<pre>
  +
  +
[<em>]this is[<\/em> ]a[ <b>]test.[][<\/b>]
  +
  +
|____| |_______| |____| |_______|
  +
| | | |
  +
F F F F
  +
  +
[<em>]this is[<\/em> ]a[ <b>]test.[][<\/b>]
  +
|______| | |____|
  +
| | |
  +
T T T
  +
</pre>
   
 
==Analyses==
 
==Analyses==

Revision as of 22:16, 23 May 2008

This page describes the stream format used in the Apertium machine translation platform.

Special characters

  • Asterisk, '*' -- Unanalysed word.
  • At sign, '@' -- Untranslated lemma.
  • Hash sign, '#'
    • In morphological generation -- Unable to generate surface form from lexical unit.
    • In morphological analysis -- Start of inconditional part of multiword marker.
  • Plus symbol, '+' --
  • Tilde '~' -- Word needs treating by post-generator.

Formatted input

See also: Superblanks

F = formatted text, T = text to be analysed.

Formatted text is ignored by all stages.


[<em>]this is[<\/em> ]a[ <b>]test.[][<\/b>]

|____|       |_______| |____|     |_______|
   |            |        |            |
   F            F        F            F
    
[<em>]this is[<\/em> ]a[ <b>]test.[][<\/b>]
      |______|        |      |____|
          |           |        | 
          T           T        T

Analyses

S = surface form, L = lemma.


^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$

   |    | |________|
   S    L    TAGS
        |______|
        ANALISIS

|_____________________________________________|
          AMBIGUOUS LEXICAL UNIT

^vino<n><m><sg>$

|______________|
 DISAMBIGUATED
  LEXICAL UNIT

^dímelo/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><nt>/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><m><sg>$

                                 |____________________________________________|
                                                JOINED MORPHEMES

^take it away/take<vblex><sep><inf>+prpers<prn><obj><p3><nt><sg># away/take<vblex><sep><pres>+prpers<prn><obj><p3><nt><sg># away$

Chunks


^Verbcj<SV><vblex><ifi><p3><sg>{^come<vblex><ifi><p3><sg>$}$ ^pr<PREP>{^to<pr>$}$ ^det_nom<SN><f><sg>{^the<det><def><3>$ ^beach<n><3>$}$

   |   |______________________||__________________________|                                                          |
 CHUNK      CHUNK TAGS              LEXICAL UNITS IN                                                               LINKED
  NAME                                  THE CHUNK                                                                   TAG

   |________________________________________|
                       |
                     CHUNK

See also