Apertium stream format

From Apertium
Revision as of 14:10, 16 April 2008 by Francis Tyers (talk | contribs)
Jump to navigation Jump to search

This page describes the stream format used in the Apertium machine translation platform.

Special characters

  • Asterisk, '*' -- Unanalysed word.
  • At sign, '@' -- Untranslated lemma.
  • Hash sign, '#'
    • In morphological generation -- Unable to generate surface form from lexical unit.
    • In morphological analysis -- Start of inconditional part of multiword marker.
  • Plus symbol, '+' --
  • Tilde '~' -- Word needs treating by post-generator.

Analyses

S = surface form, L = lemma.


^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$

   |    | |________|
   S    L    TAGS
        |______|
        ANALISIS

|_____________________________________________|
          AMBIGUOUS LEXICAL UNIT

^vino<n><m><sg>$

|______________|
 DISAMBIGUATED
  LEXICAL UNIT

^dímelo/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><nt>/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><m><sg>$

                                 |____________________________________________|
                                                JOINED MORPHEMES

^take it away/take<vblex><sep><inf>+prpers<prn><obj><p3><nt><sg># away/take<vblex><sep><pres>+prpers<prn><obj><p3><nt><sg># away$

Chunks


^Verbcj<SV><vblex><ifi><p3><sg>{^come<vblex><ifi><p3><sg>$}$ ^pr<PREP>{^to<pr>$}$ ^det_nom<SN><f><sg>{^the<det><def><3>$ ^beach<n><3>$}$

   |   |______________________||__________________________|                                                          |
 CHUNK      CHUNK TAGS              LEXICAL UNITS IN                                                               LINKED
  NAME                                  THE CHUNK                                                                   TAG

   |________________________________________|
                       |
                     CHUNK

See also