Difference between revisions of "Apertium system architecture"

From Apertium
Jump to navigation Jump to search
(link to workflow reference)
 
(26 intermediate revisions by 3 users not shown)
Line 1: Line 1:
  +
The Apertium RBMT system uses a "pipeline", with individual stages performing specific tasks before that stage's output is passed along to the next stage. The page provides an [[#The pipeline|overview of the pipeline]], information about [[#The stages|how the data in each stage is stored]], and an [[#Example translation at each stage|example of data being passed through the pipeline]] to create an output translation.
  +
  +
Details of what many of these modules do can be found [[Workflow_reference|here]].
  +
 
== The pipeline ==
 
== The pipeline ==
[[File:Apertium_system_architecture.png|1000px]]
+
[[File:Apertium_system_architecture.png|1200px]]
   
 
== The stages ==
 
== The stages ==
Line 6: Line 10:
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! stage
+
!colspan="2"| stage
 
! introduced
 
! introduced
 
! filenames
 
! filenames
Line 12: Line 16:
 
! documentation
 
! documentation
 
|-
 
|-
! morphological tagger
+
!colspan="2"| morphological tagger
 
| 2004
 
| 2004
|
+
|
 
| <code>xxx-yyy-tagger</code>, <code>xxx-tagger</code>
 
| <code>xxx-yyy-tagger</code>, <code>xxx-tagger</code>
  +
| —
|
 
 
|-
 
|-
! morphological analysis
+
!colspan="2"| morphological analysis
 
| 2004
 
| 2004
 
| <code>apertium-xxx.xxx.lexc</code> and<br /><code>apertium-xxx.xxx.twol</code> and<br /> <code>apertium-xxx.xxx.twoc</code>,<br />OR <code>apertium-xxx.xxx.dix</code>
 
| <code>apertium-xxx.xxx.lexc</code> and<br /><code>apertium-xxx.xxx.twol</code> and<br /> <code>apertium-xxx.xxx.twoc</code>,<br />OR <code>apertium-xxx.xxx.dix</code>
Line 24: Line 28:
 
|
 
|
 
|-
 
|-
! morphological disambiguation
+
!colspan="2"| morphological disambiguation
 
| 2004, 2008
 
| 2004, 2008
 
| <code>apertium-xxx.xxx.rlx</code>
 
| <code>apertium-xxx.xxx.rlx</code>
 
| <code>xxx-yyy-disam</code>, <code>xxx-disam</code>
 
| <code>xxx-yyy-disam</code>, <code>xxx-disam</code>
  +
| [[Constraint Grammar]]
|
 
 
|-
 
|-
! discontiguous multiword processing
+
!colspan="2"| discontiguous multiword assembly (optional)
| 2017, in progress
+
| 2017
  +
| <code>apertium-xxx-yyy.xxx-yyy.lsx</code>
|
 
  +
| <code>xxx-yyy-autoseq</code>
|
 
| [[Lsx module]]
+
| [[Apertium separable]]
 
|-
 
|-
! lexical transfer
+
!colspan="2"| lexical transfer
 
| 2004
 
| 2004
 
| <code>apertium-xxx-yyy.xxx-yyy.dix</code>
 
| <code>apertium-xxx-yyy.xxx-yyy.dix</code>
  +
| <code>xxx-yyy-biltrans</code>
|
 
  +
| [[Bilingual dictionary]]
|
 
 
|-
 
|-
! lexical selection
+
!colspan="2"| lexical selection
 
| 2012
 
| 2012
 
| <code>apertium-xxx-yyy.xxx-yyy.lrx</code>
 
| <code>apertium-xxx-yyy.xxx-yyy.lrx</code>
  +
| <code>xxx-yyy-lex</code>
|
 
  +
| [[Lexical selection]]
|
 
 
|-
 
|-
  +
!colspan="2"| anaphora resolution (optional)
! structural transfer
 
  +
| 2019, in progress
|
 
  +
| <code>apertium-xxx-yyy.xxx-yyy.arx</code>
|
 
  +
| <code>xxx-yyy-anaphora</code>
|
 
  +
| [[Anaphora Resolution Module]]
  +
|-
  +
!colspan="2"| pre-transfer
 
|
 
|
  +
| —
  +
| <code>xxx-yyy-pretransfer</code>
  +
| —
 
|-
 
|-
  +
!rowspan="3"| shallow structural transfer
 
! chunker
 
! chunker
  +
| 2006
|
 
 
| <code>apertium-xxx-yyy.xxx-yyy.t1x</code>
 
| <code>apertium-xxx-yyy.xxx-yyy.t1x</code>
  +
| <code>xxx-yyy-chunker</code>
|
 
  +
|rowspan="3" | [[Contributing to an existing pair#Adding structural transfer (grammar) rules]]
|
 
 
|-
 
|-
 
! interchunk
 
! interchunk
  +
| 2006
|
 
 
| <code>apertium-xxx-yyy.xxx-yyy.t2x</code>
 
| <code>apertium-xxx-yyy.xxx-yyy.t2x</code>
  +
| <code>xxx-yyy-interchunk</code>
|
 
|
 
 
|-
 
|-
 
! postchunk
 
! postchunk
  +
| 2006
|
 
 
| <code>apertium-xxx-yyy.xxx-yyy.t3x</code>
 
| <code>apertium-xxx-yyy.xxx-yyy.t3x</code>
  +
| <code>xxx-yyy-postchunk</code>
|
 
|
 
 
|-
 
|-
  +
!colspan="2"| recursive structural transfer
! morphological generation
 
  +
| 2019, in progress
  +
| <code>apertium-xxx-yyy.xxx-yyy.rtx</code>
  +
| <code>xxx-yyy-rectransfer</code>
  +
| [[Apertium-recursive]]
  +
|-
  +
!colspan="2"| discontiguous multiword disassembly (optional)
  +
| 2017
  +
| <code>apertium-xxx-yyy.yyy-xxx.lsx</code>
  +
| <code>xxx-yyy-revautoseq</code>
  +
| [[Apertium separable]]
  +
|-
  +
!colspan="2"| morphological generation
 
| 2004
 
| 2004
 
| <code>apertium-yyy.yyy.lexc</code> and<br /><code>apertium-yyy.yyy.twol</code> and<br /><code>apertium-yyy.yyy.twoc</code>,<br />OR <code>apertium-yyy.yyy.dix</code>
 
| <code>apertium-yyy.yyy.lexc</code> and<br /><code>apertium-yyy.yyy.twol</code> and<br /><code>apertium-yyy.yyy.twoc</code>,<br />OR <code>apertium-yyy.yyy.dix</code>
  +
| <code>xxx-yyy-dgen</code> or <code>xxx-yyy-gener</code> or <code>xxx-yyy-generador</code>
|
 
 
|
 
|
 
|-
 
|-
! post-generation
+
!colspan="2"| post-generation
 
| 2004
 
| 2004
  +
| <code>apertium-yyy.post-yyy.dix</code>
|
 
  +
| <code>xxx-yyy-pgen</code>
|
 
  +
| [[Post-generator]]
|
 
 
|}
 
|}
   
Line 88: Line 109:
 
deformatter, reformatter
 
deformatter, reformatter
   
=== Example translation at each stage ===
+
== Example translation at each stage ==
  +
  +
Note that this example includes what all modules would do, even though the English-Spanish pair does not currently use all of them.
  +
  +
=== Input text ===
  +
John said he took the big plant out to the yard.
  +
  +
=== Morphological analyzer ===
  +
  +
^John/John<np><ant><m><sg>$ ^said/say<vblex><past>/say<vblex><pp>$ ^he/prpers<prn><subj><p3><m><sg>$ ^took/take<vblex><past>$ ^the/the<det><def><sp>$ ^big/big<adj><sint>$ ^plant/plant<n><sg>/plant<vblex><inf>/plant<vblex><pres>/plant<vblex><imp>$ ^out/out<adv>/out<pr>$ ^to/to<pr>$ ^the/the<det><def><sp>$ ^yard/yard<n><sg>$^./.<sent>$
  +
  +
=== Morphological disambiguator ===
  +
^John<np><ant><m><sg>$ ^say<vblex><past>$ ^prpers<prn><subj><p3><m><sg>$ ^take<vblex><past>$ ^the<det><def><sp>$ ^big<adj><sint>$ ^plant<n><sg>$ ^out<adv>$ ^to<pr>$ ^the<det><def><sp>$ ^yard<n><sg>$^./.<sent>$
  +
  +
=== Discontiguous multiword processing ===
  +
^John<np><ant><m><sg>$ ^say<vblex><past>$ ^prpers<prn><subj><p3><m><sg>$ ^take# out<vblex><past>$ ^the<det><def><sp>$ ^big<adj><sint>$ ^plant<n><sg>$^./.<sent>$
  +
  +
=== Lexical transfer ===
  +
^John<np><ant><m><sg>/John<np><ant><m><sg>$ ^say<vblex><past>/decir<vblex><past>$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^big<adj><sint>/grande<adj><mf>$ ^plant<n><sg>/planta<n><f><sg>/fábrica<n><f><sg>/maquinaria<n><f><sg>$ ^to<pr>/a<pr>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^yard<n><sg>/patio<n><m><sg>$^.<sent>/.<sent>$
  +
  +
=== Lexical selection ===
  +
^John<np><ant><m><sg>/John<np><ant><m><sg>$ ^say<vblex><past>/decir<vblex><past>$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^big<adj><sint>/grande<adj><mf>$ ^plant<n><sg>/planta<n><f><sg>$ ^to<pr>/a<pr>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^yard<n><sg>/patio<n><m><sg>$^.<sent>/.<sent>$
  +
  +
=== Anaphora resolution ===
  +
  +
^John<np><ant><m><sg>/John<np><ant><m><sg>/$ ^say<vblex><past>/decir<vblex><past>/$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>/John<np><ant><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>/$ ^the<det><def><sp>/el<det><def><GD><ND>/$ ^big<adj><sint>/grande<adj><mf>/$ ^plant<n><sg>/planta<n><f><sg>/$ ^to<pr>/a<pr>/$ ^the<det><def><sp>/el<det><def><GD><ND>/$ ^yard<n><sg>/patio<n><m><sg>/$^.<sent>/.<sent>/$
  +
  +
=== Structural transfer ===
  +
^John<np><ant><m><sg>$ ^decir<vblex><ifi><p3><sg>$ ^que<cnjsub>$ ^sacar<vblex><ifi><p3><sg>$ ^el<det><def><f><sg>$ ^planta<n><f><sg>$ ^grande<adj><mf><sg>$ ^a<pr>$ ^el<det><def><m><sg>$ ^patio<n><m><sg>$^.<sent>$
  +
  +
=== Morphological generator ===
  +
John dijo que sacó la planta grande ~a el patio.
  +
  +
=== Post-generator ===
  +
  +
John dijo que sacó la planta grande al patio.
   
 
== See also ==
 
== See also ==

Latest revision as of 14:35, 28 July 2020

The Apertium RBMT system uses a "pipeline", with individual stages performing specific tasks before that stage's output is passed along to the next stage. The page provides an overview of the pipeline, information about how the data in each stage is stored, and an example of data being passed through the pipeline to create an output translation.

Details of what many of these modules do can be found here.

The pipeline[edit]

Apertium system architecture.png

The stages[edit]

Linguistic data[edit]

stage introduced filenames mode documentation
morphological tagger 2004 xxx-yyy-tagger, xxx-tagger
morphological analysis 2004 apertium-xxx.xxx.lexc and
apertium-xxx.xxx.twol and
apertium-xxx.xxx.twoc,
OR apertium-xxx.xxx.dix
xxx-yyy-morph, xxx-morph
morphological disambiguation 2004, 2008 apertium-xxx.xxx.rlx xxx-yyy-disam, xxx-disam Constraint Grammar
discontiguous multiword assembly (optional) 2017 apertium-xxx-yyy.xxx-yyy.lsx xxx-yyy-autoseq Apertium separable
lexical transfer 2004 apertium-xxx-yyy.xxx-yyy.dix xxx-yyy-biltrans Bilingual dictionary
lexical selection 2012 apertium-xxx-yyy.xxx-yyy.lrx xxx-yyy-lex Lexical selection
anaphora resolution (optional) 2019, in progress apertium-xxx-yyy.xxx-yyy.arx xxx-yyy-anaphora Anaphora Resolution Module
pre-transfer xxx-yyy-pretransfer
shallow structural transfer chunker 2006 apertium-xxx-yyy.xxx-yyy.t1x xxx-yyy-chunker Contributing to an existing pair#Adding structural transfer (grammar) rules
interchunk 2006 apertium-xxx-yyy.xxx-yyy.t2x xxx-yyy-interchunk
postchunk 2006 apertium-xxx-yyy.xxx-yyy.t3x xxx-yyy-postchunk
recursive structural transfer 2019, in progress apertium-xxx-yyy.xxx-yyy.rtx xxx-yyy-rectransfer Apertium-recursive
discontiguous multiword disassembly (optional) 2017 apertium-xxx-yyy.yyy-xxx.lsx xxx-yyy-revautoseq Apertium separable
morphological generation 2004 apertium-yyy.yyy.lexc and
apertium-yyy.yyy.twol and
apertium-yyy.yyy.twoc,
OR apertium-yyy.yyy.dix
xxx-yyy-dgen or xxx-yyy-gener or xxx-yyy-generador
post-generation 2004 apertium-yyy.post-yyy.dix xxx-yyy-pgen Post-generator

Apertium-internal[edit]

deformatter, reformatter

Example translation at each stage[edit]

Note that this example includes what all modules would do, even though the English-Spanish pair does not currently use all of them.

Input text[edit]

John said he took the big plant out to the yard.

Morphological analyzer[edit]

^John/John<np><ant><m><sg>$ ^said/say<vblex><past>/say<vblex><pp>$ ^he/prpers<prn><subj><p3><m><sg>$ ^took/take<vblex><past>$ ^the/the<det><def><sp>$ ^big/big<adj><sint>$ ^plant/plant<n><sg>/plant<vblex><inf>/plant<vblex><pres>/plant<vblex><imp>$ ^out/out<adv>/out<pr>$ ^to/to<pr>$ ^the/the<det><def><sp>$ ^yard/yard<n><sg>$^./.<sent>$

Morphological disambiguator[edit]

^John<np><ant><m><sg>$ ^say<vblex><past>$ ^prpers<prn><subj><p3><m><sg>$ ^take<vblex><past>$ ^the<det><def><sp>$ ^big<adj><sint>$ ^plant<n><sg>$ ^out<adv>$ ^to<pr>$ ^the<det><def><sp>$ ^yard<n><sg>$^./.<sent>$

Discontiguous multiword processing[edit]

^John<np><ant><m><sg>$ ^say<vblex><past>$ ^prpers<prn><subj><p3><m><sg>$ ^take# out<vblex><past>$ ^the<det><def><sp>$ ^big<adj><sint>$ ^plant<n><sg>$^./.<sent>$

Lexical transfer[edit]

^John<np><ant><m><sg>/John<np><ant><m><sg>$ ^say<vblex><past>/decir<vblex><past>$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^big<adj><sint>/grande<adj><mf>$ ^plant<n><sg>/planta<n><f><sg>/fábrica<n><f><sg>/maquinaria<n><f><sg>$ ^to<pr>/a<pr>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^yard<n><sg>/patio<n><m><sg>$^.<sent>/.<sent>$

Lexical selection[edit]

^John<np><ant><m><sg>/John<np><ant><m><sg>$ ^say<vblex><past>/decir<vblex><past>$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^big<adj><sint>/grande<adj><mf>$ ^plant<n><sg>/planta<n><f><sg>$ ^to<pr>/a<pr>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^yard<n><sg>/patio<n><m><sg>$^.<sent>/.<sent>$

Anaphora resolution[edit]

^John<np><ant><m><sg>/John<np><ant><m><sg>/$ ^say<vblex><past>/decir<vblex><past>/$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>/John<np><ant><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>/$ ^the<det><def><sp>/el<det><def><GD><ND>/$ ^big<adj><sint>/grande<adj><mf>/$ ^plant<n><sg>/planta<n><f><sg>/$ ^to<pr>/a<pr>/$ ^the<det><def><sp>/el<det><def><GD><ND>/$ ^yard<n><sg>/patio<n><m><sg>/$^.<sent>/.<sent>/$

Structural transfer[edit]

^John<np><ant><m><sg>$ ^decir<vblex><ifi><p3><sg>$ ^que<cnjsub>$ ^sacar<vblex><ifi><p3><sg>$ ^el<det><def><f><sg>$ ^planta<n><f><sg>$ ^grande<adj><mf><sg>$ ^a<pr>$ ^el<det><def><m><sg>$ ^patio<n><m><sg>$^.<sent>$

Morphological generator[edit]

John dijo que sacó la planta grande ~a el patio.

Post-generator[edit]

John dijo que sacó la planta grande al patio.

See also[edit]