Difference between revisions of "Apertium system architecture"

From Apertium
Jump to navigation Jump to search
(oops)
 
(19 intermediate revisions by 3 users not shown)
Line 1: Line 1:
  +
The Apertium RBMT system uses a "pipeline", with individual stages performing specific tasks before that stage's output is passed along to the next stage. The page provides an [[#The pipeline|overview of the pipeline]], information about [[#The stages|how the data in each stage is stored]], and an [[#Example translation at each stage|example of data being passed through the pipeline]] to create an output translation.
  +
 
== The pipeline ==
 
== The pipeline ==
[[File:Apertium_system_architecture.png|1000px]]
+
[[File:Apertium_system_architecture.png|1200px]]
   
 
== The stages ==
 
== The stages ==
Line 14: Line 16:
 
!colspan="2"| morphological tagger
 
!colspan="2"| morphological tagger
 
| 2004
 
| 2004
|
+
|
 
| <code>xxx-yyy-tagger</code>, <code>xxx-tagger</code>
 
| <code>xxx-yyy-tagger</code>, <code>xxx-tagger</code>
  +
| —
|
 
 
|-
 
|-
 
!colspan="2"| morphological analysis
 
!colspan="2"| morphological analysis
Line 30: Line 32:
 
| [[Constraint Grammar]]
 
| [[Constraint Grammar]]
 
|-
 
|-
!colspan="2"| discontiguous multiword processing
+
!colspan="2"| discontiguous multiword assembly (optional)
| 2017, in progress
+
| 2017
 
| <code>apertium-xxx-yyy.xxx-yyy.lsx</code>
 
| <code>apertium-xxx-yyy.xxx-yyy.lsx</code>
 
| <code>xxx-yyy-autoseq</code>
 
| <code>xxx-yyy-autoseq</code>
Line 39: Line 41:
 
| 2004
 
| 2004
 
| <code>apertium-xxx-yyy.xxx-yyy.dix</code>
 
| <code>apertium-xxx-yyy.xxx-yyy.dix</code>
  +
| <code>xxx-yyy-biltrans</code>
|
 
  +
| [[Bilingual dictionary]]
|
 
 
|-
 
|-
 
!colspan="2"| lexical selection
 
!colspan="2"| lexical selection
 
| 2012
 
| 2012
 
| <code>apertium-xxx-yyy.xxx-yyy.lrx</code>
 
| <code>apertium-xxx-yyy.xxx-yyy.lrx</code>
  +
| <code>xxx-yyy-lex</code>
|
 
  +
| [[Lexical selection]]
|
 
 
|-
 
|-
  +
!colspan="2"| anaphora resolution (optional)
!rowspan="3"| structural transfer
 
  +
| 2019, in progress
  +
| <code>apertium-xxx-yyy.xxx-yyy.arx</code>
  +
| <code>xxx-yyy-anaphora</code>
  +
| [[Anaphora Resolution Module]]
 
|-
  +
!colspan="2"| pre-transfer
 
|
  +
| —
  +
| <code>xxx-yyy-pretransfer</code>
  +
| —
 
|-
 
!rowspan="3"| shallow structural transfer
 
! chunker
 
! chunker
  +
| 2006
|
 
 
| <code>apertium-xxx-yyy.xxx-yyy.t1x</code>
 
| <code>apertium-xxx-yyy.xxx-yyy.t1x</code>
  +
| <code>xxx-yyy-chunker</code>
|
 
  +
|rowspan="3" | [[Contributing to an existing pair#Adding structural transfer (grammar) rules]]
|
 
 
|-
 
|-
 
! interchunk
 
! interchunk
  +
| 2006
|
 
 
| <code>apertium-xxx-yyy.xxx-yyy.t2x</code>
 
| <code>apertium-xxx-yyy.xxx-yyy.t2x</code>
  +
| <code>xxx-yyy-interchunk</code>
|
 
|
 
 
|-
 
|-
 
! postchunk
 
! postchunk
  +
| 2006
|
 
 
| <code>apertium-xxx-yyy.xxx-yyy.t3x</code>
 
| <code>apertium-xxx-yyy.xxx-yyy.t3x</code>
  +
| <code>xxx-yyy-postchunk</code>
|
 
|
 
 
|-
 
|-
!colspan="2"| reverse discontiguous multiword processing
+
!colspan="2"| recursive structural transfer
| 2017, in progress
+
| 2019, in progress
  +
| <code>apertium-xxx-yyy.xxx-yyy.rtx</code>
  +
| <code>xxx-yyy-rectransfer</code>
  +
| [[Apertium-recursive]]
 
|-
  +
!colspan="2"| discontiguous multiword disassembly (optional)
  +
| 2017
 
| <code>apertium-xxx-yyy.yyy-xxx.lsx</code>
 
| <code>apertium-xxx-yyy.yyy-xxx.lsx</code>
 
| <code>xxx-yyy-revautoseq</code>
 
| <code>xxx-yyy-revautoseq</code>
Line 76: Line 94:
 
| 2004
 
| 2004
 
| <code>apertium-yyy.yyy.lexc</code> and<br /><code>apertium-yyy.yyy.twol</code> and<br /><code>apertium-yyy.yyy.twoc</code>,<br />OR <code>apertium-yyy.yyy.dix</code>
 
| <code>apertium-yyy.yyy.lexc</code> and<br /><code>apertium-yyy.yyy.twol</code> and<br /><code>apertium-yyy.yyy.twoc</code>,<br />OR <code>apertium-yyy.yyy.dix</code>
  +
| <code>xxx-yyy-dgen</code> or <code>xxx-yyy-gener</code> or <code>xxx-yyy-generador</code>
|
 
 
|
 
|
 
|-
 
|-
Line 82: Line 100:
 
| 2004
 
| 2004
 
| <code>apertium-yyy.post-yyy.dix</code>
 
| <code>apertium-yyy.post-yyy.dix</code>
  +
| <code>xxx-yyy-pgen</code>
|
 
  +
| [[Post-generator]]
|
 
 
|}
 
|}
   
Line 89: Line 107:
 
deformatter, reformatter
 
deformatter, reformatter
   
=== Example translation at each stage ===
+
== Example translation at each stage ==
  +
  +
Note that this example includes what all modules would do, even though the English-Spanish pair does not currently use all of them.
  +
  +
=== Input text ===
  +
John said he took the big plant out to the yard.
  +
  +
=== Morphological analyzer ===
  +
  +
^John/John<np><ant><m><sg>$ ^said/say<vblex><past>/say<vblex><pp>$ ^he/prpers<prn><subj><p3><m><sg>$ ^took/take<vblex><past>$ ^the/the<det><def><sp>$ ^big/big<adj><sint>$ ^plant/plant<n><sg>/plant<vblex><inf>/plant<vblex><pres>/plant<vblex><imp>$ ^out/out<adv>/out<pr>$ ^to/to<pr>$ ^the/the<det><def><sp>$ ^yard/yard<n><sg>$^./.<sent>$
  +
  +
=== Morphological disambiguator ===
  +
^John<np><ant><m><sg>$ ^say<vblex><past>$ ^prpers<prn><subj><p3><m><sg>$ ^take<vblex><past>$ ^the<det><def><sp>$ ^big<adj><sint>$ ^plant<n><sg>$ ^out<adv>$ ^to<pr>$ ^the<det><def><sp>$ ^yard<n><sg>$^./.<sent>$
  +
  +
=== Discontiguous multiword processing ===
  +
^John<np><ant><m><sg>$ ^say<vblex><past>$ ^prpers<prn><subj><p3><m><sg>$ ^take# out<vblex><past>$ ^the<det><def><sp>$ ^big<adj><sint>$ ^plant<n><sg>$^./.<sent>$
  +
  +
=== Lexical transfer ===
  +
^John<np><ant><m><sg>/John<np><ant><m><sg>$ ^say<vblex><past>/decir<vblex><past>$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^big<adj><sint>/grande<adj><mf>$ ^plant<n><sg>/planta<n><f><sg>/fábrica<n><f><sg>/maquinaria<n><f><sg>$ ^to<pr>/a<pr>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^yard<n><sg>/patio<n><m><sg>$^.<sent>/.<sent>$
  +
  +
=== Lexical selection ===
  +
^John<np><ant><m><sg>/John<np><ant><m><sg>$ ^say<vblex><past>/decir<vblex><past>$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^big<adj><sint>/grande<adj><mf>$ ^plant<n><sg>/planta<n><f><sg>$ ^to<pr>/a<pr>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^yard<n><sg>/patio<n><m><sg>$^.<sent>/.<sent>$
  +
  +
=== Anaphora resolution ===
  +
  +
^John<np><ant><m><sg>/John<np><ant><m><sg>/$ ^say<vblex><past>/decir<vblex><past>/$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>/John<np><ant><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>/$ ^the<det><def><sp>/el<det><def><GD><ND>/$ ^big<adj><sint>/grande<adj><mf>/$ ^plant<n><sg>/planta<n><f><sg>/$ ^to<pr>/a<pr>/$ ^the<det><def><sp>/el<det><def><GD><ND>/$ ^yard<n><sg>/patio<n><m><sg>/$^.<sent>/.<sent>/$
  +
  +
=== Structural transfer ===
  +
^John<np><ant><m><sg>$ ^decir<vblex><ifi><p3><sg>$ ^que<cnjsub>$ ^sacar<vblex><ifi><p3><sg>$ ^el<det><def><f><sg>$ ^planta<n><f><sg>$ ^grande<adj><mf><sg>$ ^a<pr>$ ^el<det><def><m><sg>$ ^patio<n><m><sg>$^.<sent>$
  +
  +
=== Morphological generator ===
  +
John dijo que sacó la planta grande ~a el patio.
  +
  +
=== Post-generator ===
  +
  +
John dijo que sacó la planta grande al patio.
   
 
== See also ==
 
== See also ==

Latest revision as of 17:54, 11 June 2020

The Apertium RBMT system uses a "pipeline", with individual stages performing specific tasks before that stage's output is passed along to the next stage. The page provides an overview of the pipeline, information about how the data in each stage is stored, and an example of data being passed through the pipeline to create an output translation.

The pipeline[edit]

Apertium system architecture.png

The stages[edit]

Linguistic data[edit]

stage introduced filenames mode documentation
morphological tagger 2004 xxx-yyy-tagger, xxx-tagger
morphological analysis 2004 apertium-xxx.xxx.lexc and
apertium-xxx.xxx.twol and
apertium-xxx.xxx.twoc,
OR apertium-xxx.xxx.dix
xxx-yyy-morph, xxx-morph
morphological disambiguation 2004, 2008 apertium-xxx.xxx.rlx xxx-yyy-disam, xxx-disam Constraint Grammar
discontiguous multiword assembly (optional) 2017 apertium-xxx-yyy.xxx-yyy.lsx xxx-yyy-autoseq Apertium separable
lexical transfer 2004 apertium-xxx-yyy.xxx-yyy.dix xxx-yyy-biltrans Bilingual dictionary
lexical selection 2012 apertium-xxx-yyy.xxx-yyy.lrx xxx-yyy-lex Lexical selection
anaphora resolution (optional) 2019, in progress apertium-xxx-yyy.xxx-yyy.arx xxx-yyy-anaphora Anaphora Resolution Module
pre-transfer xxx-yyy-pretransfer
shallow structural transfer chunker 2006 apertium-xxx-yyy.xxx-yyy.t1x xxx-yyy-chunker Contributing to an existing pair#Adding structural transfer (grammar) rules
interchunk 2006 apertium-xxx-yyy.xxx-yyy.t2x xxx-yyy-interchunk
postchunk 2006 apertium-xxx-yyy.xxx-yyy.t3x xxx-yyy-postchunk
recursive structural transfer 2019, in progress apertium-xxx-yyy.xxx-yyy.rtx xxx-yyy-rectransfer Apertium-recursive
discontiguous multiword disassembly (optional) 2017 apertium-xxx-yyy.yyy-xxx.lsx xxx-yyy-revautoseq Apertium separable
morphological generation 2004 apertium-yyy.yyy.lexc and
apertium-yyy.yyy.twol and
apertium-yyy.yyy.twoc,
OR apertium-yyy.yyy.dix
xxx-yyy-dgen or xxx-yyy-gener or xxx-yyy-generador
post-generation 2004 apertium-yyy.post-yyy.dix xxx-yyy-pgen Post-generator

Apertium-internal[edit]

deformatter, reformatter

Example translation at each stage[edit]

Note that this example includes what all modules would do, even though the English-Spanish pair does not currently use all of them.

Input text[edit]

John said he took the big plant out to the yard.

Morphological analyzer[edit]

^John/John<np><ant><m><sg>$ ^said/say<vblex><past>/say<vblex><pp>$ ^he/prpers<prn><subj><p3><m><sg>$ ^took/take<vblex><past>$ ^the/the<det><def><sp>$ ^big/big<adj><sint>$ ^plant/plant<n><sg>/plant<vblex><inf>/plant<vblex><pres>/plant<vblex><imp>$ ^out/out<adv>/out<pr>$ ^to/to<pr>$ ^the/the<det><def><sp>$ ^yard/yard<n><sg>$^./.<sent>$

Morphological disambiguator[edit]

^John<np><ant><m><sg>$ ^say<vblex><past>$ ^prpers<prn><subj><p3><m><sg>$ ^take<vblex><past>$ ^the<det><def><sp>$ ^big<adj><sint>$ ^plant<n><sg>$ ^out<adv>$ ^to<pr>$ ^the<det><def><sp>$ ^yard<n><sg>$^./.<sent>$

Discontiguous multiword processing[edit]

^John<np><ant><m><sg>$ ^say<vblex><past>$ ^prpers<prn><subj><p3><m><sg>$ ^take# out<vblex><past>$ ^the<det><def><sp>$ ^big<adj><sint>$ ^plant<n><sg>$^./.<sent>$

Lexical transfer[edit]

^John<np><ant><m><sg>/John<np><ant><m><sg>$ ^say<vblex><past>/decir<vblex><past>$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^big<adj><sint>/grande<adj><mf>$ ^plant<n><sg>/planta<n><f><sg>/fábrica<n><f><sg>/maquinaria<n><f><sg>$ ^to<pr>/a<pr>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^yard<n><sg>/patio<n><m><sg>$^.<sent>/.<sent>$

Lexical selection[edit]

^John<np><ant><m><sg>/John<np><ant><m><sg>$ ^say<vblex><past>/decir<vblex><past>$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^big<adj><sint>/grande<adj><mf>$ ^plant<n><sg>/planta<n><f><sg>$ ^to<pr>/a<pr>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^yard<n><sg>/patio<n><m><sg>$^.<sent>/.<sent>$

Anaphora resolution[edit]

^John<np><ant><m><sg>/John<np><ant><m><sg>/$ ^say<vblex><past>/decir<vblex><past>/$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>/John<np><ant><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>/$ ^the<det><def><sp>/el<det><def><GD><ND>/$ ^big<adj><sint>/grande<adj><mf>/$ ^plant<n><sg>/planta<n><f><sg>/$ ^to<pr>/a<pr>/$ ^the<det><def><sp>/el<det><def><GD><ND>/$ ^yard<n><sg>/patio<n><m><sg>/$^.<sent>/.<sent>/$

Structural transfer[edit]

^John<np><ant><m><sg>$ ^decir<vblex><ifi><p3><sg>$ ^que<cnjsub>$ ^sacar<vblex><ifi><p3><sg>$ ^el<det><def><f><sg>$ ^planta<n><f><sg>$ ^grande<adj><mf><sg>$ ^a<pr>$ ^el<det><def><m><sg>$ ^patio<n><m><sg>$^.<sent>$

Morphological generator[edit]

John dijo que sacó la planta grande ~a el patio.

Post-generator[edit]

John dijo que sacó la planta grande al patio.

See also[edit]