Difference between revisions of "Apertium system architecture"
Jump to navigation
Jump to search
Firespeaker (talk | contribs) |
Popcorndude (talk | contribs) (link to workflow reference) |
||
(23 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
The Apertium RBMT system uses a "pipeline", with individual stages performing specific tasks before that stage's output is passed along to the next stage. The page provides an [[#The pipeline|overview of the pipeline]], information about [[#The stages|how the data in each stage is stored]], and an [[#Example translation at each stage|example of data being passed through the pipeline]] to create an output translation. |
|||
Details of what many of these modules do can be found [[Workflow_reference|here]]. |
|||
== The pipeline == |
== The pipeline == |
||
[[File:Apertium_system_architecture.png| |
[[File:Apertium_system_architecture.png|1200px]] |
||
== The stages == |
== The stages == |
||
Line 14: | Line 18: | ||
!colspan="2"| morphological tagger |
!colspan="2"| morphological tagger |
||
| 2004 |
| 2004 |
||
| |
| — |
||
| <code>xxx-yyy-tagger</code>, <code>xxx-tagger</code> |
| <code>xxx-yyy-tagger</code>, <code>xxx-tagger</code> |
||
| — |
|||
| |
|||
|- |
|- |
||
!colspan="2"| morphological analysis |
!colspan="2"| morphological analysis |
||
Line 30: | Line 34: | ||
| [[Constraint Grammar]] |
| [[Constraint Grammar]] |
||
|- |
|- |
||
!colspan="2"| discontiguous multiword |
!colspan="2"| discontiguous multiword assembly (optional) |
||
| 2017 |
| 2017 |
||
| <code>apertium-xxx-yyy.xxx-yyy.lsx</code> |
|||
| |
|||
| <code>xxx-yyy-autoseq</code> |
|||
| |
|||
| [[ |
| [[Apertium separable]] |
||
|- |
|- |
||
!colspan="2"| lexical transfer |
!colspan="2"| lexical transfer |
||
| 2004 |
| 2004 |
||
| <code>apertium-xxx-yyy.xxx-yyy.dix</code> |
| <code>apertium-xxx-yyy.xxx-yyy.dix</code> |
||
| <code>xxx-yyy-biltrans</code> |
|||
| |
|||
| [[Bilingual dictionary]] |
|||
| |
|||
|- |
|- |
||
!colspan="2"| lexical selection |
!colspan="2"| lexical selection |
||
| 2012 |
| 2012 |
||
| <code>apertium-xxx-yyy.xxx-yyy.lrx</code> |
| <code>apertium-xxx-yyy.xxx-yyy.lrx</code> |
||
| <code>xxx-yyy-lex</code> |
|||
| |
|||
| [[Lexical selection]] |
|||
| |
|||
|- |
|- |
||
!colspan="2"| anaphora resolution (optional) |
|||
!rowspan="3"| structural transfer |
|||
| 2019, in progress |
|||
| <code>apertium-xxx-yyy.xxx-yyy.arx</code> |
|||
| <code>xxx-yyy-anaphora</code> |
|||
| [[Anaphora Resolution Module]] |
|||
|- |
|||
!colspan="2"| pre-transfer |
|||
| |
|||
| — |
|||
| <code>xxx-yyy-pretransfer</code> |
|||
| — |
|||
|- |
|||
!rowspan="3"| shallow structural transfer |
|||
! chunker |
! chunker |
||
| 2006 |
|||
| |
|||
| <code>apertium-xxx-yyy.xxx-yyy.t1x</code> |
| <code>apertium-xxx-yyy.xxx-yyy.t1x</code> |
||
| <code>xxx-yyy-chunker</code> |
|||
| |
|||
|rowspan="3" | [[Contributing to an existing pair#Adding structural transfer (grammar) rules]] |
|||
| |
|||
|- |
|- |
||
! interchunk |
! interchunk |
||
| 2006 |
|||
| |
|||
| <code>apertium-xxx-yyy.xxx-yyy.t2x</code> |
| <code>apertium-xxx-yyy.xxx-yyy.t2x</code> |
||
| <code>xxx-yyy-interchunk</code> |
|||
| |
|||
| |
|||
|- |
|- |
||
! postchunk |
! postchunk |
||
| 2006 |
|||
| |
|||
| <code>apertium-xxx-yyy.xxx-yyy.t3x</code> |
| <code>apertium-xxx-yyy.xxx-yyy.t3x</code> |
||
| <code>xxx-yyy-postchunk</code> |
|||
| |
|||
| |
|- |
||
!colspan="2"| recursive structural transfer |
|||
| 2019, in progress |
|||
| <code>apertium-xxx-yyy.xxx-yyy.rtx</code> |
|||
| <code>xxx-yyy-rectransfer</code> |
|||
| [[Apertium-recursive]] |
|||
|- |
|||
!colspan="2"| discontiguous multiword disassembly (optional) |
|||
| 2017 |
|||
| <code>apertium-xxx-yyy.yyy-xxx.lsx</code> |
|||
| <code>xxx-yyy-revautoseq</code> |
|||
| [[Apertium separable]] |
|||
|- |
|- |
||
!colspan="2"| morphological generation |
!colspan="2"| morphological generation |
||
| 2004 |
| 2004 |
||
| <code>apertium-yyy.yyy.lexc</code> and<br /><code>apertium-yyy.yyy.twol</code> and<br /><code>apertium-yyy.yyy.twoc</code>,<br />OR <code>apertium-yyy.yyy.dix</code> |
| <code>apertium-yyy.yyy.lexc</code> and<br /><code>apertium-yyy.yyy.twol</code> and<br /><code>apertium-yyy.yyy.twoc</code>,<br />OR <code>apertium-yyy.yyy.dix</code> |
||
| <code>xxx-yyy-dgen</code> or <code>xxx-yyy-gener</code> or <code>xxx-yyy-generador</code> |
|||
| |
|||
| |
| |
||
|- |
|- |
||
!colspan="2"| post-generation |
!colspan="2"| post-generation |
||
| 2004 |
| 2004 |
||
| <code>apertium-yyy.post-yyy.dix</code> |
|||
| |
|||
| <code>xxx-yyy-pgen</code> |
|||
| |
|||
| [[Post-generator]] |
|||
| |
|||
|} |
|} |
||
Line 83: | Line 109: | ||
deformatter, reformatter |
deformatter, reformatter |
||
== Example translation at each stage == |
|||
Note that this example includes what all modules would do, even though the English-Spanish pair does not currently use all of them. |
|||
=== Input text === |
|||
John said he took the big plant out to the yard. |
|||
=== Morphological analyzer === |
|||
^John/John<np><ant><m><sg>$ ^said/say<vblex><past>/say<vblex><pp>$ ^he/prpers<prn><subj><p3><m><sg>$ ^took/take<vblex><past>$ ^the/the<det><def><sp>$ ^big/big<adj><sint>$ ^plant/plant<n><sg>/plant<vblex><inf>/plant<vblex><pres>/plant<vblex><imp>$ ^out/out<adv>/out<pr>$ ^to/to<pr>$ ^the/the<det><def><sp>$ ^yard/yard<n><sg>$^./.<sent>$ |
|||
=== Morphological disambiguator === |
|||
^John<np><ant><m><sg>$ ^say<vblex><past>$ ^prpers<prn><subj><p3><m><sg>$ ^take<vblex><past>$ ^the<det><def><sp>$ ^big<adj><sint>$ ^plant<n><sg>$ ^out<adv>$ ^to<pr>$ ^the<det><def><sp>$ ^yard<n><sg>$^./.<sent>$ |
|||
=== Discontiguous multiword processing === |
|||
^John<np><ant><m><sg>$ ^say<vblex><past>$ ^prpers<prn><subj><p3><m><sg>$ ^take# out<vblex><past>$ ^the<det><def><sp>$ ^big<adj><sint>$ ^plant<n><sg>$^./.<sent>$ |
|||
=== Lexical transfer === |
|||
^John<np><ant><m><sg>/John<np><ant><m><sg>$ ^say<vblex><past>/decir<vblex><past>$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^big<adj><sint>/grande<adj><mf>$ ^plant<n><sg>/planta<n><f><sg>/fábrica<n><f><sg>/maquinaria<n><f><sg>$ ^to<pr>/a<pr>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^yard<n><sg>/patio<n><m><sg>$^.<sent>/.<sent>$ |
|||
=== Lexical selection === |
|||
^John<np><ant><m><sg>/John<np><ant><m><sg>$ ^say<vblex><past>/decir<vblex><past>$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^big<adj><sint>/grande<adj><mf>$ ^plant<n><sg>/planta<n><f><sg>$ ^to<pr>/a<pr>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^yard<n><sg>/patio<n><m><sg>$^.<sent>/.<sent>$ |
|||
=== Anaphora resolution === |
|||
^John<np><ant><m><sg>/John<np><ant><m><sg>/$ ^say<vblex><past>/decir<vblex><past>/$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>/John<np><ant><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>/$ ^the<det><def><sp>/el<det><def><GD><ND>/$ ^big<adj><sint>/grande<adj><mf>/$ ^plant<n><sg>/planta<n><f><sg>/$ ^to<pr>/a<pr>/$ ^the<det><def><sp>/el<det><def><GD><ND>/$ ^yard<n><sg>/patio<n><m><sg>/$^.<sent>/.<sent>/$ |
|||
=== Structural transfer === |
|||
^John<np><ant><m><sg>$ ^decir<vblex><ifi><p3><sg>$ ^que<cnjsub>$ ^sacar<vblex><ifi><p3><sg>$ ^el<det><def><f><sg>$ ^planta<n><f><sg>$ ^grande<adj><mf><sg>$ ^a<pr>$ ^el<det><def><m><sg>$ ^patio<n><m><sg>$^.<sent>$ |
|||
=== Morphological generator === |
|||
John dijo que sacó la planta grande ~a el patio. |
|||
=== Post-generator === |
|||
John dijo que sacó la planta grande al patio. |
|||
== See also == |
== See also == |
Latest revision as of 14:35, 28 July 2020
The Apertium RBMT system uses a "pipeline", with individual stages performing specific tasks before that stage's output is passed along to the next stage. The page provides an overview of the pipeline, information about how the data in each stage is stored, and an example of data being passed through the pipeline to create an output translation.
Details of what many of these modules do can be found here.
The pipeline[edit]
The stages[edit]
Linguistic data[edit]
stage | introduced | filenames | mode | documentation | |
---|---|---|---|---|---|
morphological tagger | 2004 | — | xxx-yyy-tagger , xxx-tagger
|
— | |
morphological analysis | 2004 | apertium-xxx.xxx.lexc andapertium-xxx.xxx.twol andapertium-xxx.xxx.twoc ,OR apertium-xxx.xxx.dix
|
xxx-yyy-morph , xxx-morph
|
||
morphological disambiguation | 2004, 2008 | apertium-xxx.xxx.rlx
|
xxx-yyy-disam , xxx-disam
|
Constraint Grammar | |
discontiguous multiword assembly (optional) | 2017 | apertium-xxx-yyy.xxx-yyy.lsx
|
xxx-yyy-autoseq
|
Apertium separable | |
lexical transfer | 2004 | apertium-xxx-yyy.xxx-yyy.dix
|
xxx-yyy-biltrans
|
Bilingual dictionary | |
lexical selection | 2012 | apertium-xxx-yyy.xxx-yyy.lrx
|
xxx-yyy-lex
|
Lexical selection | |
anaphora resolution (optional) | 2019, in progress | apertium-xxx-yyy.xxx-yyy.arx
|
xxx-yyy-anaphora
|
Anaphora Resolution Module | |
pre-transfer | — | xxx-yyy-pretransfer
|
— | ||
shallow structural transfer | chunker | 2006 | apertium-xxx-yyy.xxx-yyy.t1x
|
xxx-yyy-chunker
|
Contributing to an existing pair#Adding structural transfer (grammar) rules |
interchunk | 2006 | apertium-xxx-yyy.xxx-yyy.t2x
|
xxx-yyy-interchunk
| ||
postchunk | 2006 | apertium-xxx-yyy.xxx-yyy.t3x
|
xxx-yyy-postchunk
| ||
recursive structural transfer | 2019, in progress | apertium-xxx-yyy.xxx-yyy.rtx
|
xxx-yyy-rectransfer
|
Apertium-recursive | |
discontiguous multiword disassembly (optional) | 2017 | apertium-xxx-yyy.yyy-xxx.lsx
|
xxx-yyy-revautoseq
|
Apertium separable | |
morphological generation | 2004 | apertium-yyy.yyy.lexc andapertium-yyy.yyy.twol andapertium-yyy.yyy.twoc ,OR apertium-yyy.yyy.dix
|
xxx-yyy-dgen or xxx-yyy-gener or xxx-yyy-generador
|
||
post-generation | 2004 | apertium-yyy.post-yyy.dix
|
xxx-yyy-pgen
|
Post-generator |
Apertium-internal[edit]
deformatter, reformatter
Example translation at each stage[edit]
Note that this example includes what all modules would do, even though the English-Spanish pair does not currently use all of them.
Input text[edit]
John said he took the big plant out to the yard.
Morphological analyzer[edit]
^John/John<np><ant><m><sg>$ ^said/say<vblex><past>/say<vblex><pp>$ ^he/prpers<prn><subj><p3><m><sg>$ ^took/take<vblex><past>$ ^the/the<det><def><sp>$ ^big/big<adj><sint>$ ^plant/plant<n><sg>/plant<vblex><inf>/plant<vblex><pres>/plant<vblex><imp>$ ^out/out<adv>/out<pr>$ ^to/to<pr>$ ^the/the<det><def><sp>$ ^yard/yard<n><sg>$^./.<sent>$
Morphological disambiguator[edit]
^John<np><ant><m><sg>$ ^say<vblex><past>$ ^prpers<prn><subj><p3><m><sg>$ ^take<vblex><past>$ ^the<det><def><sp>$ ^big<adj><sint>$ ^plant<n><sg>$ ^out<adv>$ ^to<pr>$ ^the<det><def><sp>$ ^yard<n><sg>$^./.<sent>$
Discontiguous multiword processing[edit]
^John<np><ant><m><sg>$ ^say<vblex><past>$ ^prpers<prn><subj><p3><m><sg>$ ^take# out<vblex><past>$ ^the<det><def><sp>$ ^big<adj><sint>$ ^plant<n><sg>$^./.<sent>$
Lexical transfer[edit]
^John<np><ant><m><sg>/John<np><ant><m><sg>$ ^say<vblex><past>/decir<vblex><past>$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^big<adj><sint>/grande<adj><mf>$ ^plant<n><sg>/planta<n><f><sg>/fábrica<n><f><sg>/maquinaria<n><f><sg>$ ^to<pr>/a<pr>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^yard<n><sg>/patio<n><m><sg>$^.<sent>/.<sent>$
Lexical selection[edit]
^John<np><ant><m><sg>/John<np><ant><m><sg>$ ^say<vblex><past>/decir<vblex><past>$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^big<adj><sint>/grande<adj><mf>$ ^plant<n><sg>/planta<n><f><sg>$ ^to<pr>/a<pr>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^yard<n><sg>/patio<n><m><sg>$^.<sent>/.<sent>$
Anaphora resolution[edit]
^John<np><ant><m><sg>/John<np><ant><m><sg>/$ ^say<vblex><past>/decir<vblex><past>/$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>/John<np><ant><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>/$ ^the<det><def><sp>/el<det><def><GD><ND>/$ ^big<adj><sint>/grande<adj><mf>/$ ^plant<n><sg>/planta<n><f><sg>/$ ^to<pr>/a<pr>/$ ^the<det><def><sp>/el<det><def><GD><ND>/$ ^yard<n><sg>/patio<n><m><sg>/$^.<sent>/.<sent>/$
Structural transfer[edit]
^John<np><ant><m><sg>$ ^decir<vblex><ifi><p3><sg>$ ^que<cnjsub>$ ^sacar<vblex><ifi><p3><sg>$ ^el<det><def><f><sg>$ ^planta<n><f><sg>$ ^grande<adj><mf><sg>$ ^a<pr>$ ^el<det><def><m><sg>$ ^patio<n><m><sg>$^.<sent>$
Morphological generator[edit]
John dijo que sacó la planta grande ~a el patio.
Post-generator[edit]
John dijo que sacó la planta grande al patio.