Difference between revisions of "Apertium system architecture"
Jump to navigation
Jump to search
Firespeaker (talk | contribs) |
Popcorndude (talk | contribs) (→Example translation at each stage: add anaphora) |
||
(17 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
== The pipeline == |
== The pipeline == |
||
− | [[File:Apertium_system_architecture.png| |
+ | [[File:Apertium_system_architecture.png|1200px]] |
== The stages == |
== The stages == |
||
Line 14: | Line 14: | ||
!colspan="2"| morphological tagger |
!colspan="2"| morphological tagger |
||
| 2004 |
| 2004 |
||
− | | |
+ | | — |
| <code>xxx-yyy-tagger</code>, <code>xxx-tagger</code> |
| <code>xxx-yyy-tagger</code>, <code>xxx-tagger</code> |
||
+ | | — |
||
⚫ | |||
|- |
|- |
||
!colspan="2"| morphological analysis |
!colspan="2"| morphological analysis |
||
Line 30: | Line 30: | ||
| [[Constraint Grammar]] |
| [[Constraint Grammar]] |
||
|- |
|- |
||
− | !colspan="2"| discontiguous multiword |
+ | !colspan="2"| discontiguous multiword assembly (optional) |
− | | 2017 |
+ | | 2017 |
| <code>apertium-xxx-yyy.xxx-yyy.lsx</code> |
| <code>apertium-xxx-yyy.xxx-yyy.lsx</code> |
||
| <code>xxx-yyy-autoseq</code> |
| <code>xxx-yyy-autoseq</code> |
||
Line 39: | Line 39: | ||
| 2004 |
| 2004 |
||
| <code>apertium-xxx-yyy.xxx-yyy.dix</code> |
| <code>apertium-xxx-yyy.xxx-yyy.dix</code> |
||
+ | | <code>xxx-yyy-biltrans</code> |
||
⚫ | |||
+ | | [[Bilingual dictionary]] |
||
⚫ | |||
|- |
|- |
||
!colspan="2"| lexical selection |
!colspan="2"| lexical selection |
||
| 2012 |
| 2012 |
||
| <code>apertium-xxx-yyy.xxx-yyy.lrx</code> |
| <code>apertium-xxx-yyy.xxx-yyy.lrx</code> |
||
+ | | <code>xxx-yyy-lex</code> |
||
⚫ | |||
+ | | [[Lexical selection]] |
||
− | | |
||
|- |
|- |
||
+ | !colspan="2"| anaphora resolution (optional) |
||
⚫ | |||
+ | | 2019, in progress |
||
+ | | <code>apertium-xxx-yyy.xxx-yyy.arx</code> |
||
+ | | <code>xxx-yyy-anaphora</code> |
||
+ | | [[Anaphora Resolution Module]] |
||
⚫ | |||
+ | !colspan="2"| pre-transfer |
||
⚫ | |||
+ | | — |
||
+ | | <code>xxx-yyy-pretransfer</code> |
||
+ | | — |
||
⚫ | |||
⚫ | |||
! chunker |
! chunker |
||
+ | | 2006 |
||
− | | |
||
| <code>apertium-xxx-yyy.xxx-yyy.t1x</code> |
| <code>apertium-xxx-yyy.xxx-yyy.t1x</code> |
||
+ | | <code>xxx-yyy-chunker</code> |
||
− | | |
||
+ | |rowspan="3" | [[Contributing to an existing pair#Adding structural transfer (grammar) rules]] |
||
− | | |
||
|- |
|- |
||
! interchunk |
! interchunk |
||
+ | | 2006 |
||
− | | |
||
| <code>apertium-xxx-yyy.xxx-yyy.t2x</code> |
| <code>apertium-xxx-yyy.xxx-yyy.t2x</code> |
||
+ | | <code>xxx-yyy-interchunk</code> |
||
− | | |
||
− | | |
||
|- |
|- |
||
! postchunk |
! postchunk |
||
+ | | 2006 |
||
− | | |
||
| <code>apertium-xxx-yyy.xxx-yyy.t3x</code> |
| <code>apertium-xxx-yyy.xxx-yyy.t3x</code> |
||
+ | | <code>xxx-yyy-postchunk</code> |
||
− | | |
||
− | | |
||
|- |
|- |
||
− | !colspan="2"| |
+ | !colspan="2"| recursive structural transfer |
− | | |
+ | | 2019, in progress |
+ | | <code>apertium-xxx-yyy.xxx-yyy.rtx</code> |
||
+ | | <code>xxx-yyy-rectransfer</code> |
||
+ | | [[Apertium-recursive]] |
||
⚫ | |||
+ | !colspan="2"| discontiguous multiword disassembly (optional) |
||
+ | | 2017 |
||
| <code>apertium-xxx-yyy.yyy-xxx.lsx</code> |
| <code>apertium-xxx-yyy.yyy-xxx.lsx</code> |
||
| <code>xxx-yyy-revautoseq</code> |
| <code>xxx-yyy-revautoseq</code> |
||
Line 76: | Line 92: | ||
| 2004 |
| 2004 |
||
| <code>apertium-yyy.yyy.lexc</code> and<br /><code>apertium-yyy.yyy.twol</code> and<br /><code>apertium-yyy.yyy.twoc</code>,<br />OR <code>apertium-yyy.yyy.dix</code> |
| <code>apertium-yyy.yyy.lexc</code> and<br /><code>apertium-yyy.yyy.twol</code> and<br /><code>apertium-yyy.yyy.twoc</code>,<br />OR <code>apertium-yyy.yyy.dix</code> |
||
+ | | <code>xxx-yyy-dgen</code> or <code>xxx-yyy-gener</code> or <code>xxx-yyy-generador</code> |
||
− | | |
||
| |
| |
||
|- |
|- |
||
Line 82: | Line 98: | ||
| 2004 |
| 2004 |
||
| <code>apertium-yyy.post-yyy.dix</code> |
| <code>apertium-yyy.post-yyy.dix</code> |
||
+ | | <code>xxx-yyy-pgen</code> |
||
− | | |
||
+ | | [[Post-generator]] |
||
− | | |
||
|} |
|} |
||
Line 89: | Line 105: | ||
deformatter, reformatter |
deformatter, reformatter |
||
− | + | == Example translation at each stage == |
|
+ | |||
+ | Note that this example includes what all modules would do, even though the English-Spanish pair does not currently use all of them. |
||
+ | |||
+ | === Input text === |
||
+ | John said he took the big plant out to the yard. |
||
+ | |||
+ | === Morphological analyzer === |
||
+ | |||
+ | ^John/John<np><ant><m><sg>$ ^said/say<vblex><past>/say<vblex><pp>$ ^he/prpers<prn><subj><p3><m><sg>$ ^took/take<vblex><past>$ ^the/the<det><def><sp>$ ^big/big<adj><sint>$ ^plant/plant<n><sg>/plant<vblex><inf>/plant<vblex><pres>/plant<vblex><imp>$ ^out/out<adv>/out<pr>$ ^to/to<pr>$ ^the/the<det><def><sp>$ ^yard/yard<n><sg>$^./.<sent>$ |
||
+ | |||
+ | === Morphological disambiguator === |
||
+ | ^John<np><ant><m><sg>$ ^say<vblex><past>$ ^prpers<prn><subj><p3><m><sg>$ ^take<vblex><past>$ ^the<det><def><sp>$ ^big<adj><sint>$ ^plant<n><sg>$ ^out<adv>$ ^to<pr>$ ^the<det><def><sp>$ ^yard<n><sg>$^./.<sent>$ |
||
+ | |||
+ | === Discontiguous multiword processing === |
||
+ | ^John<np><ant><m><sg>$ ^say<vblex><past>$ ^prpers<prn><subj><p3><m><sg>$ ^take# out<vblex><past>$ ^the<det><def><sp>$ ^big<adj><sint>$ ^plant<n><sg>$^./.<sent>$ |
||
+ | |||
+ | === Lexical transfer === |
||
+ | ^John<np><ant><m><sg>/John<np><ant><m><sg>$ ^say<vblex><past>/decir<vblex><past>$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^big<adj><sint>/grande<adj><mf>$ ^plant<n><sg>/planta<n><f><sg>/fábrica<n><f><sg>/maquinaria<n><f><sg>$ ^to<pr>/a<pr>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^yard<n><sg>/patio<n><m><sg>$^.<sent>/.<sent>$ |
||
+ | |||
+ | === Lexical selection === |
||
+ | ^John<np><ant><m><sg>/John<np><ant><m><sg>$ ^say<vblex><past>/decir<vblex><past>$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^big<adj><sint>/grande<adj><mf>$ ^plant<n><sg>/planta<n><f><sg>$ ^to<pr>/a<pr>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^yard<n><sg>/patio<n><m><sg>$^.<sent>/.<sent>$ |
||
+ | |||
+ | === Anaphora resolution === |
||
+ | |||
+ | ^John<np><ant><m><sg>/John<np><ant><m><sg>/$ ^say<vblex><past>/decir<vblex><past>/$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>/John<np><ant><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>/$ ^the<det><def><sp>/el<det><def><GD><ND>/$ ^big<adj><sint>/grande<adj><mf>/$ ^plant<n><sg>/planta<n><f><sg>/$ ^to<pr>/a<pr>/$ ^the<det><def><sp>/el<det><def><GD><ND>/$ ^yard<n><sg>/patio<n><m><sg>/$^.<sent>/.<sent>/$ |
||
+ | |||
+ | === Structural transfer === |
||
+ | ^John<np><ant><m><sg>$ ^decir<vblex><ifi><p3><sg>$ ^que<cnjsub>$ ^sacar<vblex><ifi><p3><sg>$ ^el<det><def><f><sg>$ ^planta<n><f><sg>$ ^grande<adj><mf><sg>$ ^a<pr>$ ^el<det><def><m><sg>$ ^patio<n><m><sg>$^.<sent>$ |
||
+ | |||
+ | === Morphological generator === |
||
+ | John dijo que sacó la planta grande ~a el patio. |
||
+ | |||
+ | === Post-generator === |
||
+ | |||
+ | John dijo que sacó la planta grande al patio. |
||
== See also == |
== See also == |
Revision as of 23:00, 18 May 2020
The pipeline
The stages
Linguistic data
stage | introduced | filenames | mode | documentation | |
---|---|---|---|---|---|
morphological tagger | 2004 | — | xxx-yyy-tagger , xxx-tagger
|
— | |
morphological analysis | 2004 | apertium-xxx.xxx.lexc andapertium-xxx.xxx.twol andapertium-xxx.xxx.twoc ,OR apertium-xxx.xxx.dix
|
xxx-yyy-morph , xxx-morph
|
||
morphological disambiguation | 2004, 2008 | apertium-xxx.xxx.rlx
|
xxx-yyy-disam , xxx-disam
|
Constraint Grammar | |
discontiguous multiword assembly (optional) | 2017 | apertium-xxx-yyy.xxx-yyy.lsx
|
xxx-yyy-autoseq
|
Apertium separable | |
lexical transfer | 2004 | apertium-xxx-yyy.xxx-yyy.dix
|
xxx-yyy-biltrans
|
Bilingual dictionary | |
lexical selection | 2012 | apertium-xxx-yyy.xxx-yyy.lrx
|
xxx-yyy-lex
|
Lexical selection | |
anaphora resolution (optional) | 2019, in progress | apertium-xxx-yyy.xxx-yyy.arx
|
xxx-yyy-anaphora
|
Anaphora Resolution Module | |
pre-transfer | — | xxx-yyy-pretransfer
|
— | ||
shallow structural transfer | chunker | 2006 | apertium-xxx-yyy.xxx-yyy.t1x
|
xxx-yyy-chunker
|
Contributing to an existing pair#Adding structural transfer (grammar) rules |
interchunk | 2006 | apertium-xxx-yyy.xxx-yyy.t2x
|
xxx-yyy-interchunk
| ||
postchunk | 2006 | apertium-xxx-yyy.xxx-yyy.t3x
|
xxx-yyy-postchunk
| ||
recursive structural transfer | 2019, in progress | apertium-xxx-yyy.xxx-yyy.rtx
|
xxx-yyy-rectransfer
|
Apertium-recursive | |
discontiguous multiword disassembly (optional) | 2017 | apertium-xxx-yyy.yyy-xxx.lsx
|
xxx-yyy-revautoseq
|
Apertium separable | |
morphological generation | 2004 | apertium-yyy.yyy.lexc andapertium-yyy.yyy.twol andapertium-yyy.yyy.twoc ,OR apertium-yyy.yyy.dix
|
xxx-yyy-dgen or xxx-yyy-gener or xxx-yyy-generador
|
||
post-generation | 2004 | apertium-yyy.post-yyy.dix
|
xxx-yyy-pgen
|
Post-generator |
Apertium-internal
deformatter, reformatter
Example translation at each stage
Note that this example includes what all modules would do, even though the English-Spanish pair does not currently use all of them.
Input text
John said he took the big plant out to the yard.
Morphological analyzer
^John/John<np><ant><m><sg>$ ^said/say<vblex><past>/say<vblex><pp>$ ^he/prpers<prn><subj><p3><m><sg>$ ^took/take<vblex><past>$ ^the/the<det><def><sp>$ ^big/big<adj><sint>$ ^plant/plant<n><sg>/plant<vblex><inf>/plant<vblex><pres>/plant<vblex><imp>$ ^out/out<adv>/out<pr>$ ^to/to<pr>$ ^the/the<det><def><sp>$ ^yard/yard<n><sg>$^./.<sent>$
Morphological disambiguator
^John<np><ant><m><sg>$ ^say<vblex><past>$ ^prpers<prn><subj><p3><m><sg>$ ^take<vblex><past>$ ^the<det><def><sp>$ ^big<adj><sint>$ ^plant<n><sg>$ ^out<adv>$ ^to<pr>$ ^the<det><def><sp>$ ^yard<n><sg>$^./.<sent>$
Discontiguous multiword processing
^John<np><ant><m><sg>$ ^say<vblex><past>$ ^prpers<prn><subj><p3><m><sg>$ ^take# out<vblex><past>$ ^the<det><def><sp>$ ^big<adj><sint>$ ^plant<n><sg>$^./.<sent>$
Lexical transfer
^John<np><ant><m><sg>/John<np><ant><m><sg>$ ^say<vblex><past>/decir<vblex><past>$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^big<adj><sint>/grande<adj><mf>$ ^plant<n><sg>/planta<n><f><sg>/fábrica<n><f><sg>/maquinaria<n><f><sg>$ ^to<pr>/a<pr>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^yard<n><sg>/patio<n><m><sg>$^.<sent>/.<sent>$
Lexical selection
^John<np><ant><m><sg>/John<np><ant><m><sg>$ ^say<vblex><past>/decir<vblex><past>$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^big<adj><sint>/grande<adj><mf>$ ^plant<n><sg>/planta<n><f><sg>$ ^to<pr>/a<pr>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^yard<n><sg>/patio<n><m><sg>$^.<sent>/.<sent>$
Anaphora resolution
^John<np><ant><m><sg>/John<np><ant><m><sg>/$ ^say<vblex><past>/decir<vblex><past>/$ ^prpers<prn><subj><p3><m><sg>/prpers<prn><tn><p3><m><sg>/John<np><ant><m><sg>$ ^take# out<vblex><past>/sacar<vblex><past>/$ ^the<det><def><sp>/el<det><def><GD><ND>/$ ^big<adj><sint>/grande<adj><mf>/$ ^plant<n><sg>/planta<n><f><sg>/$ ^to<pr>/a<pr>/$ ^the<det><def><sp>/el<det><def><GD><ND>/$ ^yard<n><sg>/patio<n><m><sg>/$^.<sent>/.<sent>/$
Structural transfer
^John<np><ant><m><sg>$ ^decir<vblex><ifi><p3><sg>$ ^que<cnjsub>$ ^sacar<vblex><ifi><p3><sg>$ ^el<det><def><f><sg>$ ^planta<n><f><sg>$ ^grande<adj><mf><sg>$ ^a<pr>$ ^el<det><def><m><sg>$ ^patio<n><m><sg>$^.<sent>$
Morphological generator
John dijo que sacó la planta grande ~a el patio.
Post-generator
John dijo que sacó la planta grande al patio.