Difference between revisions of "Surface forms in the pipe"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
  +
{{TOCD}}
  +
 
Currently the surface form is thrown away after the tagger. It might be handy to be able to keep it until transfer in order to be able to substitute things unknown to the bidix.
 
Currently the surface form is thrown away after the tagger. It might be handy to be able to keep it until transfer in order to be able to substitute things unknown to the bidix.
   
Line 46: Line 48:
 
^de<pr>+el<det><def><m><sg>/del$
 
^de<pr>+el<det><def><m><sg>/del$
 
</pre>
 
</pre>
  +
  +
==Questions==
  +
  +
* This will complicate the code for bidix lookup and for lexical selection (we'll need to be able to support '+' and output the product of each part

Revision as of 14:14, 22 June 2020

Contents

Currently the surface form is thrown away after the tagger. It might be handy to be able to keep it until transfer in order to be able to substitute things unknown to the bidix.

Another usage could be allowing surface-form embeddings, like those produced by word2vec to be used in the tagger and lexical selection modules. Lexical selection could also potentially use surface forms too.

Input:

Machiavelli took it for granted that would-be leaders naturally aim at glory or honor.

Morph:

^Machiavelli/Machiavelli<np><cog><sg>$ ^took/take<vblex><past>$ ^it/prpers<prn><subj><p3><nt><sg>/prpers<prn><obj><p3><nt><sg>$ ^for/for<cnjadv>/for<pr>$ ^granted/grant<vblex><pp>/grant<vblex><past>$ ^that/that<cnjsub>/that<det><dem><sg>/that<prn><dem><mf><sg>/that<prn><rel><an><mf><sp>$ ^would-be/would-be<adj>$ ^leaders/leader<n><pl>$ ^naturally/naturally<adv>$ ^aim at/aim<vblex><inf># at/aim<vblex><pres># at/aim<vblex><imp># at$ ^glory/glory<n><sg>$ ^or/or<cnjcoo>$ ^honor/honour<vblex><inf>/honour<vblex><pres>/honour<vblex><imp>/honour<n><sg>$^./.<sent>$

Tagger:
^Machiavelli/Machiavelli<np><cog><sg>$ ^took/take<vblex><past>$ ^it/prpers<prn><obj><p3><nt><sg>$ ^for/for<pr>$ ^granted/grant<vblex><pp>$ ^that/that<cnjsub>$ ^would-be/would-be<adj>$ ^leaders/leader<n><pl>$ ^naturally/naturally<adv>$ ^aim at/aim# at<vblex><pres>$ ^glory/glory<n><sg>$ ^or/or<cnjcoo>$ ^honor/honour<n><sg>$^./.<sent>$

Separable:
^Machiavelli/Machiavelli<np><cog><sg>$ ^took it for granted/take<vblex><past># for granted+prpers<prn><obj><p3><nt><sg>$ ^that/that<cnjsub>$ ^would-be/would-be<adj>$ ^leaders/leader<n><pl>$ ^naturally/naturally<adv>$ ^aim at/aim# at<vblex><pres>$ ^glory/glory<n><sg>$ ^or/or<cnjcoo>$ ^honor/honour<n><sg>$^./.<sent>$

Pretransfer:

Biltrans:
^Machiavelli/Machiavelli<np><cog><sg>/Machiavelli<np><cog>$ ^took it for granted/take# for granted<vblex><past>+prpers<prn><obj><p3><nt><sg>/dar# por hecho<vblex><past>+lo<prn><tn><p3><nt><sg>$ ^that/that<cnjsub>/que<cnjsub>$ ^would-be/would-be<adj>/@would-be<adj>$ ^leaders/leader<n><pl>/@leader<n><pl>$ ^naturally/naturally<adv>/naturalmente<adv>$ ^aim/aim<vblex><inf>/apuntar<vblex><inf>$ ^at/at<pr>/en<pr>$ ^glory/glory<n><sg>/gloria<n><f><sg>$ ^or/or<cnjcoo>/o<cnjcoo>$ ^honour/honour<n><sg>/honor<n><m><sg>$^./.<sent>/.<sent>$ 

Transfer:

^Machiavelli<np><cog>$ ^lo<prn><tn><p3><nt><sg>$ ^dar# por hecho<vblex><ifi><p3><sg>$ ^que<cnjsub>$ ^*would-be$ ^*leaders$ ^naturalmente<adv>$ ^apuntar<vblex><inf>$ ^en<pr>$ ^gloria<n><f><sg>$ ^o<cnjcoo>$ ^honor<n><m><sg>$^.<sent>/.<sent>$ 


Generation (?):

Machiavelli lo dio por hecho que *would-be *leaders apuntar en gloria o honor.

Potentially generation could output something like ^dar# por hecho<vblex><ifi><p3><sg>/dio por hecho$ but then how would postgeneration work? e.g. for

^de<pr>$ ^el<det><def><m><sg>$

Could it be:

Generation:
^de<pr>/de$ ^el<det><def><m><sg>/el$

Postgeneration:
^de<pr>+el<det><def><m><sg>/del$

Questions

  • This will complicate the code for bidix lookup and for lexical selection (we'll need to be able to support '+' and output the product of each part