Surface forms in the pipe

From Apertium
Revision as of 14:12, 22 June 2020 by Francis Tyers (talk | contribs)
Jump to navigation Jump to search

Currently the surface form is thrown away after the tagger. It might be handy to be able to keep it until transfer in order to be able to substitute things unknown to the bidix.

Another usage could be allowing surface-form embeddings, like those produced by word2vec to be used in the tagger and lexical selection modules. Lexical selection could also potentially use surface forms too.

Input:

Machiavelli took it for granted that would-be leaders naturally aim at glory or honor.

Morph:

^Machiavelli/Machiavelli<np><cog><sg>$ ^took/take<vblex><past>$ ^it/prpers<prn><subj><p3><nt><sg>/prpers<prn><obj><p3><nt><sg>$ ^for/for<cnjadv>/for<pr>$ ^granted/grant<vblex><pp>/grant<vblex><past>$ ^that/that<cnjsub>/that<det><dem><sg>/that<prn><dem><mf><sg>/that<prn><rel><an><mf><sp>$ ^would-be/would-be<adj>$ ^leaders/leader<n><pl>$ ^naturally/naturally<adv>$ ^aim at/aim<vblex><inf># at/aim<vblex><pres># at/aim<vblex><imp># at$ ^glory/glory<n><sg>$ ^or/or<cnjcoo>$ ^honor/honour<vblex><inf>/honour<vblex><pres>/honour<vblex><imp>/honour<n><sg>$^./.<sent>$

Tagger:
^Machiavelli/Machiavelli<np><cog><sg>$ ^took/take<vblex><past>$ ^it/prpers<prn><obj><p3><nt><sg>$ ^for/for<pr>$ ^granted/grant<vblex><pp>$ ^that/that<cnjsub>$ ^would-be/would-be<adj>$ ^leaders/leader<n><pl>$ ^naturally/naturally<adv>$ ^aim at/aim# at<vblex><pres>$ ^glory/glory<n><sg>$ ^or/or<cnjcoo>$ ^honor/honour<n><sg>$^./.<sent>$

Separable:
^Machiavelli/Machiavelli<np><cog><sg>$ ^took it for granted/take<vblex><past># for granted+prpers<prn><obj><p3><nt><sg>$ ^that/that<cnjsub>$ ^would-be/would-be<adj>$ ^leaders/leader<n><pl>$ ^naturally/naturally<adv>$ ^aim at/aim# at<vblex><pres>$ ^glory/glory<n><sg>$ ^or/or<cnjcoo>$ ^honor/honour<n><sg>$^./.<sent>$

Pretransfer:

Biltrans:
^Machiavelli/Machiavelli<np><cog><sg>/Machiavelli<np><cog>$ ^took it for granted/take# for granted<vblex><past>+prpers<prn><obj><p3><nt><sg>/dar# por hecho<vblex><past>+lo<prn><tn><p3><nt><sg>$ ^that/that<cnjsub>/que<cnjsub>$ ^would-be/would-be<adj>/@would-be<adj>$ ^leaders/leader<n><pl>/@leader<n><pl>$ ^naturally/naturally<adv>/naturalmente<adv>$ ^aim/aim<vblex><inf>/apuntar<vblex><inf>$ ^at/at<pr>/en<pr>$ ^glory/glory<n><sg>/gloria<n><f><sg>$ ^or/or<cnjcoo>/o<cnjcoo>$ ^honour/honour<n><sg>/honor<n><m><sg>$^./.<sent>/.<sent>$ 

Transfer:

^Machiavelli<np><cog>$ ^lo<prn><tn><p3><nt><sg>$ ^dar# por hecho<vblex><ifi><p3><sg>$ ^que<cnjsub>$ ^*would-be$ ^*leaders$ ^naturalmente<adv>$ ^apuntar<vblex><inf>$ ^en<pr>$ ^gloria<n><f><sg>$ ^o<cnjcoo>$ ^honor<n><m><sg>$^.<sent>/.<sent>$ 


Generation (?):

Machiavelli lo dio por hecho que *would-be *leaders apuntar en gloria o honor.

Potentially generation could output something like ^dar# por hecho<vblex><ifi><p3><sg>/dio por hecho$ but then how would postgeneration work? e.g. for

^de<pr>$ ^el<det><def><m><sg>$

Could it be:

Generation:
^de<pr>/de$ ^el<det><def><m><sg>/el$

Postgeneration:
^de<pr>+el<det><def><m><sg>/del$