Difference between revisions of "Surface forms in the pipe"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
 
Currently the surface form is thrown away after the tagger. It might be handy to be able to keep it until transfer in order to be able to substitute things unknown to the bidix.
 
Currently the surface form is thrown away after the tagger. It might be handy to be able to keep it until transfer in order to be able to substitute things unknown to the bidix.
  +
  +
Another usage could be allowing surface-form embeddings, like those produced by <tt>word2vec</tt> to be used in the tagger and lexical selection modules. Lexical selection could also potentially use surface forms too.
   
 
<pre>
 
<pre>

Revision as of 14:12, 22 June 2020

Currently the surface form is thrown away after the tagger. It might be handy to be able to keep it until transfer in order to be able to substitute things unknown to the bidix.

Another usage could be allowing surface-form embeddings, like those produced by word2vec to be used in the tagger and lexical selection modules. Lexical selection could also potentially use surface forms too.

Input:

Machiavelli took it for granted that would-be leaders naturally aim at glory or honor.

Morph:

^Machiavelli/Machiavelli<np><cog><sg>$ ^took/take<vblex><past>$ ^it/prpers<prn><subj><p3><nt><sg>/prpers<prn><obj><p3><nt><sg>$ ^for/for<cnjadv>/for<pr>$ ^granted/grant<vblex><pp>/grant<vblex><past>$ ^that/that<cnjsub>/that<det><dem><sg>/that<prn><dem><mf><sg>/that<prn><rel><an><mf><sp>$ ^would-be/would-be<adj>$ ^leaders/leader<n><pl>$ ^naturally/naturally<adv>$ ^aim at/aim<vblex><inf># at/aim<vblex><pres># at/aim<vblex><imp># at$ ^glory/glory<n><sg>$ ^or/or<cnjcoo>$ ^honor/honour<vblex><inf>/honour<vblex><pres>/honour<vblex><imp>/honour<n><sg>$^./.<sent>$

Tagger:
^Machiavelli/Machiavelli<np><cog><sg>$ ^took/take<vblex><past>$ ^it/prpers<prn><obj><p3><nt><sg>$ ^for/for<pr>$ ^granted/grant<vblex><pp>$ ^that/that<cnjsub>$ ^would-be/would-be<adj>$ ^leaders/leader<n><pl>$ ^naturally/naturally<adv>$ ^aim at/aim# at<vblex><pres>$ ^glory/glory<n><sg>$ ^or/or<cnjcoo>$ ^honor/honour<n><sg>$^./.<sent>$

Separable:
^Machiavelli/Machiavelli<np><cog><sg>$ ^took it for granted/take<vblex><past># for granted+prpers<prn><obj><p3><nt><sg>$ ^that/that<cnjsub>$ ^would-be/would-be<adj>$ ^leaders/leader<n><pl>$ ^naturally/naturally<adv>$ ^aim at/aim# at<vblex><pres>$ ^glory/glory<n><sg>$ ^or/or<cnjcoo>$ ^honor/honour<n><sg>$^./.<sent>$

Pretransfer:

Biltrans:
^Machiavelli/Machiavelli<np><cog><sg>/Machiavelli<np><cog>$ ^took it for granted/take# for granted<vblex><past>+prpers<prn><obj><p3><nt><sg>/dar# por hecho<vblex><past>+lo<prn><tn><p3><nt><sg>$ ^that/that<cnjsub>/que<cnjsub>$ ^would-be/would-be<adj>/@would-be<adj>$ ^leaders/leader<n><pl>/@leader<n><pl>$ ^naturally/naturally<adv>/naturalmente<adv>$ ^aim/aim<vblex><inf>/apuntar<vblex><inf>$ ^at/at<pr>/en<pr>$ ^glory/glory<n><sg>/gloria<n><f><sg>$ ^or/or<cnjcoo>/o<cnjcoo>$ ^honour/honour<n><sg>/honor<n><m><sg>$^./.<sent>/.<sent>$ 

Transfer:

^Machiavelli<np><cog>$ ^lo<prn><tn><p3><nt><sg>$ ^dar# por hecho<vblex><ifi><p3><sg>$ ^que<cnjsub>$ ^*would-be$ ^*leaders$ ^naturalmente<adv>$ ^apuntar<vblex><inf>$ ^en<pr>$ ^gloria<n><f><sg>$ ^o<cnjcoo>$ ^honor<n><m><sg>$^.<sent>/.<sent>$ 


Generation (?):

Machiavelli lo dio por hecho que *would-be *leaders apuntar en gloria o honor.

Potentially generation could output something like ^dar# por hecho<vblex><ifi><p3><sg>/dio por hecho$ but then how would postgeneration work? e.g. for

^de<pr>$ ^el<det><def><m><sg>$

Could it be:

Generation:
^de<pr>/de$ ^el<det><def><m><sg>/el$

Postgeneration:
^de<pr>+el<det><def><m><sg>/del$