Difference between revisions of "User:Junzay/Blank handling"
Line 92: | Line 92: | ||
==Repositories== |
==Repositories== |
||
Apertium: https://github.com/junaidiiith/apertium/tree/blank-handling |
Apertium: https://github.com/junaidiiith/apertium/tree/blank-handling |
||
<br/> |
|||
lttoolbox: https://github.com/junaidiiith/lttoolbox |
lttoolbox: https://github.com/junaidiiith/lttoolbox |
||
Revision as of 10:10, 15 August 2016
GsoC 2016 project
Code at https://github.com/junaidiiith/Apertium / https://github.com/junaidiiith/Apertium_Code
Contents
What works currently
The deformatter and the reformatter work. There's still more testing that needs to be done. The fst processor works fine to distribute the tags efficiently and correctly to the words. The pretransfer works fine with testing phase completed. The transfer, interchunk and post-chunk are completed, but still more testing needs to be done. This is how the chain works as of now:
Deformatter
The deformatter links every word with its inline tag before the word
Before deformatter:
<p><i>Hello brother</i> How are you <u>doing</u> Do you see <b>the point</b> I <u>couldn't</u> do it</p>
After deformatter:
[5][{1}]Hello brother[] How are you [{2}]doing[] Do you see [{3}]the point[] I [{4}]couldn't[] do it[6]
Lt-proc
lt-proc distributes the tags efficiently to all the words and also handles the inline tags across MWE's
After lt-proc:
[5][{1}]^Hello<ij>$[{1}]^brother<n><sg>$[] ^How<adv><itg>$ ^be<vbser><pres>$ ^prpers<prn><obj><p2><mf><sp>$ [{2}]^do<vblex><ger>$[] ^Do<vbdo><pres>$ ^prpers<prn><subj><p2><mf><sp>$ [{3}]^see<vblex><pres># the point$[] ^prpers<prn><subj><p1><mf><sg>$ [{4}]^can<vaux><past>+not<adv>$[] ^do<vbdo><pres>$ ^prpers<prn><subj><p3><nt><sg>$[6]
Pretransfer
The tags before a word lu involving '#' or '+' are distributed to the other words as well- eg [{4}]^can<vaux><past>$ [{4}]^not<adv>$
After pretransfer:
[5][{1}]^Hello<ij>$[{1}]^brother<n><sg>$[] ^How<adv><itg>$ ^be<vbser><pres>$ ^prpers<prn><obj><p2><mf><sp>$ [{2}]^do<vblex><ger>$[] ^Do<vbdo><pres>$ ^prpers<prn><subj><p2><mf><sp>$ [{3}]^see# the point<vblex><pres>$[] ^prpers<prn><subj><p1><mf><sg>$ [{4}]^can<vaux><past>$ [{4}]^not<adv>$[] ^do<vbdo><pres>$ ^prpers<prn><subj><p3><nt><sg>$[6]
Transfer
The inline tags are linked with each word inside the chunk
After transfer:
[5]^default<default>{[{1}]^Hola<ij>$[]}$^Nom<SN><UNDET><m><sg>{[{1}]^hermano<n><3><4>$[]}$ ^adv<adv><itg>{^Cómo<adv><itg>$}$ ^verbcj<SV><vbser><pri><p2><sg>{^ser<vbser><3><4><5>$ ^prpers<prn><subj><p2><mf><sg>$}$ ^ger<SV><vblex><ger><PD><ND>{[{2}]^hacer<vblex><3>$[]}$ ^prnsubj<SN><tn><p2><mf><sg>{^prpers<prn><2><p2><4><sg>$}$ ^verbcj<SV><vblex><pri><PD><ND>{[{3}]^coger<vblex><3><4><5># la gracia$[]}$ ^prnsubj<SN><tn><p1><mf><sg>{^prpers<prn><2><p1><4><sg>$}$ ^mod<SV><vbmod><cni><PD><ND>{[{4}]^poder<vbmod><3><4><5>$[]}$ ^adv<adv><NEG>{[{4}]^no<adv>$[]}$ ^prnsubj<SN><tn><p3><m><sg>{^prpers<prn><2><p3><4><sg>$}$ [6]
Interchunk
In interchunk all the superblanks corresponding to every chunk are output before the reordering of the chunk so as to avoid Superblank Reordering
After interchunk:
[5]^default<default>{[{1}]^Hola<ij>$[]}$ ^Nom<SN><PDET><m><sg>{[{1}]^hermano<n><3><4>$[]}$ ^adv<adv><itg>{^Cómo<adv><itg>$}$ ^verbcj<SV><vbser><pri><p2><sg>{^ser<vbser><3><4><5>$ ^prpers<prn><subj><p2><mf><sg>$}$ ^ger<SV><vblex><ger><PD><ND>{[{2}]^hacer<vblex><3>$[]}$ ^verbcj<SV><vblex><pri><p2><sg>{[{3}]^coger<vblex><3><4><5># la gracia$[]}$ ^mod<SV><vbmod><cni><p1><sg>{[{4}]^poder<vbmod><3><4><5>$[]}$ ^adv<adv><NEG>{[{4}]^no<adv>$[]}$ ^prnsubj<SN><tn><p3><m><sg>{^prpers<prn><2><p3><4><sg>$}$ [6]
Postchunk
After postchunk:
[5][{1}]^Hola<ij>$[] ^El<det><def><m><sg>$ [{1}]^hermano<n><m><sg>$ ^Cómo<adv><itg>$ ^ser<vbser><pri><p2><sg>$ [{2}]^hacer<vblex><ger>$ [{3}]^coger<vblex><pri><p2><sg># la gracia$ [{4}]^poder<vbmod><cni><p1><sg>$[] [{4}]^no<adv>$[] ^prpers<prn><tn><p3><m><sg>$ [6]
Generator
After generator
[5][{1}]Hola[] El [{1}]hermano Cómo eres [{2}]haciendo [{3}]coges la gracia [{4}]podría[] [{4}]no[] él [6]
Reformatter
The libtidy module beautifies the input and reformats it to give the output
<html> <head> <title></title> </head> <body> <p> <i>Hola</i> El <i>hermano Cómo eres</i> <u>haciendo</u> <b>coges la gracia</b> <u>podría</u> <u>no</u> él</p> </body> </html>
Repositories
Apertium: https://github.com/junaidiiith/apertium/tree/blank-handling
lttoolbox: https://github.com/junaidiiith/lttoolbox