Difference between revisions of "User:Junzay/Blank handling"

From Apertium
Jump to navigation Jump to search
Line 4: Line 4:
   
 
==What works currently==
 
==What works currently==
The deformatter and the reformatter works for now. There's still more testing that needs to be done.
+
The deformatter and the reformatter work. There's still more testing that needs to be done.
 
The fst processor works fine to distribute the tags efficiently and correctly to the words.
 
The fst processor works fine to distribute the tags efficiently and correctly to the words.
 
The pretransfer works fine with testing phase completed.
 
The pretransfer works fine with testing phase completed.
 
The transfer, interchunk and post-chunk are completed, but still more testing needs to be done.
 
The transfer, interchunk and post-chunk are completed, but still more testing needs to be done.
 
This is how the chain works as of now:
 
This is how the chain works as of now:
  +
  +
==Deformatter==
  +
The deformatter links every word with its inline tag before the word
   
 
Before deformatter:
 
Before deformatter:
<pre><p><i>Hello man</i> tea <u>pot</u> <div><i>Just see the point she's got it</i><u> I couldn't do it</u></div></pre>
+
<pre><p><i>Hello brother</i> How are you <u>doing</u> Do you see <b>the point</b> I <u>couldn't</u> do it</p></pre>
   
 
After deformatter:
 
After deformatter:
<pre>[2][{5}]Post man[] tea [{11}] pot[3][{8}] Just see the point she's got it[][{9}] I couldn't do it [4]</pre>
+
<pre>[5][{1}]Hello brother[] How are you [{2}]doing[] Do you see [{3}]the point[] I [{4}]couldn't[] do it[6]</pre>
  +
  +
  +
==Lt-proc==
  +
lt-proc distributes the tags efficiently to all the words and also handles the inline tags across MWE's
  +
  +
After lt-proc:
  +
<pre>[5][{1}]^Hello<ij>$[{1}]^brother<n><sg>$[] ^How<adv><itg>$ ^be<vbser><pres>$ ^prpers<prn><obj><p2><mf><sp>$ [{2}]^do<vblex><ger>$[] ^Do<vbdo><pres>$ ^prpers<prn><subj><p2><mf><sp>$ [{3}]^see<vblex><pres># the point$[] ^prpers<prn><subj><p1><mf><sg>$ [{4}]^can<vaux><past>+not<adv>$[] ^do<vbdo><pres>$ ^prpers<prn><subj><p3><nt><sg>$[6]</pre>
  +
  +
==Pretransfer==
  +
  +
The tags before a word lu involving '#' or '+' are distributed to the other words as well- eg [{4}]^can<vaux><past>$ [{4}]^not<adv>$
   
 
After pretransfer:
 
After pretransfer:
<pre>[2][{5}]^Post<n><sg>$ [{5}]^man<n><sg>$[] ^tea<n><sg>$ [{11}]^pot<n><sg>$[3] [{8}]^Just<adv>$ [{8}]^see# the point<vblex><inf>$ [{8}]^prpers<prn><subj><p3><f><sg>$ [{8}]^have got<vblex><pri><p3><sg>$ [{8}]^prpers<prn><obj><p3><nt><sg>$[] [{9}]^prpers<prn><subj><p1><mf><sg>$ [{9}]^can<vaux><past>$ [{9}]^not<adv>$ [{9}]^do<vbdo><pres>$ [{9}]^prpers<prn><subj><p3><nt><sg>$[4] </pre>
+
[5][{1}]^Hello<ij>$[{1}]^brother<n><sg>$[] ^How<adv><itg>$ ^be<vbser><pres>$ ^prpers<prn><obj><p2><mf><sp>$ [{2}]^do<vblex><ger>$[] ^Do<vbdo><pres>$ ^prpers<prn><subj><p2><mf><sp>$ [{3}]^see# the point<vblex><pres>$[] ^prpers<prn><subj><p1><mf><sg>$ [{4}]^can<vaux><past>$ [{4}]^not<adv>$[] ^do<vbdo><pres>$ ^prpers<prn><subj><p3><nt><sg>$[6]
  +
  +
  +
==Transfer==
  +
The inline tags are linked with each word inside the chunk
   
 
After transfer:
 
After transfer:
<pre>[2]^Nom_pr_nom_pr_nom_pr_nom<SN><UNDET><m><sg>{[{11}]^tarro<n><3><4>$ [{5}]^de<pr>$[] ^<n><m><sg>$ [{5}]^de<pr>$ [{5}]^hombre<n><m><sg>$ [{5}]^de<pr>$ [{5}]^correo<n><m><sg>$}$[3] ^adv<adv>{[{8}]^Justo<adv>$[]}$ ^inf<SV><vblex><inf><PD><ND>{[{8}]^coger<vblex><3># la gracia$[]}$ ^prnsubj<SN><tn><p3><f><sg>{[{8}]^prpers<prn><2><p3><4><sg>$[]}$ ^pro_verbcj<SV><vblex><pri><p3><sg>{[{8}]^prpers<prn><pro><p3><m><sg>$ [{8}]^tener<vblex><3><4><5>$[]}$ ^prnsubj<SN><tn><p1><mf><sg>{[{9}]^prpers<prn><2><p1><4><sg>$[]}$ ^mod<SV><vbmod><cni><PD><ND>{[{9}]^poder<vbmod><3><4><5>$[]}$ ^adv<adv><NEG>{[{9}]^no<adv>$[]}$ ^prnsubj<SN><tn><p3><m><sg>{[{9}]^prpers<prn><2><p3><4><sg>$}$ [4]</pre>
+
<pre>[5]^default<default>{[{1}]^Hola<ij>$[]}$^Nom<SN><UNDET><m><sg>{[{1}]^hermano<n><3><4>$[]}$ ^adv<adv><itg>{^Cómo<adv><itg>$}$ ^verbcj<SV><vbser><pri><p2><sg>{^ser<vbser><3><4><5>$ ^prpers<prn><subj><p2><mf><sg>$}$ ^ger<SV><vblex><ger><PD><ND>{[{2}]^hacer<vblex><3>$[]}$ ^prnsubj<SN><tn><p2><mf><sg>{^prpers<prn><2><p2><4><sg>$}$ ^verbcj<SV><vblex><pri><PD><ND>{[{3}]^coger<vblex><3><4><5># la gracia$[]}$ ^prnsubj<SN><tn><p1><mf><sg>{^prpers<prn><2><p1><4><sg>$}$ ^mod<SV><vbmod><cni><PD><ND>{[{4}]^poder<vbmod><3><4><5>$[]}$ ^adv<adv><NEG>{[{4}]^no<adv>$[]}$ ^prnsubj<SN><tn><p3><m><sg>{^prpers<prn><2><p3><4><sg>$}$ [6]</pre>
  +
  +
==Interchunk==
  +
In interchunk all the superblanks corresponding to every chunk are output before the reordering of the chunk so as to avoid <b>Superblank Reordering</b>
   
 
After interchunk:
 
After interchunk:
<pre>[2]^Nom_pr_nom_pr_nom_pr_nom<SN><UNDET><m><sg>{[{11}]^tarro<n><3><4>$ [{5}]^de<pr>$[] ^<n><m><sg>$ [{5}]^de<pr>$ [{5}]^hombre<n><m><sg>$ [{5}]^de<pr>$ [{5}]^correo<n><m><sg>$}$ [3] ^adv<adv> {[{8}]^Justo<adv>$[]}$ ^inf<SV><vblex><pri><p3><sg>{[{8}]^coger<vblex><3># la gracia$[]}$ ^pro_verbcj<SV><vblex><prs><p3><sg>{[{8}]^prpers<prn><pro><p3><m><sg>$ [{8}]^tener<vblex><3><4><5>$[]}$ ^mod<SV><vbmod><cni><p1><sg>{[{9}]^poder<vbmod><3><4><5>$[]}$ ^adv<adv><NEG>{[{9}]^no<adv>$[]}$ ^prnsubj<SN><tn><p3><m><sg>{[{9}]^prpers<prn><2><p3><4><sg>$}$ [4]</pre>
+
<pre>[5]^default<default>{[{1}]^Hola<ij>$[]}$ ^Nom<SN><PDET><m><sg>{[{1}]^hermano<n><3><4>$[]}$ ^adv<adv><itg>{^Cómo<adv><itg>$}$ ^verbcj<SV><vbser><pri><p2><sg>{^ser<vbser><3><4><5>$ ^prpers<prn><subj><p2><mf><sg>$}$ ^ger<SV><vblex><ger><PD><ND>{[{2}]^hacer<vblex><3>$[]}$ ^verbcj<SV><vblex><pri><p2><sg>{[{3}]^coger<vblex><3><4><5># la gracia$[]}$ ^mod<SV><vbmod><cni><p1><sg>{[{4}]^poder<vbmod><3><4><5>$[]}$ ^adv<adv><NEG>{[{4}]^no<adv>$[]}$ ^prnsubj<SN><tn><p3><m><sg>{^prpers<prn><2><p3><4><sg>$}$ [6]</pre>
   
  +
  +
==Postchunk==
 
After postchunk:
 
After postchunk:
<pre>[2][{11}]^Tarro<n><m><sg>$ [{5}]^de<pr>$ ^té<n><m><sg>$ [{5}]^de<pr>$ [{5}]^hombre<n><m><sg>$ [{5}]^de<pr>$ [{5}]^correo<n><m><sg>$ [3][{8}]^Justo<adv>$ [{8}]^coger<vblex><pri><p3><sg># la gracia$ [{8}]^prpers<prn><pro><p3><m><sg>$ [{8}]^tener<vblex><prs><p3><sg>$[] [{9}]^poder<vbmod><cni><p1><sg>$ [{9}]^no<adv>$[{9}]^prpers<prn><tn><p3><m><sg>$ [4]</pre>
+
<pre>[5][{1}]^Hola<ij>$[] ^El<det><def><m><sg>$ [{1}]^hermano<n><m><sg>$ ^Cómo<adv><itg>$ ^ser<vbser><pri><p2><sg>$ [{2}]^hacer<vblex><ger>$ [{3}]^coger<vblex><pri><p2><sg># la gracia$ [{4}]^poder<vbmod><cni><p1><sg>$[] [{4}]^no<adv>$[] ^prpers<prn><tn><p3><m><sg>$ [6]</pre>
  +
  +
==Generator==
  +
After generator
  +
<pre>[5][{1}]Hola[] El [{1}]hermano Cómo eres [{2}]haciendo [{3}]coges la gracia [{4}]podría[] [{4}]no[] él [6]</pre>
   
  +
==Reformatter==
After reformatter:
 
  +
The libtidy module beautifies the input and reformats it to give the output
<pre><p><u>Tarro</u> <i>the</i> té <i>de hombre de correo </i> <div><i>Justo coger la gracias prpers tener </i><u> poder no prpers</u></div> </pre>
 
  +
<pre><html>
  +
<head>
  +
<title></title>
  +
</head>
  +
<body>
  +
<p>
  +
<i>Hola</i> El
  +
<i>hermano Cómo eres</i>
  +
<u>haciendo</u>
  +
<b>coges la gracia</b>
  +
<u>podría</u>
  +
<u>no</u> él</p>
  +
</body>
  +
</html>
  +
</pre>
   
 
==See also==
 
==See also==

Revision as of 10:41, 13 August 2016

GsoC 2016 project

Code at https://github.com/junaidiiith/Apertium / https://github.com/junaidiiith/Apertium_Code

What works currently

The deformatter and the reformatter work. There's still more testing that needs to be done. The fst processor works fine to distribute the tags efficiently and correctly to the words. The pretransfer works fine with testing phase completed. The transfer, interchunk and post-chunk are completed, but still more testing needs to be done. This is how the chain works as of now:

Deformatter

The deformatter links every word with its inline tag before the word

Before deformatter:

<p><i>Hello brother</i> How are you <u>doing</u> Do you see <b>the point</b> I <u>couldn't</u> do it</p>

After deformatter:

[5][{1}]Hello brother[] How are you [{2}]doing[] Do you see [{3}]the point[] I [{4}]couldn't[] do it[6]


Lt-proc

lt-proc distributes the tags efficiently to all the words and also handles the inline tags across MWE's

After lt-proc:

[5][{1}]^Hello<ij>$[{1}]^brother<n><sg>$[] ^How<adv><itg>$ ^be<vbser><pres>$ ^prpers<prn><obj><p2><mf><sp>$ [{2}]^do<vblex><ger>$[] ^Do<vbdo><pres>$ ^prpers<prn><subj><p2><mf><sp>$ [{3}]^see<vblex><pres># the point$[] ^prpers<prn><subj><p1><mf><sg>$ [{4}]^can<vaux><past>+not<adv>$[] ^do<vbdo><pres>$ ^prpers<prn><subj><p3><nt><sg>$[6]

Pretransfer

The tags before a word lu involving '#' or '+' are distributed to the other words as well- eg [{4}]^can<vaux><past>$ [{4}]^not<adv>$

After pretransfer: [5][{1}]^Hello<ij>$[{1}]^brother<n><sg>$[] ^How<adv><itg>$ ^be<vbser><pres>$ ^prpers<prn><obj><p2><mf><sp>$ [{2}]^do<vblex><ger>$[] ^Do<vbdo><pres>$ ^prpers<prn><subj><p2><mf><sp>$ [{3}]^see# the point<vblex><pres>$[] ^prpers<prn><subj><p1><mf><sg>$ [{4}]^can<vaux><past>$ [{4}]^not<adv>$[] ^do<vbdo><pres>$ ^prpers<prn><subj><p3><nt><sg>$[6]


Transfer

The inline tags are linked with each word inside the chunk

After transfer:

[5]^default<default>{[{1}]^Hola<ij>$[]}$^Nom<SN><UNDET><m><sg>{[{1}]^hermano<n><3><4>$[]}$ ^adv<adv><itg>{^Cómo<adv><itg>$}$ ^verbcj<SV><vbser><pri><p2><sg>{^ser<vbser><3><4><5>$ ^prpers<prn><subj><p2><mf><sg>$}$ ^ger<SV><vblex><ger><PD><ND>{[{2}]^hacer<vblex><3>$[]}$  ^prnsubj<SN><tn><p2><mf><sg>{^prpers<prn><2><p2><4><sg>$}$ ^verbcj<SV><vblex><pri><PD><ND>{[{3}]^coger<vblex><3><4><5># la gracia$[]}$ ^prnsubj<SN><tn><p1><mf><sg>{^prpers<prn><2><p1><4><sg>$}$ ^mod<SV><vbmod><cni><PD><ND>{[{4}]^poder<vbmod><3><4><5>$[]}$  ^adv<adv><NEG>{[{4}]^no<adv>$[]}$   ^prnsubj<SN><tn><p3><m><sg>{^prpers<prn><2><p3><4><sg>$}$ [6]

Interchunk

In interchunk all the superblanks corresponding to every chunk are output before the reordering of the chunk so as to avoid Superblank Reordering

After interchunk:

[5]^default<default>{[{1}]^Hola<ij>$[]}$  ^Nom<SN><PDET><m><sg>{[{1}]^hermano<n><3><4>$[]}$ ^adv<adv><itg>{^Cómo<adv><itg>$}$ ^verbcj<SV><vbser><pri><p2><sg>{^ser<vbser><3><4><5>$ ^prpers<prn><subj><p2><mf><sg>$}$ ^ger<SV><vblex><ger><PD><ND>{[{2}]^hacer<vblex><3>$[]}$    ^verbcj<SV><vblex><pri><p2><sg>{[{3}]^coger<vblex><3><4><5># la gracia$[]}$  ^mod<SV><vbmod><cni><p1><sg>{[{4}]^poder<vbmod><3><4><5>$[]}$  ^adv<adv><NEG>{[{4}]^no<adv>$[]}$  ^prnsubj<SN><tn><p3><m><sg>{^prpers<prn><2><p3><4><sg>$}$ [6]


Postchunk

After postchunk:

[5][{1}]^Hola<ij>$[]   ^El<det><def><m><sg>$ [{1}]^hermano<n><m><sg>$ ^Cómo<adv><itg>$ ^ser<vbser><pri><p2><sg>$ [{2}]^hacer<vblex><ger>$  [{3}]^coger<vblex><pri><p2><sg># la gracia$     [{4}]^poder<vbmod><cni><p1><sg>$[]  [{4}]^no<adv>$[]  ^prpers<prn><tn><p3><m><sg>$ [6]

Generator

After generator

[5][{1}]Hola[] El [{1}]hermano Cómo eres [{2}]haciendo  [{3}]coges la gracia  [{4}]podría[] [{4}]no[] él [6]

Reformatter

The libtidy module beautifies the input and reformats it to give the output

<html>
<head>
<title></title>
</head>
<body>
<p>
<i>Hola</i> El 
<i>hermano Cómo eres</i> 
<u>haciendo</u> 
<b>coges la gracia</b> 
<u>podría</u> 
<u>no</u> él</p>
</body>
</html>

See also