Difference between revisions of "User:Aboelhamd/progress"
(19 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
'''GSOC 2019 : Extend weighted transfer rules |
'''GSOC 2019 progress : Extend weighted transfer rules''' |
||
'''The code is uploaded regularly in this repo [https://github.com/aboelhamd/machine-translation].<br />''' |
|||
My working days will be everyday except for Thursday, 5 hours per day, at least for the first phase only.<br /> |
|||
I will upload the code in this repo[https://github.com/aboelhamd/machine-translation].<br /> |
|||
== Phase 1 (April 19 : May 16) == |
== Phase 1 (April 19 : May 16) == |
||
Line 10: | Line 14: | ||
=== Week 1 (April 19 : April 25) === |
=== Week 1 (April 19 : April 25) === |
||
''' Day 1 (Friday April 19) '''<br /> |
|||
Latest evaluation scores were lower than traditional apertium's LRLM resolution, by far, unfortunately.<br /> |
Latest evaluation scores were lower than traditional apertium's LRLM resolution, by far, unfortunately.<br /> |
||
Debugged the code to see what's the cause of such low evaluation score.<br /> |
Debugged the code to see what's the cause of such low evaluation score.<br /> |
||
Figured out that there |
Figured out that there was a bug in normalizing the LM scores of the target ambiguous sentences. The LM score is a log base 10 the probability of the sentence, and as the magnitude gets higher, the sentence normalizing probability should get lower, and I was doing the inverse of that.<br /> |
||
The easiest solution was to modify the score-sentences script, and instead of getting the score, will get its reciprocal.<br /> |
The easiest solution was to modify the score-sentences script, and instead of getting the score, will get its reciprocal.<br /> |
||
==== Day 2 (Saturday April 20) ==== |
|||
''' Day 2 (Saturday April 20) '''<br /> |
|||
The evaluation results were better, but still not better than traditional apertium's LRLM resolution.<br /> |
The evaluation results were better, but still not better than traditional apertium's LRLM resolution.<br /> |
||
Debugged the code to see why the score still worse than apertium's.<br /> |
Debugged the code to see why the score still worse than apertium's.<br /> |
||
Found a bug in generating the ambiguous combinations and still working on solving it.<br /> |
Found a bug in generating the ambiguous combinations and still working on solving it.<br /> |
||
The bug resulted some incomplete sentences like:<br /> |
The bug resulted some incomplete sentences like:<br /> |
||
Sentence : Reciprocal of negative LM score <br /> |
'''Sentence : Reciprocal of negative LM score <br />''' |
||
Resumption of the period of sessions : 0.0552026445652 <br /> |
Resumption of the period of sessions : 0.0552026445652 <br /> |
||
Resumption of session period : 0.0739337147853 <br /> |
Resumption of session period : 0.0739337147853 <br /> |
||
Line 29: | Line 35: | ||
Resumption of the period : '''0.0809504851312''' <br /> |
Resumption of the period : '''0.0809504851312''' <br /> |
||
And as shown, the best score went to incomplete sentence, which is one of the |
And as shown, the best score went to incomplete sentence, which is one of the reasons why we got bad evaluation score.<br /> |
||
==== Day 3 (Sunday April 21) ==== |
|||
''' Day 3 (Sunday April 21) '''<br /> |
|||
The bug of some incomplete sentences was a problem of pointers. Before that, I was depending on pointers and the program was working good but a memory leaks problem emerged when I had done some changes, so instead of solving the leaks problems I turned to use stack instead of the heap, and with not enough testing after that, we got this bug.<br /> |
|||
Now, I will remove that stack solution again and turn to pointers and try to solve the leaks problem, so the program will work as well as previous version.<br /> |
|||
The bug was solved, and started training spa-eng pair again.<br /> |
|||
==== Day 5 (Tuesday April 23) ==== |
|||
''' Day 4 (Monday April 22) '''<br /> |
|||
Continue spa-eng training. <br/> |
|||
Read the paper Neural Machine Translation with Extended Context. [https://arxiv.org/abs/1708.05943]<br /> |
|||
==== Day 7 (Thursday April 25) ==== |
|||
''' Day 5 (Tuesday April 23) '''<br /> |
|||
Continue spa-eng training. <br/> |
|||
Debugged kir-tur pair for possible bug, as the chunker output has only default and unknown chunks. But still needs more debugging <br /> |
|||
''' Day 6 (Wednesday April 24) '''<br /> |
|||
Training is finished.<br /> |
|||
Modifying some of the code regarding evaluation, to update them with the last bug fixed.<br /> |
|||
Evaluation is finished, and the result scores still lower than apertium's.<br /> |
|||
{| class="wikitable" |
|||
|+ Using 100% of training data, 6-gram LM, max entropy models, sampling ambiguous combinations |
|||
! |
|||
! Apertium LRLM |
|||
! Apertium ambiguous |
|||
|- |
|||
! WER (Word Error Rate) |
|||
| 78.41 |
|||
| 76.93 |
|||
|- |
|||
! PER (Position-independent word Error Rate) |
|||
| 61.86 |
|||
| 57.62 |
|||
|- |
|||
! BLEU (Bi-Lingual Evaluation Understudy) |
|||
| 14.13 |
|||
| 13.72 |
|||
|} |
|||
<br /> |
|||
''' Day 7 (Thursday April 25) '''<br /> |
|||
Working on training with 10%, 25%, 50%, 75% of the data, to evaluate the results with respect the data size.<br /> |
|||
=== Week 2 (April 26 : May 2) === |
=== Week 2 (April 26 : May 2) === |
||
Evaluating with 1000 sentences. The source and all target files are uploaded here[https://drive.google.com/open?id=1Jz1-LzfP_CbpjBk4Q7E5DAwgnqWk2Tk3] . |
|||
==== Day 1 (Friday April 26) ==== |
|||
Best score of our system :<br /> |
|||
==== Day 2 (Saturday April 27) ==== |
|||
Number of words in reference: 24701<br /> |
|||
==== Day 3 (Sunday April 28) ==== |
|||
Number of words in test: 25406<br /> |
|||
Number of unknown words (marked with a star) in test: 1200<br /> |
|||
Percentage of unknown words: 4.72 %<br /> |
|||
Results when removing unknown-word marks (stars)<br /> |
|||
==== Day 4 (Monday April 29) ==== |
|||
Edit distance: 18675<br /> |
|||
==== Day 5 (Tuesday April 30) ==== |
|||
Word error rate (WER): 75.60 %<br /> |
|||
Number of position-independent correct words: 11058<br /> |
|||
Position-independent word error rate (PER): 58.09 %<br /> |
|||
---- |
|||
==== Day 6 (Wednesday May 1) ==== |
|||
Average score of our system :<br /> |
|||
==== Day 7 (Thursday May 2) ==== |
|||
Number of words in reference: 579172<br /> |
|||
Number of words in test: 622563<br /> |
|||
Number of unknown words (marked with a star) in test: 26178<br /> |
|||
Percentage of unknown words: 4.20 %<br /> |
|||
Results when removing unknown-word marks (stars)<br /> |
|||
Edit distance: 453716<br /> |
|||
Word error rate (WER): 78.34 %<br /> |
|||
Number of position-independent correct words: 269929<br /> |
|||
Position-independent word error rate (PER): 60.89 %<br /> |
|||
---- |
|||
Statistics about input files<br /> |
|||
Number of words in reference: 24701<br /> |
|||
Number of words in test: 26486<br /> |
|||
Number of unknown words (marked with a star) in test: 1200<br /> |
|||
Percentage of unknown words: 4.53 %<br /> |
|||
Results when removing unknown-word marks (stars)<br /> |
|||
Edit distance: 19369<br /> |
|||
Word error rate (WER): 78.41 %<br /> |
|||
Number of position-independent correct words: 11206<br /> |
|||
Position-independent word error rate (PER): 61.86 %<br /> |
|||
---- |
|||
''' Day 1 (Friday April 26) ''' |
|||
''' Day 2 (Saturday April 27) ''' |
|||
''' Day 3 (Sunday April 28) ''' |
|||
''' Day 4 (Monday April 29) ''' |
|||
''' Day 5 (Tuesday April 30) ''' |
|||
''' Day 6 (Wednesday May 1) ''' |
|||
''' Day 7 (Thursday May 2) ''' |
|||
=== Week 3 (May 3 : May 9) === |
=== Week 3 (May 3 : May 9) === |
||
Trying to modify and create some scripts to automate the evaluation process, and tweak it to produce many evaluation scenarios.<br /> |
|||
Also looking at the documantation of yasmet trying to find a way to add the tags of a lemma as features.<br /> |
|||
Add some file to order sentences from most ambiguous to least, to choose the most ambiguous as test data and re-evaluate. Also will test with 10000 sentences instead of 1000.<br /> |
|||
==== Day 1 (Friday May 3) ==== |
==== Day 1 (Friday May 3) ==== |
Latest revision as of 01:23, 7 May 2019
GSOC 2019 progress : Extend weighted transfer rules
The code is uploaded regularly in this repo [1].
My working days will be everyday except for Thursday, 5 hours per day, at least for the first phase only.
Contents
- 1 Phase 1 (April 19 : May 16)
- 2 Phase 2 (June 28 : July 25)
- 3 Phase 3 (July 26 : August 19)
Phase 1 (April 19 : May 16)[edit]
From April 19 to May 16 and from June 21 to June 28.
It's shifted because of my exams.
Week 1 (April 19 : April 25)[edit]
Day 1 (Friday April 19)
Latest evaluation scores were lower than traditional apertium's LRLM resolution, by far, unfortunately.
Debugged the code to see what's the cause of such low evaluation score.
Figured out that there was a bug in normalizing the LM scores of the target ambiguous sentences. The LM score is a log base 10 the probability of the sentence, and as the magnitude gets higher, the sentence normalizing probability should get lower, and I was doing the inverse of that.
The easiest solution was to modify the score-sentences script, and instead of getting the score, will get its reciprocal.
Day 2 (Saturday April 20)
The evaluation results were better, but still not better than traditional apertium's LRLM resolution.
Debugged the code to see why the score still worse than apertium's.
Found a bug in generating the ambiguous combinations and still working on solving it.
The bug resulted some incomplete sentences like:
Sentence : Reciprocal of negative LM score
Resumption of the period of sessions : 0.0552026445652
Resumption of session period : 0.0739337147853
Resumption of the session period : 0.0753641871191
Resumption of the period of : 0.0757469108192
Resumption of period of : 0.0684245152522
Resumption of the period : 0.0809504851312
And as shown, the best score went to incomplete sentence, which is one of the reasons why we got bad evaluation score.
Day 3 (Sunday April 21)
The bug of some incomplete sentences was a problem of pointers. Before that, I was depending on pointers and the program was working good but a memory leaks problem emerged when I had done some changes, so instead of solving the leaks problems I turned to use stack instead of the heap, and with not enough testing after that, we got this bug.
Now, I will remove that stack solution again and turn to pointers and try to solve the leaks problem, so the program will work as well as previous version.
The bug was solved, and started training spa-eng pair again.
Day 4 (Monday April 22)
Continue spa-eng training.
Read the paper Neural Machine Translation with Extended Context. [2]
Day 5 (Tuesday April 23)
Continue spa-eng training.
Debugged kir-tur pair for possible bug, as the chunker output has only default and unknown chunks. But still needs more debugging
Day 6 (Wednesday April 24)
Training is finished.
Modifying some of the code regarding evaluation, to update them with the last bug fixed.
Evaluation is finished, and the result scores still lower than apertium's.
Apertium LRLM | Apertium ambiguous | |
---|---|---|
WER (Word Error Rate) | 78.41 | 76.93 |
PER (Position-independent word Error Rate) | 61.86 | 57.62 |
BLEU (Bi-Lingual Evaluation Understudy) | 14.13 | 13.72 |
Day 7 (Thursday April 25)
Working on training with 10%, 25%, 50%, 75% of the data, to evaluate the results with respect the data size.
Week 2 (April 26 : May 2)[edit]
Evaluating with 1000 sentences. The source and all target files are uploaded here[3] .
Best score of our system :
Number of words in reference: 24701
Number of words in test: 25406
Number of unknown words (marked with a star) in test: 1200
Percentage of unknown words: 4.72 %
Results when removing unknown-word marks (stars)
Edit distance: 18675
Word error rate (WER): 75.60 %
Number of position-independent correct words: 11058
Position-independent word error rate (PER): 58.09 %
Average score of our system :
Number of words in reference: 579172
Number of words in test: 622563
Number of unknown words (marked with a star) in test: 26178
Percentage of unknown words: 4.20 %
Results when removing unknown-word marks (stars)
Edit distance: 453716
Word error rate (WER): 78.34 %
Number of position-independent correct words: 269929
Position-independent word error rate (PER): 60.89 %
Statistics about input files
Number of words in reference: 24701
Number of words in test: 26486
Number of unknown words (marked with a star) in test: 1200
Percentage of unknown words: 4.53 %
Results when removing unknown-word marks (stars)
Edit distance: 19369
Word error rate (WER): 78.41 %
Number of position-independent correct words: 11206
Position-independent word error rate (PER): 61.86 %
Day 1 (Friday April 26)
Day 2 (Saturday April 27)
Day 3 (Sunday April 28)
Day 4 (Monday April 29)
Day 5 (Tuesday April 30)
Day 6 (Wednesday May 1)
Day 7 (Thursday May 2)
Week 3 (May 3 : May 9)[edit]
Trying to modify and create some scripts to automate the evaluation process, and tweak it to produce many evaluation scenarios.
Also looking at the documantation of yasmet trying to find a way to add the tags of a lemma as features.
Add some file to order sentences from most ambiguous to least, to choose the most ambiguous as test data and re-evaluate. Also will test with 10000 sentences instead of 1000.
Day 1 (Friday May 3)[edit]
Day 2 (Saturday May 4)[edit]
Day 3 (Sunday May 5)[edit]
Day 4 (Monday May 6)[edit]
Day 5 (Tuesday May 7)[edit]
Day 6 (Wednesday May 8)[edit]
Day 7 (Thursday May 9)[edit]
Week 4 (May 10 : May 16)[edit]
Day 1 (Friday May 10)[edit]
Day 2 (Saturday May 11)[edit]
Day 3 (Sunday May 12)[edit]
Day 4 (Monday May 13)[edit]
Day 5 (Tuesday May 14)[edit]
Day 6 (Wednesday May 15)[edit]
Day 7 (Thursday May 16)[edit]
Week 5 (June 21 : June 27)[edit]
After my exams.
Day 1 (Friday June 21)[edit]
Day 2 (Saturday June 22)[edit]
Day 3 (Sunday June 23)[edit]
Day 4 (Monday June 24)[edit]
Day 5 (Tuesday June 25)[edit]
Day 6 (Wednesday June 26)[edit]
Day 7 (Thursday June 27)[edit]
Phase 2 (June 28 : July 25)[edit]
Week 1 (June 28 : July 4)[edit]
Day 1 (Friday June 28)[edit]
Day 2 (Saturday June 29)[edit]
Day 3 (Sunday June 30)[edit]
Day 4 (Monday July 1)[edit]
Day 5 (Tuesday July 2)[edit]
Day 6 (Wednesday July 3)[edit]
Day 7 (Thursday July 4)[edit]
Week 2 (July 5 : July 11)[edit]
Day 1 (Friday July 5)[edit]
Day 2 (Saturday July 6)[edit]
Day 3 (Sunday July 7)[edit]
Day 4 (Monday July 8)[edit]
Day 5 (Tuesday July 9)[edit]
Day 6 (Wednesday July 10)[edit]
Day 7 (Thursday July 11)[edit]
Week 3 (July 12 : July 18)[edit]
Day 1 (Friday July 12)[edit]
Day 2 (Saturday July 13)[edit]
Day 3 (Sunday July 14)[edit]
Day 4 (Monday July 15)[edit]
Day 5 (Tuesday July 16)[edit]
Day 6 (Wednesday July 17)[edit]
Day 7 (Thursday July 18)[edit]
Week 4 (July 19 : July 25)[edit]
Day 1 (Friday July 19)[edit]
Day 2 (Saturday July 20)[edit]
Day 3 (Sunday July 21)[edit]
Day 4 (Monday July 22)[edit]
Day 5 (Tuesday July 23)[edit]
Day 6 (Wednesday July 24)[edit]
Day 7 (Thursday July 25)[edit]