Difference between revisions of "Linearisation in Matxin"

From Apertium
Jump to navigation Jump to search
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
  +
{{TOCD}}
   
  +
==Paper reviews==
  +
  +
* Transition-Based Syntactic Linearisation with Lookahead Features” - Ratish Puduppully, Yue Zhang, Manish Shrivastava - [http://www.aclweb.org/anthology/N/N16/N16-1058.pdf]
  +
<pre>
  +
  +
The brand new algorithm. Has the best results that have been seen. 30x faster than the next one. Has published the code on Github.
  +
  +
“Our system improves upon the previous best scores by 8.7 BLEU points for the task of unlabeled syntactic linearisation.
  +
For the task of labeled syntactic linearisation, we achieve the score of 91.8 BLEU points, the highest results reported so far.”
  +
  +
Unfortunately, tested only for English.
  +
  +
</pre>
  +
  +
* “Partial-Tree Linearisation: Generalized Word Ordering for Text Synthesis” - Yue Zhang - [http://people.sutd.edu.sg/~yue_zhang/pub/ijcai13.pdf]
  +
  +
<pre>
  +
The best algorithm of its time, yet it was outperformed by the next algorithms (developed by the same author).
  +
It has good results in sensible time constraints (5s), but it uses a lot of memory for the training set
  +
(tens of thousands of examples needed) and the training time is also long.
  +
  +
Accuracy: “Our system gave a BLEU score of 89.3, comparable to the best-performing system of the shared task,
  +
although our results are not directly comparable because we used the PTB POS set and the Penn2Malt dependency labels,
  +
which are less fine-grained than the shared task data.” That’s the best result there was at that time, everything was performed in the 5s time constraints.
  +
  +
Tried only in English
  +
  +
</pre>
  +
  +
* “Transition-Based Syntactic Linearisation” - Yijia Liu , Yue Zhang , Wanxiang Che , Bing Qin [http://people.sutd.edu.sg/~yue_zhang/pub/naacl15.yijia.pdf]
  +
  +
<pre>
  +
Faster and more precise than the last one. Moreover, they even share their code for the algorithm, which would surely help during the implementation.
  +
Again, the only problem is the need for a big learning set.
  +
  +
Got 81 on the BLEU score, yet compared to the one above it performed better.
  +
  +
Only English (Wall Street Journal)
  +
  +
</pre>
  +
  +
* “Experiments with Generative Models for Dependency Tree Linearisation” - Richard Futrell and Edward Gibson - [https://www.aclweb.org/anthology/D15-1231]
  +
  +
<pre>
  +
Good algorithm, doesn’t require that much of a memory, but no mention of the performance time-wise (probably it was poor).
  +
Also, when it comes to the accuracy, it was outperformed by the previous one.
  +
  +
Got 58 BLEU (some random techniques in other papers got 38, so quite bad).
  +
  +
But did it for many languages: Basque, Czech, English, Finnish, French, German, Hebrew, Indonesian, Persian, Swedish, Spanish
  +
  +
</pre>
  +
  +
* “Sentence Realisation with Unlexicalized Tree Linearisation Grammars” Rui WANG Yi ZHANG - [https://www.aclweb.org/anthology/C12-2127]
  +
  +
<pre>
  +
Again, another algorithm by the same author. It again requires a lot of memory for training.
  +
  +
But it’s checked on many languages (BLEU score) : Catalan (76), Chinese (81), Czech (67), English (85), German (74), Spanish (73)
  +
  +
</pre>
  +
  +
* “Generating Non-Projective Word Order in Statistical Linearisation” - Bernd Bohnet Anders Bjorkelund Jonas Kuhn Wolfgang Seeker Sina Zarrieß - [https://aclweb.org/anthology/D/D12/D12-1085.pdf]
  +
  +
<pre>
  +
Nice results, everything neatly explained, surprisingly good score!
  +
  +
checked on (BLEU score) Dutch (82) , Danish (86), Hungarian(77), Czech (72), English(93) and German(80).
  +
  +
</pre>
   
 
==Thoughts==
 
==Thoughts==

Latest revision as of 19:03, 29 November 2016

Paper reviews[edit]

  • Transition-Based Syntactic Linearisation with Lookahead Features” - Ratish Puduppully, Yue Zhang, Manish Shrivastava - [1]

The brand new algorithm. Has the best results that have been seen. 30x faster than the next one. Has published the code on Github.
  
“Our system improves upon the previous best scores by 8.7 BLEU points for the task of unlabeled syntactic linearisation. 
For the task of labeled syntactic linearisation, we achieve the score of 91.8 BLEU points, the highest results reported so far.”

Unfortunately, tested only for English.

  • “Partial-Tree Linearisation: Generalized Word Ordering for Text Synthesis” - Yue Zhang - [2]
The best algorithm of its time, yet it was outperformed by the next algorithms (developed by the same author). 
It has good results in sensible time constraints (5s), but it uses a lot of memory for the training set 
(tens of thousands of examples needed) and the training time is also long.

Accuracy: “Our system gave a BLEU score of 89.3, comparable to the best-performing system of the shared task, 
although our results are not directly comparable because we used the PTB POS set and the Penn2Malt dependency labels, 
which are less fine-grained than the shared task data.” That’s the best result there was at that time, everything was performed in the 5s time constraints.

Tried only in English

  • “Transition-Based Syntactic Linearisation” - Yijia Liu , Yue Zhang , Wanxiang Che , Bing Qin [3]
Faster and more precise than the last one. Moreover, they even share their code for the algorithm, which would surely help during the implementation. 
Again, the only problem is the need for a big learning set.

Got 81 on the BLEU score, yet compared to the one above it performed better.

Only English (Wall Street Journal)

  • “Experiments with Generative Models for Dependency Tree Linearisation” - Richard Futrell and Edward Gibson - [4]
Good algorithm, doesn’t require that much of a memory, but no mention of the performance time-wise (probably it was poor). 
Also, when it comes to the accuracy, it was outperformed by the previous one.

Got 58 BLEU (some random techniques in other papers got 38, so quite bad).

But did it for many languages: Basque, Czech, English, Finnish, French, German, Hebrew, Indonesian, Persian, Swedish, Spanish

  • “Sentence Realisation with Unlexicalized Tree Linearisation Grammars” Rui WANG Yi ZHANG - [5]
Again, another algorithm by the same author. It again requires a lot of memory for training. 

But it’s checked on many languages (BLEU score) : Catalan (76), Chinese (81), Czech (67), English (85), German (74), Spanish (73)

  • “Generating Non-Projective Word Order in Statistical Linearisation” - Bernd Bohnet Anders Bjorkelund Jonas Kuhn Wolfgang Seeker Sina Zarrieß - [6]
Nice results, everything neatly explained, surprisingly good score!

checked on  (BLEU score) Dutch (82) , Danish (86), Hungarian(77), Czech (72), English(93) and German(80).

Thoughts[edit]

<NODE ord="3"  mi="n|sg">
   <NODE ord="2" si="amod"/>
   <NODE ord="1"  si="det"/>
</NODE>

<NODE ord="2"  si="root"">
   <NODE ord="1" si="nsubj"/>
   <NODE ord="3"  si="dobj"/>
</NODE>

External resources[edit]

Relevant papers[edit]