Difference between revisions of "Aligning a corpus with fast align"

From Apertium
Jump to navigation Jump to search
(Created page with "{{TOCD}} ==What you need== * A sentence-aligned parallel corpus * Fast_align (get it [https://github.com/clab/fast_align here]) * Two apertium language packages ==Process=...")
(No difference)

Revision as of 09:02, 9 December 2015


What you need

  • A sentence-aligned parallel corpus
  • Fast_align (get it here)
  • Two apertium language packages

Process

First analyse the corpus with the language packages.

  cat 

Then remove superfluous tags (for example for lexical alignment, case is not really interesting).

  cat | sed 's/<\(nom\|acc\|gen\)>//g' > 

Create the input file for fast_align:



Run fast_align:




Symmetrise the alignments: