Aligning a corpus with fast align

From Apertium
Revision as of 09:02, 9 December 2015 by Francis Tyers (talk | contribs) (Created page with "{{TOCD}} ==What you need== * A sentence-aligned parallel corpus * Fast_align (get it [https://github.com/clab/fast_align here]) * Two apertium language packages ==Process=...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


What you need

  • A sentence-aligned parallel corpus
  • Fast_align (get it here)
  • Two apertium language packages

Process

First analyse the corpus with the language packages.

  cat 

Then remove superfluous tags (for example for lexical alignment, case is not really interesting).

  cat | sed 's/<\(nom\|acc\|gen\)>//g' > 

Create the input file for fast_align:



Run fast_align:




Symmetrise the alignments: