Aligning a corpus with fast align
Revision as of 09:02, 9 December 2015 by Francis Tyers (talk | contribs) (Created page with "{{TOCD}} ==What you need== * A sentence-aligned parallel corpus * Fast_align (get it [https://github.com/clab/fast_align here]) * Two apertium language packages ==Process=...")
Contents |
What you need
- A sentence-aligned parallel corpus
- Fast_align (get it here)
- Two apertium language packages
Process
First analyse the corpus with the language packages.
cat
Then remove superfluous tags (for example for lexical alignment, case is not really interesting).
cat | sed 's/<\(nom\|acc\|gen\)>//g' >
Create the input file for fast_align:
Run fast_align:
Symmetrise the alignments: