Difference between revisions of "Charlifter"

From Apertium
Jump to navigation Jump to search
(Created page with ' ==Training== You need a corpus: * <code>XX-corpus.txt</code>: A clean (orthographically correct) corpus in the language You need two wordlists: * <code>XX-clean.txt</code>:…')
(No difference)

Revision as of 16:52, 18 August 2011


Training

You need a corpus:

  • XX-corpus.txt: A clean (orthographically correct) corpus in the language

You need two wordlists:

  • XX-clean.txt: A file with a list of known clean words
  • XX-prettyclean.txt: A file with a list of probably clean words (can be really small)
Process

First do 'make', you might want to edit the makefile to change your paths.

Then, put the source files (corpus, wordlists) where sf.pl can find them, and run:

$ cat cv-training.crp | perl sf.pl -t -l cv