Charlifter
Revision as of 16:52, 18 August 2011 by Francis Tyers (talk | contribs) (Created page with ' ==Training== You need a corpus: * <code>XX-corpus.txt</code>: A clean (orthographically correct) corpus in the language You need two wordlists: * <code>XX-clean.txt</code>:…')
Training
You need a corpus:
XX-corpus.txt
: A clean (orthographically correct) corpus in the language
You need two wordlists:
XX-clean.txt
: A file with a list of known clean wordsXX-prettyclean.txt
: A file with a list of probably clean words (can be really small)
- Process
First do 'make', you might want to edit the makefile
to change your paths.
Then, put the source files (corpus, wordlists) where sf.pl
can find them, and run:
$ cat cv-training.crp | perl sf.pl -t -l cv