Difference between revisions of "Charlifter"
Jump to navigation
Jump to search
(Created page with ' ==Training== You need a corpus: * <code>XX-corpus.txt</code>: A clean (orthographically correct) corpus in the language You need two wordlists: * <code>XX-clean.txt</code>:…') |
(No difference)
|
Revision as of 16:52, 18 August 2011
Training
You need a corpus:
XX-corpus.txt
: A clean (orthographically correct) corpus in the language
You need two wordlists:
XX-clean.txt
: A file with a list of known clean wordsXX-prettyclean.txt
: A file with a list of probably clean words (can be really small)
- Process
First do 'make', you might want to edit the makefile
to change your paths.
Then, put the source files (corpus, wordlists) where sf.pl
can find them, and run:
$ cat cv-training.crp | perl sf.pl -t -l cv