Difference between revisions of "Charlifter"

Revision as of 16:56, 18 August 2011

Training

You need a corpus:

XX-corpus.txt: A clean (orthographically correct) corpus in the language

You need two wordlists:

XX-clean.txt: A file with a list of known clean words
XX-prettyclean.txt: A file with a list of probably clean words (can be really small)

Process

First do 'make', you might want to edit the makefile to change your paths.

Then, put the source files (corpus, wordlists) where sf.pl can find them, and run:

$ cat cv-training.crp | perl sf.pl -t -l cv
Reading the clean dictionary...
Clean dictionary processed...
Reading the "pretty clean" dictionary...
"Pretty clean" dictionary processed...
Reading the training text...
Training texts processed (1143 words)...
Computing final probabilities...
Dumping plain text hashes to disk...

Then:

$ perl sf.pl -m -l cv
Reading in plain text hashes...
Saving storable hashes to disk...

Now you should be able to use the program.

Usage

$ cat testing/cv-source.txt  | perl sf.pl -r -d . -l cv

Difference between revisions of "Charlifter"

Revision as of 16:56, 18 August 2011

Training

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools