N-grams
Revision as of 13:53, 10 February 2015 by Unhammer (talk | contribs) (Created page with "Say you have a corpus and an analyser, how do you make an trigram frequency list in three shell commands? Grab apertium-cleanstream, then do: <pre> bzcat corpus.bz2 | ape...")
Say you have a corpus and an analyser, how do you make an trigram frequency list in three shell commands?
Grab apertium-cleanstream, then do:
bzcat corpus.bz2 | apertium-deshtml | lt-proc foo.bin | apertium-cleanstream -n >corpus.ana paste corpus.ana <(tail -n+1 corpus.ana) <(tail -n+2 corpus.ana) >corpus.trigrams sort corpus.trigrams | uniq -c | sort -nr > corpus.trigrams.hitparade