From Apertium
Revision as of 00:49, 2 January 2013 by Dtvrij74 (talk | contribs) (Created page with ' is located in <code>/trunk/apertium-tools/scraper/</code>. It was written by [ Daniel Huang]. <br /> If you want t…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search is located in /trunk/apertium-tools/scraper/ It was written by Daniel Huang.
If you want to use NLTK's Punkt sentence tokenizer, you can call in your Python 3 code like:

py2output = subprocess.check_output(['python', '', tosplit, lang])
  • tosplit is the text that will be tokenized into sentences
  • lang is the 3-letter or 2-letter language code. Currently, it supports English, Russian, and Armenian.

The sentences will be printed to the variable py2output. (in the same directory) uses