Difference between revisions of "Getnltk.py"

Revision as of 00:49, 2 January 2013

getnltk.py is located in /trunk/apertium-tools/scraper/getnltk.py. It was written by Daniel Huang.
If you want to use NLTK's Punkt sentence tokenizer, you can call getnltk.py in your Python 3 code like:

py2output = subprocess.check_output(['python', 'getnltk.py', tosplit, lang])

tosplit is the text that will be tokenized into sentences
lang is the 3-letter or 2-letter language code. Currently, it supports English, Russian, and Armenian.

The sentences will be printed to the variable py2output. xml2txt.py (in the same directory) uses getnltk.py.

Difference between revisions of "Getnltk.py"

Revision as of 00:49, 2 January 2013

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools