Getnltk.py

getnltk.py is located in /trunk/apertium-tools/scraper/getnltk.py. It was written by Daniel Huang.
If you want to use NLTK's Punkt sentence tokenizer, you can call getnltk.py in your Python 3 code like:

py2output = subprocess.check_output(['python', 'getnltk.py', tosplit, lang])

tosplit is the text that will be tokenized into sentences
lang is the 3-letter or 2-letter language code. Currently, it supports English, Russian, and Armenian.

The sentences will be printed to the variable py2output. xml2txt.py (in the same directory) uses getnltk.py.

Getnltk.py

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools