Difference between revisions of "Getnltk.py"
Jump to navigation
Jump to search
(Created page with 'getnltk.py is located in <code>/trunk/apertium-tools/scraper/getnltk.py</code>. It was written by [http://wiki.apertium.org/wiki/User:Dtvrij74 Daniel Huang]. <br /> If you want t…') |
(GitHub migration) |
||
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
{{Github-unmigrated-tool}} |
|||
getnltk.py is located in <code>/trunk/apertium-tools/scraper/getnltk.py</code>. It was written by [http://wiki.apertium.org/wiki/User:Dtvrij74 Daniel Huang]. |
getnltk.py is located in <code>/trunk/apertium-tools/scraper/getnltk.py</code>. It was written by [http://wiki.apertium.org/wiki/User:Dtvrij74 Daniel Huang]. The purpose is to make NLTK's Punkt sentence tokenizer work on Python 3. |
||
<br /> |
<br /> |
||
You can call <code>getnltk.py</code> in your Python 3 code like: |
|||
<pre> |
<pre> |
||
py2output = subprocess.check_output(['python', 'getnltk.py', tosplit, lang]) |
py2output = subprocess.check_output(['python', 'getnltk.py', tosplit, lang]) |
Latest revision as of 02:32, 10 March 2018
Note: After Apertium's migration to GitHub, this tool is read-only on the SourceForge repository and does not exist on GitHub. If you are interested in migrating this tool to GitHub, see Migrating tools to GitHub.
getnltk.py is located in /trunk/apertium-tools/scraper/getnltk.py
. It was written by Daniel Huang. The purpose is to make NLTK's Punkt sentence tokenizer work on Python 3.
You can call getnltk.py
in your Python 3 code like:
py2output = subprocess.check_output(['python', 'getnltk.py', tosplit, lang])
tosplit
is the text that will be tokenized into sentenceslang
is the 3-letter or 2-letter language code. Currently, it supports English, Russian, and Armenian.
The sentences will be printed to the variable py2output
. xml2txt.py
(in the same directory) uses getnltk.py
.