Difference between revisions of "Language identification"

Revision as of 11:22, 9 March 2015

Language identification or language recognition is the process of identifying what language a text (document/paragraph/sentence/word/…) is in.

Apertium-apy uses the CLD2 library for language identification (optionally it can use coverage of analysers, but this is really slow)

Apertium-apy/Language identification for some accuracy experiments of CLD2
http://odur.let.rug.nl/~vannoord/TextCat/ the original TextCat library (perl, there's also a C port)
- https://github.com/unhammer/gt-CorpusTools/blob/master/corpustools/text_cat.py python2 reimplementation of textcat
http://www.ling.upenn.edu/Events/PLC/plc37/abstracts/Szymanski_PLC37_CT.pdf for identifying parallel text in another language within the same document.
https://github.com/saffsd/langid.py Naïve Bayes method implemented in Python, comes with lots of pre-trained models

@@ Line 10: / Line 10: @@
 ** https://github.com/unhammer/gt-CorpusTools/blob/master/corpustools/text_cat.py python2 reimplementation of textcat
 * http://www.ling.upenn.edu/Events/PLC/plc37/abstracts/Szymanski_PLC37_CT.pdf for identifying parallel text in another language within the same document.
+* https://github.com/saffsd/langid.py Naïve Bayes method implemented in Python, comes with lots of pre-trained models

Difference between revisions of "Language identification"

Revision as of 11:22, 9 March 2015

See also

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools