Difference between revisions of "Language identification"
Jump to navigation
Jump to search
(Created page with "'''Language identification''' or '''language recognition''' is the process of identifying what language a text (document/paragraph/sentence/word/…) is in. Apertium-apy uses...") |
|||
Line 1: | Line 1: | ||
'''Language identification''' or '''language recognition''' is the process of identifying what language a text (document/paragraph/sentence/word/…) is in. |
'''Language identification''' or '''language recognition''' is the process of identifying what language a text (document/paragraph/sentence/word/…) is in. |
||
Apertium-apy uses the CLD2 library for language identification (optionally it can use coverage of analysers, but this is really slow) |
Apertium-apy uses the [[Apertium-apy#Language_identification|CLD2]] library for language identification (optionally it can use coverage of analysers, but this is really slow) |
||
⚫ | |||
==See also== |
==See also== |
||
⚫ | |||
* http://odur.let.rug.nl/~vannoord/TextCat/ the original TextCat library (perl, there's also a C port) |
* http://odur.let.rug.nl/~vannoord/TextCat/ the original TextCat library (perl, there's also a C port) |
||
** https://github.com/unhammer/gt-CorpusTools/blob/master/corpustools/text_cat.py python2 reimplementation of textcat |
** https://github.com/unhammer/gt-CorpusTools/blob/master/corpustools/text_cat.py python2 reimplementation of textcat |
Revision as of 09:00, 13 February 2015
Language identification or language recognition is the process of identifying what language a text (document/paragraph/sentence/word/…) is in.
Apertium-apy uses the CLD2 library for language identification (optionally it can use coverage of analysers, but this is really slow)
See also
- Apertium-apy/Language identification for some accuracy experiments of CLD2
- http://odur.let.rug.nl/~vannoord/TextCat/ the original TextCat library (perl, there's also a C port)
- https://github.com/unhammer/gt-CorpusTools/blob/master/corpustools/text_cat.py python2 reimplementation of textcat
- http://www.ling.upenn.edu/Events/PLC/plc37/abstracts/Szymanski_PLC37_CT.pdf for identifying parallel text in another language within the same document.