Difference between revisions of "Language identification"

From Apertium
Jump to navigation Jump to search
Line 11: Line 11:
 
* http://www.ling.upenn.edu/Events/PLC/plc37/abstracts/Szymanski_PLC37_CT.pdf for identifying parallel text in another language within the same document.
 
* http://www.ling.upenn.edu/Events/PLC/plc37/abstracts/Szymanski_PLC37_CT.pdf for identifying parallel text in another language within the same document.
 
* https://github.com/saffsd/langid.py Naïve Bayes method implemented in Python, comes with lots of pre-trained models
 
* https://github.com/saffsd/langid.py Naïve Bayes method implemented in Python, comes with lots of pre-trained models
  +
* https://www.mediawiki.org/wiki/User:TJones_%28WMF%29/Notes/Balanced_Language_Identification_Evaluation_Set_for_Queries

Revision as of 16:54, 1 March 2016

Language identification or language recognition is the process of identifying what language a text (document/paragraph/sentence/word/…) is in.

Apertium-apy uses the CLD2 library for language identification (optionally it can use coverage of analysers, but this is really slow)


See also