Difference between revisions of "Arabic"

From Apertium
Jump to navigation Jump to search
m (Correction of the Quranic Arabc Corpus URL)
Line 13: Line 13:
 
===Corpora===
 
===Corpora===
 
* [http://github.com/anastaw/Meedan-Memory Meedan-Memory], Arabic-English TMX (sentence-aligned), ~467,000 words on the English side, [http://www.opendatacommons.org/licenses/odbl/ Open Database Licence]
 
* [http://github.com/anastaw/Meedan-Memory Meedan-Memory], Arabic-English TMX (sentence-aligned), ~467,000 words on the English side, [http://www.opendatacommons.org/licenses/odbl/ Open Database Licence]
* [http://quran.uk.net/ Quranic Arabic Corpus], 77,430 words of Quranic Arabic, with manually verified contextual POS, inflection, derivation; [[dependency grammar]] annotation is planned.
+
* [http://corpus.quran.com/ Quranic Arabic Corpus], 77,430 words of Quranic Arabic, with manually verified contextual POS, inflection, derivation; [[dependency grammar]] annotation is planned.
   
   

Revision as of 15:50, 27 January 2010

Resources

  • Sarf - Arabic Morphology System (all in Java...)
  • ElixirFM (online interface here) is a Functional Arabic Morphology written in Haskell and Perl; the lexicon is a "re-processed" version of the Buckwalter analyser.
  • There is a good documentation of how to make a morphological analyser for Arabic (and Semitic languages in general) in the Beesley/Karttunen finite state transducer book, documenting the Xerox compiler (Ken Beesley also made an Arabic fst). Also, there now is an open source compiler reading the Xerox format, the HFST compiler.

Corpora