Difference between revisions of "Arabic"

From Apertium
Jump to navigation Jump to search
 
(17 intermediate revisions by 6 users not shown)
Line 1: Line 1:
{{Language
|native_name=
|english_name=Arabic
|family=[[Semitic languages]]
|iso639_1=ar
|iso639_2=
|iso639_3=ara
|pairs=
}}


Arabic is a semitic language (http://en.wikipedia.org/wiki/Hamito-Semitic).

Language pairs:
* [[apertium-mlt-ara]] in trunk: https://svn.code.sf.net/p/apertium/svn/trunk/apertium-mlt-ara
* [[apertium-ara-heb]] in incubator: https://svn.code.sf.net/p/apertium/svn/incubator/apertium-ara-heb

Developing other semitic language pairs with Arabic would be a good idea (e.g. [[Tamazight]]).

==Resources==
==Resources==


* [http://sourceforge.net/projects/sarf Sarf] - Arabic Morphology System (all in Java...)
* http://www.qamus.org/morphology.htm

* [https://sourceforge.net/projects/aramorph/ AraMorph - Perl] - An Arabic morphological analyzer and part-of-speech tagger written in Perl (originally by Tim Buckwalter)
* [https://sourceforge.net/projects/aramorph/ AraMorph - Perl] - An Arabic morphological analyzer and part-of-speech tagger written in Perl (originally by Tim Buckwalter, see http://www.qamus.org/morphology.htm)
** Direct download: http://heanet.dl.sourceforge.net/sourceforge/aramorph/aramorph-1.2.1.tar.gz
** Direct download: http://heanet.dl.sourceforge.net/sourceforge/aramorph/aramorph-1.2.1.tar.gz
* [http://www.nongnu.org/aramorph/ AraMorph - Java] - An Arabic morphological analyzer and part-of-speech tagger rewritten in Java for [http://lucene.apache.org/ Lucene]
* [http://www.nongnu.org/aramorph/ AraMorph - Java] - An Arabic morphological analyzer and part-of-speech tagger rewritten in Java for [http://lucene.apache.org/ Lucene]
* [http://www.ling.ohio-state.edu/~jonsafari/arabiclg/arabiclg.20060829.tar.bz2 Arabic dictionaries], by [http://www.ling.ohio-state.edu/~jonsafari/ Jon Dehdari], for the [http://www.abisource.com/projects/link-grammar/ Link-Grammar parser]. These require the Aramorph stemming package, above.
* [http://www.ling.ohio-state.edu/~jonsafari/arabiclg/arabiclg.20060829.tar.bz2 Arabic dictionaries], by [http://www.ling.ohio-state.edu/~jonsafari/ Jon Dehdari], for the [http://www.abisource.com/projects/link-grammar/ Link-Grammar parser]. These require the Aramorph stemming package, above.

* [https://sourceforge.net/apps/trac/elixir-fm/wiki ElixirFM] ([http://quest.ms.mff.cuni.cz/cgi-bin/elixir/index.fcgi online interface here]) is a Functional Arabic Morphology written in Haskell and Perl; the lexicon is a "re-processed" version of the Buckwalter analyser.
* There is a good documentation of how to make a morphological analyser for Arabic (and Semitic languages in general) in the Beesley/Karttunen [http://fsmbook.com finite state transducer book], documenting the Xerox compiler (Ken Beesley also made an Arabic fst). Also, there now is an open source compiler reading the Xerox format, the [[HFST]] compiler.
* And there is also an open source finite state morphological analyser for Arabic, [http://sourceforge.net/projects/aracomlex/ AraComLex] ([http://www.cngl.ie/aracomlex/morph.php online interface here]). Among other resources related to AraComLex there is [http://sourceforge.net/projects/arabicpatterns/ a list of Arabic morphological patterns] and [http://sourceforge.net/projects/arabicwordcount/ a frequency word list] from a 1 billion word corpus.

* [http://arabicreference.com/ Arabic Reference] by Hans Wehr with form I vowelling, masadir (infinitives), broken plurals

===Wordnet and dbpedia===
* http://compling.hss.ntu.edu.sg/omw/ CC-BY-SA wordnet
* http://permalink.gmane.org/gmane.science.linguistics.corpora/22281 Arabic names from dbpedia


===Corpora===
===Corpora===
* [http://github.com/anastaw/Meedan-Memory Meedan-Memory], Arabic-English TMX (sentence-aligned), ~467,000 words on the English side, [http://www.opendatacommons.org/licenses/odbl/ Open Database Licence]
* [http://github.com/anastaw/Meedan-Memory Meedan-Memory], Arabic-English TMX (sentence-aligned), ~467,000 words on the English side, [http://www.opendatacommons.org/licenses/odbl/ Open Database Licence]
* [http://quran.uk.net/ Quranic Arabic Corpus], 77,430 words of Quranic Arabic, with manually verified contextual POS, inflection, derivation; [[dependency grammar]] annotation is planned.
* [http://corpus.quran.com/ Quranic Arabic Corpus], 77,430 words of Quranic Arabic, with manually verified contextual POS, inflection, derivation; [[dependency grammar]] annotation is planned.





* [http://sourceforge.net/projects/sarf sarf]


[[Category:Languages]]
[[Category:Arabic|*]]

Latest revision as of 10:23, 21 November 2021


(Arabic)
Family: Semitic languages
ISO Codes: ar / / ara
Incubator: {{{incubator}}}
Language pairs:


Arabic is a semitic language (http://en.wikipedia.org/wiki/Hamito-Semitic).

Language pairs:

Developing other semitic language pairs with Arabic would be a good idea (e.g. Tamazight).

Resources[edit]

  • Sarf - Arabic Morphology System (all in Java...)
  • ElixirFM (online interface here) is a Functional Arabic Morphology written in Haskell and Perl; the lexicon is a "re-processed" version of the Buckwalter analyser.
  • There is a good documentation of how to make a morphological analyser for Arabic (and Semitic languages in general) in the Beesley/Karttunen finite state transducer book, documenting the Xerox compiler (Ken Beesley also made an Arabic fst). Also, there now is an open source compiler reading the Xerox format, the HFST compiler.
  • And there is also an open source finite state morphological analyser for Arabic, AraComLex (online interface here). Among other resources related to AraComLex there is a list of Arabic morphological patterns and a frequency word list from a 1 billion word corpus.
  • Arabic Reference by Hans Wehr with form I vowelling, masadir (infinitives), broken plurals

Wordnet and dbpedia[edit]

Corpora[edit]