Apertium Indic

From Apertium
Revision as of 16:01, 13 August 2017 by Vin-ivar (talk | contribs) (init)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Apertium Indic is a subfamily of Apertium-based systems for Indic/Indo-Aryan languages. This has the potential to also include Dravidian languages, in the future. This page defines a set of standards and definitions that ought to generalise fairly cross-linguistically across the entire family, and that should be followed as much as possible. A standard reference book often used is The Indo-Aryan Languages [1].

Currently, Marathi has the most comprehensive morphological analyser in Apertium and should be used as a reference for building other analysers.

Morphology

Nominal

  • Follow Masica's three-level morphology wherever possible: nouns are nominative (<nom>), altered roots are oblique (<obl>), altered roots with cases directly mark case, postpositions attach to the oblique.
  • The main difference between a case and a postposition is whether there can be an intervening element between the oblique and the postposition. Other differences are subjective, but languages oughtn't to have more than 10 cases at most.
  • Genitives are separated from the root despite being a traditional case.
  • Pronouns should not be treated as fusional and should analyse the same way a noun would, whether more complex forms have been lexicalised or not.

Verbal

  • All verb forms must be covered - this goes without saying.
  • Verbs typically inflect for perfective/imperfective aspect, not past/present tense. Tense is imparted by copulas.
  • Gerunds exist and typically overlap with infinitives, but can take cases/postpositions.
  • Light verbs (N + V) should be treated as separate elements. If N cannot exist as an independent word, gloss it as <adv>.
  • Compound verbs (V + V) receive no special treatment.

Verb form table

Massive TODO

Active contributors

If you're contributing to any Indic language, add yourself here.

Name IRC nick Indic projects involved in
Vinit Ravishankar vin-ivar Marathi

Bibliography

[1] Masica, C.P. (1993) The Indo-Aryan Languages. Cambridge Language Surveys. Cambridge University Press. https://books.google.com.mt/books?id=Itp2twGR6tsC