Difference between revisions of "Indic"

@@ Line 1: / Line 1: @@
-{{TOCD}}
-=== THIS PAGE IS UNFINISHED ===
-The '''Indic languages''' include [[Hindi]], [[Urdu]], [[Bengali]], [[Sanskrit]], and several other languages. These languages are the dominant language family of the Indian subcontinent. The number of people that speak an Indic language is upwards of 900,000,000.
-The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair.  The current status of these goals is listed below.
-==Status==
-The ultimate goal is to have multi-purposable transducers for a variety of Indic languages.  These can then be paired for X→Y translation with the addition of a [[Constraint Grammar|CG]] for language X and transfer rules / dictionary for the pair X→Y.  Below is listed development progress for each language's transducers and dictionary pairs.
-=== Transducers ===
-Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".
-{| class="wikitable sortable"
-|-
-!rowspan=2| name
-!rowspan=2| Language
-!colspan=2 class="unsortable"| ISO 639
-!rowspan=2| formalism
-!rowspan=2| state
-!rowspan=2| stems
-!rowspan=2| coverage
-!rowspan=2| location
-!rowspan=2 class="unsortable"| primary authors
-|-class="sortbottom"
-! -2
-! -3
-|-
-|| <code>[[apertium-hin]]</code>
-|| [[Hindi]]
-|| <code>hi</code>
-|| <code>hin</code>
-|| HFST (lexc+twol)
-|| production
-|align="right"| {{#lst:apertium-hin/stats|stems}}
-|align="center"| -
-|| [[apertium-hin]]&nbsp;([[languages]])
-|| [[User:Nikant|Nikant]], [[User:darthxaher|Abu Zaher Md. Faridee]], [[User:Francis Tyers|Fran]]
-|-
-|| <code>[[apertium-urd]]</code>
-|| [[Urdu]]
-|| <code>ur</code>
-|| <code>urd</code>
-|| HFST (lexc+twol)
-|| production
-|align="right"| {{#lst:apertium-urd/stats|stems}}
-|align="center"| -
-|| [[apertium-urd]]&nbsp;([[languages]])
-|| -
-|-
-|| <code>[[apertium-ben]]</code>
-|| [[Bengali]]
-|| <code>bn</code>
-|| <code>ben</code>
-|| HFST (lexc+twol)
-|| production
-|align="right"| {{#lst:apertium-ben/stats|stems}}
-|align="center"| -
-|| [[apertium-ben]]&nbsp;([[languages]])
-|| [[User:darthxaher|Abu Zaher Md. Faridee]]
-|-
-|| <code>[[apertium-san]]</code>
-|| [[Sanskrit]]
-|| <code>sa</code>
-|| <code>san</code>
-|| HFST (lexc+twol)
-|| production
-|align="right"| {{#lst:Apertium-san/stats|stems}}
-|align="center"| -
-|| [[apertium-san]] ([[languages]])
-|| Amba Kulkarni
-|-
-|}
-=== Indic Language Classification ===
-* Dardic: [[Pahayi]], [[Khowar]], [[Kohistani]], [[Shina language]], [[Kashiri]]
-* Northern Zone:
-**Cantral Pahari
-***[[Garhwali]], [[Kumauni]]
-**Eastern Pahari
-***[[Nepali]]
-* North-Western Zone:
-**Dogri-Kangri
-***[[Dogri]], [[Kangri]], [[Mandeali]], etc.
-** [[Punjabi]]
-** [[Lahnda]]
-** [[Sindhi]]
-* Western Zone:
-** Rajasthani
-*** [[Marwari]], [[Rajasthani]]
-**[[Gujarati]]
-**[[Bhil]]
-**[[Khandeshi]]
-**[[Domari-Romani]]
-* [[Hindi]]
-* Southern Zone:
-** [[Marathi]]
-** [[Konkani]]
-** Insular Indic
-*** [[Sinhalese]], [[Maldivian]]
-* Eastern Zone:
-** Bihari
-*** [[Bhojpuri]], [[Maithili]], etc.
-** [[Bengali]]
-** [[Oriya]]
-** [[Tharu]]
-* [[Sanskrit]]
-==== Indic-Indic pairs ====
-{| style="text-align: center;" class="wikitable"
-|- style="background: #ececec"
-!           !!    hin    !! ben !! urd !! san
-|-
-| '''hin''' ||     -     ||     ||     ||     |
-|-
-| '''ben''' || [[bn-hi]] ||  -  ||     ||     |
-|-
-| '''urd''' || [[ur-hi]] ||     ||  -  ||     |
-|-
-| '''san''' ||           ||     ||     ||  -  |
-|-
-|}
-==== Pairs with non-Indic languages ====
-{| style="text-align: center;" class="wikitable"
-|- style="background: #ececec"
-!           !!     eng     !!    as     !!    mr     !!    pa     !!    fa
-|-
-| '''hin''' || [[eng-hin]] || [[as-hi]] || [[mr-hi]] || [[pa-hi]] ||
-|-
-| '''ben''' ||  [[bn-en]]  ||           ||           ||           ||
-|-
-| '''urd''' ||             ||           ||           || [[ur-pa]] || [[ur-fa]]
-|-
-| '''san''' ||             ||           ||           ||           ||
-|}
-==Tagset==
-Rough guide to tagsets in various Indic language transducers, with an eye to keeping stuff that is basically the same tagged the same. In the following table, <sup>A</sup> stands for Apertium and <sup>T</sup> stands for [[TRmorph]] (See also [[List_of_symbols|the general tagset list]]).
-{|class="wikitable"
-! Phenomenon      !! Morphology !!  Description !! Tag(s) !! Language(s) !! Notes
-|-
-|colspan=6 align="center"|'''Part of speech'''
-|-
-| Noun || ||  || {{tag|n}} ||  ||

Difference between revisions of "Indic"

Latest revision as of 02:44, 22 November 2013

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools