Difference between revisions of "Kashmiri"

From Apertium
Jump to navigation Jump to search
m
 
(21 intermediate revisions by the same user not shown)
Line 10: Line 10:


Kashmiri is an Indo-Aryan language spoken in the Kashmir Valley and regions around it that were historically a part of various kingdoms based in Kashmir. Kashmiri shares some common vocabulary with other Indo-Aryan languages of India and Pakistan such as Hindi and Punjabi, yet, probably due to its unique isolationist topography and history, has developed features of its own, such as a word order (syntax) different from the usual SOV (Subject-Object-Verb) found in Indo-Aryan languages, a sound system which features contrastive palatalisation of nearly all consonants and an extensive system of vowels.<ref>https://en.wikibooks.org/wiki/Kashmiri</ref>
Kashmiri is an Indo-Aryan language spoken in the Kashmir Valley and regions around it that were historically a part of various kingdoms based in Kashmir. Kashmiri shares some common vocabulary with other Indo-Aryan languages of India and Pakistan such as Hindi and Punjabi, yet, probably due to its unique isolationist topography and history, has developed features of its own, such as a word order (syntax) different from the usual SOV (Subject-Object-Verb) found in Indo-Aryan languages, a sound system which features contrastive palatalisation of nearly all consonants and an extensive system of vowels.<ref>https://en.wikibooks.org/wiki/Kashmiri</ref>
= Letters and Encoding =
Kashmiri today is mostly written in a variant of the Arabic script, with a number of adaptations. [http://unicode.org/L2/L2009/09215-kashmiri.pdf This link] details two unique Unicode characters (half-ye 0620 ؠ and wavy hamza under 065F ٟ ) for Kashmiri, and a [http://www.loc.gov/catdir/cpso/romanization/kashmiri.pdf reference] used in the document is helpful for understanding the orthography. [https://web.archive.org/web/20140723212005/http://parc.cdac.in/PASCII_V10.pdf This document] is also useful for nomenclature and lookup but unfortunately copying the characters does not work as it should.


== Resources ==
== Input ==
The script is not phonemic and also very difficult to input and read even for someone comfortable with standard Arabic script. It will be prudent to use some sort of transcription script to read and especially write in a shell environment.
=== Kashmiri Language Websites ===

Android tends to render the script well, and Gboard has a Kashmiri Arabic keyboard. In a fix this could also be a method for input.

== Encoding ==
Texts in Kashmiri are full of characters with the same shape and different encodings. It is an urgent need to figure out the correct encoding for each one and have the others interpreted as this character.

One should make liberal use of hexdumps and lookups, as looking up the arabic letter "noon" results in [https://www.fileformat.info/info/unicode/char/search.htm?q=noon&preview=entity 78 different characters].

= Resources =
It might be worth looking into using Amazon's Mechanical Turk, as in [http://www.aclweb.org/anthology/W10-0717 this paper.]

* [http://indradhanush.unigoa.ac.in/kashmiriwordnet/public/webcontent/webcontent.php?langid=19&id=1 Kashmiri Wordnet] - Surprisingly good site layout, has a popup keyboard for input as well
* [https://www.lexilogos.com/keyboard/kashmiri.htm Web App Keyboard]
== Related Work ==
* [http://dspaces.uok.edu.in:8080/jspui/bitstream/1/1445/1/Sajad%20Hussain%20Wani.pdf Theoretical Analysis of Kashmiri-English Translation] - GSoC projects are definitely going to be more practical than this
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/139773/15/15_chapter%208.pdf Data-driven parsing]
*[http://web2py.iiit.ac.in/research_centres/publications/download/inproceedings.pdf.afa21868fd51d5dc.6c74632d39322d526979617a41686d6164426861742e706466.pdf Shallow Parsing]
* [https://pdfs.semanticscholar.org/1b7e/b574581e8027c738d3ba27c54dda8ad35667.pdf Paper on transliteration of english to Kashmiri letters] - This might be sort of useful for input while working on stuff
* [http://www.aclweb.org/anthology/W12-5605 Kashmiri Dependency Treebank]
* [http://www.indjst.org/index.php/indjst/article/viewFile/81194/62606 Hindi-Dogri RBMT]
* [http://www.ijircce.com/upload/2015/october/138_English.pdf "Example" (Rule?)-based system attempt] (<s>Seems to use some sort of constituency parsing</s> Actually appears to be a phrase dictionary)

== Kashmiri Language Websites ==
* [https://muneeburrahman.wordpress.com/ Muneeb Urrahman's Literary Blog]
* [https://muneeburrahman.wordpress.com/ Muneeb Urrahman's Literary Blog]
* [http://gospelgo.com/a/kashmiril.htm Bible in latin letters]
* [http://gospelgo.com/a/kashmiril.htm Bible in latin letters]
Line 18: Line 43:
* [https://neabmagazine.com/ Neab Magazine]
* [https://neabmagazine.com/ Neab Magazine]
* [http://www.kashmirilanguage.com/ kashmirilanguage.com] (Has a not insignificant amount of text in nastaliq)
* [http://www.kashmirilanguage.com/ kashmirilanguage.com] (Has a not insignificant amount of text in nastaliq)
=== Other Corpora ===
== Other Corpora ==
* [http://dspaces.uok.edu.in:8080/jspui/handle/1/313 University of Kashmir Digital Library]
* [http://dspaces.uok.edu.in:8080/jspui/handle/1/313 University of Kashmir Digital Library]
* [http://catalog.elra.info/en-us/repository/browse/the-emilleciil-corpus/65d01734a9dc11e7a093ac9e1701ca02bd7f4d8cc15c4aafb2b3c04960650646/ EMILLE/CIIL Corpus] (Huge corpus of many Indian languages, needs special approval to access, cannot redistribute)
* [http://catalog.elra.info/en-us/repository/browse/the-emilleciil-corpus/65d01734a9dc11e7a093ac9e1701ca02bd7f4d8cc15c4aafb2b3c04960650646/ EMILLE/CIIL Corpus] (Huge corpus of many Indian languages, needs special approval to access, cannot redistribute)
=== Grammar ===
== Grammar ==
* [http://koshur.org/courses.html Resources compiled by the Kashmiri Pandit Network] (Includes structured course versions of some grammars, with recordings)
* [http://koshur.org/courses.html Resources compiled by the Kashmiri Pandit Network] (Includes structured course versions of some grammars, with recordings)
* Various grammars by Omkar Koul:
* Various grammars by Omkar Koul:
Line 27: Line 52:
** Spoken Kashmiri: A Language Course
** Spoken Kashmiri: A Language Course
** Kashmiri: A Cognitive-Descriptive Grammar
** Kashmiri: A Cognitive-Descriptive Grammar
=== Dictionaries ===
== Dictionaries ==
* [http://dsal.uchicago.edu/dictionaries/grierson/ Grierson Dictionary] (1932, in Devanagari, can query online)
* [http://dsal.uchicago.edu/dictionaries/grierson/ Grierson Dictionary] (1932, in Devanagari and latin, can query online)
** Some editions of the dictionary itself have Perso-Arabic, downloading from archive.org could be useful
==Developers==
* [http://dsal.uchicago.edu/dictionaries/hassan/ Hassan Dictionary] (2010, only latin, can query online, has recordings)
[[/Nominal_Morphology|Nominal Morphology]]
* Kaesher Lugaat - Shafi Shauq (Modern dictionary produced by the Academy, nastaliq, tricky to find)
* Kashir Dictionary - Tousikhani (7 volumes)
* Kashmiri-English Dictionary - Omkar N. Koul
* Persian/Tajik-Kashmiri-English dictionary - Jān, Jī. Ār

===Related===
* A Dictionary of Kashmiri Proverbs - Omkar N. Koul

=Developers=
* [[/Nominal_Morphology|Nominal Morphology]]
* [[/FST_Ideas|FST Ideas]]


=== Potential Students for GSoC 2019 ===
=== Potential Students for GSoC 2019 ===

Latest revision as of 10:42, 2 July 2018

Kaeshir
(Kashmiri)
Family: Indo-Aryan
ISO Codes: ks / kas / kas
Incubator: apertium-kas
Language pairs: {{{pairs}}}

Kashmiri is an Indo-Aryan language spoken in the Kashmir Valley and regions around it that were historically a part of various kingdoms based in Kashmir. Kashmiri shares some common vocabulary with other Indo-Aryan languages of India and Pakistan such as Hindi and Punjabi, yet, probably due to its unique isolationist topography and history, has developed features of its own, such as a word order (syntax) different from the usual SOV (Subject-Object-Verb) found in Indo-Aryan languages, a sound system which features contrastive palatalisation of nearly all consonants and an extensive system of vowels.[1]

Letters and Encoding[edit]

Kashmiri today is mostly written in a variant of the Arabic script, with a number of adaptations. This link details two unique Unicode characters (half-ye 0620 ؠ and wavy hamza under 065F ٟ ) for Kashmiri, and a reference used in the document is helpful for understanding the orthography. This document is also useful for nomenclature and lookup but unfortunately copying the characters does not work as it should.

Input[edit]

The script is not phonemic and also very difficult to input and read even for someone comfortable with standard Arabic script. It will be prudent to use some sort of transcription script to read and especially write in a shell environment.

Android tends to render the script well, and Gboard has a Kashmiri Arabic keyboard. In a fix this could also be a method for input.

Encoding[edit]

Texts in Kashmiri are full of characters with the same shape and different encodings. It is an urgent need to figure out the correct encoding for each one and have the others interpreted as this character.

One should make liberal use of hexdumps and lookups, as looking up the arabic letter "noon" results in 78 different characters.

Resources[edit]

It might be worth looking into using Amazon's Mechanical Turk, as in this paper.

Related Work[edit]

Kashmiri Language Websites[edit]

Other Corpora[edit]

Grammar[edit]

  • Resources compiled by the Kashmiri Pandit Network (Includes structured course versions of some grammars, with recordings)
  • Various grammars by Omkar Koul:
    • Modern Kashmiri Grammar
    • Spoken Kashmiri: A Language Course
    • Kashmiri: A Cognitive-Descriptive Grammar

Dictionaries[edit]

  • Grierson Dictionary (1932, in Devanagari and latin, can query online)
    • Some editions of the dictionary itself have Perso-Arabic, downloading from archive.org could be useful
  • Hassan Dictionary (2010, only latin, can query online, has recordings)
  • Kaesher Lugaat - Shafi Shauq (Modern dictionary produced by the Academy, nastaliq, tricky to find)
  • Kashir Dictionary - Tousikhani (7 volumes)
  • Kashmiri-English Dictionary - Omkar N. Koul
  • Persian/Tajik-Kashmiri-English dictionary - Jān, Jī. Ār

Related[edit]

  • A Dictionary of Kashmiri Proverbs - Omkar N. Koul

Developers[edit]

Potential Students for GSoC 2019[edit]

  • Rurik - Kashmiri-Hindi?
  •  ?

References[edit]