Malayalam and English/documentation
Malayalam is both agglutinative and inflective language . it belong dravidian language category . In apertium we are trying to implement englaish malayalam pair using hfst .it is described here Starting_a_new_language_with_HFST
Contents
Morphotactic using lexc
let's take an example of a noun word , malayalam noun can have 8 inflections , nominative,dative, instrumental, locative ,accusative,vocative and sociative . it can be also classified on the basis on number ,singular and plural , let's declare essential symbols
Multichar_Symbols %<n%> ! Noun ! നാമം %<nom%> ! Nominative ! %<acc%> ! Accusative ! %<dat%> ! Dative ! %<soc%> ! Sociative ! %<gen%> ! Genitive ! %<ins%> ! Instrumental ! %<loc%> ! Locative ! %<voc%> ! Vocative ! %<sg%> ! Singular ! %<pl%> ! Plural !
Now we have all essential symbols for a noun , let's add an example paradigm
LEXICON Root Miscellaneous ; Conjunctions ; Postpositions ; Pronouns ; Determiners ; Numerals ; NominalStems ;
Nouns
Word | Tree(N1) | Elephant(N2) | Human(N8) | Dog(N1) | ||||
---|---|---|---|---|---|---|---|---|
Case | Singular | Plural | Singular | Plural | Singular | Plural | Singular | Plural |
Nominative | maram | maraṅṅaḷ | āna | ānakaḷ | manuṣyan | manuṣyar | paṭṭi | paṭṭikaḷ |
Vocative | maramē | maraṅṅaḷē | ānē | ānakaḷē | manuṣyā | manuṣyarē | paṭṭī | paṭṭikaḷē |
Accusative | maratte | maraṅṅaḷe | ānaye | ānakaḷe | manuṣyane | manuṣyare | paṭṭiye | paṭṭikaḷe |
Genitive | marattinte | maraṅṅaḷuṭe | ānayuṭe | ānakaḷuṭe | manuṣyante | manuṣyaruṭe | paṭṭiyuṭe | paṭṭikaḷuṭe |
Dative | marattinu | maraṅṅaḷkku | ānaykku | ānakaḷkku | manuṣyanu | manuṣyarkku | paṭṭiykku | paṭṭikaḷkku |
Instrumental | marattāl | maraṅṅaḷāl | ānayāl | ānakaḷāl | manuṣyanāl | manuṣyarāl | paṭṭiyāl | paṭṭikaḷāl |
Locative | marattil | maraṅṅaḷil | ānayil | ānakaḷil | manuṣyanil | manuṣyaril | paṭṭiyil | paṭṭikaḷil |
Sociative | marattōṭu | maraṅṅaḷōṭu | ānayōṭu | ānakaḷōṭu | manuṣyanōṭu | manuṣyarōṭu | paṭṭiyōṭu | paṭṭikaḷōṭu |
LEXICON N1 %<n%>%<sg%>%<nom%>:ṁ CLIT-N-NOM ; ! ṁ %<n%>%<sg%>%<loc%>:%>ttil CLIT-N-LOC ; ! ttil %<n%>%<sg%>%<acc%>:%>tte CLIT-N-ACC ; ! tte %<n%>%<sg%>%<gen%>:%>ttinṟe CLIT-N-GEN ; ! ttinṟe %<n%>%<sg%>%<dat%>:%>ttin CLIT-N ; ! ttin %<n%>%<sg%>%<dat%>:%>ttinu CLIT-N ; ! ttinu ! debug !plural %<n%>%<pl%>%<nom%>:%>ṅṅaḷ CLIT-N-NOM ; ! ṅṅaḷ %<n%>%<pl%>%<acc%>:%>ṅṅaḷe CLIT-N-ACC ; ! ṅṅale %<n%>%<pl%>%<gen%>:%>ṅṅaḷuṭe CLIT-N-GEN ; ! ṅṅaḷuṭe
and an example word
LEXICON NominalStems mēghaṁ:mēgha N1 ; ! mēghaṁ ! cloud
Currently there are 10 noun paradigms ,N1 N2 ,...N10
General trends in paradigms
LEXICON N1 :- Words ending with anusuvara( ം )
LEXICON N2 :- words ending with the vowel a or i
LEXICON N3 :- words ending with virama or a vowel
LEXICON N4 :- words ending with the vowel a or i
LEXICON N5 :- words ends with virama (eg വീട് )
LEXICON N6 :- for the word പേര് (pēr)
LEXICON N7 :-words ends with the vowel u
LEXICON N8 :-
LEXICON N9 :-
LEXICON N10 :-
Proper Nouns
- LEXICON NP* is for proper nouns , nature of the proper nouns are almost similar to nouns
- LEXICON NP*-COG is for second name
- LEXICON NP-TOP-* represents place names
they are
- NP-TOP-KERALA :- Place names ending with anusuvara
- NP-TOP-INDIA :- Place names ending with the vowel a or i
- NP-TOP-CALICUT :- Place names ending with virama
- NP-TOP-KANNUR :- Place names ending with chillu
- NP-TOP-MALABAR : -place name ending with chillu R(ര് )
- NP-TOP-JAPAN :- place names ending with chilllu ന്
- NP-TOP-BRAZIL :-place name ending with chillu ല്
ProNouns
- PRON-PERS-* represents personal pronouns
they are
- PRON-PERS-NNAAN :-
- PRON-PERS-NII :-
- PRON-PERS-AVAN :-
- PRON-PERS-AVAL :-
- PRON-PERS-NNANNAL
- PRON-PERS-NAAM
- PRON-PERS-NINNAL
- PRON-PERS-AVAR
- PRON-PERS-ADDEHA
- PRON-DEM is for demonstrative pronoun
they are
- PRON-DEM-AT
- PRON-DEM-IT
- PRON-IND is for Indefinite pronoun
Numerals
NUM for numerals
Verbs
Form | Description | Tag | Example | Translation |
---|---|---|---|---|
Present | stemk-unnu | <pres> |
kuttikal kalikkunnu children play |
The children are playing. |
Future | stem-um | <fut> |
naale mala peyyum tomorrow rain will.fall |
It will rain tomorrow. |
Present progressive | presk-unt | aval nannaayi pathikkunt she well studying.is |
She is studying well. | |
Present progressive (II) | inf ān | siita avite irikkuka ān Sita there sit is. |
Sita is sitting there. | |
Iterative present | stem-kontu-iri-kk-unnu | avan paatikkontirikkunnu He singing.is |
He is singing. | |
Iterative fut | stem-kontu-iri-kk-um | avan paatikkontirikkum He singing.will.be |
He will be singing. | |
Iterative past | stem-kontu-iri-unnu | avan paatikkontirunnu He singing.was |
He was singing. | |
Continuous iterative | stem-konte-iri-kunnu | kuttikal paatikkonteeyirunnu children sang.without.stopping |
The children sang without stopping | |
Perfect | innale mala peytirunnu yesterday rain fell |
It rained yesterday. | ||
Contemporaneous perfect | yuddham pottippurappettirikkunnu war broken#out.has |
War has broken out! | ||
Remote perfect | ñaan paattŭ pathiccittuntŭ I music studied.had |
I had studied music. | ||
Habitual present | juun maasattil mala peyyaaruntŭ June month.in rain falls.usually |
It usually rains in June. | ||
Habitual past | ñaan delhiyil pookaaruntaayirunnu I Delhi.to go.used#to |
I used to go to Delhi. | ||
Imperative | putiya vidyaarthikal hedmaasrrare kaaneentataanŭ new students headmaster meet.should |
New students should meet the headmaster. | ||
Promissive | past-ām | ñaan naale varaam I tomorrow come.will |
I will come tomorrow. | |
Emphatic promissive | past-ēk-ām | ñaan naale vanneekkaam I tomorrow come.will |
I will come tomorrow. | |
Permissive | past-ō (kolluu) | vannoo you.may.come |
You may come. | |
Permissive (II) | past-ootte | avan avite irunnootte He there sit.let |
Let him sit there. | |
Permissive (III) | avar avite taamasikkatte He there sit.let |
Let him sit there. | ||
Permissive (Formal) | paas ullavarkkŭ itilee pookaavunnatŭ aanŭ pass having this.way go.may is |
Those who have a pass may go this way. | ||
Optative | mala peyyatte rain fall.let |
Let it rain. | ||
Precative | stem-anē (= stem-uka-vēnam-ē) | mala peyyanee rain fall.may |
May it rain. |
%<quot%> %<enum%> ! Enumerative ! %<subst%> ! Substantive ! %<attr%> ! Attributive ! %<iv%> ! Intransitive ! %<tv%> ! Transitive ! %<neg%> ! Negative ! %<pres%> ! Present tense ! വര്ത്ത്മാന കാലം %<past%> ! Past tense ! ഭൂത കാലം %<fut%> ! Future tense ! %<perf%> ! Simple Perfect ! %<rem_perf%> ! Remote Perfect ! %<contpres%> ! Contemporaneous perfect ! %<perm%> ! Permissive mood ! %<imp%> ! Imperative mood ! %<hab%> ! Habitual aspect ! %<prec%> ! Precative mood ! %<opt%> ! Optative mood ! %<irre%> ! Irrealis mood ! %<satis%> ! satisfactive mood ! %<monit%> ! monitory mood ! %<frml%> ! Formal ! %<infml%> ! InFormal ! %<inf_k%> ! Infinitive ! %<inf_n%> ! Purposive infinitive ! %<oblig%> ! Obligative ! %<simul%> ! Simultaneous ! %<iter%> ! Iterative ! %<cond%> ! Conditional Mood ! %<gpr_pres%> ! %<gpr_past%> !
currently 25+ verb paradigms are added to lexc , unlike noun , it is difficult to predict paradigm by checking word pattern
Verb paradigm are of the form
LEXICON V-TV-ARIYUKA %<v%>%<tv%>: V-COMMON-ARIYUKA ; ! ""
and
LEXICON V-IV-KALI %<v%>%<iv%>: V-COMMON-KALI ; ! ""
here IV-Intransitive Verb
TV-Transitive Verb
continuation paradigm v-common-* is added to both Eg : v-common-atikkuka
LEXICON V-COMMON-ATIKKUKA %<inf_k%>:%>kku k CLIT-CC ; ! "" ! FIXME %<inf_n%>:%>kkan CLIT-CC ; ! "̔" ! %<perf%>:%>chchiru nnu CLIT-ITG ; ! "̔" ! %<rem_perf%>:%>chchiṭṭu ṇṭ # ; ! "' !"" %<pres%>:%>kku nnu NEG-WHEN ; ! "̔̔" %<past%>:%>chchu NEG-WHEN ; ! "" %<fut%>:%>kku ' NEG-WHEN ; ! "" %<pass%>:%>kkppe PASS-CONT ; %<iter%>:%>chchu koṇṭi ITER-TENS; ! "" %<iter%>%<cont%>:%>chchu koṇṭeyi ITER-TENS; ! "" %<gpr_pres%>:%>kku nn GPR-PRES ; ! "" %<gpr_past%>:%>chch GPR-PAST ; ! "" %<hab%>:%>kkar CLIT-COP-UNTU ; ! "ār" %<imp%>:%>kk CONT_IMP; !"" %<pcpl%>:%>chch # ; ! "" %<contpres%>:%>chchirikku nnu # ; %<prec%>:%>kk PREC-CONT ; ! "" ! "" %<opt%>:%>kkṭṭe # ; %<irre%>%<past%>:%>chchene # ; %<cond%>:%>chchal NEG-WHEN ; ! "ccāl" %<monit%>%<fut%>:%>kku me # ; %<satis%>%<fut%>:%>kku mllo #; %<satis%>%<past%>:%>chchllo #; %<satis%>%<pres%>:%>kku nnllo #; %<oblig%>:%>kkṇ' # ;
it contain continuation lexicons like CLIT-CC,CLIT-ITG, NEG-WHEN etc
- passive verbs are added using the continuation lexicon PASS-CONT (%<pass%>:%>kkppe PASS-CONT ;)
- passive verb lexicon is defined as
LEXICON PASS-CONT %<inf_k%>:%>ṭu k CLIT-CC ; ! "" ! FIXME %<inf_n%>:%>ṭan CLIT-CC ; ! "̔" ! %<perf%>:%>ṭṭiru nnu CLIT-ITG ; ! "̔" ! %<pres%>:%>ṭu nnu NEG-WHEN ; ! "̔̔" %<past%>:%>ṭṭu NEG-WHEN ; ! "" %<fut%>:%>ṭu ' NEG-WHEN ; ! "" %<iter%>%<pres%>:%>ṭṭu koṇṭirikku nnu NEG-WHEN ; ! "" %<iter%>%<past%>:%>ṭṭu koṇṭiru nnu NEG-WHEN ; ! "" %<iter%>%<fut%>:%>ṭṭu koṇṭirikku ' NEG-WHEN ; ! "" %<gpr_pres%>:%>ṭu nn GPR-PRES ; ! "" %<gpr_past%>:%>ṭṭ GPR-PAST ; ! "" %<imp%>:%>ṭ # ; !"" %<imp%>%<frml%>:%>ṭṇ' # ; !"" %<imp%>%<infml%>:%>ṭu # ; !"" %<hab%>:%>ṭar CLIT-COP-UNTU ; ! "ār" %<pcpl%>:%>ṭṭ # ; ! "" %<contpres%>:%>ṭṭikku nnu # ; %<prec%>:%>ṭṇe # ; %<opt%>:%>ṭṭṭe # ; %<irre%>%<past%>:%>ṭṭene # ; %<cond%>:%>ṭṭal NEG-WHEN ; ! " %<monit%>%<fut%>:%>ṭu me # ; %<satis%>%<fut%>:%>ṭu mllo #; %<satis%>%<pres%>:%>ṭu nnllo #; %<satis%>%<past%>:%>chchllo #; %<oblig%>:%>ṭṇ' # ; %<itg%>:%>ṭu mo # ;
- Imperative mood is added using the continuation lexicon CONT_IMP ( %<imp%>:%>ക്ക CONT_IMP; !"" )
LEXICON CONT_IMP # ; !"" %<frml%>:%>ṇ' # ; !"" %<infml%>:%>u # ; !""
- Verbal adjectives are added using the continuation lexicon GPR-PRES
LEXICON GPR-PRES %<subst%>:%>ത N3-COMMON; # ;
Adjectives
4 paradigms are included in apertium
- LEXICON A1
- LEXICON A2
- LEXICON A3
- LEXICON A4
Adverbs
5 adverb paradigms are added
- ADV
- ADV1
- ADV2
- ADV3
- ADV4
Post Positions
Malayalam Sandhi Rules Implementation
Refer Malayalam_and_English/sandh_in_malayalam
BiLingual Dictionary
for mapping source language word to target language word. it acts like a dictionary Read Bilingual_dictionary
Eg
<e><p><l>അഭിസംബോധന<s n="n"/></l><r>address<s n="n"/></r></p></e> <e><p><l>കമ്യൂണിസ്റ്റ്<s n="n"/></l><r>communist<s n="n"/></r></p></e> <e><p><l>ഗോത്രം<s n="n"/></l><r>caste<s n="n"/></r></p></e>
Transfer Rules
Structural transfer module , A_long_introduction_to_transfer_rules
References
This article uses material from the Wikipedia article Malayalam Grammar, which is released under the Creative Commons Attribution-Share-Alike License 3.0.