Malayalam and English/documentation

From Apertium
Jump to navigation Jump to search

Malayalam is both agglutinative and inflective language . it belong dravidian language category . In apertium we are trying to implement englaish malayalam pair using hfst .it is described here Starting_a_new_language_with_HFST

Morphotactic using lexc

let's take an example of a noun word , malayalam noun can have 8 inflections , nominative,dative, instrumental, locative ,accusative,vocative and sociative . it can be also classified on the basis on number ,singular and plural , let's declare essential symbols

Multichar_Symbols

%<n%>           ! Noun                        ! നാമം

%<nom%>         ! Nominative                  !

%<acc%>         ! Accusative                  !

%<dat%>         ! Dative                      !

%<soc%>         ! Sociative                   !

%<gen%>         ! Genitive                    !

%<ins%>         ! Instrumental                !

%<loc%>         ! Locative                    !

%<voc%>         ! Vocative                    !

%<sg%>          ! Singular                    !

%<pl%>          ! Plural                      !

Now we have all essential symbols for a noun , let's add an example paradigm

LEXICON Root

Miscellaneous ;
Conjunctions ; 
Postpositions ;
Pronouns ;
Determiners ;
Numerals ;
NominalStems ;

Nouns

Word Tree Elephant Human Dog
Case Singular Plural Singular Plural Singular Plural Singular Plural
Nominative maram maraṅṅaḷ āna ānakaḷ manuṣyan manuṣyar paṭṭi paṭṭikaḷ
Vocative maramē maraṅṅaḷē ānē ānakaḷē manuṣyā manuṣyarē paṭṭī paṭṭikaḷē
Accusative maratte maraṅṅaḷe ānaye ānakaḷe manuṣyane manuṣyare paṭṭiye paṭṭikaḷe
Genitive marattinte maraṅṅaḷuṭe ānayuṭe ānakaḷuṭe manuṣyante manuṣyaruṭe paṭṭiyuṭe paṭṭikaḷuṭe
Dative marattinu maraṅṅaḷkku ānaykku ānakaḷkku manuṣyanu manuṣyarkku paṭṭiykku paṭṭikaḷkku
Instrumental marattāl maraṅṅaḷāl ānayāl ānakaḷāl manuṣyanāl manuṣyarāl paṭṭiyāl paṭṭikaḷāl
Locative marattil maraṅṅaḷil ānayil ānakaḷil manuṣyanil manuṣyaril paṭṭiyil paṭṭikaḷil
Sociative marattōṭu maraṅṅaḷōṭu ānayōṭu ānakaḷōṭu manuṣyanōṭu manuṣyarōṭu paṭṭiyōṭu paṭṭikaḷōṭu
LEXICON N1 

%<n%>%<sg%>%<nom%>:ṁ CLIT-N-NOM ; ! ṁ
%<n%>%<sg%>%<loc%>:%>ttil‍ CLIT-N-LOC ; ! ttil
%<n%>%<sg%>%<acc%>:%>tte CLIT-N-ACC ; ! tte
%<n%>%<sg%>%<gen%>:%>ttinṟe CLIT-N-GEN ; ! ttinṟe
%<n%>%<sg%>%<dat%>:%>ttin CLIT-N ; ! ttin
%<n%>%<sg%>%<dat%>:%>ttinu CLIT-N ; ! ttinu ! debug
!plural
%<n%>%<pl%>%<nom%>:%>ṅṅaḷ‍ CLIT-N-NOM ; ! ṅṅaḷ‍
%<n%>%<pl%>%<acc%>:%>ṅṅaḷe CLIT-N-ACC ; ! ṅṅale
%<n%>%<pl%>%<gen%>:%>ṅṅaḷuṭe CLIT-N-GEN ; ! ṅṅaḷuṭe

and an example word

LEXICON NominalStems
mēghaṁ:mēgha N1 ; ! mēghaṁ ! cloud

Currently there are 10 noun paradigms ,N1 N2 ,...N10

General trends in paradigms

LEXICON N1 :- Words ending with anusuvara( ം )

LEXICON N2 :- words ending with the vowel a or i

LEXICON N3 :- words ending with virama or a vowel

LEXICON N4 :- words ending with the vowel a or i

LEXICON N5 :- words ends with virama (eg വീട് )

LEXICON N6 :- for the word പേര്‍ (pēr‍)

LEXICON N7 :-words ends with the vowel u

LEXICON N8 :-

LEXICON N9 :-

LEXICON N10 :-

Proper Nouns

  • LEXICON NP* is for proper nouns , nature of the proper nouns are almost similar to nouns
  • LEXICON NP*-COG is for second name
  • LEXICON NP-TOP-* represents place names

they are

  1. NP-TOP-KERALA :- Place names ending with anusuvara
  2. NP-TOP-INDIA :- Place names ending with the vowel a or i
  3. NP-TOP-CALICUT :- Place names ending with virama
  4. NP-TOP-KANNUR :- Place names ending with chillu
  5. NP-TOP-MALABAR : -place name ending with chillu R(ര്‍ )
  6. NP-TOP-JAPAN :- place names ending with chilllu ന്‍
  7. NP-TOP-BRAZIL :-place name ending with chillu ല്‍

ProNouns

  • PRON-PERS-* represents personal pronouns

they are

  1. PRON-PERS-NNAAN :-
  2. PRON-PERS-NII :-
  3. PRON-PERS-AVAN :-
  4. PRON-PERS-AVAL :-
  5. PRON-PERS-NNANNAL
  6. PRON-PERS-NAAM
  7. PRON-PERS-NINNAL
  8. PRON-PERS-AVAR
  9. PRON-PERS-ADDEHA
  • PRON-DEM is for demonstrative pronoun

they are

  1. PRON-DEM-AT
  2. PRON-DEM-IT
  • PRON-IND is for Indefinite pronoun

Numerals

NUM for numerals

Verbs

Form Description Tag Example Translation
Present stemk-unnu <pres> kuttikal kalikkunnu
children play
The children are playing.
Future stem-um <fut> naale mala peyyum
tomorrow rain will.fall
It will rain tomorrow.
Present progressive presk-unt aval nannaayi pathikkunt
she well studying.is
She is studying well.
Present progressive (II) inf ān siita avite irikkuka ān
Sita there sit is.
Sita is sitting there.
Iterative present stem-kontu-iri-kk-unnu avan paatikkontirikkunnu
He singing.is
He is singing.
Iterative fut stem-kontu-iri-kk-um avan paatikkontirikkum
He singing.will.be
He will be singing.
Iterative past stem-kontu-iri-unnu avan paatikkontirunnu
He singing.was
He was singing.
Continuous iterative stem-konte-iri-kunnu kuttikal paatikkonteeyirunnu
children sang.without.stopping
The children sang without stopping
Perfect innale mala peytirunnu
yesterday rain fell
It rained yesterday.
Contemporaneous perfect yuddham pottippurappettirikkunnu
war broken#out.has
War has broken out!
Remote perfect ñaan paattŭ pathiccittuntŭ
I music studied.had
I had studied music.
Habitual present juun maasattil mala peyyaaruntŭ
June month.in rain falls.usually
It usually rains in June.
Habitual past ñaan delhiyil pookaaruntaayirunnu
I Delhi.to go.used#to
I used to go to Delhi.
Imperative putiya vidyaarthikal hedmaasrrare kaaneentataanŭ
new students headmaster meet.should
New students should meet the headmaster.
Promissive past-ām ñaan naale varaam
I tomorrow come.will
I will come tomorrow.
Emphatic promissive past-ēk-ām ñaan naale vanneekkaam
I tomorrow come.will
I will come tomorrow.
Permissive past-ō (kolluu) vannoo
you.may.come
You may come.
Permissive (II) past-ootte avan avite irunnootte
He there sit.let
Let him sit there.
Permissive (III) avar avite taamasikkatte
He there sit.let
Let him sit there.
Permissive (Formal) paas ullavarkkŭ itilee pookaavunnatŭ aanŭ
pass having this.way go.may is
Those who have a pass may go this way.
Optative mala peyyatte
rain fall.let
Let it rain.
Precative stem-anē (= stem-uka-vēnam-ē) mala peyyanee
rain fall.may
May it rain.
%<quot%>
%<enum%>        ! Enumerative                 !

%<subst%>       ! Substantive                 !
%<attr%>        ! Attributive                 !

%<iv%>          ! Intransitive                ! 
%<tv%>          ! Transitive                  ! 

%<neg%>         ! Negative                    !

%<pres%>        ! Present tense               ! വര്‍ത്ത്മാന കാലം 
%<past%>        ! Past tense                  ! ഭൂത കാലം 
%<fut%>         ! Future tense                ! 
%<perf%>        ! Simple Perfect              !
%<rem_perf%>    ! Remote Perfect              !
%<contpres%>    ! Contemporaneous perfect     !
%<perm%>        ! Permissive mood             !
%<imp%>         ! Imperative mood             !
%<hab%>         ! Habitual aspect             ! 

%<prec%>         ! Precative mood             ! 
%<opt%>          ! Optative mood              !
%<irre%>         ! Irrealis mood              !
%<satis%>        ! satisfactive mood              !  
%<monit%>        ! monitory  mood              !    

%<frml%>        ! Formal                      !
%<infml%>       ! InFormal                    !

%<inf_k%>       ! Infinitive                  !
%<inf_n%>       ! Purposive infinitive        !
%<oblig%>       ! Obligative                  ! 
%<simul%>       ! Simultaneous                !

%<iter%>        ! Iterative                   !
%<cond%>        ! Conditional Mood                 !

%<gpr_pres%>    ! 
%<gpr_past%>    !

currently 25+ verb paradigms are added to lexc , unlike noun , it is difficult to predict paradigm by checking word pattern

Verb paradigm are of the form

LEXICON V-TV-ARIYUKA

%<v%>%<tv%>: V-COMMON-ARIYUKA ; ! ""

and

LEXICON V-IV-KALI

%<v%>%<iv%>: V-COMMON-KALI ; ! ""

here IV-Intransitive Verb

TV-Transitive Verb

continuation paradigm v-common-* is added to both Eg : v-common-atikkuka

LEXICON V-COMMON-ATIKKUKA
%<inf_k%>:%>kku k CLIT-CC ; ! "" ! FIXME
%<inf_n%>:%>kkan‍ CLIT-CC ; ! "̔" !
%<perf%>:%>chchiru nnu  CLIT-ITG ; ! "̔" !
%<rem_perf%>:%>chchiṭṭu ṇṭ # ; ! "' !""
%<pres%>:%>kku nnu  NEG-WHEN ; ! "̔̔"
%<past%>:%>chchu  NEG-WHEN ; ! ""
%<fut%>:%>kku ' NEG-WHEN ; ! ""
%<pass%>:%>kkppe PASS-CONT ;
%<iter%>:%>chchu koṇṭi ITER-TENS; ! ""
%<iter%>%<cont%>:%>chchu koṇṭeyi ITER-TENS; ! ""
%<gpr_pres%>:%>kku nn GPR-PRES ; ! ""
%<gpr_past%>:%>chch GPR-PAST ; ! ""
%<hab%>:%>kkar‍ CLIT-COP-UNTU ; ! "ār"
%<imp%>:%>kk CONT_IMP; !""
%<pcpl%>:%>chch # ; ! ""
%<contpres%>:%>chchirikku nnu  # ;
%<prec%>:%>kk PREC-CONT ; ! "" ! ""
%<opt%>:%>kkṭṭe # ;
%<irre%>%<past%>:%>chchene # ;
%<cond%>:%>chchal‍  NEG-WHEN ; ! "ccāl‍"
%<monit%>%<fut%>:%>kku me # ;
%<satis%>%<fut%>:%>kku mllo #;
%<satis%>%<past%>:%>chchllo #;
%<satis%>%<pres%>:%>kku nnllo #;
%<oblig%>:%>kkṇ' # ;

it contain continuation lexicons like CLIT-CC,CLIT-ITG, NEG-WHEN etc

  • passive verbs are added using the continuation lexicon PASS-CONT (%<pass%>:%>kkppe PASS-CONT ;)
  • passive verb lexicon is defined as
LEXICON PASS-CONT
%<inf_k%>:%>ṭu k CLIT-CC ; ! "" ! FIXME
%<inf_n%>:%>ṭan‍ CLIT-CC ; ! "̔" ! 
%<perf%>:%>ṭṭiru nnu  CLIT-ITG ; ! "̔" ! 
%<pres%>:%>ṭu nnu  NEG-WHEN ; ! "̔̔"
%<past%>:%>ṭṭu  NEG-WHEN ; ! ""
%<fut%>:%>ṭu ' NEG-WHEN ; ! ""
%<iter%>%<pres%>:%>ṭṭu koṇṭirikku nnu  NEG-WHEN ; ! ""
%<iter%>%<past%>:%>ṭṭu koṇṭiru nnu  NEG-WHEN ; ! ""
%<iter%>%<fut%>:%>ṭṭu koṇṭirikku ' NEG-WHEN ; ! ""
%<gpr_pres%>:%>ṭu nn GPR-PRES ; ! ""
%<gpr_past%>:%>ṭṭ GPR-PAST ; ! ""
%<imp%>:%>ṭ # ; !""
%<imp%>%<frml%>:%>ṭṇ' # ; !""
%<imp%>%<infml%>:%>ṭu  # ; !""
%<hab%>:%>ṭar‍ CLIT-COP-UNTU ; ! "ār"
%<pcpl%>:%>ṭṭ # ; ! ""
%<contpres%>:%>ṭṭikku nnu  # ; 
%<prec%>:%>ṭṇe # ; 
%<opt%>:%>ṭṭṭe # ; 
%<irre%>%<past%>:%>ṭṭene # ; 
%<cond%>:%>ṭṭal‍  NEG-WHEN ; ! "
%<monit%>%<fut%>:%>ṭu me # ;
%<satis%>%<fut%>:%>ṭu mllo #;
%<satis%>%<pres%>:%>ṭu nnllo #;
%<satis%>%<past%>:%>chchllo #;
%<oblig%>:%>ṭṇ' # ; 
%<itg%>:%>ṭu mo # ; 
  • Imperative mood is added using the continuation lexicon CONT_IMP ( %<imp%>:%>ക്ക CONT_IMP; !"" )
LEXICON CONT_IMP
 # ; !""
%<frml%>:%>ṇ' # ; !""
%<infml%>:%>u  # ; !""
  • Verbal adjectives are added using the continuation lexicon GPR-PRES
LEXICON GPR-PRES

%<subst%>:%>ത  N3-COMMON; 
# ;

Adjectives

4 paradigms are included in apertium

  1. LEXICON A1
  2. LEXICON A2
  3. LEXICON A3
  4. LEXICON A4

Adverbs

5 adverb paradigms are added

  1. ADV
  2. ADV1
  3. ADV2
  4. ADV3
  5. ADV4

Post Positions

Malayalam Sandhi Rules Implementation

Refer Malayalam_and_English/sandh_in_malayalam

BiLingual Dictionary

for mapping source language word to target language word. it acts like a dictionary Read Bilingual_dictionary

Eg


<e><p><l>അഭിസംബോധന<s n="n"/></l><r>address<s n="n"/></r></p></e>
<e><p><l>കമ്യൂണിസ്റ്റ്<s n="n"/></l><r>communist<s n="n"/></r></p></e>
<e><p><l>ഗോത്രം<s n="n"/></l><r>caste<s n="n"/></r></p></e>

Transfer Rules

Structural transfer module , A_long_introduction_to_transfer_rules