Malayalam and English/documentation

From Apertium
Jump to navigation Jump to search

Malayalam is both agglutinative and inflective language . it belong dravidian language category . In apertium we are trying to implement englaish malayalam pair using hfst .it is described here Starting_a_new_language_with_HFST

Morphotactic using lexc

let's take an example of a noun word , malayalam noun can have 8 inflections , nominative,dative, instrumental, locative ,accusative,vocative and sociative . it can be also classified on the basis on number ,singular and plural , let's declare essential symbols

Multichar_Symbols

%<n%>           ! Noun                        ! നാമം

%<nom%>         ! Nominative                  !

%<acc%>         ! Accusative                  !

%<dat%>         ! Dative                      !

%<soc%>         ! Sociative                   !

%<gen%>         ! Genitive                    !

%<ins%>         ! Instrumental                !

%<loc%>         ! Locative                    !

%<voc%>         ! Vocative                    !

%<sg%>          ! Singular                    !

%<pl%>          ! Plural                      !

Now we have all essential symbols for a noun , let's add an example paradigm

LEXICON Root

Miscellaneous ;
Conjunctions ; 
Postpositions ;
Pronouns ;
Determiners ;
Numerals ;
NominalStems ;

Nouns

LEXICON N1 

%<n%>%<sg%>%<nom%>:ṁ CLIT-N-NOM ; ! ṁ
%<n%>%<sg%>%<loc%>:%>ttil‍ CLIT-N-LOC ; ! ttil
%<n%>%<sg%>%<acc%>:%>tte CLIT-N-ACC ; ! tte
%<n%>%<sg%>%<gen%>:%>ttinṟe CLIT-N-GEN ; ! ttinṟe
%<n%>%<sg%>%<dat%>:%>ttin CLIT-N ; ! ttin
%<n%>%<sg%>%<dat%>:%>ttinu CLIT-N ; ! ttinu ! debug
!plural
%<n%>%<pl%>%<nom%>:%>ṅṅaḷ‍ CLIT-N-NOM ; ! ṅṅaḷ‍
%<n%>%<pl%>%<acc%>:%>ṅṅaḷe CLIT-N-ACC ; ! ṅṅale
%<n%>%<pl%>%<gen%>:%>ṅṅaḷuṭe CLIT-N-GEN ; ! ṅṅaḷuṭe

and an example word

LEXICON NominalStems
mēghaṁ:mēgha N1 ; ! mēghaṁ ! cloud

Currently there are 10 noun paradigms ,N1 N2 ,...N10

General trends in paradigms

LEXICON N1 :- Words ending with anusuvara( ം )

LEXICON N2 :- words ending with the vowel a or i

LEXICON N3 :- words ending with virama or a vowel

LEXICON N4 :- words ending with the vowel a or i

LEXICON N5 :- words ends with virama (eg വീട് )

LEXICON N6 :- for the word പേര്‍ (pēr‍)

LEXICON N7 :-words ends with the vowel u

LEXICON N8 :-

LEXICON N9 :-

LEXICON N10 :-

Proper Nouns

  • LEXICON NP* is for proper nouns , nature of the proper nouns are almost similar to nouns
  • LEXICON NP*-COG is for second name
  • LEXICON NP-TOP-* represents place names

they are

  1. NP-TOP-KERALA :- Place names ending with anusuvara
  2. NP-TOP-INDIA :- Place names ending with the vowel a or i
  3. NP-TOP-CALICUT :- Place names ending with virama
  4. NP-TOP-KANNUR :- Place names ending with chillu
  5. NP-TOP-MALABAR : -place name ending with chillu R(ര്‍ )
  6. NP-TOP-JAPAN :- place names ending with chilllu ന്‍
  7. NP-TOP-BRAZIL :-place name ending with chillu ല്‍

ProNouns

  • PRON-PERS-* represents personal pronouns

they are

  1. PRON-PERS-NNAAN :-
  2. PRON-PERS-NII :-
  3. PRON-PERS-AVAN :-
  4. PRON-PERS-AVAL :-
  5. PRON-PERS-NNANNAL
  6. PRON-PERS-NAAM
  7. PRON-PERS-NINNAL
  8. PRON-PERS-AVAR
  9. PRON-PERS-ADDEHA
  • PRON-DEM is for demonstrative pronoun

they are

  1. PRON-DEM-AT
  2. PRON-DEM-IT
  • PRON-IND is for Indefinite pronoun

Numerals

NUM for numerals

Verbs

Form Description Tag Example Translation
Present stemk-unnu <pres> kuttikal kalikkunnu
children play
The children are playing.
Future stem-um <fut> naale mala peyyum
tomorrow rain will.fall
It will rain tomorrow.
Present progressive presk-unt aval nannaayi pathikkunt
she well studying.is
She is studying well.
Present progressive (II) inf ān siita avite irikkuka ān
Sita there sit is.
Sita is sitting there.
Iterative present stem-kontu-iri-kk-unnu avan paatikkontirikkunnu
He singing.is
He is singing.
Iterative fut stem-kontu-iri-kk-um avan paatikkontirikkum
He singing.will.be
He will be singing.
Iterative past stem-kontu-iri-unnu avan paatikkontirunnu
He singing.was
He was singing.
Continuous iterative stem-konte-iri-kunnu kuttikal paatikkonteeyirunnu
children sang.without.stopping
The children sang without stopping
Perfect innale mala peytirunnu
yesterday rain fell
It rained yesterday.
Contemporaneous perfect yuddham pottippurappettirikkunnu
war broken#out.has
War has broken out!
Remote perfect ñaan paattŭ pathiccittuntŭ
I music studied.had
I had studied music.
Habitual present juun maasattil mala peyyaaruntŭ
June month.in rain falls.usually
It usually rains in June.
Habitual past ñaan delhiyil pookaaruntaayirunnu
I Delhi.to go.used#to
I used to go to Delhi.
Imperative putiya vidyaarthikal hedmaasrrare kaaneentataanŭ
new students headmaster meet.should
New students should meet the headmaster.
Promissive past-ām ñaan naale varaam
I tomorrow come.will
I will come tomorrow.
Emphatic promissive past-ēk-ām ñaan naale vanneekkaam
I tomorrow come.will
I will come tomorrow.
Permissive past-ō (kolluu) vannoo
you.may.come
You may come.
Permissive (II) past-ootte avan avite irunnootte
He there sit.let
Let him sit there.
Permissive (III) avar avite taamasikkatte
He there sit.let
Let him sit there.
Permissive (Formal) paas ullavarkkŭ itilee pookaavunnatŭ aanŭ
pass having this.way go.may is
Those who have a pass may go this way.
Optative mala peyyatte
rain fall.let
Let it rain.
Precative stem-anē (= stem-uka-vēnam-ē) mala peyyanee
rain fall.may
May it rain.
%<quot%>
%<enum%>        ! Enumerative                 !

%<subst%>       ! Substantive                 !
%<attr%>        ! Attributive                 !

%<iv%>          ! Intransitive                ! 
%<tv%>          ! Transitive                  ! 

%<neg%>         ! Negative                    !

%<pres%>        ! Present tense               ! വര്‍ത്ത്മാന കാലം 
%<past%>        ! Past tense                  ! ഭൂത കാലം 
%<fut%>         ! Future tense                ! 
%<perf%>        ! Simple Perfect              !
%<rem_perf%>    ! Remote Perfect              !
%<contpres%>    ! Contemporaneous perfect     !
%<perm%>        ! Permissive mood             !
%<imp%>         ! Imperative mood             !
%<hab%>         ! Habitual aspect             ! 

%<prec%>         ! Precative mood             ! 
%<opt%>          ! Optative mood              !
%<irre%>         ! Irrealis mood              !
%<satis%>        ! satisfactive mood              !  
%<monit%>        ! monitory  mood              !    

%<frml%>        ! Formal                      !
%<infml%>       ! InFormal                    !

%<inf_k%>       ! Infinitive                  !
%<inf_n%>       ! Purposive infinitive        !
%<oblig%>       ! Obligative                  ! 
%<simul%>       ! Simultaneous                !

%<iter%>        ! Iterative                   !
%<cond%>        ! Conditional Mood                 !

%<gpr_pres%>    ! 
%<gpr_past%>    !

currently 25+ verb paradigms are added to lexc , unlike noun , it is difficult to predict paradigm by checking word pattern

Verb paradigm are of the form

LEXICON V-TV-ARIYUKA

%<v%>%<tv%>: V-COMMON-ARIYUKA ; ! ""

and

LEXICON V-IV-KALI

%<v%>%<iv%>: V-COMMON-KALI ; ! ""

here IV-Intransitive Verb

TV-Transitive Verb

continuation paradigm v-common-* is added to both Eg : v-common-atikkuka

LEXICON V-COMMON-ATIKKUKA
%<inf_k%>:%>kku k CLIT-CC ; ! "" ! FIXME
%<inf_n%>:%>kkan‍ CLIT-CC ; ! "̔" !
%<perf%>:%>chchiru nnu  CLIT-ITG ; ! "̔" !
%<rem_perf%>:%>chchiṭṭu ṇṭ # ; ! "' !""
%<pres%>:%>kku nnu  NEG-WHEN ; ! "̔̔"
%<past%>:%>chchu  NEG-WHEN ; ! ""
%<fut%>:%>kku ' NEG-WHEN ; ! ""
%<pass%>:%>kkppe PASS-CONT ;
%<iter%>:%>chchu koṇṭi ITER-TENS; ! ""
%<iter%>%<cont%>:%>chchu koṇṭeyi ITER-TENS; ! ""
%<gpr_pres%>:%>kku nn GPR-PRES ; ! ""
%<gpr_past%>:%>chch GPR-PAST ; ! ""
%<hab%>:%>kkar‍ CLIT-COP-UNTU ; ! "ār"
%<imp%>:%>kk CONT_IMP; !""
%<pcpl%>:%>chch # ; ! ""
%<contpres%>:%>chchirikku nnu  # ;
%<prec%>:%>kk PREC-CONT ; ! "" ! ""
%<opt%>:%>kkṭṭe # ;
%<irre%>%<past%>:%>chchene # ;
%<cond%>:%>chchal‍  NEG-WHEN ; ! "ccāl‍"
%<monit%>%<fut%>:%>kku me # ;
%<satis%>%<fut%>:%>kku mllo #;
%<satis%>%<past%>:%>chchllo #;
%<satis%>%<pres%>:%>kku nnllo #;
%<oblig%>:%>kkṇ' # ;

it contain continuation lexicons like CLIT-CC,CLIT-ITG, NEG-WHEN etc

  • passive verbs are added using the continuation lexicon PASS-CONT (%<pass%>:%>kkppe PASS-CONT ;)
  • passive verb lexicon is defined as
LEXICON PASS-CONT
%<inf_k%>:%>ṭu k CLIT-CC ; ! "" ! FIXME
%<inf_n%>:%>ṭan‍ CLIT-CC ; ! "̔" ! 
%<perf%>:%>ṭṭiru nnu  CLIT-ITG ; ! "̔" ! 
%<pres%>:%>ṭu nnu  NEG-WHEN ; ! "̔̔"
%<past%>:%>ṭṭu  NEG-WHEN ; ! ""
%<fut%>:%>ṭu ' NEG-WHEN ; ! ""
%<iter%>%<pres%>:%>ṭṭu koṇṭirikku nnu  NEG-WHEN ; ! ""
%<iter%>%<past%>:%>ṭṭu koṇṭiru nnu  NEG-WHEN ; ! ""
%<iter%>%<fut%>:%>ṭṭu koṇṭirikku ' NEG-WHEN ; ! ""
%<gpr_pres%>:%>ṭu nn GPR-PRES ; ! ""
%<gpr_past%>:%>ṭṭ GPR-PAST ; ! ""
%<imp%>:%>ṭ # ; !""
%<imp%>%<frml%>:%>ṭṇ' # ; !""
%<imp%>%<infml%>:%>ṭu  # ; !""
%<hab%>:%>ṭar‍ CLIT-COP-UNTU ; ! "ār"
%<pcpl%>:%>ṭṭ # ; ! ""
%<contpres%>:%>ṭṭikku nnu  # ; 
%<prec%>:%>ṭṇe # ; 
%<opt%>:%>ṭṭṭe # ; 
%<irre%>%<past%>:%>ṭṭene # ; 
%<cond%>:%>ṭṭal‍  NEG-WHEN ; ! "
%<monit%>%<fut%>:%>ṭu me # ;
%<satis%>%<fut%>:%>ṭu mllo #;
%<satis%>%<pres%>:%>ṭu nnllo #;
%<satis%>%<past%>:%>chchllo #;
%<oblig%>:%>ṭṇ' # ; 
%<itg%>:%>ṭu mo # ; 
  • Imperative mood is added using the continuation lexicon CONT_IMP ( %<imp%>:%>ക്ക CONT_IMP; !"" )
LEXICON CONT_IMP
 # ; !""
%<frml%>:%>ṇ' # ; !""
%<infml%>:%>u  # ; !""
  • Verbal adjectives are added using the continuation lexicon GPR-PRES
LEXICON GPR-PRES

%<subst%>:%>ത  N3-COMMON; 
# ;

Adjectives

4 paradigms are included in apertium

  1. LEXICON A1
  2. LEXICON A2
  3. LEXICON A3
  4. LEXICON A4

Adverbs

5 adverb paradigms are added

  1. ADV
  2. ADV1
  3. ADV2
  4. ADV3
  5. ADV4

Post Positions

Malayalam Sandhi Rules Implementation

Refer Malayalam_and_English/sandh_in_malayalam

BiLingual Dictionary

for mapping source language word to target language word. it acts like a dictionary Read Bilingual_dictionary

Eg


<e><p><l>അഭിസംബോധന<s n="n"/></l><r>address<s n="n"/></r></p></e>
<e><p><l>കമ്യൂണിസ്റ്റ്<s n="n"/></l><r>communist<s n="n"/></r></p></e>
<e><p><l>ഗോത്രം<s n="n"/></l><r>caste<s n="n"/></r></p></e>

Transfer Rules

Structural transfer module , A_long_introduction_to_transfer_rules