Malayalam and English/documentation

From Apertium
Jump to navigation Jump to search

Malayalam is both agglutinative and inflective language . it belong dravidian language category . In apertium we are trying to implement englaish malayalam pair using hfst .it is described here Starting_a_new_language_with_HFST

Morphotactic using lexc

let's take an example of a noun word , malayalam noun can have 8 inflections , nominative,dative, instrumental, locative ,accusative,vocative and sociative . it can be also classified on the basis on number ,singular and plural , let's declare essential symbols

Multichar_Symbols

%<n%>           ! Noun                        ! നാമം

%<nom%>         ! Nominative                  !

%<acc%>         ! Accusative                  !

%<dat%>         ! Dative                      !

%<soc%>         ! Sociative                   !

%<gen%>         ! Genitive                    !

%<ins%>         ! Instrumental                !

%<loc%>         ! Locative                    !

%<voc%>         ! Vocative                    !

%<sg%>          ! Singular                    !

%<pl%>          ! Plural                      !

Now we have all essential symbols for a noun , let's add an example paradigm

LEXICON Root

Miscellaneous ;
Conjunctions ; 
Postpositions ;
Pronouns ;
Determiners ;
Numerals ;
NominalStems ;

Nouns

LEXICON N1 

%<n%>%<sg%>%<nom%>:ṁ CLIT-N-NOM ; ! ṁ
%<n%>%<sg%>%<loc%>:%>ttil‍ CLIT-N-LOC ; ! ttil
%<n%>%<sg%>%<acc%>:%>tte CLIT-N-ACC ; ! tte
%<n%>%<sg%>%<gen%>:%>ttinṟe CLIT-N-GEN ; ! ttinṟe
%<n%>%<sg%>%<dat%>:%>ttin CLIT-N ; ! ttin
%<n%>%<sg%>%<dat%>:%>ttinu CLIT-N ; ! ttinu ! debug
!plural
%<n%>%<pl%>%<nom%>:%>ṅṅaḷ‍ CLIT-N-NOM ; ! ṅṅaḷ‍
%<n%>%<pl%>%<acc%>:%>ṅṅaḷe CLIT-N-ACC ; ! ṅṅale
%<n%>%<pl%>%<gen%>:%>ṅṅaḷuṭe CLIT-N-GEN ; ! ṅṅaḷuṭe

and an example word

LEXICON NominalStems
mēghaṁ:mēgha N1 ; ! mēghaṁ ! cloud

Currently there are 10 noun paradigms ,N1 N2 ,...N10

General trends in paradigms

LEXICON N1 :- Words ending with anusuvara( ം )

LEXICON N2 :- words ending with the vowel a or i

LEXICON N3 :- words ending with virama or a vowel

LEXICON N4 :- words ending with the vowel a or i

LEXICON N5 :- words ends with virama (eg വീട് )

LEXICON N6 :- for the word പേര്‍ (pēr‍)

LEXICON N7 :-words ends with the vowel u

LEXICON N8 :-

LEXICON N9 :-

LEXICON N10 :-

Proper Nouns

  • LEXICON NP* is for proper nouns , nature of the proper nouns are almost similar to nouns
  • LEXICON NP*-COG is for second name
  • LEXICON NP-TOP-* represents place names

they are

  1. NP-TOP-KERALA :- Place names ending with anusuvara
  2. NP-TOP-INDIA :- Place names ending with the vowel a or i
  3. NP-TOP-CALICUT :- Place names ending with virama
  4. NP-TOP-KANNUR :- Place names ending with chillu
  5. NP-TOP-MALABAR : -place name ending with chillu R(ര്‍ )
  6. NP-TOP-JAPAN :- place names ending with chilllu ന്‍
  7. NP-TOP-BRAZIL :-place name ending with chillu ല്‍

ProNouns

  • PRON-PERS-* represents personal pronouns

they are

  1. PRON-PERS-NNAAN :-
  2. PRON-PERS-NII :-
  3. PRON-PERS-AVAN :-
  4. PRON-PERS-AVAL :-
  5. PRON-PERS-NNANNAL
  6. PRON-PERS-NAAM
  7. PRON-PERS-NINNAL
  8. PRON-PERS-AVAR
  9. PRON-PERS-ADDEHA
  • PRON-DEM is for demonstrative pronoun

they are

  1. PRON-DEM-AT
  2. PRON-DEM-IT
  • PRON-IND is for Indefinite pronoun

Numerals

NUM for numerals

Verbs

%<quot%>
%<enum%>        ! Enumerative                 !

%<subst%>       ! Substantive                 !
%<attr%>        ! Attributive                 !

%<iv%>          ! Intransitive                ! 
%<tv%>          ! Transitive                  ! 

%<neg%>         ! Negative                    !

%<pres%>        ! Present tense               ! വര്‍ത്ത്മാന കാലം 
%<past%>        ! Past tense                  ! ഭൂത കാലം 
%<fut%>         ! Future tense                ! 
%<perf%>        ! Simple Perfect              !
%<rem_perf%>    ! Remote Perfect              !
%<contpres%>    ! Contemporaneous perfect     !
%<perm%>        ! Permissive mood             !
%<imp%>         ! Imperative mood             !
%<hab%>         ! Habitual aspect             ! 

%<prec%>         ! Precative mood             ! 
%<opt%>          ! Optative mood              !
%<irre%>         ! Irrealis mood              !
%<satis%>        ! satisfactive mood              !  
%<monit%>        ! monitory  mood              !    

%<frml%>        ! Formal                      !
%<infml%>       ! InFormal                    !

%<inf_k%>       ! Infinitive                  !
%<inf_n%>       ! Purposive infinitive        !
%<oblig%>       ! Obligative                  ! 
%<simul%>       ! Simultaneous                !

%<iter%>        ! Iterative                   !
%<cond%>        ! Conditional Mood                 !

%<gpr_pres%>    ! 
%<gpr_past%>    !

currently 25+ verb paradigms are added to lexc , unlike noun , it is difficult to predict paradigm by checking word pattern

Verb paradigm are of the form

LEXICON V-TV-ARIYUKA

%<v%>%<tv%>: V-COMMON-ARIYUKA ; ! ""

and

LEXICON V-IV-KALI

%<v%>%<iv%>: V-COMMON-KALI ; ! ""

here IV-Intransitive Verb

TV-Transitive Verb

continuation paradigm v-common-* is added to both Eg : v-common-atikkuka

LEXICON V-COMMON-ATIKKUKA
%<inf_k%>:%>kku k CLIT-CC ; ! "" ! FIXME
%<inf_n%>:%>kkan‍ CLIT-CC ; ! "̔" !
%<perf%>:%>chchiru nnu  CLIT-ITG ; ! "̔" !
%<rem_perf%>:%>chchiṭṭu ṇṭ # ; ! "' !""
%<pres%>:%>kku nnu  NEG-WHEN ; ! "̔̔"
%<past%>:%>chchu  NEG-WHEN ; ! ""
%<fut%>:%>kku ' NEG-WHEN ; ! ""
%<pass%>:%>kkppe PASS-CONT ;
%<iter%>:%>chchu koṇṭi ITER-TENS; ! ""
%<iter%>%<cont%>:%>chchu koṇṭeyi ITER-TENS; ! ""
%<gpr_pres%>:%>kku nn GPR-PRES ; ! ""
%<gpr_past%>:%>chch GPR-PAST ; ! ""
%<hab%>:%>kkar‍ CLIT-COP-UNTU ; ! "ār"
%<imp%>:%>kk CONT_IMP; !""
%<pcpl%>:%>chch # ; ! ""
%<contpres%>:%>chchirikku nnu  # ;
%<prec%>:%>kk PREC-CONT ; ! "" ! ""
%<opt%>:%>kkṭṭe # ;
%<irre%>%<past%>:%>chchene # ;
%<cond%>:%>chchal‍  NEG-WHEN ; ! "ccāl‍"
%<monit%>%<fut%>:%>kku me # ;
%<satis%>%<fut%>:%>kku mllo #;
%<satis%>%<past%>:%>chchllo #;
%<satis%>%<pres%>:%>kku nnllo #;
%<oblig%>:%>kkṇ' # ;

it contain continuation lexicons like CLIT-CC,CLIT-ITG, NEG-WHEN etc

  • passive verbs are added using the continuation lexicon PASS-CONT (%<pass%>:%>kkppe PASS-CONT ;)
  • passive verb lexicon is defined as
LEXICON PASS-CONT
%<inf_k%>:%>ṭu k CLIT-CC ; ! "" ! FIXME
%<inf_n%>:%>ṭan‍ CLIT-CC ; ! "̔" ! 
%<perf%>:%>ṭṭiru nnu  CLIT-ITG ; ! "̔" ! 
%<pres%>:%>ṭu nnu  NEG-WHEN ; ! "̔̔"
%<past%>:%>ṭṭu  NEG-WHEN ; ! ""
%<fut%>:%>ṭu ' NEG-WHEN ; ! ""
%<iter%>%<pres%>:%>ṭṭu koṇṭirikku nnu  NEG-WHEN ; ! ""
%<iter%>%<past%>:%>ṭṭu koṇṭiru nnu  NEG-WHEN ; ! ""
%<iter%>%<fut%>:%>ṭṭu koṇṭirikku ' NEG-WHEN ; ! ""
%<gpr_pres%>:%>ṭu nn GPR-PRES ; ! ""
%<gpr_past%>:%>ṭṭ GPR-PAST ; ! ""
%<imp%>:%>ṭ # ; !""
%<imp%>%<frml%>:%>ṭṇ' # ; !""
%<imp%>%<infml%>:%>ṭu  # ; !""
%<hab%>:%>ṭar‍ CLIT-COP-UNTU ; ! "ār"
%<pcpl%>:%>ṭṭ # ; ! ""
%<contpres%>:%>ṭṭikku nnu  # ; 
%<prec%>:%>ṭṇe # ; 
%<opt%>:%>ṭṭṭe # ; 
%<irre%>%<past%>:%>ṭṭene # ; 
%<cond%>:%>ṭṭal‍  NEG-WHEN ; ! "
%<monit%>%<fut%>:%>ṭu me # ;
%<satis%>%<fut%>:%>ṭu mllo #;
%<satis%>%<pres%>:%>ṭu nnllo #;
%<satis%>%<past%>:%>chchllo #;
%<oblig%>:%>ṭṇ' # ; 
%<itg%>:%>ṭu mo # ; 
  • Imperative mood is added using the continuation lexicon CONT_IMP ( %<imp%>:%>ക്ക CONT_IMP; !"" )
LEXICON CONT_IMP
 # ; !""
%<frml%>:%>ṇ' # ; !""
%<infml%>:%>u  # ; !""
  • Verbal adjectives are added using the continuation lexicon GPR-PRES
LEXICON GPR-PRES

%<subst%>:%>ത  N3-COMMON; 
# ;

Adjectives

4 paradigms are included in apertium

  1. LEXICON A1
  2. LEXICON A2
  3. LEXICON A3
  4. LEXICON A4

Adverbs

5 adverb paradigms are added

  1. ADV
  2. ADV1
  3. ADV2
  4. ADV3
  5. ADV4

Post Positions

Malayalam Sandhi Rules Implementation

BiLingual Dictionary

for mapping source language word to target language word. it acts like a dictionary Read Bilingual_dictionary

Eg


<e><p><l>അഭിസംബോധന<s n="n"/></l><r>address<s n="n"/></r></p></e>
<e><p><l>കമ്യൂണിസ്റ്റ്<s n="n"/></l><r>communist<s n="n"/></r></p></e>
<e><p><l>ഗോത്രം<s n="n"/></l><r>caste<s n="n"/></r></p></e>

Transfer Rules