Difference between revisions of "Malayalam and English/documentation"

From Apertium
Jump to navigation Jump to search
Line 293: Line 293:
=BiLingual Dictionary =
=BiLingual Dictionary =
Read
Read
[http://wiki.apertium.org/wiki/Bilingual_dictionary]
[http://wiki.apertium.org/wiki/Bilingual_dictionary Bilingual_dictionary]

Eg
<pre>

<e><p><l>അഭിസംബോധന<s n="n"/></l><r>address<s n="n"/></r></p></e>
<e><p><l>കമ്യൂണിസ്റ്റ്<s n="n"/></l><r>communist<s n="n"/></r></p></e>
<e><p><l>ഗോത്രം<s n="n"/></l><r>caste<s n="n"/></r></p></e>
</pre>


=Transfer Rules=
=Transfer Rules=

Revision as of 15:10, 16 August 2014

Malayalam is both agglutinative and inflective language . it belong dravidian language category . In apertium we are trying to implement englaish malayalam pair using hfst .it is described here Starting_a_new_language_with_HFST

Morphotactic using lexc

let's take an example of a noun word , malayalam noun can have 8 inflections , nominative,dative, instrumental, locative ,accusative,vocative and sociative . it can be also classified on the basis on number ,singular and plural , let's declare essential symbols

Multichar_Symbols

%<n%>           ! Noun                        ! നാമം

%<nom%>         ! Nominative                  !

%<acc%>         ! Accusative                  !

%<dat%>         ! Dative                      !

%<soc%>         ! Sociative                   !

%<gen%>         ! Genitive                    !

%<ins%>         ! Instrumental                !

%<loc%>         ! Locative                    !

%<voc%>         ! Vocative                    !

%<sg%>          ! Singular                    !

%<pl%>          ! Plural                      !

Now we have all essential symbols for a noun , let's add an example paradigm

LEXICON Root

Miscellaneous ;
Conjunctions ; 
Postpositions ;
Pronouns ;
Determiners ;
Numerals ;
NominalStems ;

Nouns

LEXICON N1 

%<n%>%<sg%>%<nom%>:ṁ CLIT-N-NOM ; ! ṁ
%<n%>%<sg%>%<loc%>:%>ttil‍ CLIT-N-LOC ; ! ttil
%<n%>%<sg%>%<acc%>:%>tte CLIT-N-ACC ; ! tte
%<n%>%<sg%>%<gen%>:%>ttinṟe CLIT-N-GEN ; ! ttinṟe
%<n%>%<sg%>%<dat%>:%>ttin CLIT-N ; ! ttin
%<n%>%<sg%>%<dat%>:%>ttinu CLIT-N ; ! ttinu ! debug
!plural
%<n%>%<pl%>%<nom%>:%>ṅṅaḷ‍ CLIT-N-NOM ; ! ṅṅaḷ‍
%<n%>%<pl%>%<acc%>:%>ṅṅaḷe CLIT-N-ACC ; ! ṅṅale
%<n%>%<pl%>%<gen%>:%>ṅṅaḷuṭe CLIT-N-GEN ; ! ṅṅaḷuṭe

and an example word

LEXICON NominalStems
mēghaṁ:mēgha N1 ; ! mēghaṁ ! cloud

Currently there are 10 noun paradigms ,N1 N2 ,...N10

General trends in paradigms

LEXICON N1 :- Words ending with anusuvara( ം )

LEXICON N2 :- words ending with the vowel a or i

LEXICON N3 :- words ending with virama or a vowel

LEXICON N4 :- words ending with the vowel a or i

LEXICON N5 :- words ends with virama (eg വീട് )

LEXICON N6 :- for the word പേര്‍ (pēr‍)

LEXICON N7 :-words ends with the vowel u

LEXICON N8 :-

LEXICON N9 :-

LEXICON N10 :-

Proper Nouns

  • LEXICON NP* is for proper nouns , nature of the proper nouns are almost similar to nouns
  • LEXICON NP*-COG is for second name
  • LEXICON NP-TOP-* represents place names

they are

  1. NP-TOP-KERALA :- Place names ending with anusuvara
  2. NP-TOP-INDIA :- Place names ending with the vowel a or i
  3. NP-TOP-CALICUT :- Place names ending with virama
  4. NP-TOP-KANNUR :- Place names ending with chillu
  5. NP-TOP-MALABAR : -place name ending with chillu R(ര്‍ )
  6. NP-TOP-JAPAN :- place names ending with chilllu ന്‍
  7. NP-TOP-BRAZIL :-place name ending with chillu ല്‍

ProNouns

  • PRON-PERS-* represents personal pronouns

they are

  1. PRON-PERS-NNAAN :-
  2. PRON-PERS-NII :-
  3. PRON-PERS-AVAN :-
  4. PRON-PERS-AVAL :-
  5. PRON-PERS-NNANNAL
  6. PRON-PERS-NAAM
  7. PRON-PERS-NINNAL
  8. PRON-PERS-AVAR
  9. PRON-PERS-ADDEHA
  • PRON-DEM is for demonstrative pronoun

they are

  1. PRON-DEM-AT
  2. PRON-DEM-IT
  • PRON-IND is for Indefinite pronoun

Numerals

NUM for numerals

Verbs

%<quot%>
%<enum%>        ! Enumerative                 !

%<subst%>       ! Substantive                 !
%<attr%>        ! Attributive                 !

%<iv%>          ! Intransitive                ! 
%<tv%>          ! Transitive                  ! 

%<neg%>         ! Negative                    !

%<pres%>        ! Present tense               ! വര്‍ത്ത്മാന കാലം 
%<past%>        ! Past tense                  ! ഭൂത കാലം 
%<fut%>         ! Future tense                ! 
%<perf%>        ! Simple Perfect              !
%<rem_perf%>    ! Remote Perfect              !
%<contpres%>    ! Contemporaneous perfect     !
%<perm%>        ! Permissive mood             !
%<imp%>         ! Imperative mood             !
%<hab%>         ! Habitual aspect             ! 

%<prec%>         ! Precative mood             ! 
%<opt%>          ! Optative mood              !
%<irre%>         ! Irrealis mood              !
%<satis%>        ! satisfactive mood              !  
%<monit%>        ! monitory  mood              !    

%<frml%>        ! Formal                      !
%<infml%>       ! InFormal                    !

%<inf_k%>       ! Infinitive                  !
%<inf_n%>       ! Purposive infinitive        !
%<oblig%>       ! Obligative                  ! 
%<simul%>       ! Simultaneous                !

%<iter%>        ! Iterative                   !
%<cond%>        ! Conditional Mood                 !

%<gpr_pres%>    ! 
%<gpr_past%>    !

currently 25+ verb paradigms are added to lexc , unlike noun , it is difficult to predict paradigm by checking word pattern

Verb paradigm are of the form

LEXICON V-TV-ARIYUKA

%<v%>%<tv%>: V-COMMON-ARIYUKA ; ! ""

and

LEXICON V-IV-KALI

%<v%>%<iv%>: V-COMMON-KALI ; ! ""

here IV-Intransitive Verb

TV-Transitive Verb

continuation paradigm v-common-* is added to both Eg : v-common-atikkuka

LEXICON V-COMMON-ATIKKUKA
%<inf_k%>:%>kku k CLIT-CC ; ! "" ! FIXME
%<inf_n%>:%>kkan‍ CLIT-CC ; ! "̔" !
%<perf%>:%>chchiru nnu  CLIT-ITG ; ! "̔" !
%<rem_perf%>:%>chchiṭṭu ṇṭ # ; ! "' !""
%<pres%>:%>kku nnu  NEG-WHEN ; ! "̔̔"
%<past%>:%>chchu  NEG-WHEN ; ! ""
%<fut%>:%>kku ' NEG-WHEN ; ! ""
%<pass%>:%>kkppe PASS-CONT ;
%<iter%>:%>chchu koṇṭi ITER-TENS; ! ""
%<iter%>%<cont%>:%>chchu koṇṭeyi ITER-TENS; ! ""
%<gpr_pres%>:%>kku nn GPR-PRES ; ! ""
%<gpr_past%>:%>chch GPR-PAST ; ! ""
%<hab%>:%>kkar‍ CLIT-COP-UNTU ; ! "ār"
%<imp%>:%>kk CONT_IMP; !""
%<pcpl%>:%>chch # ; ! ""
%<contpres%>:%>chchirikku nnu  # ;
%<prec%>:%>kk PREC-CONT ; ! "" ! ""
%<opt%>:%>kkṭṭe # ;
%<irre%>%<past%>:%>chchene # ;
%<cond%>:%>chchal‍  NEG-WHEN ; ! "ccāl‍"
%<monit%>%<fut%>:%>kku me # ;
%<satis%>%<fut%>:%>kku mllo #;
%<satis%>%<past%>:%>chchllo #;
%<satis%>%<pres%>:%>kku nnllo #;
%<oblig%>:%>kkṇ' # ;

it contain continuation lexicons like CLIT-CC,CLIT-ITG, NEG-WHEN etc

  • passive verbs are added using the continuation lexicon PASS-CONT (%<pass%>:%>kkppe PASS-CONT ;)
  • passive verb lexicon is defined as
LEXICON PASS-CONT
%<inf_k%>:%>ṭu k CLIT-CC ; ! "" ! FIXME
%<inf_n%>:%>ṭan‍ CLIT-CC ; ! "̔" ! 
%<perf%>:%>ṭṭiru nnu  CLIT-ITG ; ! "̔" ! 
%<pres%>:%>ṭu nnu  NEG-WHEN ; ! "̔̔"
%<past%>:%>ṭṭu  NEG-WHEN ; ! ""
%<fut%>:%>ṭu ' NEG-WHEN ; ! ""
%<iter%>%<pres%>:%>ṭṭu koṇṭirikku nnu  NEG-WHEN ; ! ""
%<iter%>%<past%>:%>ṭṭu koṇṭiru nnu  NEG-WHEN ; ! ""
%<iter%>%<fut%>:%>ṭṭu koṇṭirikku ' NEG-WHEN ; ! ""
%<gpr_pres%>:%>ṭu nn GPR-PRES ; ! ""
%<gpr_past%>:%>ṭṭ GPR-PAST ; ! ""
%<imp%>:%>ṭ # ; !""
%<imp%>%<frml%>:%>ṭṇ' # ; !""
%<imp%>%<infml%>:%>ṭu  # ; !""
%<hab%>:%>ṭar‍ CLIT-COP-UNTU ; ! "ār"
%<pcpl%>:%>ṭṭ # ; ! ""
%<contpres%>:%>ṭṭikku nnu  # ; 
%<prec%>:%>ṭṇe # ; 
%<opt%>:%>ṭṭṭe # ; 
%<irre%>%<past%>:%>ṭṭene # ; 
%<cond%>:%>ṭṭal‍  NEG-WHEN ; ! "
%<monit%>%<fut%>:%>ṭu me # ;
%<satis%>%<fut%>:%>ṭu mllo #;
%<satis%>%<pres%>:%>ṭu nnllo #;
%<satis%>%<past%>:%>chchllo #;
%<oblig%>:%>ṭṇ' # ; 
%<itg%>:%>ṭu mo # ; 
  • Imperative mood is added using the continuation lexicon CONT_IMP ( %<imp%>:%>ക്ക CONT_IMP; !"" )
LEXICON CONT_IMP
 # ; !""
%<frml%>:%>ṇ' # ; !""
%<infml%>:%>u  # ; !""
  • Verbal adjectives are added using the continuation lexicon GPR-PRES
LEXICON GPR-PRES

%<subst%>:%>ത  N3-COMMON; 
# ;

Adjectives

4 paradigms are included in apertium

  1. LEXICON A1
  2. LEXICON A2
  3. LEXICON A3
  4. LEXICON A4

Adverbs

5 adverb paradigms are added

  1. ADV
  2. ADV1
  3. ADV2
  4. ADV3
  5. ADV4

Post Positions

Malayalam Sandhi Rules Implementation

BiLingual Dictionary

Read Bilingual_dictionary

Eg


<e><p><l>അഭിസംബോധന<s n="n"/></l><r>address<s n="n"/></r></p></e>
<e><p><l>കമ്യൂണിസ്റ്റ്<s n="n"/></l><r>communist<s n="n"/></r></p></e>
<e><p><l>ഗോത്രം<s n="n"/></l><r>caste<s n="n"/></r></p></e>

Transfer Rules