Difference between revisions of "Malayalam and English/documentation"
Line 126: | Line 126: | ||
=== Numbers=== |
=== Numbers=== |
||
NUM for numerals |
NUM for numerals |
||
=== Verbs == |
=== Verbs === |
||
currently 25+ verb paradigms are added to lexc , unlike noun , it is difficult to predict paradigm by checking word pattern |
|||
Verb paradigm are of the form |
|||
<pre> |
|||
LEXICON V-TV-ARIYUKA |
|||
%<v%>%<tv%>: V-COMMON-ARIYUKA ; ! "" |
|||
</pre> |
|||
and |
|||
<pre> |
|||
LEXICON V-IV-KALI |
|||
%<v%>%<iv%>: V-COMMON-KALI ; ! "" |
|||
</pre> |
|||
here |
|||
IV-Intransitive Verb |
|||
TV-Transitive Verb |
|||
continuation paradigm v-common-* is added to both |
Revision as of 20:34, 15 August 2014
Malayalam is both agglutinative and inflective language . it belong dravidian language category . In apertium we are trying to implement englaish malayalam pair using hfst .it is described here Starting_a_new_language_with_HFST
Contents
Morphotactic using lexc
let's take an example of a noun word , malayalam noun can have 8 inflections , nominative,dative, instrumental, locative ,accusative,vocative and sociative . it can be also classified on the basis on number ,singular and plural , let's declare essential symbols
Multichar_Symbols %<n%> ! Noun ! നാമം %<nom%> ! Nominative ! %<acc%> ! Accusative ! %<dat%> ! Dative ! %<soc%> ! Sociative ! %<gen%> ! Genitive ! %<ins%> ! Instrumental ! %<loc%> ! Locative ! %<voc%> ! Vocative ! %<sg%> ! Singular ! %<pl%> ! Plural !
Now we have all essential symbols for a noun , let's add an example paradigm
LEXICON Root Miscellaneous ; Conjunctions ; Postpositions ; Pronouns ; Determiners ; Numerals ; NominalStems ;
nouns
LEXICON N1 %<n%>%<sg%>%<nom%>:ṁ CLIT-N-NOM ; ! ṁ %<n%>%<sg%>%<loc%>:%>ttil CLIT-N-LOC ; ! ttil %<n%>%<sg%>%<acc%>:%>tte CLIT-N-ACC ; ! tte %<n%>%<sg%>%<gen%>:%>ttinṟe CLIT-N-GEN ; ! ttinṟe %<n%>%<sg%>%<dat%>:%>ttin CLIT-N ; ! ttin %<n%>%<sg%>%<dat%>:%>ttinu CLIT-N ; ! ttinu ! debug !plural %<n%>%<pl%>%<nom%>:%>ṅṅaḷ CLIT-N-NOM ; ! ṅṅaḷ %<n%>%<pl%>%<acc%>:%>ṅṅaḷe CLIT-N-ACC ; ! ṅṅale %<n%>%<pl%>%<gen%>:%>ṅṅaḷuṭe CLIT-N-GEN ; ! ṅṅaḷuṭe
and an example word
LEXICON NominalStems mēghaṁ:mēgha N1 ; ! mēghaṁ ! cloud
Currently there are 10 noun paradigms ,N1 N2 ,...N10
General trends in paradigms
LEXICON N1 :- Words ending with anusuvara( ം )
LEXICON N2 :- words ending with the vowel a or i
LEXICON N3 :- words ending with virama or a vowel
LEXICON N4 :- words ending with the vowel a or i
LEXICON N5 :- words ends with virama (eg വീട് )
LEXICON N6 :- for the word പേര് (pēr)
LEXICON N7 :-words ends with the vowel u
LEXICON N8 :-
LEXICON N9 :-
LEXICON N10 :-
Proper Nouns
- LEXICON NP* is for proper nouns , nature of the proper nouns are almost similar to nouns
- LEXICON NP*-COG is for second name
- LEXICON NP-TOP-* represents place names
they are
- NP-TOP-KERALA :- Place names ending with anusuvara
- NP-TOP-INDIA :- Place names ending with the vowel a or i
- NP-TOP-CALICUT :- Place names ending with virama
- NP-TOP-KANNUR :- Place names ending with chillu
- NP-TOP-MALABAR : -place name ending with chillu R(ര് )
- NP-TOP-JAPAN :- place names ending with chilllu ന്
- NP-TOP-BRAZIL :-place name ending with chillu ല്
Verbs
ProNouns
- PRON-PERS-* represents personal pronouns
they are
- PRON-PERS-NNAAN :-
- PRON-PERS-NII :-
- PRON-PERS-AVAN :-
- PRON-PERS-AVAL :-
- PRON-PERS-NNANNAL
- PRON-PERS-NAAM
- PRON-PERS-NINNAL
- PRON-PERS-AVAR
- PRON-PERS-ADDEHA
- PRON-DEM is for demonstrative pronoun
they are
- PRON-DEM-AT
- PRON-DEM-IT
- PRON-IND is for Indefinite pronoun
Numbers
NUM for numerals
Verbs
currently 25+ verb paradigms are added to lexc , unlike noun , it is difficult to predict paradigm by checking word pattern
Verb paradigm are of the form
LEXICON V-TV-ARIYUKA %<v%>%<tv%>: V-COMMON-ARIYUKA ; ! ""
and
LEXICON V-IV-KALI %<v%>%<iv%>: V-COMMON-KALI ; ! ""
here IV-Intransitive Verb
TV-Transitive Verb
continuation paradigm v-common-* is added to both