Thaana romanisation

From Apertium
Revision as of 10:13, 8 March 2009 by Francis Tyers (talk | contribs)
Jump to navigation Jump to search

Currently we are using romanised form of Thaana letters instead of using actual Unicode Thaana letters. This makes things a lot easier for us. The translated romanised output from English to Dhivehi can be converted to Unicode by a simple mapping. This mapping is as follows:

h	<char-0x0780> "letter haa
S	<char-0x0781> "shaviani
n	<char-0x0782> "noonu
r	<char-0x0783> "raa
b	<char-0x0784> "baa
L	<char-0x0785> "lhaviani
k	<char-0x0786> "kaafu
w	<char-0x0787> "alifu  
v	<char-0x0788> "vaavu
m	<char-0x0789> "meemu
f	<char-0x078A> "faafu
d	<char-0x078B> "dhaalu
t	<char-0x078C> "thaa
l	<char-0x078D> "laamu
g	<char-0x078E> "gaafu
N	<char-0x078F> "gnaviani
s	<char-0x0790> "seenu
D	<char-0x0791> "daviani
z	<char-0x0792> "zaviani
T	<char-0x0793> "taviani
y	<char-0x0794> "yaa
p	<char-0x0795> "paviani
j	<char-0x0796> "javiani
c	<char-0x0797> "chaviani

"THAANA DOTTED LETTERS (used in arabic words)
X	<char-0x0798> "TTAA   (thaa mathee thin thiki)
H	<char-0x0799> "HHAA   (haa thiree ehthiki)
K	<char-0x079A> "KHAA   (haa mathee ehthiki)
J	<char-0x079B> "THAALU (dhaa mathee ehthiki)
R	<char-0x079C> "ZAA    (raa mathee ehthiki)
C	<char-0x079D> "SHEENU (seenu mathee thinthiki)
M	<char-0x079E> "SAADHU (seenu thiree ehthiki)
B	<char-0x079F> "DHAADHU(seenu mathee ehthiki)
Y	<char-0x07A0> "TO     (thaa thiree ehthiki)
Z	<char-0x07A1> "ZO     (thaa mathee ehthiki)
W 	<char-0x07A2> "AINU   (alifu thiree ehthiki)
G	<char-0x07A3> "GHAINU (alifu mathee ehthiki)
Q	<char-0x07A4> "QAAFU  (gaafu mathee dhethkiki)
V	<char-0x07A5> "VAAVU  (vaavu mathee ehthiki)

"THAANA FILI (combining characters)
a	<char-0x07A6> "abafili
A	<char-0x07A7> "aabaafili
i	<char-0x07A8> "ibifili
I	<char-0x07A9> "eebeefili
u	<char-0x07AA> "ubufili
U	<char-0x07AB> "ooboofili
e	<char-0x07AC> "ebefili
E	<char-0x07AD> "ebeyfili
o	<char-0x07AE> "obofili
O	<char-0x07AF> "oaboafili
q	<char-0x07B0> "sukun

Thaana is written in right to left. however, for romanisation, we use from left to right. so ->

"I am a fisherman"

outputs:

"waharenqnakI masqveriwewq" (read from left to right)

which is

"އަހަރެންނަކީ މަސްވެރިއްއް" (read from _right_ to left)