Difference between revisions of "Twol rules in lttoolbox"
Jump to navigation
Jump to search
Techievena (talk | contribs) m (→Alphabets: Explain archiphoneme) |
Techievena (talk | contribs) (Add headings) |
||
Line 53: | Line 53: | ||
|} |
|} |
||
== |
==Definitions== |
||
<pre> |
|||
<sets> |
|||
<set n="Vowels">aeiou</set> |
|||
<set n="BackVow">bcdfg</set> |
|||
</sets> |
|||
</pre> |
|||
{|class=wikitable |
|||
! Tag/Symbol !! Meaning |
|||
|- |
|||
| '''set''' || set/group of alphabets |
|||
|- |
|||
| '''n''' || set name |
|||
|- |
|||
|} |
|||
==Diacritics== |
|||
<pre> |
|||
<sets> |
|||
<set n="Vowels">aeiou</set> |
|||
<set n="BackVow">bcdfg</set> |
|||
</sets> |
|||
</pre> |
|||
{|class=wikitable |
|||
! Tag/Symbol !! Meaning |
|||
|- |
|||
| '''set''' || set/group of alphabets |
|||
|- |
|||
| '''n''' || set name |
|||
|- |
|||
|} |
|||
==Rule Definitions== |
|||
<pre> |
|||
<sets> |
|||
<set n="Vowels">aeiou</set> |
|||
<set n="BackVow">bcdfg</set> |
|||
</sets> |
|||
</pre> |
|||
{|class=wikitable |
|||
! Tag/Symbol !! Meaning |
|||
|- |
|||
| '''set''' || set/group of alphabets |
|||
|- |
|||
| '''n''' || set name |
|||
|- |
|||
|} |
|||
==Rules== |
|||
<pre> |
<pre> |
||
Line 104: | Line 158: | ||
|- |
|- |
||
|} |
|} |
||
==Regular Expression Synatx== |
|||
Regular expressions in the twolc syntax are handled with the help |
Revision as of 05:49, 3 June 2018
Current Status: In Progress
Project: Extend lttoolbox to have the power of HFST
Guidelines
- Every rule in the dictionary file must be properly compatible with the the HFST twolc engine and must not result in any kind of ambiguities.
- The xml tags must be well defined for archiphonemes and rules and must be distinct from the other existing tags in lttoolbox.
- Every rule entry should have comments adequate enough to give a brief understanding of morphophonological transformations performed by the twol compiler.
Design
- The design is still in the development stage and may need significant modifications after it is implemented on the existing language pairs.
- The design must be robust enough to support all type of rules namely:
- Phonologically conditioned deletion
- Morphologically conditioned deletion
- Phonologically conditioned symbol change
- Morphologically conditioned symbol change
- Phonologically conditioned insertion
- Morphologically conditioned insertion
Alphabets
<alphabet>аӑеёӗиоуӳыэюябвгджзклмнпрсҫтфхцчшщйьъАӐЕЁӖИОУӲЫЭЮЯБВГДЖЗКЛМНПРСҪТФХЦЧШЩЙЬЪ<ar n="A">ae</ar></alphabet>
The alphabets within the ar tags denote all the possible surface form transformations possible for the archiphoneme.
Tag/Symbol | Meaning |
---|---|
ar | archiphoneme |
n | archiphoneme name |
Sets
<sets> <set n="Vowels">aeiou</set> <set n="BackVow">bcdfg</set> </sets>
Tag/Symbol | Meaning |
---|---|
set | set/group of alphabets |
n | set name |
Definitions
<sets> <set n="Vowels">aeiou</set> <set n="BackVow">bcdfg</set> </sets>
Tag/Symbol | Meaning |
---|---|
set | set/group of alphabets |
n | set name |
Diacritics
<sets> <set n="Vowels">aeiou</set> <set n="BackVow">bcdfg</set> </sets>
Tag/Symbol | Meaning |
---|---|
set | set/group of alphabets |
n | set name |
Rule Definitions
<sets> <set n="Vowels">aeiou</set> <set n="BackVow">bcdfg</set> </sets>
Tag/Symbol | Meaning |
---|---|
set | set/group of alphabets |
n | set name |
Rules
<rules> <rule c="Back vowel harmony for archiphoneme A"> <m><ar n="A"></m><s>a</s> <context dir="e"><l_c><set n="BackVow"></l_c><r_c></r_c></context> </rule> <rule c="Only hyphen in vowel boundaries and caps"> <m><ar n="hyph?"></m><s>-</s> <context dir="f"><l_c><set n="Vowels"></l_c><r_c></r_c></context> </rule> <rule c="Back vowel harmony for archiphoneme A"> <m><ar n="A"></m><s>a</s> <context dir="b"><l_c><set n="BackVow"></l_c><r_c></r_c></context> </rule> <rule c="Back vowel harmony for archiphoneme A"> <m><ar n="A"></m><s>a</s> <context dir="ne"><l_c><set n="BackVow"></l_c><r_c></r_c></context> </rule> </rules>
Tag/Symbol | Meaning |
---|---|
rule | twol rule |
c | comment |
m | morphotactic side |
s | surface side |
context | context for transformation |
dir | direction constraint |
f | a:b => _ ; If the symbol pair a:b appears it must be in context _ |
b | a:b <= _ ; If lexical a appears in the context _ then it must correspond to surface b |
e | a:b <=> _ ; Lexical a always corresponds to b in context _ |
ne | a:b /<= _ ; Lexical a never corresponds to b in context _ |
r_c | right context |
l_c | left context |
Regular Expression Synatx
Regular expressions in the twolc syntax are handled with the help