Difference between revisions of "Twol rules in lttoolbox"

From Apertium
Jump to navigation Jump to search
m (→‎Alphabets: Explain archiphoneme)
(Add headings)
Line 53: Line 53:
 
|}
 
|}
   
==Twol Rules==
+
==Definitions==
  +
  +
<pre>
  +
<sets>
  +
<set n="Vowels">aeiou</set>
  +
<set n="BackVow">bcdfg</set>
  +
</sets>
  +
</pre>
  +
  +
{|class=wikitable
  +
! Tag/Symbol !! Meaning
  +
|-
  +
| '''set''' || set/group of alphabets
  +
|-
  +
| '''n''' || set name
  +
|-
  +
|}
  +
  +
==Diacritics==
  +
  +
<pre>
  +
<sets>
  +
<set n="Vowels">aeiou</set>
  +
<set n="BackVow">bcdfg</set>
  +
</sets>
  +
</pre>
  +
  +
{|class=wikitable
  +
! Tag/Symbol !! Meaning
  +
|-
  +
| '''set''' || set/group of alphabets
  +
|-
  +
| '''n''' || set name
  +
|-
  +
|}
  +
  +
==Rule Definitions==
  +
  +
<pre>
  +
<sets>
  +
<set n="Vowels">aeiou</set>
  +
<set n="BackVow">bcdfg</set>
  +
</sets>
  +
</pre>
  +
  +
{|class=wikitable
  +
! Tag/Symbol !! Meaning
  +
|-
  +
| '''set''' || set/group of alphabets
  +
|-
  +
| '''n''' || set name
  +
|-
  +
|}
  +
  +
==Rules==
   
 
<pre>
 
<pre>
Line 104: Line 158:
 
|-
 
|-
 
|}
 
|}
  +
  +
==Regular Expression Synatx==
  +
  +
Regular expressions in the twolc syntax are handled with the help

Revision as of 05:49, 3 June 2018

Current Status: In Progress
Project: Extend lttoolbox to have the power of HFST

Guidelines

  • Every rule in the dictionary file must be properly compatible with the the HFST twolc engine and must not result in any kind of ambiguities.
  • The xml tags must be well defined for archiphonemes and rules and must be distinct from the other existing tags in lttoolbox.
  • Every rule entry should have comments adequate enough to give a brief understanding of morphophonological transformations performed by the twol compiler.

Design

  • The design is still in the development stage and may need significant modifications after it is implemented on the existing language pairs.
  • The design must be robust enough to support all type of rules namely:
    • Phonologically conditioned deletion
    • Morphologically conditioned deletion
    • Phonologically conditioned symbol change
    • Morphologically conditioned symbol change
    • Phonologically conditioned insertion
    • Morphologically conditioned insertion

Alphabets

<alphabet>аӑеёӗиоуӳыэюябвгджзклмнпрсҫтфхцчшщйьъАӐЕЁӖИОУӲЫЭЮЯБВГДЖЗКЛМНПРСҪТФХЦЧШЩЙЬЪ<ar n="A">ae</ar></alphabet>

The alphabets within the ar tags denote all the possible surface form transformations possible for the archiphoneme.

Tag/Symbol Meaning
ar archiphoneme
n archiphoneme name

Sets

<sets>
  <set n="Vowels">aeiou</set>
  <set n="BackVow">bcdfg</set>
</sets>
Tag/Symbol Meaning
set set/group of alphabets
n set name

Definitions

<sets>
  <set n="Vowels">aeiou</set>
  <set n="BackVow">bcdfg</set>
</sets>
Tag/Symbol Meaning
set set/group of alphabets
n set name

Diacritics

<sets>
  <set n="Vowels">aeiou</set>
  <set n="BackVow">bcdfg</set>
</sets>
Tag/Symbol Meaning
set set/group of alphabets
n set name

Rule Definitions

<sets>
  <set n="Vowels">aeiou</set>
  <set n="BackVow">bcdfg</set>
</sets>
Tag/Symbol Meaning
set set/group of alphabets
n set name

Rules

<rules>
  <rule c="Back vowel harmony for archiphoneme A">
    <m><ar n="A"></m><s>a</s>
    <context dir="e"><l_c><set n="BackVow"></l_c><r_c></r_c></context>
  </rule>
  <rule c="Only hyphen in vowel boundaries and caps">
    <m><ar n="hyph?"></m><s>-</s>
    <context dir="f"><l_c><set n="Vowels"></l_c><r_c></r_c></context>
  </rule>
  <rule c="Back vowel harmony for archiphoneme A">
    <m><ar n="A"></m><s>a</s>
    <context dir="b"><l_c><set n="BackVow"></l_c><r_c></r_c></context>
  </rule>
  <rule c="Back vowel harmony for archiphoneme A">
    <m><ar n="A"></m><s>a</s>
    <context dir="ne"><l_c><set n="BackVow"></l_c><r_c></r_c></context>
  </rule>
</rules>
Tag/Symbol Meaning
rule twol rule
c comment
m morphotactic side
s surface side
context context for transformation
dir direction constraint
f a:b => _ ; If the symbol pair a:b appears it must be in context _
b a:b <= _ ; If lexical a appears in the context _ then it must correspond to surface b
e a:b <=> _ ; Lexical a always corresponds to b in context _
ne a:b /<= _ ; Lexical a never corresponds to b in context _
r_c right context
l_c left context

Regular Expression Synatx

Regular expressions in the twolc syntax are handled with the help