Difference between revisions of "Archiphonemes"

From Apertium
Jump to navigation Jump to search
m
 
(10 intermediate revisions by 3 users not shown)
Line 1: Line 1:
==Guidelines==
==Standards for archiphonemes==


* Archiphonemes should be a single character.
* Archiphonemes should be a single character.
* Archiphonemes in [[lexc]] should be encased in <code>{</code> and <code>}</code>.
* Archiphonemes in [[lexc]] should be encased in <code>{</code> and <code>}</code>.
* Archiphonemes should be declared in the <code>Multichar_Symbols</code> section in the header of the file, with a comment giving their possible forms.
* Archiphonemes should be declared in the <code>Multichar_Symbols</code> section in the header of the file after the grammatical tags, with a comment giving their possible forms.
* If the archiphoneme is subject to deletion, it should be written in lower case, e.g. <code>{s}</code>
* If the archiphoneme is subject to deletion, it should be written in lower case, e.g. <code>{s}</code>
* If the archiphoneme is never deleted, it should be written in upper case, e.g. <code>{A}</code>
* If the archiphoneme has a range of default surface forms (even if rarely subject to deletion), it should be written in upper case, e.g. <code>{A}</code>
* If the archiphoneme is always deleted, it ''may'' consist of more than one character, e.g. <code>{dup}</code>. This is, however, advised against.

==Common archiphonemes==


==Frequently asked questions==

; Why use {C} and not ^C ?

<pre>
<spectie> was thinking about {A} over ^A
<Flammie> good
<spectie> and worked out a nice argument for it (aside from pure aesthetics):
<spectie> other programs (e.g. morphological segmenters) parsing the output with {A} don't need to know about multicharacter symbols
<spectie> compare:
<spectie> foo{A}z{A}l
<spectie> foo^Az^Al
<spectie> with the first you know where the symbol ends
<spectie> in the second you do not know
<spectie> it may be ^Az and ^Al or ^A z ^A l
</pre>

[[Category:Terminology]]

[[Category:HFST]]
[[Category:Writing dictionaries]]
[[Category:Documentation in English]]

Latest revision as of 16:22, 26 September 2016

Guidelines[edit]

  • Archiphonemes should be a single character.
  • Archiphonemes in lexc should be encased in { and }.
  • Archiphonemes should be declared in the Multichar_Symbols section in the header of the file after the grammatical tags, with a comment giving their possible forms.
  • If the archiphoneme is subject to deletion, it should be written in lower case, e.g. {s}
  • If the archiphoneme has a range of default surface forms (even if rarely subject to deletion), it should be written in upper case, e.g. {A}
  • If the archiphoneme is always deleted, it may consist of more than one character, e.g. {dup}. This is, however, advised against.

Common archiphonemes[edit]

Frequently asked questions[edit]

Why use {C} and not ^C ?
<spectie> was thinking about {A} over ^A
<Flammie> good
<spectie> and worked out a nice argument for it (aside from pure aesthetics): 
<spectie> other programs (e.g. morphological segmenters) parsing the output with {A} don't need to know about multicharacter symbols 
<spectie> compare: 
<spectie> foo{A}z{A}l
<spectie> foo^Az^Al
<spectie> with the first you know where the symbol ends
<spectie> in the second you do not know
<spectie> it may be ^Az and ^Al or ^A z ^A l