Difference between revisions of "Archiphonemes"
Jump to navigation
Jump to search
m |
|||
(8 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
+ | ==Guidelines== |
||
⚫ | |||
* Archiphonemes should be a single character. |
* Archiphonemes should be a single character. |
||
Line 5: | Line 5: | ||
* Archiphonemes should be declared in the <code>Multichar_Symbols</code> section in the header of the file after the grammatical tags, with a comment giving their possible forms. |
* Archiphonemes should be declared in the <code>Multichar_Symbols</code> section in the header of the file after the grammatical tags, with a comment giving their possible forms. |
||
* If the archiphoneme is subject to deletion, it should be written in lower case, e.g. <code>{s}</code> |
* If the archiphoneme is subject to deletion, it should be written in lower case, e.g. <code>{s}</code> |
||
− | * If the archiphoneme |
+ | * If the archiphoneme has a range of default surface forms (even if rarely subject to deletion), it should be written in upper case, e.g. <code>{A}</code> |
+ | * If the archiphoneme is always deleted, it ''may'' consist of more than one character, e.g. <code>{dup}</code>. This is, however, advised against. |
||
⚫ | |||
+ | |||
+ | |||
+ | ==Frequently asked questions== |
||
+ | |||
+ | ; Why use {C} and not ^C ? |
||
+ | |||
+ | <pre> |
||
+ | <spectie> was thinking about {A} over ^A |
||
+ | <Flammie> good |
||
+ | <spectie> and worked out a nice argument for it (aside from pure aesthetics): |
||
+ | <spectie> other programs (e.g. morphological segmenters) parsing the output with {A} don't need to know about multicharacter symbols |
||
+ | <spectie> compare: |
||
+ | <spectie> foo{A}z{A}l |
||
+ | <spectie> foo^Az^Al |
||
+ | <spectie> with the first you know where the symbol ends |
||
+ | <spectie> in the second you do not know |
||
+ | <spectie> it may be ^Az and ^Al or ^A z ^A l |
||
+ | </pre> |
||
[[Category:Terminology]] |
[[Category:Terminology]] |
||
+ | |||
+ | [[Category:HFST]] |
||
+ | [[Category:Writing dictionaries]] |
||
+ | [[Category:Documentation in English]] |
Latest revision as of 16:22, 26 September 2016
Guidelines[edit]
- Archiphonemes should be a single character.
- Archiphonemes in lexc should be encased in
{
and}
. - Archiphonemes should be declared in the
Multichar_Symbols
section in the header of the file after the grammatical tags, with a comment giving their possible forms. - If the archiphoneme is subject to deletion, it should be written in lower case, e.g.
{s}
- If the archiphoneme has a range of default surface forms (even if rarely subject to deletion), it should be written in upper case, e.g.
{A}
- If the archiphoneme is always deleted, it may consist of more than one character, e.g.
{dup}
. This is, however, advised against.
Common archiphonemes[edit]
Frequently asked questions[edit]
- Why use {C} and not ^C ?
<spectie> was thinking about {A} over ^A <Flammie> good <spectie> and worked out a nice argument for it (aside from pure aesthetics): <spectie> other programs (e.g. morphological segmenters) parsing the output with {A} don't need to know about multicharacter symbols <spectie> compare: <spectie> foo{A}z{A}l <spectie> foo^Az^Al <spectie> with the first you know where the symbol ends <spectie> in the second you do not know <spectie> it may be ^Az and ^Al or ^A z ^A l