Difference between revisions of "Archiphonemes"
		
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
		
		
		
		
		
	
| m | |||
| (10 intermediate revisions by 3 users not shown) | |||
| Line 1: | Line 1: | ||
| ==Guidelines== | |||
| ⚫ | |||
| * Archiphonemes should be a single character. | * Archiphonemes should be a single character. | ||
| * Archiphonemes in [[lexc]] should be encased in <code>{</code> and <code>}</code>.  | * Archiphonemes in [[lexc]] should be encased in <code>{</code> and <code>}</code>.  | ||
| * Archiphonemes should be declared in the <code>Multichar_Symbols</code> section in the header of the file, with a comment giving their possible forms. | * Archiphonemes should be declared in the <code>Multichar_Symbols</code> section in the header of the file after the grammatical tags, with a comment giving their possible forms. | ||
| * If the archiphoneme is subject to deletion, it should be written in lower case, e.g. <code>{s}</code> | * If the archiphoneme is subject to deletion, it should be written in lower case, e.g. <code>{s}</code> | ||
| * If the archiphoneme  | * If the archiphoneme has a range of default surface forms (even if rarely subject to deletion), it should be written in upper case, e.g. <code>{A}</code> | ||
| * If the archiphoneme is always deleted, it ''may'' consist of more than one character, e.g. <code>{dup}</code>. This is, however, advised against. | |||
| ⚫ | |||
| ==Frequently asked questions== | |||
| ; Why use {C} and not ^C ? | |||
| <pre> | |||
| <spectie> was thinking about {A} over ^A | |||
| <Flammie> good | |||
| <spectie> and worked out a nice argument for it (aside from pure aesthetics):  | |||
| <spectie> other programs (e.g. morphological segmenters) parsing the output with {A} don't need to know about multicharacter symbols  | |||
| <spectie> compare:  | |||
| <spectie> foo{A}z{A}l | |||
| <spectie> foo^Az^Al | |||
| <spectie> with the first you know where the symbol ends | |||
| <spectie> in the second you do not know | |||
| <spectie> it may be ^Az and ^Al or ^A z ^A l | |||
| </pre> | |||
| [[Category:Terminology]] | |||
| [[Category:HFST]] | |||
| [[Category:Writing dictionaries]] | |||
| [[Category:Documentation in English]] | |||
Latest revision as of 16:22, 26 September 2016
Guidelines[edit]
- Archiphonemes should be a single character.
- Archiphonemes in lexc should be encased in {and}.
- Archiphonemes should be declared in the Multichar_Symbolssection in the header of the file after the grammatical tags, with a comment giving their possible forms.
- If the archiphoneme is subject to deletion, it should be written in lower case, e.g. {s}
- If the archiphoneme has a range of default surface forms (even if rarely subject to deletion), it should be written in upper case, e.g. {A}
- If the archiphoneme is always deleted, it may consist of more than one character, e.g. {dup}. This is, however, advised against.
Common archiphonemes[edit]
Frequently asked questions[edit]
- Why use {C} and not ^C ?
<spectie> was thinking about {A} over ^A
<Flammie> good
<spectie> and worked out a nice argument for it (aside from pure aesthetics): 
<spectie> other programs (e.g. morphological segmenters) parsing the output with {A} don't need to know about multicharacter symbols 
<spectie> compare: 
<spectie> foo{A}z{A}l
<spectie> foo^Az^Al
<spectie> with the first you know where the symbol ends
<spectie> in the second you do not know
<spectie> it may be ^Az and ^Al or ^A z ^A l

