Difference between revisions of "List of symbols"

From Apertium
Jump to navigation Jump to search
(add noun classes)
(21 intermediate revisions by 6 users not shown)
Line 11: Line 11:
 
If you were wondering what the symbols #, /, @, +, ~ or * mean, read [[Apertium stream format]].
 
If you were wondering what the symbols #, /, @, +, ~ or * mean, read [[Apertium stream format]].
   
  +
<!-- comments following section headers are intended to make scraping this page easier -->
==Part-of-speech Categories==
 
  +
  +
==Part-of-speech Categories== <!-- POS -->
   
 
{|class=wikitable
 
{|class=wikitable
Line 65: Line 67:
 
|-
 
|-
 
| <code>atp</code> || Attachable prefix || In [[German]], ''zusammen''- ||
 
| <code>atp</code> || Attachable prefix || In [[German]], ''zusammen''- ||
  +
|-
  +
| <code>ideo</code> || Ideophone || ||
  +
|-
  +
| <code>clt</code> || Clitic || ||
 
|}
 
|}
   
=== Punctuation ===
+
=== Punctuation === <!-- punct -->
   
 
{|class=wikitable
 
{|class=wikitable
Line 74: Line 80:
 
| <code>sent</code> || Sentence-ending punctuation || e.g. full stop, question mark || PUNCT
 
| <code>sent</code> || Sentence-ending punctuation || e.g. full stop, question mark || PUNCT
 
|-
 
|-
| <code>cm</code> || Comma punctuation || , || PunctType=Cm
+
| <code>cm</code> || Comma punctuation || , || PUNCT PunctType=Comm
 
|-
 
|-
| <code>lquot</code> || Left quote || « || PUNCT
+
| <code>lquot</code> || Left quote || « || PUNCT PunctType=Quot PunctSide=Ini
 
|-
 
|-
| <code>rquot</code> || Right quote || » || PUNCT
+
| <code>rquot</code> || Right quote || » || PUNCT PunctType=Quot PunctSide=Fin
 
|-
 
|-
| <code>lpar</code> || Left parenthesis || ( || PUNCT
+
| <code>lpar</code> || Left parenthesis || ( || PUNCT PunctType=Brck PunctSide=Ini
 
|-
 
|-
| <code>rpar</code> || Right parenthesis || ) || PUNCT
+
| <code>rpar</code> || Right parenthesis || ) || PUNCT PunctType=Brck PunctSide=Fin
 
|-
 
|-
| <code>guio</code> || Hyphen || - used to connect two words into one e.g. year-long|| PUNCT
+
| <code>guio</code> || Hyphen || - used to connect two words into one e.g. year-long|| PUNCT PunctType=Dash
 
|-
 
|-
| <code>apos</code> || Apostrophe || ' or ' || PunctType=apos
+
| <code>apos</code> || Apostrophe || ' or ' || PUNCT
 
|-
 
|-
| <code>quot</code> || Quotation || " || PunctType=quot
+
| <code>quot</code> || Quotation || " || PUNCT PunctType=Quot
 
|-
 
|-
| <code>percent</code> || Percentage || % || PunctType=percent
+
| <code>percent</code> || Percentage || % || PUNCT
 
|-
 
|-
| <code>lquest</code> || Left question/exclamation mark || ¿¡ (''used in Spanish'') || PUNCT
+
| <code>lquest</code> || Left question/exclamation mark || ¿¡ (''used in Spanish'') || PUNCT PunctSide=Ini
  +
|-
  +
| <code>clb</code> || Clause Boundary || Refers to any of the following symbols: .?;:!·… || PUNCT
  +
|-
  +
| <code>punct</code> || Punctuation || || PUNCT
 
|}
 
|}
   
==Part-of-speech Sub-categories==
+
==Part-of-speech Sub-categories== <!-- subtype -->
   
===Gender===
+
===Gender=== <!-- gender -->
   
 
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).
 
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).
Line 120: Line 130:
 
| <code>fn</code> || Feminine or neuter || || Gender=Fem,Neut
 
| <code>fn</code> || Feminine or neuter || || Gender=Fem,Neut
 
|-
 
|-
| <code>mf</code> || Masculine or feminine || This is used where the gender can be either masculine or feminine || Gender=Masc,Fem
+
| <code>mf</code> || Masculine or feminine || Used when masculine and feminine have the same form || Gender=Masc,Fem
 
|-
 
|-
| <code>mfn</code> || Masculine , feminine , neuter || This is used where the gender can be either masculine, feminine or neuter || Gender=Masc,Fem,Neut
+
| <code>mfn</code> || Masculine , feminine , neuter || Used when masculine, feminine, and neuter have the same form || Gender=Masc,Fem,Neut
 
|-
 
|-
 
| <code>ut</code> || Common || From ''utrum'', found in Scandinavian languages. || Gender=Com
 
| <code>ut</code> || Common || From ''utrum'', found in Scandinavian languages. || Gender=Com
Line 132: Line 142:
 
|}
 
|}
   
===Count/Mass===
+
===Count/Mass=== <!-- countability -->
   
 
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).
 
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).
Line 145: Line 155:
 
|}
 
|}
   
===Animacy===
+
===Animacy=== <!-- animacy -->
   
 
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).
 
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).
Line 152: Line 162:
 
! Symbol !! Gloss !! Notes !! Universal feature
 
! Symbol !! Gloss !! Notes !! Universal feature
 
|-
 
|-
| <code>aa</code> || Animate ||
+
| <code>aa</code> || Animate || || Animacy=Anim
 
|-
 
|-
| <code>an</code> || Animate or inanimate ||
+
| <code>an</code> || Animate or inanimate || || Animacy=Anim,Inan
 
|-
 
|-
| <code>nn</code> || Inanimate ||
+
| <code>nn</code> || Inanimate || || Animacy=Inan
 
|-
 
|-
  +
| <code>hu</code> || Human || || Animacy=Hum
 
|}
 
|}
   
===Adjectives===
+
===Adjectives=== <!-- adj_type -->
   
 
{|class=wikitable
 
{|class=wikitable
Line 173: Line 184:
 
|}
 
|}
   
===Pronoun types ===
+
===Noun Class === <!-- n_class -->
  +
  +
{| class="wikitable" border="1"
  +
! Symbol !! Gloss !! Notes
  +
|-
  +
| <code>cl1</code> || Noun class 1 ||
  +
|-
  +
| <code>cl2</code> || Noun class 2 ||
  +
|-
  +
| <code>cl3</code> || Noun class 3 ||
  +
|-
  +
| <code>cl4</code> || Noun class 4 ||
  +
|-
  +
| <code>cl5</code> || Noun class 5 ||
  +
|-
  +
| <code>cl6</code> || Noun class 6 ||
  +
|-
  +
| <code>cl7</code> || Noun class 7 ||
  +
|-
  +
| <code>cl8</code> || Noun class 8 ||
  +
|-
  +
| <code>cl9</code> || Noun class 9 ||
  +
|-
  +
| <code>cl10</code> || Noun class 10 ||
  +
|-
  +
| <code>cl11</code> || Noun class 11 ||
  +
|-
  +
| <code>cl12</code> || Noun class 12 ||
  +
|}
  +
  +
===Pronoun types === <!-- prn_type -->
   
 
{| class="wikitable" border="1"
 
{| class="wikitable" border="1"
Line 181: Line 222:
 
|-
 
|-
 
| <code>tn</code> || Tónico ||
 
| <code>tn</code> || Tónico ||
  +
|-
  +
| <code>log</code> || Logophoric ||
 
|-
 
|-
 
| <code>detnt</code> || Neuter determiner || POS? || DET
 
| <code>detnt</code> || Neuter determiner || POS? || DET
Line 217: Line 260:
 
|-
 
|-
 
| <code>prx</code> || Proximate ||
 
| <code>prx</code> || Proximate ||
  +
|-
  +
| <code>med</code> || Medial ||
 
|-
 
|-
 
| <code>dst</code> || Distal ||
 
| <code>dst</code> || Distal ||
Line 223: Line 268:
 
|-
 
|-
 
| <code>rec</code> || Reciprocal Pronoun ||
 
| <code>rec</code> || Reciprocal Pronoun ||
  +
|-
  +
| <code>res</code> || Reciprocal Pronoun ||
 
|}
 
|}
   
=== Transitivity ===
+
=== Transitivity === <!-- transitivity -->
   
 
Used for verbs.
 
Used for verbs.
Line 232: Line 279:
 
! Symbol !! Gloss !! Notes !! Universal feature
 
! Symbol !! Gloss !! Notes !! Universal feature
 
|-
 
|-
| <code>tv</code> || Transitive || takes direct object in accusative case (used in Turkic)
+
| <code>tv</code> || Transitive || takes direct object in accusative case (used in Turkic) || Subcat=Tran
 
|-
 
|-
| <code>iv</code> || Intransitive || does not take direct object in accusative case (used in Turkic)
+
| <code>iv</code> || Intransitive || does not take direct object in accusative case (used in Turkic) || Subcat=Intr
 
|-
 
|-
| <code>TD</code> || Transitivity to be determined || if the sub-category is [currently] unknown
+
| <code>TD</code> || Transitivity to be determined || if the sub-category is [currently] unknown
 
|}
 
|}
   
===Separable verbs===
+
===Separable verbs=== <!-- separable -->
   
 
{|class=wikitable
 
{|class=wikitable
Line 252: Line 299:
 
|}
 
|}
   
  +
===Proper nouns=== <!-- np_type -->
== Inflectional morphology ==
 
   
  +
{|class=wikitable
===Number===
 
  +
! Symbol !! Gloss !! Notes !! Universal features
  +
|-
  +
| <code>ant</code> || Anthroponym || [http://en.wikipedia.org/wiki/Anthroponym wikipedia], it's very common to use ant together with f and m for traditionally gender-specific names
  +
|-
  +
| <code>top</code> || Toponym || In some language pairs without the locative case this may be ''loc''. Although this should be changed. [http://en.wikipedia.org/wiki/Toponym wikipedia]
  +
|-
  +
| <code>hyd</code> || Hydronym || [http://en.wikipedia.org/wiki/Hydronym wikipedia]
  +
|-
  +
| <code>cog</code> || Cognomen || In normal use, surnames
  +
|-
  +
| <code>org</code> || Organisation ||
  +
|-
  +
| <code>al</code> || Altres || Other, misc.
  +
|-
  +
| <code>pat</code> ||Patronymic || A name derived from the name of a father or ancestor, e.g. Johnson, O'Brien, Ivanovich.
  +
|}
  +
  +
== Inflectional morphology == <!-- infl -->
  +
  +
===Number=== <!-- number -->
 
Note: number can be a sub-category tag too, e.g. with pronouns.
 
Note: number can be a sub-category tag too, e.g. with pronouns.
   
Line 277: Line 344:
   
   
===Case===
+
===Case=== <!-- case -->
   
 
{|class=wikitable
 
{|class=wikitable
Line 332: Line 399:
 
| <code>dela</code> || Delative || [[Hungarian]] || Case=Del
 
| <code>dela</code> || Delative || [[Hungarian]] || Case=Del
 
|-
 
|-
| <code>term</code> || Terminative || [[Hungarian]], Estonian, ... ||
+
| <code>term</code> || Terminative || [[Hungarian]], Estonian, ... || Case=Ter
 
|-
 
|-
| <code>temp</code> || Temporal || [https://en.wikipedia.org/wiki/Temporal_case] || Case=Temp
+
| <code>temp</code> || Temporal || [https://en.wikipedia.org/wiki/Temporal_case] || Case=Tem
 
|-
 
|-
 
| <code>obl</code> || Oblique || [https://en.wikipedia.org/wiki/Oblique_case] || Case=Obl
 
| <code>obl</code> || Oblique || [https://en.wikipedia.org/wiki/Oblique_case] || Case=Obl
 
|-
 
|-
 
| <code>erg</code> || Ergative || [https://en.wikipedia.org/wiki/Ergative_case] || Case=Erg
 
| <code>erg</code> || Ergative || [https://en.wikipedia.org/wiki/Ergative_case] || Case=Erg
  +
|-
  +
| <code>CD</code> || Case to be determined ||
 
|}
 
|}
   
===Voice===
+
===Voice=== <!-- voice -->
   
 
{|class=wikitable
 
{|class=wikitable
Line 360: Line 429:
 
|}
 
|}
   
===Tense and mode===
+
===Tense and mode=== <!-- tense -->
   
 
{|class=wikitable
 
{|class=wikitable
 
! Symbol !! Gloss !! Notes !! Universal features
 
! Symbol !! Gloss !! Notes !! Universal features
 
|-
 
|-
| <code>pres</code> || Present || || Tense=Pres
+
| <code>aff</code> || Affirmative || [https://en.wikipedia.org/wiki/Affirmation_and_negation wikipedia] || Polarity=Pos
 
|-
 
|-
| <code>pret</code> || Preterite || [https://en.wikipedia.org/wiki/Preterite Preterite] || Tense=Past
+
| <code>aor</code> || Aorist || [https://en.wikipedia.org/wiki/Aorist wikipedia] A tense in Turkic languages. || Tense=Past
 
|-
 
|-
| <code>past</code> || Past || || Tense=Past
+
| <code>cni</code> || Conditional || Lot of pairs will probably use cnd or cond... || Mood=Cnd
 
|-
 
|-
| <code>imp</code> || Imperative || [http://www.englishlanguageguide.com/grammar/imperative.asp englishlanguageguide] || Mood=Imp
+
| <code>deb</code> || Debitive mode || Exclusive to Latvian ([https://en.wikipedia.org/wiki/Debitive wikipedia]) ||
 
|-
 
|-
| <code>inf</code> || Infinitive || [https://en.wikipedia.org/wiki/Infinitive wikipedia] || VerbForm=Inf
+
| <code>fti</code> || Future indicative || || Tense=Fut Mood=Ind
 
|-
 
|-
| <code>ito</code> || Infinitive with 'to' || [[German]] || VerbForm=Inf
+
| <code>fts</code> || Future subjunctive || || Tense=Fut Mood=Sub
 
|-
 
|-
| <code>aor</code> || Aorist || [https://en.wikipedia.org/wiki/Aorist wikipedia] A tense in Turkic languages. || Tense=Past
+
| <code>fut</code> || Future || || Tense=Fut
 
|-
 
|-
| <code>pp</code> || Past participle || [http://en.wikipedia.org/wiki/Participle wikipedia] || VerbForm=Part
+
| <code>ger</code> || Gerund || [http://en.wikipedia.org/wiki/Gerund wikipedia] || VerbForm=Ger
 
|-
 
|-
| <code>pp2</code> || Past participle (???) || It's at least used in the Esperanto dictionaries for future active participles, ''ont'' (seems quite odd) ||
+
| <code>ifi</code> || Past definite || from ''Pretério perfecto o indefinido'' || Tense=Past Definite=Def
 
|-
 
|-
| <code>pp3</code> || Past participle (???) || It's at least used in the Esperanto dictionaries for past active participles, ''int'' (seems quite odd) ||
+
| <code>imp</code> || Imperative || [http://www.englishlanguageguide.com/grammar/imperative.asp englishlanguageguide] || Mood=Imp
 
|-
 
|-
| <code>pprs</code> || Present participle || Also appears as <code>ppres</code> (deprecated) || VerbForm=Part
+
| <code>inf</code> || Infinitive || [https://en.wikipedia.org/wiki/Infinitive wikipedia] || VerbForm=Inf
 
|-
 
|-
| <code>ger</code> || Gerund || [http://en.wikipedia.org/wiki/Gerund wikipedia] || VerbForm=Ger
+
| <code>infps</code> || Personal infinitive || Used in Portuguese ||
 
|-
 
|-
| <code>supn</code> || Supine || [http://en.wikipedia.org/wiki/Supine wikipedia] || VerbForm=Sup
+
| <code>itg</code> || Interrogative || ||
 
|-
 
|-
| <code>pri</code> || Present indicative || ''see also: pres''. [http://en.wikipedia.org/wiki/Present_indicative wikipedia] || Tense=Pres Mood=Ind
+
| <code>ito</code> || Infinitive with 'to' || [[German]] || VerbForm=Inf
 
|-
 
|-
| <code>pprs</code> || Present participle || ''see also: pprs''. [http://en.wikipedia.org/wiki/Present_participle wikipedia] || Tense=Pres Mood=Part
+
| <code>lp</code> || L-participle ||
 
|-
 
|-
| <code>pii</code> || Imperfect || from ''Pretério imperfecto de indicativo'' [https://en.wikipedia.org/wiki/Imperfect wikipedia] || Tense=Past Mood=Ind
+
| <code>neg</code> || Negative || || Polarity=Neg
 
|-
 
|-
| <code>fti</code> || Future indicative || || Tense=Fut Mood=Ind
+
| <code>nonpast</code> || Non-past || || Tense=Pres,Fut
 
|-
 
|-
| <code>fts</code> || Future subjunctive || || Tense=Fut Mood=Sub
+
| <code>past</code> || Past || || Tense=Past
 
|-
 
|-
| <code>cni</code> || Conditional || Lot of pairs will probably use cnd or cond... || Mood=Cnd
+
| <code>pii</code> || Imperfect || from ''Pretério imperfecto de indicativo'' [https://en.wikipedia.org/wiki/Imperfect wikipedia] || Tense=Past Mood=Ind
  +
|-
  +
| <code>pis</code> || Imperfect subjunctive || || Tense=Past Mood=Sub
 
|-
 
|-
 
| <code>plu</code> || Pluperfect || In <code>cy-en</code> || Tense=Pqp
 
| <code>plu</code> || Pluperfect || In <code>cy-en</code> || Tense=Pqp
Line 407: Line 478:
 
| <code>pmp</code> || Pluperfect || In <code>es-gl</code> (from ''Pluscamperfecto'') || Tense=Pqp
 
| <code>pmp</code> || Pluperfect || In <code>es-gl</code> (from ''Pluscamperfecto'') || Tense=Pqp
 
|-
 
|-
  +
| <code>pp2</code> || Past participle (???) || It's at least used in the Esperanto dictionaries for future active participles, ''ont'' (seems quite odd) ||
| <code>prs</code> || Present subjunctive || [http://en.wikipedia.org/wiki/Present_subjunctive wikipedia] || Tense=Pres Mood=Sub
 
 
|-
 
|-
| <code>pis</code> || Imperfect subjunctive || || Tense=Past Mood=Sub
+
| <code>pp3</code> || Past participle (???) || It's at least used in the Esperanto dictionaries for past active participles, ''int'' (seems quite odd) ||
 
|-
 
|-
| <code>ifi</code> || Past definite || from ''Pretério perfecto o indefinido'' || Tense=Past Definite=Def
+
| <code>pp</code> || Past participle || [http://en.wikipedia.org/wiki/Participle wikipedia] || VerbForm=Part
 
|-
 
|-
| <code>aff</code> || Affirmative || [https://en.wikipedia.org/wiki/Affirmation_and_negation wikipedia] || Polarity=Pos
+
| <code>pprs</code> || Present participle || Also appears as <code>ppres</code> (deprecated) || VerbForm=Part
 
|-
 
|-
| <code>itg</code> || Interrogative || ||
+
| <code>pprs</code> || Present participle || ''see also: pprs''. [http://en.wikipedia.org/wiki/Present_participle wikipedia] || Tense=Pres Mood=Part
 
|-
 
|-
| <code>neg</code> || Negative || || Polarity=Neg
+
| <code>pres</code> || Present || || Tense=Pres
 
|-
 
|-
| <code>lp</code> || L-participle ||
+
| <code>pret</code> || Preterite || [https://en.wikipedia.org/wiki/Preterite Preterite] || Tense=Past
 
|-
 
|-
| <code>deb</code> || Debitive mode || Exclusive to Latvian ([https://en.wikipedia.org/wiki/Debitive wikipedia]) ||
+
| <code>pri</code> || Present indicative || ''see also: pres''. [http://en.wikipedia.org/wiki/Present_indicative wikipedia] || Tense=Pres Mood=Ind
|-
+
|-
  +
| <code>prs</code> || Present subjunctive || [http://en.wikipedia.org/wiki/Present_subjunctive wikipedia] || Tense=Pres Mood=Sub
  +
|-
  +
| <code>supn</code> || Supine || [http://en.wikipedia.org/wiki/Supine wikipedia] || VerbForm=Sup
  +
|}
  +
  +
===Aspect=== <!-- aspect -->
  +
{|class=wikitable
  +
! Symbol !! Gloss !! Notes !! Universal feature
  +
|-
  +
| <code>hab</code> || Habitual || || Aspect=Hab
  +
|-
  +
| <code>imperf</code> || Imperfective || || Aspect=Imp
  +
|-
  +
| <code>impf</code> || Imperfective || || Aspect=Imp
  +
|-
  +
| <code>perf</code> || Perfective || || Aspect=Perf
 
|}
 
|}
   
===Person===
+
===Person=== <!-- person -->
 
Note: person can be a sub-category tag, e.g. with pronouns.
 
Note: person can be a sub-category tag, e.g. with pronouns.
   
Line 439: Line 526:
 
| <code>impers</code> || Impersonal || Sometimes called 'autonomous' || Person=0
 
| <code>impers</code> || Impersonal || Sometimes called 'autonomous' || Person=0
 
|-
 
|-
  +
| <code>past3p</code> || past third person ||
 
|}
 
|}
   
===Derivations===
+
===Derivations=== <!-- verb_deriv -->
 
{|class=wikitable
 
{|class=wikitable
 
! Symbol !! Gloss !! Notes
 
! Symbol !! Gloss !! Notes
Line 455: Line 543:
 
|}
 
|}
   
===Possession===
+
===Possession=== <!-- possessor -->
 
{|class=wikitable
 
{|class=wikitable
 
! Symbol !! Gloss !! Notes !! Universal feature
 
! Symbol !! Gloss !! Notes !! Universal feature
Line 475: Line 563:
 
|}
 
|}
   
===Object marking===
+
===Subject marking=== <!-- subject -->
  +
  +
e.g. in verbs with both, otherwise, see [[#Person]] and [[#Number]].
  +
  +
{|class=wikitable
  +
! Symbol !! Gloss !! Notes !! Universal features
  +
|-
  +
| <code>s_sg1</code> || First person singular object || || Number=Sing Person=1
  +
|-
  +
| <code>s_sg2</code> || Second person singular object || || Number=Sing Person=2
  +
|-
  +
| <code>s_sg3</code> || Third person singular object || || Number=Sing Person=3
  +
|-
  +
| <code>s_pl1</code> || First person plural object || || Number=Plur Person=1
  +
|-
  +
| <code>s_pl2</code> || Second person plural object || || Number=Plur Person=2
  +
|-
  +
| <code>s_pl3</code> || Third person plural object || || Number=Plur Person=3
  +
|-
  +
|}
  +
  +
  +
===Object marking=== <!-- object -->
   
 
e.g. in verbs with both
 
e.g. in verbs with both
Line 496: Line 606:
 
|}
 
|}
   
  +
===Adjectives=== <!-- adj_infl -->
===Proper nouns===
 
   
 
{|class=wikitable
 
{|class=wikitable
 
! Symbol !! Gloss !! Notes !! Universal features
 
! Symbol !! Gloss !! Notes !! Universal features
 
|-
 
|-
| <code>ant</code> || Anthroponym || [http://en.wikipedia.org/wiki/Anthroponym wikipedia], it's very common to use ant together with f and m for traditionally gender-specific names
+
| <code>pst</code> || Positive || || Degree=Pos
 
|-
 
|-
| <code>top</code> || Toponym || In some language pairs without the locative case this may be ''loc''. Although this should be changed. [http://en.wikipedia.org/wiki/Toponym wikipedia]
+
| <code>comp</code> || Comparative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] || Degree=Comp
 
|-
 
|-
| <code>hyd</code> || Hydronym || [http://en.wikipedia.org/wiki/Hydronym wikipedia]
+
| <code>sup</code> || Superlative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] || Degree=Sup
 
|-
 
|-
| <code>cog</code> || Cognomen || In normal use, surnames
+
| <code>attr</code> || Attributive || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] ||
 
|-
 
|-
| <code>org</code> || Organisation ||
+
| <code>pred</code> || Predicative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] ||
 
|-
 
|-
| <code>al</code> || Altres || Other, misc.
+
|-<code>short</code> || Short adjective ||
 
|}
 
|}
   
  +
===Formality=== <!-- formality -->
===Adjectives===
 
 
 
{|class=wikitable
 
{|class=wikitable
! Symbol !! Gloss !! Notes !! Universal features
+
! Symbols !! Gloss !! Notes
 
|-
 
|-
| <code>pst</code> || Positive || || Degree=Pos
+
| <code>crd</code> || Cordial ||
 
|-
 
|-
| <code>comp</code> || Comparative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] || Degree=Comp
+
| <code>el</code> || Elite ||
 
|-
 
|-
| <code>sup</code> || Superlative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] || Degree=Sup
+
| <code>fam</code> || Familiar ||
 
|-
 
|-
| <code>attr</code> || Attributive || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia]
+
| <code>frm</code> || Formal ||
 
|-
 
|-
| <code>pred</code> || Predicative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia]
+
| <code>infml</code> || Informal ||
  +
|-
  +
| <code>pol</code> || Polite ||
  +
|-
  +
| <code>low</code> || Low courtesy ||
  +
|-
  +
| <code>mid</code> || Mid courtesy ||
  +
|-
  +
| <code>hi</code> || High courtesy ||
 
|}
 
|}
   
  +
===Others=== <!-- other -->
 
===Others===
 
 
{|class=wikitable
 
{|class=wikitable
 
! Symbol !! Gloss !! Notes
 
! Symbol !! Gloss !! Notes
Line 539: Line 655:
 
| <code>date</code> || Dates, years... ||
 
| <code>date</code> || Dates, years... ||
 
|-
 
|-
| <code>time</code> || Time ||
+
| <code>email</code> || Electronic Mail || Shorten form of Electronic Mail
  +
|-
  +
| <code>file</code> || Filenames ||
  +
|-
  +
| <code>mon</code> || Money ||
 
|-
 
|-
 
| <code>percent</code> || Percentage || e.g. 25%, 0.9%
 
| <code>percent</code> || Percentage || e.g. 25%, 0.9%
  +
|-
  +
| <code>time</code> || Time ||
  +
|-
  +
| <code>url</code> || Web address ||
 
|-
 
|-
 
| <code>web</code> || Links and Emails ||
 
| <code>web</code> || Links and Emails ||
 
|-
 
|-
| <code>file</code> || Filenames ||
+
| <code>year</code> || Years ||
 
|-
 
|-
| <code>email</code> || Electronic Mail || Shorten form of Electronic Mail
+
| <code>maj</code> || Large script in which every letter is the same height ||
 
|-
 
|-
  +
| <code>min</code> || small script in which every letter is the same height ||
 
|}
 
|}
   
  +
=== Compounds === <!-- compound -->
 
 
 
 
 
 
 
=== Compounds ===
 
   
 
{|class=wikitable
 
{|class=wikitable
! Symbol !! Gloss !! Notes
+
! Symbol !! Gloss !! Notes !! Universal feature
|-
 
| <code>cmp</code> || Compound Noun || https://en.wikipedia.org/wiki/English_compound ||
 
 
|-
 
|-
  +
| <code>cmp</code> || Compound Noun ||
 
|}
 
|}
   
Line 571: Line 688:
 
* [[Tagging guidelines for Portuguese]]
 
* [[Tagging guidelines for Portuguese]]
   
==Chunk tags==
+
==Chunk tags== <!-- chunk -->
   
 
{|class=wikitable
 
{|class=wikitable
Line 584: Line 701:
 
|}
 
|}
   
==XML tags==
+
==XML tags== <!-- xml -->
 
Note: All XML tags are explained in depth in the PDF [[documentation]], see also the [https://github.com/apertium/lttoolbox/blob/master/lttoolbox/dix.dtd dix.dtd] and [https://github.com/apertium/lttoolbox/blob/master/lttoolbox/dix.rng dix.rng] files in the GitHub repository.
 
Note: All XML tags are explained in depth in the PDF [[documentation]], see also the [https://github.com/apertium/lttoolbox/blob/master/lttoolbox/dix.dtd dix.dtd] and [https://github.com/apertium/lttoolbox/blob/master/lttoolbox/dix.rng dix.rng] files in the GitHub repository.
   
Line 590: Line 707:
 
! XML tag !! Means !! Appears in XML tags / notes / examples
 
! XML tag !! Means !! Appears in XML tags / notes / examples
 
|-
 
|-
| <code><dictionary></code> || Mono- or bilingual dictionary || In files apertium-eo-en.en.dix, apertium-eo-en.eo-en.dix, apertium-eo-en.post-en.dix, apertium-eo-en.post-eo.dix
+
| <code><dictionary></code> || Mono- or bilingual dictionary || Toplevel tag for all dictionaries
 
|-
 
|-
 
| <code><alphabet></code> || Set of characters in the language|| In <code><dictionary></code>
 
| <code><alphabet></code> || Set of characters in the language|| In <code><dictionary></code>
Line 620: Line 737:
 
| <code>&lt;b></code> || Blank space || In <code>&lt;r></code>, <code>&lt;l></code> and <code>&lt;i></code>. Ex.: <code>&lt;l>you're&lt;b/>welcome&lt;s ...</code>
 
| <code>&lt;b></code> || Blank space || In <code>&lt;r></code>, <code>&lt;l></code> and <code>&lt;i></code>. Ex.: <code>&lt;l>you're&lt;b/>welcome&lt;s ...</code>
 
|-
 
|-
  +
| <code>&lt;g></code> || Group || For [[Chunking:_A_full_example#Handling_of_multiwords_with_inner_inflection|multiwords]]
  +
|-
  +
| <code>&lt;ig></code> || Identity group || Combination of <code>&lt;i></code> and <code>&lt;g></code>
  +
|-
  +
| <code>&lt;j></code> || Join || A <code>+</code> symbol in compounds. In [[Apertium-separable]], <code>&lt;j></code> indicates end-of-word
  +
|-
  +
| <code>&lt;prm></code> || Parameter || Only in [[Metadix]]
  +
|-
  +
| <code>&lt;sa></code> || Symbol Argument ??? || Only in [[Metadix]]
  +
|-
  +
| <code>&lt;t></code> || Tag or Template || In [[Apertium-separable]] <code>&lt;t></code> is any tag, in crossdix it is template (matches a single tag)
  +
|-
  +
| <code>&lt;v></code> || Variable || Only in crossdix - like + in regexes
 
|}
 
|}
 
TODO: Probably there are more. --[[User:Jacob Nordfalk|Jacob Nordfalk]] 14:47, 25 August 2008 (UTC)
 
 
Other tags:
 
<pre>
 
<j/> (in stream format #) is to mark multiwords
 
 
<t/> and <v/> are only in crossdix
 
t = template, v = variable
 
t matches any single tag, v is like + in regexes (0 or more)
 
 
<sa/> and <prm/> are only used in metadixes.
 
'sa' lets you add n optional extra tag, prm is an extra string for the paradigm
 
</pre>
 
   
 
=== Transfer ===
 
=== Transfer ===
   
==== <clip> tag ====
+
==== <clip> tag ==== <!-- clip -->
   
 
See the [https://wiki.apertium.org/w/images/d/d0/Apertium2-documentation.pdf documentation (pdf)], p.144 for more information.
 
See the [https://wiki.apertium.org/w/images/d/d0/Apertium2-documentation.pdf documentation (pdf)], p.144 for more information.
Line 657: Line 773:
 
==See also==
 
==See also==
 
* [[Syntax tags]]
 
* [[Syntax tags]]
  +
* [[Secondary tags]]
 
* [[Apertium stream format]]
 
* [[Apertium stream format]]
 
* [[User:Adverick#FreeMind_Apertium_PoS|FreeMind Apertium PoS]]
 
* [[User:Adverick#FreeMind_Apertium_PoS|FreeMind Apertium PoS]]

Revision as of 03:58, 26 December 2020

En français · по-русски

This page lists the symbols in Apertium used to denote part-of-speech and further morphological features, as well as chunk tags used for more syntactic functions, as well as XML tags.

This page also documents alignment between Apertium morphological tags and Universal Dependencies POS tags and features.


This is meant to be a glossary of symbol names in alphabetical order with notes. Some of these names are specific to particular packages or language pairs, as not all languages have the same grammatical features (most don't have spatial distinction in articles for example).

If you were wondering what the symbols #, /, @, +, ~ or * mean, read Apertium stream format.


Part-of-speech Categories

Symbol Gloss Notes Universal POS
n Noun see 'np' for proper noun NOUN
vblex Standard ("lexical") verb see also: vbser, vbhaver, vbmod, vaux, vbdo VERB
v Standard verb shortened form of vblex, often used in agglutinative languages VERB
vbmod Modal verb VERB
vbser Verb "to be" from ser (to be) VERB (or AUX)
vbhaver Verb "to have" from haver (to have)  VERB (or AUX)
vbdo Verb "to do" "to do" includes all eleven tenses and forms of to do, can also be an auxiliary verb  VERB (or AUX)
vaux Auxiliary verb wikipedia  AUX
cop Copula wikipedia; sometimes verb-like, sometimes not  AUX, ...
adj Adjective  ADJ
adv Adverb  ADV
preadv Pre-adverb  ADV
postadv Post-adverb  ADV
mod Modal word [1] PART
det Determiner wikipedia  DET
prn Pronoun wikipedia  PRON
pr Preposition wikipedia ADP
post Postposition ADP
num Numeral NUM
np Proper noun From nom propi wikipedia  PROPN
ij Interjection wikipedia INTJ
cnjcoo Co-ordinating conjunction wikipedia CCONJ
cnjsub Sub-ordinating conjunction SCONJ
cnjadv Conjunctive adverb wikipedia  SCONJ, ADV
atp Attachable prefix In German, zusammen-
ideo Ideophone
clt Clitic

Punctuation

Symbol Gloss Notes Universal POS
sent Sentence-ending punctuation e.g. full stop, question mark PUNCT
cm Comma punctuation ,  PUNCT PunctType=Comm
lquot Left quote « PUNCT PunctType=Quot PunctSide=Ini
rquot Right quote » PUNCT PunctType=Quot PunctSide=Fin
lpar Left parenthesis (  PUNCT PunctType=Brck PunctSide=Ini
rpar Right parenthesis )  PUNCT PunctType=Brck PunctSide=Fin
guio Hyphen - used to connect two words into one e.g. year-long  PUNCT PunctType=Dash
apos Apostrophe ' or '  PUNCT
quot Quotation "  PUNCT PunctType=Quot
percent Percentage %  PUNCT
lquest Left question/exclamation mark ¿¡ (used in Spanish) PUNCT PunctSide=Ini
clb Clause Boundary Refers to any of the following symbols: .?;:!·… PUNCT
punct Punctuation PUNCT

Part-of-speech Sub-categories

Gender

These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).

Symbol Gloss Notes Universal features
f Feminine Gender=Fem
m Masculine Gender=Masc
nt Neuter Gender=Neut
ma Masculine (animate) Mostly in Slavic languages Gender=Masc
mi Masculine (inanimate) Mostly in Slavic languages Gender=Masc
mp Masculine (personal) in Polish Gender=Masc
mn Masculine or neuter  Gender=Masc,Neut
fn Feminine or neuter Gender=Fem,Neut
mf Masculine or feminine Used when masculine and feminine have the same form Gender=Masc,Fem
mfn Masculine , feminine , neuter Used when masculine, feminine, and neuter have the same form Gender=Masc,Fem,Neut
ut Common From utrum, found in Scandinavian languages. Gender=Com
un Common or neuter As above, only common or neuter Gender=Com,Neut
GD Gender to be determined

Count/Mass

These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).

Symbol Gloss Notes Universal feature
cnt Countable
unc Uncountable (mass)

Animacy

These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).

Symbol Gloss Notes Universal feature
aa Animate Animacy=Anim
an Animate or inanimate Animacy=Anim,Inan
nn Inanimate Animacy=Inan
hu Human Animacy=Hum

Adjectives

Symbol Gloss Notes Universal feature
sint Synthetic "nice, nicer, nicest" is synthetic. "handsome, more handsome, the most handsome" is not. wikipedia
preadj Pre-adjective for languages where most of adjectives are after the noun (ex: French in eo->fr bidix)
preadj_nh Pre-adjective if not human according to the noun, the adjective is before or after

Noun Class

Symbol Gloss Notes
cl1 Noun class 1
cl2 Noun class 2
cl3 Noun class 3
cl4 Noun class 4
cl5 Noun class 5
cl6 Noun class 6
cl7 Noun class 7
cl8 Noun class 8
cl9 Noun class 9
cl10 Noun class 10
cl11 Noun class 11
cl12 Noun class 12

Pronoun types

Symbol Gloss Notes Universal feature
pers Personal  PronType=Prs
tn Tónico
log Logophoric
detnt Neuter determiner POS?  DET
predet Pre determiner POS?  DET
atn Atónico
qnt Quantifier  PronType=Ind
ord Ordinal  NumType=Ord
obj Object
subj Subject
pro Proclitic
enc Enclitic
acr Acronym Not Pronuon?  Abbr=Yes
rel Relative  PronType=Rel
ind Indefinite  PronType=Ind
itg Interrogative  PronType=Int
dem Demonstrative PronType=Dem
def Definite
pos Possessive Poss=Yes
ref Reflexive Reflex=Yes
prx Proximate
med Medial
dst Distal
expl Syntactic expletive wikipedia
rec Reciprocal Pronoun
res Reciprocal Pronoun

Transitivity

Used for verbs.

Symbol Gloss Notes Universal feature
tv Transitive takes direct object in accusative case (used in Turkic) Subcat=Tran
iv Intransitive does not take direct object in accusative case (used in Turkic) Subcat=Intr
TD Transitivity to be determined if the sub-category is [currently] unknown

Separable verbs

Symbol Gloss Notes
sep Separable verb wikipedia, lingolia, PDF
fs Separable verb in subordinate clause
fm Separable verb in main clause

Proper nouns

Symbol Gloss Notes Universal features
ant Anthroponym wikipedia, it's very common to use ant together with f and m for traditionally gender-specific names
top Toponym In some language pairs without the locative case this may be loc. Although this should be changed. wikipedia
hyd Hydronym wikipedia
cog Cognomen In normal use, surnames
org Organisation
al Altres Other, misc.
pat Patronymic A name derived from the name of a father or ancestor, e.g. Johnson, O'Brien, Ivanovich.

Inflectional morphology

Number

Note: number can be a sub-category tag too, e.g. with pronouns.

Symbol Gloss Notes Universal feature
sg Singular Number=Sing
pl Plural Number=Plur
sp Singular or plural Number=Sing,Plur
du Dual  Number=Dual
ct Count see mk-bg Number=Count
coll Collective Number=Coll
ND Number to be determined


Case

Symbol Gloss Notes Universal feature
nom Nominative  Case=Nom
acc Accusative  Case=Acc
dat Dative Case=Dat
gen Genitive  Case=Gen
dg Dative and Genitive in ro-es, discouraged in new developments Case=Dat,Gen
voc Vocative  Case=Voc
abl Ablative wikipedia Case=Abl
ins Instrumental or Instructive wikipedia Case=Ins
loc Locative wikipedia Case=Loc
prp Prepositional wikipedia
tra Translative Case=Tra
ill Illative  Case=Ill
ine Inessive Case=Ine
ade Adessive Case=Ade
all Allative Case=All
abe Abessive  Case=Abe
ess Essive Case=Ess
par Partitive  Case=Par
dis Distributive Case=Dis
com Comitative  Case=Com
soc Sociative
prl Prolative  Case=Pro
ses Superessive Hungarian Case=Sup
sub Sublative Hungarian Case=Sub
dela Delative Hungarian Case=Del
term Terminative Hungarian, Estonian, ... Case=Ter
temp Temporal [2] Case=Tem
obl Oblique [3] Case=Obl
erg Ergative [4] Case=Erg
CD Case to be determined

Voice

Symbol Gloss Notes Universal feature
actv Active voice  Voice=Act
pass Passive voice is more used in Turkic. Voice=Pass
pasv Passive voice is more used in Germanic. Voice=PAss
midv Middle voice  Voice=Mid
nactv Non-active voice See Albanian.
caus Causative voice see also #Derivations Voice=Cau

Tense and mode

Symbol Gloss Notes Universal features
aff Affirmative wikipedia Polarity=Pos
aor Aorist wikipedia A tense in Turkic languages.  Tense=Past
cni Conditional Lot of pairs will probably use cnd or cond... Mood=Cnd
deb Debitive mode Exclusive to Latvian (wikipedia)
fti Future indicative  Tense=Fut Mood=Ind
fts Future subjunctive Tense=Fut Mood=Sub
fut Future Tense=Fut
ger Gerund wikipedia VerbForm=Ger
ifi Past definite from Pretério perfecto o indefinido  Tense=Past Definite=Def
imp Imperative englishlanguageguide Mood=Imp
inf Infinitive wikipedia  VerbForm=Inf
infps Personal infinitive Used in Portuguese
itg Interrogative
ito Infinitive with 'to' German VerbForm=Inf
lp L-participle
neg Negative  Polarity=Neg
nonpast Non-past Tense=Pres,Fut
past Past Tense=Past
pii Imperfect from Pretério imperfecto de indicativo wikipedia  Tense=Past Mood=Ind
pis Imperfect subjunctive  Tense=Past Mood=Sub
plu Pluperfect In cy-en Tense=Pqp
pmp Pluperfect In es-gl (from Pluscamperfecto) Tense=Pqp
pp2 Past participle (???) It's at least used in the Esperanto dictionaries for future active participles, ont (seems quite odd)
pp3 Past participle (???) It's at least used in the Esperanto dictionaries for past active participles, int (seems quite odd)
pp Past participle wikipedia VerbForm=Part
pprs Present participle Also appears as ppres (deprecated) VerbForm=Part
pprs Present participle see also: pprs. wikipedia Tense=Pres Mood=Part
pres Present  Tense=Pres
pret Preterite Preterite Tense=Past
pri Present indicative see also: pres. wikipedia Tense=Pres Mood=Ind
prs Present subjunctive wikipedia Tense=Pres Mood=Sub
supn Supine wikipedia VerbForm=Sup

Aspect

Symbol Gloss Notes Universal feature
hab Habitual Aspect=Hab
imperf Imperfective Aspect=Imp
impf Imperfective Aspect=Imp
perf Perfective Aspect=Perf

Person

Note: person can be a sub-category tag, e.g. with pronouns.

Symbol Gloss Notes Universal feature
p1 First person Person=1
p2 Second person  Person=2
p3 Third person  Person=3
impers Impersonal Sometimes called 'autonomous'  Person=0
past3p past third person

Derivations

Symbol Gloss Notes
caus Causative
ingr Ingressive https://nn.wikipedia.org/w/index.php?title=Ingressiv
subs Verbal Noun or Verbal Substantive Shorten form of substantive. Noun formed from a verb
agnt Agent noun Agent Noun

Possession

Symbol Gloss Notes Universal feature
px1sg First person singular possessive e.g. in Turkic languages  Person[psor]=1 Number[psor]=Sing
px2sg Second person singular possessive e.g. in Turkic languages  Person[psor]=2 Number[psor]=Sing
px3sg Third person singular possessive e.g. in Turkic languages  Person[psor]=3 Number[psor]=Sing
px1pl First person plural possessive e.g. in Turkic languages  Person[psor]=1 Number[psor]=Plur
px2pl Second person plural possessive e.g. in Turkic languages  Person[psor]=2 Number[psor]=Plur
px3pl Third person plural possessive e.g. in Turkic languages  Person[psor]=3 Number[psor]=Plur
px3sp Third person possessive singular or plural e.g. in Turkic languages  Person[psor]=3

Subject marking

e.g. in verbs with both, otherwise, see #Person and #Number.

Symbol Gloss Notes Universal features
s_sg1 First person singular object Number=Sing Person=1
s_sg2 Second person singular object Number=Sing Person=2
s_sg3 Third person singular object Number=Sing Person=3
s_pl1 First person plural object Number=Plur Person=1
s_pl2 Second person plural object Number=Plur Person=2
s_pl3 Third person plural object Number=Plur Person=3


Object marking

e.g. in verbs with both

Symbol Gloss Notes Universal features
o_sg1 First person singular object
o_sg2 Second person singular object
o_sg3 Third person singular object
o_pl1 First person plural object
o_pl2 Second person plural object
o_pl3 Third person plural object

Adjectives

Symbol Gloss Notes Universal features
pst Positive  Degree=Pos
comp Comparative wikipedia Degree=Comp
sup Superlative wikipedia Degree=Sup
attr Attributive wikipedia
pred Predicative wikipedia

Formality

Symbols Gloss Notes
crd Cordial
el Elite
fam Familiar
frm Formal
infml Informal
pol Polite
low Low courtesy
mid Mid courtesy
hi High courtesy

Others

Symbol Gloss Notes
abbr Abbreviation (e.g. etc., Mr.) Acronyms are also included (see acr)
date Dates, years...
email Electronic Mail Shorten form of Electronic Mail
file Filenames
mon Money
percent Percentage e.g. 25%, 0.9%
time Time
url Web address
web Links and Emails
year Years
maj Large script in which every letter is the same height
min small script in which every letter is the same height

Compounds

Symbol Gloss Notes Universal feature
cmp Compound Noun

See also

Chunk tags

Tag Description
<SN> Noun phrase / noun group (sintagma nominal)
<SA> Adjective phrase / adjective group
<SV> Verb phrase / verb group (sintagma verbal)

XML tags

Note: All XML tags are explained in depth in the PDF documentation, see also the dix.dtd and dix.rng files in the GitHub repository.

XML tag Means Appears in XML tags / notes / examples
<dictionary> Mono- or bilingual dictionary Toplevel tag for all dictionaries
<alphabet> Set of characters in the language In <dictionary>
<sdefs> Symbol definitions In <dictionary>
<sdef> Symbol definition In <sdefs>. Ex: <sdef n="noun"/>
<pardefs> Paradigm definitions In <dictionary>.
<pardef> Paradigm definition In <pardefs>.
<section> A section of the dictionary In <dictionary>. Ex: <section id="main" type="standard">
<e> A dictionary entry (a word) In <section> and in <pardef>.
<i> Invariant (left and right side) In <e>. Ex.: <i>beer</i>
<p> A pair In <e>.
<l> Left side (surface form) In <p>. Ex.: <l>beer</l>
<r> Right side (lexical unit) In <p>. Ex.: <r>beer<s n="noun"/><s n="singular"/></r>
<s> A lexical symbol (noun, adj..) In <r>, <l> and <i>. Ex.: <s n="noun"/>
<a> Post-generator wake-up mark In <r>, <l> and <i>. Ex.: <l><a/>a<s ... (for the a/an rule in English)
<b> Blank space In <r>, <l> and <i>. Ex.: <l>you're<b/>welcome<s ...
<g> Group For multiwords
<ig> Identity group Combination of <i> and <g>
<j> Join A + symbol in compounds. In Apertium-separable, <j> indicates end-of-word
<prm> Parameter Only in Metadix
<sa> Symbol Argument ??? Only in Metadix
<t> Tag or Template In Apertium-separable <t> is any tag, in crossdix it is template (matches a single tag)
<v> Variable Only in crossdix - like + in regexes

Transfer

<clip> tag

See the documentation (pdf), p.144 for more information.

XML attribute value Means Appears in attribute Notes
whole lemma and grammatical symbols part
lem lemma part
lemh (inflected) head word of multiword part
lemq following queue of multiword part

See also