Difference between revisions of "List of symbols"

From Apertium
Jump to navigation Jump to search
(aor isn't Past)
 
(189 intermediate revisions by 31 users not shown)
Line 1: Line 1:
[[Liste de symboles|En français]] · [[Список символов|по-русски]]

This page lists the symbols in Apertium used to denote part-of-speech and further morphological features, as well as chunk tags used for more syntactic functions, as well as XML tags.

This page also documents alignment between Apertium morphological tags and [https://universaldependencies.org/ Universal Dependencies] [https://universaldependencies.org/u/pos/index.html POS tags] and [https://universaldependencies.org/u/feat/index.html features].


{{TOCD}}
{{TOCD}}
Eventually this will be a glossary of symbol names in alphabetical order with notes. Some of these names are specific to particular packages or language pairs, as not all languages have the same grammatical features (most don't have spatial distinction in articles for example).
This is meant to be a glossary of symbol names in alphabetical order with notes. Some of these names are specific to particular packages or language pairs, as not all languages have the same grammatical features (most don't have spatial distinction in articles for example).


If you were wondering what the symbols #, /, @, +, ~ or * mean, read [[Apertium stream format]].
==Categories==

<!-- comments following section headers are intended to make scraping this page easier -->

==Part-of-speech Categories== <!-- POS -->


{|class=wikitable
{|class=wikitable
! Symbol !! Gloss !! Notes
! Symbol !! Gloss !! Notes !! Universal POS
|-
| <code>n</code> || Noun || ''see 'np' for proper noun'' || NOUN
|-
| <code>vblex</code> || Standard ("lexical") verb || ''see also: vbser, vbhaver, vbmod, vaux, vbdo'' || VERB
|-
| <code>v</code> || Standard verb || shortened form of vblex, often used in agglutinative languages || VERB
|-
|-
| <code>n</code> || Noun || ''see 'np' for proper noun''
| <code>vbmod</code> || Modal verb || || VERB
|-
|-
| <code>vblex</code> || Standard verb || ''see also: vbser, vbhaver, vbmod, vaux''
| <code>vbser</code> || Verb "to be" || from ''ser'' (to be) || VERB or AUX
|-
|-
| <code>vbmod</code> || Modal verb ||
| <code>vbhaver</code> || Verb "to have" || from ''haver'' (to have) || VERB or AUX
|-
|-
| <code>vbser</code> || Verb "to be" || from ''ser'' (to be)
| <code>vbdo</code> || Verb "to do" || "to do" includes all eleven tenses and forms of to do, can also be an auxiliary verb || VERB or AUX
|-
|-
| <code>vbhaver</code> || Verb "to have" || from ''haver'' (to have)
| <code>vaux</code> || Auxiliary verb || [http://en.wikipedia.org/wiki/Auxilliary_verb wikipedia] || AUX
|-
|-
| <code>vaux</code> || Auxilliary verb || [http://en.wikipedia.org/wiki/Auxilliary_verb wikipedia]
| <code>cop</code> || Copula || [http://en.wikipedia.org/wiki/Copula_(linguistics) wikipedia]; sometimes verb-like, sometimes not || AUX
|-
|-
| <code>adj</code> || Adjective ||
| <code>adj</code> || Adjective || || ADJ
|-
|-
| <code>adv</code> || Adverb ||
| <code>adv</code> || Adverb || || ADV
|-
|-
| <code>preadv</code> || Pre-adverb ||
| <code>preadv</code> || Pre-adverb || || ADV
|-
|-
| <code>det</code> || Determiner || [http://en.wikipedia.org/wiki/Determiner_(class) wikipedia]
| <code>postadv</code> || Post-adverb || || ADV
|-
|-
| <code>prn</code> || Pronoun || [http://en.wikipedia.org/wiki/Pronoun wikipedia]
| <code>mod</code> || Modal word || [http://dic.academic.ru/dic.nsf/lingvistic/749] || PART
|-
|-
| <code>pr</code> || Preposition || [http://en.wikipedia.org/wiki/Preposition wikipedia]
| <code>det</code> || Determiner || [http://en.wikipedia.org/wiki/Determiner_(class) wikipedia] || DET
|-
|-
| <code>num</code> || Numeral ||
| <code>prn</code> || Pronoun || [http://en.wikipedia.org/wiki/Pronoun wikipedia] || PRON
|-
|-
| <code>np</code> || Proper noun || From ''nom propi'' [http://en.wikipedia.org/wiki/Proper_noun wikipedia]
| <code>pr</code> || Preposition || [http://en.wikipedia.org/wiki/Preposition wikipedia] || ADP
|-
|-
| <code>ij</code> || Interjection || [http://en.wikipedia.org/wiki/Interjection wikipedia]
| <code>post</code> || Postposition || || ADP
|-
|-
| <code>cnjcoo</code> || Co-ordinating conjunction || [http://en.wikipedia.org/wiki/Co-ordinating_conjunction wikipedia]
| <code>num</code> || Numeral || || NUM
|-
|-
| <code>np</code> || Proper noun || From ''nom propi'' [http://en.wikipedia.org/wiki/Proper_noun wikipedia] || PROPN
| <code>cnjsub</code> || Sub-ordinating conjunction ||
|-
|-
| <code>cnjadv</code> || Conjunctive adverb || [http://en.wikipedia.org/wiki/Conjunctive_adverb wikipedia]
| <code>ij</code> || Interjection || [http://en.wikipedia.org/wiki/Interjection wikipedia] || INTJ
|-
|-
| <code>cnjcoo</code> || Co-ordinating conjunction || [http://en.wikipedia.org/wiki/Co-ordinating_conjunction wikipedia] || CCONJ
|-
| <code>cnjsub</code> || Sub-ordinating conjunction || || SCONJ
|-
| <code>cnjadv</code> || Conjunctive adverb || [http://en.wikipedia.org/wiki/Conjunctive_adverb wikipedia] || SCONJ, ADV
|-
| <code>atp</code> || Attachable prefix || In [[German]], ''zusammen''- ||
|-
| <code>ideo</code> || Ideophone || ||
|-
| <code>clt</code> || Clitic || ||
|}
|}


=== Punctuation === <!-- punct -->
==Sub-categories==


{|class=wikitable
===Gender===
! Symbol !! Gloss !! Notes !! Universal POS
|-
| <code>sent</code> || Sentence-ending punctuation || e.g. full stop, question mark || PUNCT
|-
| <code>cm</code> || Comma punctuation || , || PUNCT PunctType=Comm
|-
| <code>lquot</code> || Left quote || « || PUNCT PunctType=Quot PunctSide=Ini
|-
| <code>rquot</code> || Right quote || » || PUNCT PunctType=Quot PunctSide=Fin
|-
| <code>lpar</code> || Left parenthesis || ( || PUNCT PunctType=Brck PunctSide=Ini
|-
| <code>rpar</code> || Right parenthesis || ) || PUNCT PunctType=Brck PunctSide=Fin
|-
| <code>guio</code> || Hyphen || - used to connect two words into one e.g. year-long|| PUNCT PunctType=Dash
|-
| <code>apos</code> || Apostrophe || ' or ' || PUNCT
|-
| <code>quot</code> || Quotation || " || PUNCT PunctType=Quot
|-
| <code>percent</code> || Percentage || % || PUNCT
|-
| <code>lquest</code> || Left question/exclamation mark || ¿¡ (''used in Spanish'') || PUNCT PunctSide=Ini
|-
| <code>clb</code> || Clause Boundary || Refers to any of the following symbols: .?;:!·… || PUNCT
|-
| <code>punct</code> || Punctuation || || PUNCT
|}


==Part-of-speech Sub-categories== <!-- subtype -->

===Gender=== <!-- gender -->

These tags are usually used with nouns. When they occur with things that agree/concord with nouns (like adjectives and verbs), they in fact constitute inflectional/grammatical tags.


{|class=wikitable
{|class=wikitable
! Symbol !! Gloss !! Notes
! Symbol !! Gloss !! Notes !! Universal features
|-
| <code>f</code> || Feminine || || Gender=Fem
|-
| <code>m</code> || Masculine || || Gender=Masc <!-- default -->
|-
| <code>nt</code> || Neuter || || Gender=Neut
|-
| <code>ma</code> || Masculine (animate) || Mostly in Slavic languages || Gender=Masc
|-
| <code>mi</code> || Masculine (inanimate) || Mostly in Slavic languages || Gender=Masc
|-
| <code>mp</code> || Masculine (personal) || in Polish || Gender=Masc
|-
|-
| <code>f</code> || Feminine ||
| <code>mn</code> || Masculine or neuter || || Gender=Masc,Neut
|-
|-
| <code>m</code> || Masculine ||
| <code>fn</code> || Feminine or neuter || || Gender=Fem,Neut
|-
|-
| <code>nt</code> || Neuter ||
| <code>mf</code> || Masculine or feminine || Used when masculine and feminine have the same form || Gender=Masc,Fem
|-
|-
| <code>ut</code> || Common || From ''utrum'', found in Scandinavian languages.
| <code>mfn</code> || Masculine , feminine , neuter || Used when masculine, feminine, and neuter have the same form || Gender=Masc,Fem,Neut
|-
|-
| <code>mf</code> || Masculine , feminine || This is used where the gender can be either masculine or feminine
| <code>ut</code> || Common || From ''utrum'', found in Scandinavian languages. || Gender=Com
|-
|-
| <code>un</code> || Common, neuter || As above, only common or neuter
| <code>un</code> || Common or neuter || As above, only common or neuter || Gender=Com,Neut
|-
|-
| <code>GD</code> || Gender to be determined ||
| <code>GD</code> || Gender to be determined || || <!-- unknown -->
|-
|-
|}
|}


===Count/Mass=== <!-- countability -->
===Number===

These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).


{|class=wikitable
{|class=wikitable
! Symbol !! Gloss !! Notes
! Symbol !! Gloss !! Notes !! Universal feature
|-
|-
| <code>sg</code> || Singular ||
| <code>cnt</code> || Countable || ||
|-
|-
| <code>pl</code> || Plural ||
| <code>unc</code> || Uncountable (mass) || ||
|-
|}

===Animacy=== <!-- animacy -->

These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).

{|class=wikitable
! Symbol !! Gloss !! Notes !! Universal feature
|-
|-
| <code>du</code> || Dual ||
| <code>aa</code> || Animate || || Animacy=Anim
|-
|-
| <code>sp</code> || Singular , plural ||
| <code>an</code> || Animate or inanimate || || Animacy=Anim,Inan
|-
|-
| <code>ND</code> || Number to be determined ||
| <code>nn</code> || Inanimate || || Animacy=Inan
|-
|-
| <code>hu</code> || Human || || Animacy=Hum
|}
|}


===Adjectives=== <!-- adj_type -->
===Case===


{|class=wikitable
{|class=wikitable
! Symbol !! Gloss !! Notes !! Universal feature
|-
| <code>sint</code> || Synthetic || "nice, nicer, nicest" is synthetic. "handsome, more handsome, the most handsome" is not. [http://en.wikipedia.org/wiki/Synthetic_language wikipedia] ||
|-
| <code>preadj</code> || Pre-adjective || for languages where most of adjectives are after the noun (ex: French in eo->fr bidix) ||
|-
| <code>preadj_nh</code> || Pre-adjective if not human || according to the noun, the adjective is before or after ||
|-
|}

===Noun Class === <!-- n_class -->

{| class="wikitable" border="1"
! Symbol !! Gloss !! Notes
! Symbol !! Gloss !! Notes
|-
|-
| <code>nom</code> || Nominative ||
| <code>cl1</code> || Noun class 1 ||
|-
|-
| <code>acc</code> || Accusative ||
| <code>cl2</code> || Noun class 2 ||
|-
|-
| <code>dat</code> || Dative ||
| <code>cl3</code> || Noun class 3 ||
|-
|-
| <code>gen</code> || Genitive ||
| <code>cl4</code> || Noun class 4 ||
|-
|-
| <code>voc</code> || Vocative ||
| <code>cl5</code> || Noun class 5 ||
|-
|-
| <code>ins</code> || Instrumental || [http://en.wikipedia.org/wiki/Instrumental_case wikipedia]
| <code>cl6</code> || Noun class 6 ||
|-
|-
| <code>loc</code> || Locative || [http://en.wikipedia.org/wiki/Locative wikipedia]
| <code>cl7</code> || Noun class 7 ||
|-
|-
| <code>abl</code> || Ablative || [http://en.wikipedia.org/wiki/Ablative wikipedia]
| <code>cl8</code> || Noun class 8 ||
|-
|-
| <code>cpr</code> || Prepositional || [http://en.wikipedia.org/wiki/Prepositional wikipedia]
| <code>cl9</code> || Noun class 9 ||
|-
|-
| <code>cl10</code> || Noun class 10 ||
|-
| <code>cl11</code> || Noun class 11 ||
|-
| <code>cl12</code> || Noun class 12 ||
|}
|}


===Pronoun types === <!-- prn_type -->
===Tense and mood===


{|class=wikitable
{| class="wikitable" border="1"
! Symbol !! Gloss !! Notes
! Symbol !! Gloss !! Notes !! Universal feature
|-
|-
| <code>pres</code> || Present ||
| <code>pers</code> || Personal || || PronType=Prs
|-
|-
| <code>past</code> || Past ||
| <code>tn</code> || Tónico || ||
|-
|-
| <code>imp</code> || Imperative ||
| <code>log</code> || Logophoric || ||
|-
|-
| <code>inf</code> || Infinitive ||
| <code>detnt</code> || Neuter determiner || POS? || DET
|-
|-
| <code>pp</code> || Past participle || [http://en.wikipedia.org/wiki/Participle wikipedia]
| <code>predet</code> || Pre determiner || POS? || DET
|-
|-
| <code>pp2</code> || Past participle (???) ||
| <code>atn</code> || Atónico || ||
|-
|-
| <code>pp3</code> || Past participle (???) ||
| <code>qnt</code> || Quantifier || || PronType=Ind
|-
|-
| <code>ger</code> || Gerund || [http://en.wikipedia.org/wiki/Gerund wikipedia]
| <code>ord</code> || Ordinal || || NumType=Ord
|-
|-
| <code>pri</code> || Present indicative || ''see also: pres''. [http://en.wikipedia.org/wiki/Present_indicative wikipedia]
| <code>obj</code> || Object || || Case=Acc
|-
|-
| <code>pii</code> || Imperfect || from ''Pretério imperfecto de indicativo''
| <code>subj</code> || Subject || || Case=Nom
|-
|-
| <code>fti</code> || Future indicative ||
| <code>pro</code> || Proclitic || ||
|-
|-
| <code>fts</code> || Conditional ??? ||
| <code>enc</code> || Enclitic || ||
|-
|-
| <code>cni</code> || Conditional ||
| <code>acr</code> || Acronym || Not Pronuon? || Abbr=Yes
|-
|-
| <code>prs</code> || Present subjunctive || [http://en.wikipedia.org/wiki/Present_subjunctive wikipedia]
| <code>rel</code> || Relative || || PronType=Rel
|-
|-
| <code>pis</code> || Imperfect subjunctive ||
| <code>ind</code> || Indefinite || || PronType=Ind
|-
|-
| <code>ifi</code> || || from ''Pretério perfecto o indefinido''
| <code>itg</code> || Interrogative || || PronType=Int
|-
|-
| <code>aff</code> || Affirmative ||
| <code>dem</code> || Demonstrative || || PronType=Dem
|-
|-
| <code>itg</code> || Interrogative ||
| <code>def</code> || Definite || || Definite=Def
|-
|-
| <code>neg</code> || Negative ||
| <code>pos</code> || Possessive || || Poss=Yes
|-
| <code>ref</code> || Reflexive || || Reflex=Yes
|-
| <code>prx</code> || Proximate || ||
|-
| <code>med</code> || Medial || ||
|-
| <code>dst</code> || Distal || ||
|-
| <code>expl</code> || Syntactic expletive || [https://en.wikipedia.org/wiki/Syntactic_expletive wikipedia] ||
|-
| <code>rec</code> || Reciprocal Pronoun || ||
|-
| <code>res</code> || Reciprocal Pronoun || ||
|}

=== Transitivity === <!-- transitivity -->

Used for verbs.

{| class="wikitable" border="1"
! Symbol !! Gloss !! Notes !! Universal feature
|-
| <code>tv</code> || Transitive || takes direct object in accusative case (used in Turkic) || Subcat=Tran
|-
| <code>iv</code> || Intransitive || does not take direct object in accusative case (used in Turkic) || Subcat=Intr
|-
| <code>TD</code> || Transitivity to be determined || if the sub-category is (currently) unknown || <!-- unknown -->
|}

===Separable verbs=== <!-- separable -->

{|class=wikitable
! Symbol !! Gloss !! Notes
|-
| <code>sep</code> || Separable verb || [https://en.wikipedia.org/wiki/Separable_verb wikipedia], [https://deutsch.lingolia.com/en/grammar/verbs/separable-verbs lingolia], [https://aclweb.org/anthology/P98-1078.pdf PDF]
|-
| <code>fs</code> || Separable verb in subordinate clause ||
|-
| <code>fm</code> || Separable verb in main clause ||
|-
|-
|}
|}


===Proper nouns===
===Proper nouns=== <!-- np_type -->


{|class=wikitable
{|class=wikitable
! Symbol !! Gloss !! Notes
! Symbol !! Gloss !! Notes
|-
|-
| <code>ant</code> || Anthroponym || [http://en.wikipedia.org/wiki/Anthroponym wikipedia]
| <code>ant</code> || Anthroponym || [http://en.wikipedia.org/wiki/Anthroponym wikipedia], it's very common to use ant together with f and m for traditionally gender-specific names
|-
|-
| <code>top</code> || Toponym || In some language pairs without the locative case this may be ''loc''. Although this should be changed. [http://en.wikipedia.org/wiki/Toponym wikipedia]
| <code>top</code> || Toponym || In some language pairs without the locative case this may be ''loc''. Although this should be changed. [http://en.wikipedia.org/wiki/Toponym wikipedia]
Line 172: Line 315:
|-
|-
| <code>al</code> || Altres || Other, misc.
| <code>al</code> || Altres || Other, misc.
|-
| <code>pat</code> ||Patronymic || A name derived from the name of a father or ancestor, e.g. Johnson, O'Brien, Ivanovich.
|}
|}


== Inflectional morphology == <!-- infl -->
===Person===

===Number=== <!-- number -->
Note: number can be a sub-category tag too, e.g. with pronouns.


{|class=wikitable
{|class=wikitable
! Symbol !! Gloss !! Notes
! Symbol !! Gloss !! Notes !! Universal feature
|-
| <code>sg</code> || Singular || || Number=Sing <!-- default -->
|-
| <code>pl</code> || Plural || || Number=Plur
|-
| <code>sp</code> || Singular or plural || || Number=Sing,Plur
|-
| <code>du</code> || Dual || || Number=Dual
|-
|-
| <code>p1</code> || First person ||
| <code>ct</code> || Count || see mk-bg || Number=Count
|-
|-
| <code>p2</code> || Second person ||
| <code>coll</code> || Collective || || Number=Coll
|-
|-
| <code>p3</code> || Third person ||
| <code>ND</code> || Number to be determined || || <!-- unknown -->
|-
|-
|}
|}


===Adjectives===


===Case=== <!-- case -->


{|class=wikitable
! Symbol !! Gloss !! Notes !! Universal feature
|-
| <code>nom</code> || Nominative || || Case=Nom
|-
| <code>acc</code> || Accusative || || Case=Acc
|-
| <code>dat</code> || Dative || || Case=Dat
|-
| <code>gen</code> || Genitive || || Case=Gen
|-
| <code>dg</code> || Dative and Genitive || in [[ro-es]], discouraged in new developments || Case=Dat,Gen
|-
| <code>voc</code> || Vocative || || Case=Voc
|-
| <code>abl</code> || Ablative || [http://en.wikipedia.org/wiki/Ablative wikipedia] || Case=Abl
|-
| <code>ins</code> || Instrumental or Instructive || [http://en.wikipedia.org/wiki/Instrumental_case wikipedia] || Case=Ins
|-
| <code>loc</code> || Locative || [http://en.wikipedia.org/wiki/Locative wikipedia] || Case=Loc
|-
| <code>prp</code> || Prepositional || [http://en.wikipedia.org/wiki/Prepositional wikipedia] ||
|-
| <code>tra</code> || Translative || || Case=Tra
|-
| <code>ill</code> || Illative || || Case=Ill
|-
| <code>ine</code> || Inessive || || Case=Ine
|-
| <code>ade</code> || Adessive || || Case=Ade
|-
| <code>all</code> || Allative || || Case=All
|-
| <code>abe</code> || Abessive || || Case=Abe
|-
| <code>ess</code> || Essive || || Case=Ess
|-
| <code>par</code> || Partitive || || Case=Par
|-
| <code>dis</code> || Distributive || || Case=Dis
|-
| <code>com</code> || Comitative || || Case=Com
|-
| <code>soc</code> || Sociative || ||
|-
| <code>prl</code> || Prolative || || Case=Pro
|-
| <code>ses</code> || Superessive || [[Hungarian]] || Case=Sup
|-
| <code>sub</code> || Sublative || [[Hungarian]] || Case=Sub
|-
| <code>dela</code> || Delative || [[Hungarian]] || Case=Del
|-
| <code>term</code> || Terminative || [[Hungarian]], Estonian, ... || Case=Ter
|-
| <code>temp</code> || Temporal || [https://en.wikipedia.org/wiki/Temporal_case] || Case=Tem
|-
| <code>obl</code> || Oblique || [https://en.wikipedia.org/wiki/Oblique_case] || Case=Obl
|-
| <code>erg</code> || Ergative || [https://en.wikipedia.org/wiki/Ergative_case] || Case=Erg
|-
| <code>CD</code> || Case to be determined || || <!-- unknown -->
|}

===Voice=== <!-- voice -->

{|class=wikitable
! Symbol !! Gloss !! Notes !! Universal feature
|-
| <code>actv</code> || Active voice || || Voice=Act
|-
| <code>pass</code> || Passive voice || is more used in Turkic. || Voice=Pass
|-
| <code>pasv</code> || Passive voice || is more used in Germanic. || Voice=Pass
|-
| <code>midv</code> || Middle voice || || Voice=Mid
|-
| <code>nactv</code> || Non-active voice || See Albanian. ||
|-
| <code>caus</code> || Causative voice || see also [[#Derivations]] || Voice=Cau
|-
|}

===Tense and mode=== <!-- tense -->

{|class=wikitable
! Symbol !! Gloss !! Notes !! Universal features
|-
| <code>aff</code> || Affirmative || [https://en.wikipedia.org/wiki/Affirmation_and_negation wikipedia] || Polarity=Pos
|-
| <code>aor</code> || Aorist || [https://en.wikipedia.org/wiki/Aorist wikipedia] A tense in Turkic languages. ||
|-
| <code>cni</code> || Conditional || Lot of pairs will probably use cnd or cond... || Mood=Cnd
|-
| <code>deb</code> || Debitive mode || Exclusive to Latvian ([https://en.wikipedia.org/wiki/Debitive wikipedia]) ||
|-
| <code>fti</code> || Future indicative || || Tense=Fut Mood=Ind
|-
| <code>fts</code> || Future subjunctive || || Tense=Fut Mood=Sub
|-
| <code>fut</code> || Future || || Tense=Fut
|-
| <code>ifi</code> || Past definite || from ''Pretério perfecto o indefinido'' || Tense=Past Definite=Def
|-
| <code>imp</code> || Imperative || [http://www.englishlanguageguide.com/grammar/imperative.asp englishlanguageguide] || Mood=Imp
|-
| <code>itg</code> || Interrogative || ||
|-
| <code>ito</code> || Infinitive with 'to' || [[German]] || VerbForm=Inf
|-
| <code>lp</code> || L-participle || ||
|-
| <code>neg</code> || Negative || || Polarity=Neg
|-
| <code>nonpast</code> || Non-past || || Tense=Pres,Fut
|-
| <code>past</code> || Past || || Tense=Past
|-
| <code>pii</code> || Imperfect || from ''Pretério imperfecto de indicativo'' [https://en.wikipedia.org/wiki/Imperfect wikipedia] || Tense=Past Mood=Ind Aspect=Imp
|-
| <code>pis</code> || Imperfect subjunctive || || Tense=Past Mood=Sub Aspect=Imp
|-
| <code>plu</code> || Pluperfect || In <code>cy-en</code> || Tense=Pqp
|-
| <code>pmp</code> || Pluperfect || In <code>es-gl</code> (from ''Pluscamperfecto'') || Tense=Pqp
|-
| <code>pp2</code> || Past participle (???) || It's at least used in the Esperanto dictionaries for future active participles, ''ont'' (seems quite odd) || VerbForm=Part Tense=Past
|-
| <code>pp3</code> || Past participle (???) || It's at least used in the Esperanto dictionaries for past active participles, ''int'' (seems quite odd) || VerbForm=Part Tense=Past
|-
| <code>pp</code> || Past participle || [http://en.wikipedia.org/wiki/Participle wikipedia] || VerbForm=Part Tense=Past
|-
| <code>pprs</code> || Present participle || Also appears as <code>ppres</code> (deprecated) || VerbForm=Part Tense=Pres
|-
| <code>ppres</code> || Present participle || ''see also: pprs''. [http://en.wikipedia.org/wiki/Present_participle wikipedia] || Tense=Pres VerbForm=Part
|-
| <code>pres</code> || Present || || Tense=Pres
|-
| <code>pret</code> || Preterite || [https://en.wikipedia.org/wiki/Preterite Preterite] || Tense=Past
|-
| <code>pri</code> || Present indicative || ''see also: pres''. [http://en.wikipedia.org/wiki/Present_indicative wikipedia] || Tense=Pres Mood=Ind
|-
| <code>prs</code> || Present subjunctive || [http://en.wikipedia.org/wiki/Present_subjunctive wikipedia] || Tense=Pres Mood=Sub
|-
| <code>supn</code> || Supine || [http://en.wikipedia.org/wiki/Supine wikipedia] || VerbForm=Sup
|}

=== Non-finite verb forms === <!-- nonfinite -->

These tags are used for non-finite verb forms, which are often elsewhere called "infinitives" or "participles". See https://doi.org/10.3765/ptu.v4i1.4587 for discussion.

==== Noun-like ==== <!-- verbal-nouns -->

{|class=wikitable
! Symbol !! Gloss !! Notes !! Universal features
|-
| <code>ger</code> || Gerund || || VerbForm=Vnoun
|-
| <code>ger_aor</code> || Aorist gerund || || VerbForm=Vnoun
|-
| <code>ger_fut</code> || Future gerund || || VerbForm=Vnoun Tense=Fut
|-
| <code>ger_hab</code> || Habitual gerund || || VerbForm=Vnoun Aspect=Hab
|-
| <code>ger_impf</code> || Imperfect gerund || || VerbForm=Vnoun Aspect=Imp
|-
| <code>ger_past</code> || Past gerund || || VerbForm=Vnoun Tense=Past
|-
| <code>ger_perf</code> || Perfect gerund || || VerbForm=Vnoun Aspect=Perf
|-
| <code>ger_pres</code> || Present gerund || || VerbForm=Vnoun Tense=Pres
|}

==== Adjective-like ==== <!-- verbal-adjectives -->

{|class=wikitable
! Symbol !! Gloss !! Notes !! Universal features
|-
| <code>gpr</code> || Verbal adjective || || VerbForm=Part
|-
| <code>gpr_aor</code> || Aorist verbal adjective || || VerbForm=Part
|-
| <code>gpr_fut</code> || Future verbal adjective || || VerbForm=Part Tense=Fut
|-
| <code>gpr_hab</code> || Habitual verbal adjective || || VerbForm=Part Aspect=Hab
|-
| <code>gpr_impf</code> || Imperfect verbal adjective || || VerbForm=Part Aspect=Imp
|-
| <code>gpr_past</code> || Past verbal adjective || || VerbForm=Part Tense=Past
|-
| <code>gpr_perf</code> || Perfect verbal adjective || || VerbForm=Part Aspect=Perf
|-
| <code>gpr_pres</code> || Present verbal adjective || || VerbForm=Part Tense=Pres
|}

==== Adverb-like ==== <!-- verbal-adverbs -->

{|class=wikitable
! Symbol !! Gloss !! Notes !! Universal features
|-
| <code>gna</code> || Verbal adverb || || VerbForm=Conv
|-
| <code>gna_aor</code> || Aorist verbal adverb || || VerbForm=Conv
|-
| <code>gna_fut</code> || Future verbal adverb || || VerbForm=Conv Tense=Fut
|-
| <code>gna_hab</code> || Habitual verbal adverb || || VerbForm=Conv Aspect=Hab
|-
| <code>gna_impf</code> || Imperfect verbal adverb || || VerbForm=Conv Aspect=Imp
|-
| <code>gna_past</code> || Past verbal adverb || || VerbForm=Conv Tense=Past
|-
| <code>gna_perf</code> || Perfect verbal adverb || || VerbForm=Conv Aspect=Perf
|-
| <code>gna_pres</code> || Present verbal adverb || || VerbForm=Conv Tense=Pres
|}

==== Infinitives ==== <!-- infinitives -->

Generally these must occur with auxiliaries.

{|class=wikitable
! Symbol !! Gloss !! Notes !! Universal features
|-
| <code>inf</code> || Infinitive || || VerbForm=Inf
|-
| <code>infps</code> || Personal infinitive || Used in Portuguese, likely should be merged || VerbForm=Inf
|-
| <code>prc_aor</code> || Aorist participle || || VerbForm=Inf
|-
| <code>prc_fut</code> || Future participle || || VerbForm=Inf Tense=Fut
|-
| <code>prc_hab</code> || Habitual participle || || VerbForm=Inf Aspect=Hab
|-
| <code>prc_impf</code> || Imperfect participle || || VerbForm=Inf Aspect=Imp
|-
| <code>prc_past</code> || Past participle || || VerbForm=Inf Tense=Past
|-
| <code>prc_perf</code> || Perfect participle || || VerbForm=Inf Aspect=Perf
|-
| <code>prc_pres</code> || Present participle || || VerbForm=Inf Tense=Pres
|}

===Aspect=== <!-- aspect -->
{|class=wikitable
! Symbol !! Gloss !! Notes !! Universal feature
|-
| <code>hab</code> || Habitual || || Aspect=Hab
|-
| <code>imperf</code> || Imperfective || Should be merged with <code>impf</code> || Aspect=Imp
|-
| <code>impf</code> || Imperfective || || Aspect=Imp
|-
| <code>perf</code> || Perfective || || Aspect=Perf
|}

===Person=== <!-- person -->
Note: person can be a sub-category tag, e.g. with pronouns.

{|class=wikitable
! Symbol !! Gloss !! Notes !! Universal feature
|-
| <code>p1</code> || First person || || Person=1
|-
| <code>p2</code> || Second person || || Person=2
|-
| <code>p3</code> || Third person || || Person=3
|-
| <code>impers</code> || Impersonal || Sometimes called 'autonomous' || Person=0
|-
| <code>past3p</code> || Past third person || In <code>rus</code> and <code>bel-rus</code>, should be 2 tags || Person=3 Tense=Past
|}

===Derivations=== <!-- verb_deriv -->
{|class=wikitable
{|class=wikitable
! Symbol !! Gloss !! Notes
! Symbol !! Gloss !! Notes
|-
|-
| <code>sint</code> || Synthetic || [http://en.wikipedia.org/wiki/Synthetic_language wikipedia]
| <code>caus</code> || Causative ||
|-
|-
| <code>pve</code> || Positive ||
| <code>ingr</code> || Ingressive || https://nn.wikipedia.org/w/index.php?title=Ingressiv
|-
|-
| <code>subs</code> || Verbal Noun or Verbal Substantive || Shorten form of ''substantive''. Noun formed from a verb
| <code>comp</code> || Comparative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia]
|-
|-
| <code>sup</code> || Superlative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia]
| <code>agnt</code> || Agent noun || [https://en.wikipedia.org/wiki/Agent_noun Agent Noun]
|-
|-
|}
| <code>attr</code> || Attributive || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia]

===Possession=== <!-- possessor -->
{|class=wikitable
! Symbol !! Gloss !! Notes !! Universal feature
|-
|-
| <code>px1sg</code> || First person singular possessive || e.g. in [[Turkic languages]] || Person[psor]=1 Number[psor]=Sing
| <code>pred</code> || Predicative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia]
|-
| <code>px2sg</code> || Second person singular possessive || e.g. in [[Turkic languages]] || Person[psor]=2 Number[psor]=Sing
|-
| <code>px3sg</code> || Third person singular possessive || e.g. in [[Turkic languages]] || Person[psor]=3 Number[psor]=Sing
|-
| <code>px1pl</code> || First person plural possessive || e.g. in [[Turkic languages]] || Person[psor]=1 Number[psor]=Plur
|-
| <code>px2pl</code> || Second person plural possessive || e.g. in [[Turkic languages]] || Person[psor]=2 Number[psor]=Plur
|-
| <code>px3pl</code> || Third person plural possessive || e.g. in [[Turkic languages]] || Person[psor]=3 Number[psor]=Plur
|-
| <code>px3sp</code> || Third person possessive singular or plural || e.g. in [[Turkic languages]] || Person[psor]=3
|-
|-
|}
|}


===Subject marking=== <!-- subject -->
<pre>


e.g. in verbs with both, otherwise, see [[#Person]] and [[#Number]].
<sdef n="impers" c="Impersonal"/>
<sdef n="tn" c="Tonico"/>


{|class=wikitable
! Symbol !! Gloss !! Notes !! Universal features
|-
| <code>s_sg1</code> || First person singular object || || Number[subj]=Sing Person[subj]=1
|-
| <code>s_sg2</code> || Second person singular object || || Number[subj]=Sing Person[subj]=2
|-
| <code>s_sg3</code> || Third person singular object || || Number[subj]=Sing Person[subj]=3
|-
| <code>s_pl1</code> || First person plural object || || Number[subj]=Plur Person[subj]=1
|-
| <code>s_pl2</code> || Second person plural object || || Number[subj]=Plur Person[subj]=2
|-
| <code>s_pl3</code> || Third person plural object || || Number[subj]=Plur Person[subj]=3
|-
|}


<sdef n="detnt" c="Neuter determiner"/>
<sdef n="predet" c="Pre determiner"/>
<sdef n="atn"/>
<sdef n="qnt" c="Quantifier"/>
<sdef n="ord" c="Ordinal"/>
<sdef n="obj" c="Object"/>
<sdef n="subj" c="Subject"/>
<sdef n="pro" c="Proclitic"/>
<sdef n="enc" c="Enclitic"/>


===Object marking=== <!-- object -->
<sdef n="acr" c="Acronym"/>


e.g. in verbs with both
<sdef n="rel" c="Relative"/>
<sdef n="nn" c=""/>
<sdef n="an" c=""/>
<sdef n="aa" c=""/>
<sdef n="ind" c="Indefinite"/>
<sdef n="itg" c="Interrogative"/>
<sdef n="dem" c="Demonstrative"/>
<sdef n="def" c="Definite"/>
<sdef n="pos" c="Possessive"/>
<sdef n="ref" c="Reflexive"/>


{|class=wikitable
! Symbol !! Gloss !! Notes !! Universal features
|-
| <code>o_sg1</code> || First person singular object || || Number[obj]=Sing Person[obj]=1
|-
| <code>o_sg2</code> || Second person singular object || || Number[obj]=Sing Person[obj]=2
|-
| <code>o_sg3</code> || Third person singular object || || Number[obj]=Sing Person[obj]=3
|-
| <code>o_pl1</code> || First person plural object || || Number[obj]=Plur Person[obj]=1
|-
| <code>o_pl2</code> || Second person plural object || || Number[obj]=Plur Person[obj]=2
|-
| <code>o_pl3</code> || Third person plural object || || Number[obj]=Plur Person[obj]=3
|-
|}


===Adjectives=== <!-- adj_infl -->


{|class=wikitable
<sdef n="ind" c="Indefinite"/>
! Symbol !! Gloss !! Notes !! Universal features
<sdef n="def" c="Definite"/>
|-
<sdef n="prx" c="Proximate"/>
| <code>pst</code> || Positive || || Degree=Pos
<sdef n="dst" c="Distal"/>
|-
| <code>comp</code> || Comparative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] || Degree=Comp
|-
| <code>sup</code> || Superlative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] || Degree=Sup
|-
| <code>attr</code> || Attributive || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] ||
|-
| <code>pred</code> || Predicative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] ||
|-
|-<code>short</code> || Short adjective ||
|}


===Formality=== <!-- formality -->
</pre>
{|class=wikitable
! Symbols !! Gloss !! Notes
|-
| <code>crd</code> || Cordial ||
|-
| <code>el</code> || Elite ||
|-
| <code>fam</code> || Familiar ||
|-
| <code>frm</code> || Formal ||
|-
| <code>infml</code> || Informal ||
|-
| <code>pol</code> || Polite ||
|-
| <code>low</code> || Low courtesy ||
|-
| <code>mid</code> || Mid courtesy ||
|-
| <code>hi</code> || High courtesy ||
|}


===Specificity=== <!-- specificity -->
{|class=wikitable
! Symbols !! Gloss !! Notes
|-
| <code>spc</code> || Specific || Definite=Spec
|-
| <code>nspc</code> || Non-sepecific ||
|}

===Others=== <!-- other -->
{|class=wikitable
! Symbol !! Gloss !! Notes
|-
| <code>abbr</code> || Abbreviation (e.g. ''etc., Mr.'') || Acronyms are also included (see <code>acr</code>)
|-
| <code>date</code> || Dates, years... ||
|-
| <code>email</code> || Electronic Mail || Shorten form of Electronic Mail
|-
| <code>file</code> || Filenames ||
|-
| <code>mon</code> || Money ||
|-
| <code>percent</code> || Percentage || e.g. 25%, 0.9%
|-
| <code>time</code> || Time ||
|-
| <code>url</code> || Web address ||
|-
| <code>web</code> || Links and Emails ||
|-
| <code>year</code> || Years ||
|-
| <code>maj</code> || Large script in which every letter is the same height ||
|-
| <code>min</code> || small script in which every letter is the same height ||
|}


=== Compounds === <!-- compound -->
===XML tags===


{|class=wikitable
{|class=wikitable
! XML tag !! Means !! Appears in XML tags / notes / examples
! Symbol !! Gloss !! Notes !! Universal feature
|-
|-
| <code>cmp</code> || Compound Noun || ||
| <code><dictionary></code> || Mono- or bilingual dictionary || In files apertium-eo-en.en.dix, apertium-eo-en.eo-en.dix, apertium-eo-en.post-en.dix, apertium-eo-en.post-eo.dix
|}

==Chunk tags== <!-- chunk -->

{|class=wikitable
! Tag !! Description
|-
|-
| {{tag|SN}} || Noun phrase / noun group (''sintagma nominal'')
| <code><alphabet></code> || Set of characters in the language|| In <code><dictionary></code>
|-
| {{tag|SA}} || Adjective phrase / adjective group
|-
|-
| {{tag|SV}} || Verb phrase / verb group (''sintagma verbal'')
| <code><sdefs></code> || Symbol definitions || In <code><dictionary></code>
|-
|-
|}
| <code><sdef></code> || Symbol definition || In <code><sdefs></code>. Ex: <code><sdef n="noun"/></code>

==XML tags== <!-- xml -->
Note: All XML tags are explained in depth in the PDF [[documentation]], see also the [https://github.com/apertium/lttoolbox/blob/master/lttoolbox/dix.dtd dix.dtd] and [https://github.com/apertium/lttoolbox/blob/master/lttoolbox/dix.rng dix.rng] files in the GitHub repository.

{|class=wikitable
! XML tag !! Means !! Appears in XML tags / notes / examples
|-
|-
| <code><section> </code> || A section of the dictionary || In <code><dictionary></code>. Ex: <code><section id="main" type="standard"></code>
| <code><dictionary></code> || Mono- or bilingual dictionary || Toplevel tag for all dictionaries
|-
|-
| <code>&lt;e></code> || A dictionary entry (a word) || In <code><section></code>.
| <code><alphabet></code> || Set of characters in the language|| In <code><dictionary></code>
|-
| <code><sdefs></code> || Symbol definitions || In <code>&lt;dictionary></code>
|-
| <code><sdef></code> || Symbol definition || In <code>&lt;sdefs></code>. Ex: <code>&lt;sdef n="noun"/></code>
|-
| <code><pardefs></code> || Paradigm definitions || In <code>&lt;dictionary></code>.
|-
| <code><pardef></code> || Paradigm definition || In <code>&lt;pardefs></code>.
|-
| <code>&lt;section></code> || A section of the dictionary || In <code>&lt;dictionary></code>. Ex: <code>&lt;section id="main" type="standard"></code>
|-
| <code>&lt;e></code> || A dictionary entry (a word) || In <code>&lt;section></code> and in <code>&lt;pardef></code>.
|-
| <code>&lt;i></code> || Invariant (left and right side) || In <code>&lt;e></code>. Ex.: <code>&lt;i>beer&lt;/i></code>
|-
|-
| <code>&lt;p></code> || A pair || In <code><e></code>.
| <code>&lt;p></code> || A pair || In <code><e></code>.
Line 270: Line 826:
| <code>&lt;r></code> || Right side (lexical unit) || In <code>&lt;p></code>. Ex.: <code><r>beer&lt;s n="noun"/>&lt;s n="singular"/></r></code>
| <code>&lt;r></code> || Right side (lexical unit) || In <code>&lt;p></code>. Ex.: <code><r>beer&lt;s n="noun"/>&lt;s n="singular"/></r></code>
|-
|-
| <code>&lt;s></code> || A lexical symbol (noun, adj..) || In <code><r></code>. Ex.: <code><s n="noun"/></code>
| <code>&lt;s></code> || A lexical symbol (noun, adj..) || In <code>&lt;r></code>, <code>&lt;l></code> and <code>&lt;i></code>. Ex.: <code>&lt;s n="noun"/></code>
|-
|-
| <code><pardefs></code> || Paradigm definition (???) || In <code><dictionary></code>. Ex.: ??? TODO
| <code>&lt;a></code> || Post-generator wake-up mark || In <code>&lt;r></code>, <code>&lt;l></code> and <code>&lt;i></code>. Ex.: <code>&lt;l>&lt;a/>a&lt;s ...</code> (for the a/an rule in English)
|-
|-
| <code>&lt;b></code> || Blank space || In <code>&lt;r></code>, <code>&lt;l></code> and <code>&lt;i></code>. Ex.: <code>&lt;l>you're&lt;b/>welcome&lt;s ...</code>
|-
| <code>&lt;g></code> || Group || For [[Chunking:_A_full_example#Handling_of_multiwords_with_inner_inflection|multiwords]]
|-
| <code>&lt;ig></code> || Identity group || Combination of <code>&lt;i></code> and <code>&lt;g></code>
|-
| <code>&lt;j></code> || Join || A <code>+</code> symbol in compounds
|-
| <code>&lt;prm></code> || Parameter || Only in [[Metadix]]
|-
| <code>&lt;sa></code> || Symbol Argument ??? || Only in [[Metadix]]
|-
| <code>&lt;t></code> || Tag or Template || In [[Apertium-separable]] <code>&lt;t></code> is any tag, in crossdix it is template (matches a single tag)
|-
| <code>&lt;d></code> || Delimiter || In [[Apertium-separable]] marks end-of-word
|-
| <code>&lt;v></code> || Variable || Only in crossdix - like + in regexes
|}
|}

TODO: Probably there are more. --[[User:Jacob Nordfalk|Jacob Nordfalk]] 14:47, 25 August 2008 (UTC)
=== Transfer ===

==== <clip> tag ==== <!-- clip -->

See the [https://wiki.apertium.org/w/images/d/d0/Apertium2-documentation.pdf documentation (pdf)], p.144 for more information.

{|class=wikitable
! XML attribute value !! Means !! Appears in attribute || Notes
|-
| <code>whole</code> || lemma and grammatical symbols || part
|-
| <code>lem</code> || lemma || part
|-
| <code>lemh</code> || (inflected) head word of [[Chunking:_A_full_example#Handling_of_multiwords_with_inner_inflection|multiword]] || part
|-
| <code>lemq</code> || following queue of [[Chunking:_A_full_example#Handling_of_multiwords_with_inner_inflection|multiword]] || part
|-
|}

==Scraping this page==

This page should be relatively scrapeable if requested with <code>?action=raw</code>.

Section headers which precede tables all have <code>=</code> as the first character of the line and have a category name without spaces in a comment.

Lines that define tags begin with <code>| &lt;code&gt;</code>. Splitting a line on <code>||</code> gives either 3 or 4 columns. The 4th column can be split on spaces to give UD POS tags and feature values or the word <code>or</code>. These are mixed together but features have <code>=</code> and POS tags don't. A line might be followed by a comment containing either <code>unknown</code> or <code>default</code>, which indicate a placeholder tag or a tag which is commonly used when the correct value cannot be determined, respectively.

A Python scraper script can be found at https://github.com/mr-martian/apertium-recursive-learning/blob/master/tags.py

==See also==
* [[Turkic lexicon|Guidelines for tag assignment (etc.) in Turkic]]
* [[Tagging guidelines for Portuguese]]
* [[Syntax tags]]
* [[Secondary tags]]
* [[Apertium stream format]]
* [[User:Adverick#FreeMind_Apertium_PoS|FreeMind Apertium PoS]]

[[Category:Documentation in English]]

Latest revision as of 15:36, 9 May 2024

En français · по-русски

This page lists the symbols in Apertium used to denote part-of-speech and further morphological features, as well as chunk tags used for more syntactic functions, as well as XML tags.

This page also documents alignment between Apertium morphological tags and Universal Dependencies POS tags and features.


This is meant to be a glossary of symbol names in alphabetical order with notes. Some of these names are specific to particular packages or language pairs, as not all languages have the same grammatical features (most don't have spatial distinction in articles for example).

If you were wondering what the symbols #, /, @, +, ~ or * mean, read Apertium stream format.


Part-of-speech Categories[edit]

Symbol Gloss Notes Universal POS
n Noun see 'np' for proper noun NOUN
vblex Standard ("lexical") verb see also: vbser, vbhaver, vbmod, vaux, vbdo VERB
v Standard verb shortened form of vblex, often used in agglutinative languages VERB
vbmod Modal verb VERB
vbser Verb "to be" from ser (to be) VERB or AUX
vbhaver Verb "to have" from haver (to have) VERB or AUX
vbdo Verb "to do" "to do" includes all eleven tenses and forms of to do, can also be an auxiliary verb VERB or AUX
vaux Auxiliary verb wikipedia AUX
cop Copula wikipedia; sometimes verb-like, sometimes not AUX
adj Adjective ADJ
adv Adverb ADV
preadv Pre-adverb ADV
postadv Post-adverb ADV
mod Modal word [1] PART
det Determiner wikipedia DET
prn Pronoun wikipedia PRON
pr Preposition wikipedia ADP
post Postposition ADP
num Numeral NUM
np Proper noun From nom propi wikipedia PROPN
ij Interjection wikipedia INTJ
cnjcoo Co-ordinating conjunction wikipedia CCONJ
cnjsub Sub-ordinating conjunction SCONJ
cnjadv Conjunctive adverb wikipedia SCONJ, ADV
atp Attachable prefix In German, zusammen-
ideo Ideophone
clt Clitic

Punctuation[edit]

Symbol Gloss Notes Universal POS
sent Sentence-ending punctuation e.g. full stop, question mark PUNCT
cm Comma punctuation , PUNCT PunctType=Comm
lquot Left quote « PUNCT PunctType=Quot PunctSide=Ini
rquot Right quote » PUNCT PunctType=Quot PunctSide=Fin
lpar Left parenthesis ( PUNCT PunctType=Brck PunctSide=Ini
rpar Right parenthesis ) PUNCT PunctType=Brck PunctSide=Fin
guio Hyphen - used to connect two words into one e.g. year-long PUNCT PunctType=Dash
apos Apostrophe ' or ' PUNCT
quot Quotation " PUNCT PunctType=Quot
percent Percentage % PUNCT
lquest Left question/exclamation mark ¿¡ (used in Spanish) PUNCT PunctSide=Ini
clb Clause Boundary Refers to any of the following symbols: .?;:!·… PUNCT
punct Punctuation PUNCT

Part-of-speech Sub-categories[edit]

Gender[edit]

These tags are usually used with nouns. When they occur with things that agree/concord with nouns (like adjectives and verbs), they in fact constitute inflectional/grammatical tags.

Symbol Gloss Notes Universal features
f Feminine Gender=Fem
m Masculine Gender=Masc
nt Neuter Gender=Neut
ma Masculine (animate) Mostly in Slavic languages Gender=Masc
mi Masculine (inanimate) Mostly in Slavic languages Gender=Masc
mp Masculine (personal) in Polish Gender=Masc
mn Masculine or neuter Gender=Masc,Neut
fn Feminine or neuter Gender=Fem,Neut
mf Masculine or feminine Used when masculine and feminine have the same form Gender=Masc,Fem
mfn Masculine , feminine , neuter Used when masculine, feminine, and neuter have the same form Gender=Masc,Fem,Neut
ut Common From utrum, found in Scandinavian languages. Gender=Com
un Common or neuter As above, only common or neuter Gender=Com,Neut
GD Gender to be determined

Count/Mass[edit]

These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).

Symbol Gloss Notes Universal feature
cnt Countable
unc Uncountable (mass)

Animacy[edit]

These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).

Symbol Gloss Notes Universal feature
aa Animate Animacy=Anim
an Animate or inanimate Animacy=Anim,Inan
nn Inanimate Animacy=Inan
hu Human Animacy=Hum

Adjectives[edit]

Symbol Gloss Notes Universal feature
sint Synthetic "nice, nicer, nicest" is synthetic. "handsome, more handsome, the most handsome" is not. wikipedia
preadj Pre-adjective for languages where most of adjectives are after the noun (ex: French in eo->fr bidix)
preadj_nh Pre-adjective if not human according to the noun, the adjective is before or after

Noun Class[edit]

Symbol Gloss Notes
cl1 Noun class 1
cl2 Noun class 2
cl3 Noun class 3
cl4 Noun class 4
cl5 Noun class 5
cl6 Noun class 6
cl7 Noun class 7
cl8 Noun class 8
cl9 Noun class 9
cl10 Noun class 10
cl11 Noun class 11
cl12 Noun class 12

Pronoun types[edit]

Symbol Gloss Notes Universal feature
pers Personal PronType=Prs
tn Tónico
log Logophoric
detnt Neuter determiner POS? DET
predet Pre determiner POS? DET
atn Atónico
qnt Quantifier PronType=Ind
ord Ordinal NumType=Ord
obj Object Case=Acc
subj Subject Case=Nom
pro Proclitic
enc Enclitic
acr Acronym Not Pronuon? Abbr=Yes
rel Relative PronType=Rel
ind Indefinite PronType=Ind
itg Interrogative PronType=Int
dem Demonstrative PronType=Dem
def Definite Definite=Def
pos Possessive Poss=Yes
ref Reflexive Reflex=Yes
prx Proximate
med Medial
dst Distal
expl Syntactic expletive wikipedia
rec Reciprocal Pronoun
res Reciprocal Pronoun

Transitivity[edit]

Used for verbs.

Symbol Gloss Notes Universal feature
tv Transitive takes direct object in accusative case (used in Turkic) Subcat=Tran
iv Intransitive does not take direct object in accusative case (used in Turkic) Subcat=Intr
TD Transitivity to be determined if the sub-category is (currently) unknown

Separable verbs[edit]

Symbol Gloss Notes
sep Separable verb wikipedia, lingolia, PDF
fs Separable verb in subordinate clause
fm Separable verb in main clause

Proper nouns[edit]

Symbol Gloss Notes
ant Anthroponym wikipedia, it's very common to use ant together with f and m for traditionally gender-specific names
top Toponym In some language pairs without the locative case this may be loc. Although this should be changed. wikipedia
hyd Hydronym wikipedia
cog Cognomen In normal use, surnames
org Organisation
al Altres Other, misc.
pat Patronymic A name derived from the name of a father or ancestor, e.g. Johnson, O'Brien, Ivanovich.

Inflectional morphology[edit]

Number[edit]

Note: number can be a sub-category tag too, e.g. with pronouns.

Symbol Gloss Notes Universal feature
sg Singular Number=Sing
pl Plural Number=Plur
sp Singular or plural Number=Sing,Plur
du Dual Number=Dual
ct Count see mk-bg Number=Count
coll Collective Number=Coll
ND Number to be determined


Case[edit]

Symbol Gloss Notes Universal feature
nom Nominative Case=Nom
acc Accusative Case=Acc
dat Dative Case=Dat
gen Genitive Case=Gen
dg Dative and Genitive in ro-es, discouraged in new developments Case=Dat,Gen
voc Vocative Case=Voc
abl Ablative wikipedia Case=Abl
ins Instrumental or Instructive wikipedia Case=Ins
loc Locative wikipedia Case=Loc
prp Prepositional wikipedia
tra Translative Case=Tra
ill Illative Case=Ill
ine Inessive Case=Ine
ade Adessive Case=Ade
all Allative Case=All
abe Abessive Case=Abe
ess Essive Case=Ess
par Partitive Case=Par
dis Distributive Case=Dis
com Comitative Case=Com
soc Sociative
prl Prolative Case=Pro
ses Superessive Hungarian Case=Sup
sub Sublative Hungarian Case=Sub
dela Delative Hungarian Case=Del
term Terminative Hungarian, Estonian, ... Case=Ter
temp Temporal [2] Case=Tem
obl Oblique [3] Case=Obl
erg Ergative [4] Case=Erg
CD Case to be determined

Voice[edit]

Symbol Gloss Notes Universal feature
actv Active voice Voice=Act
pass Passive voice is more used in Turkic. Voice=Pass
pasv Passive voice is more used in Germanic. Voice=Pass
midv Middle voice Voice=Mid
nactv Non-active voice See Albanian.
caus Causative voice see also #Derivations Voice=Cau

Tense and mode[edit]

Symbol Gloss Notes Universal features
aff Affirmative wikipedia Polarity=Pos
aor Aorist wikipedia A tense in Turkic languages.
cni Conditional Lot of pairs will probably use cnd or cond... Mood=Cnd
deb Debitive mode Exclusive to Latvian (wikipedia)
fti Future indicative Tense=Fut Mood=Ind
fts Future subjunctive Tense=Fut Mood=Sub
fut Future Tense=Fut
ifi Past definite from Pretério perfecto o indefinido Tense=Past Definite=Def
imp Imperative englishlanguageguide Mood=Imp
itg Interrogative
ito Infinitive with 'to' German VerbForm=Inf
lp L-participle
neg Negative Polarity=Neg
nonpast Non-past Tense=Pres,Fut
past Past Tense=Past
pii Imperfect from Pretério imperfecto de indicativo wikipedia Tense=Past Mood=Ind Aspect=Imp
pis Imperfect subjunctive Tense=Past Mood=Sub Aspect=Imp
plu Pluperfect In cy-en Tense=Pqp
pmp Pluperfect In es-gl (from Pluscamperfecto) Tense=Pqp
pp2 Past participle (???) It's at least used in the Esperanto dictionaries for future active participles, ont (seems quite odd) VerbForm=Part Tense=Past
pp3 Past participle (???) It's at least used in the Esperanto dictionaries for past active participles, int (seems quite odd) VerbForm=Part Tense=Past
pp Past participle wikipedia VerbForm=Part Tense=Past
pprs Present participle Also appears as ppres (deprecated) VerbForm=Part Tense=Pres
ppres Present participle see also: pprs. wikipedia Tense=Pres VerbForm=Part
pres Present Tense=Pres
pret Preterite Preterite Tense=Past
pri Present indicative see also: pres. wikipedia Tense=Pres Mood=Ind
prs Present subjunctive wikipedia Tense=Pres Mood=Sub
supn Supine wikipedia VerbForm=Sup

Non-finite verb forms[edit]

These tags are used for non-finite verb forms, which are often elsewhere called "infinitives" or "participles". See https://doi.org/10.3765/ptu.v4i1.4587 for discussion.

Noun-like[edit]

Symbol Gloss Notes Universal features
ger Gerund VerbForm=Vnoun
ger_aor Aorist gerund VerbForm=Vnoun
ger_fut Future gerund VerbForm=Vnoun Tense=Fut
ger_hab Habitual gerund VerbForm=Vnoun Aspect=Hab
ger_impf Imperfect gerund VerbForm=Vnoun Aspect=Imp
ger_past Past gerund VerbForm=Vnoun Tense=Past
ger_perf Perfect gerund VerbForm=Vnoun Aspect=Perf
ger_pres Present gerund VerbForm=Vnoun Tense=Pres

Adjective-like[edit]

Symbol Gloss Notes Universal features
gpr Verbal adjective VerbForm=Part
gpr_aor Aorist verbal adjective VerbForm=Part
gpr_fut Future verbal adjective VerbForm=Part Tense=Fut
gpr_hab Habitual verbal adjective VerbForm=Part Aspect=Hab
gpr_impf Imperfect verbal adjective VerbForm=Part Aspect=Imp
gpr_past Past verbal adjective VerbForm=Part Tense=Past
gpr_perf Perfect verbal adjective VerbForm=Part Aspect=Perf
gpr_pres Present verbal adjective VerbForm=Part Tense=Pres

Adverb-like[edit]

Symbol Gloss Notes Universal features
gna Verbal adverb VerbForm=Conv
gna_aor Aorist verbal adverb VerbForm=Conv
gna_fut Future verbal adverb VerbForm=Conv Tense=Fut
gna_hab Habitual verbal adverb VerbForm=Conv Aspect=Hab
gna_impf Imperfect verbal adverb VerbForm=Conv Aspect=Imp
gna_past Past verbal adverb VerbForm=Conv Tense=Past
gna_perf Perfect verbal adverb VerbForm=Conv Aspect=Perf
gna_pres Present verbal adverb VerbForm=Conv Tense=Pres

Infinitives[edit]

Generally these must occur with auxiliaries.

Symbol Gloss Notes Universal features
inf Infinitive VerbForm=Inf
infps Personal infinitive Used in Portuguese, likely should be merged VerbForm=Inf
prc_aor Aorist participle VerbForm=Inf
prc_fut Future participle VerbForm=Inf Tense=Fut
prc_hab Habitual participle VerbForm=Inf Aspect=Hab
prc_impf Imperfect participle VerbForm=Inf Aspect=Imp
prc_past Past participle VerbForm=Inf Tense=Past
prc_perf Perfect participle VerbForm=Inf Aspect=Perf
prc_pres Present participle VerbForm=Inf Tense=Pres

Aspect[edit]

Symbol Gloss Notes Universal feature
hab Habitual Aspect=Hab
imperf Imperfective Should be merged with impf Aspect=Imp
impf Imperfective Aspect=Imp
perf Perfective Aspect=Perf

Person[edit]

Note: person can be a sub-category tag, e.g. with pronouns.

Symbol Gloss Notes Universal feature
p1 First person Person=1
p2 Second person Person=2
p3 Third person Person=3
impers Impersonal Sometimes called 'autonomous' Person=0
past3p Past third person In rus and bel-rus, should be 2 tags Person=3 Tense=Past

Derivations[edit]

Symbol Gloss Notes
caus Causative
ingr Ingressive https://nn.wikipedia.org/w/index.php?title=Ingressiv
subs Verbal Noun or Verbal Substantive Shorten form of substantive. Noun formed from a verb
agnt Agent noun Agent Noun

Possession[edit]

Symbol Gloss Notes Universal feature
px1sg First person singular possessive e.g. in Turkic languages Person[psor]=1 Number[psor]=Sing
px2sg Second person singular possessive e.g. in Turkic languages Person[psor]=2 Number[psor]=Sing
px3sg Third person singular possessive e.g. in Turkic languages Person[psor]=3 Number[psor]=Sing
px1pl First person plural possessive e.g. in Turkic languages Person[psor]=1 Number[psor]=Plur
px2pl Second person plural possessive e.g. in Turkic languages Person[psor]=2 Number[psor]=Plur
px3pl Third person plural possessive e.g. in Turkic languages Person[psor]=3 Number[psor]=Plur
px3sp Third person possessive singular or plural e.g. in Turkic languages Person[psor]=3

Subject marking[edit]

e.g. in verbs with both, otherwise, see #Person and #Number.

Symbol Gloss Notes Universal features
s_sg1 First person singular object Number[subj]=Sing Person[subj]=1
s_sg2 Second person singular object Number[subj]=Sing Person[subj]=2
s_sg3 Third person singular object Number[subj]=Sing Person[subj]=3
s_pl1 First person plural object Number[subj]=Plur Person[subj]=1
s_pl2 Second person plural object Number[subj]=Plur Person[subj]=2
s_pl3 Third person plural object Number[subj]=Plur Person[subj]=3


Object marking[edit]

e.g. in verbs with both

Symbol Gloss Notes Universal features
o_sg1 First person singular object Number[obj]=Sing Person[obj]=1
o_sg2 Second person singular object Number[obj]=Sing Person[obj]=2
o_sg3 Third person singular object Number[obj]=Sing Person[obj]=3
o_pl1 First person plural object Number[obj]=Plur Person[obj]=1
o_pl2 Second person plural object Number[obj]=Plur Person[obj]=2
o_pl3 Third person plural object Number[obj]=Plur Person[obj]=3

Adjectives[edit]

Symbol Gloss Notes Universal features
pst Positive Degree=Pos
comp Comparative wikipedia Degree=Comp
sup Superlative wikipedia Degree=Sup
attr Attributive wikipedia
pred Predicative wikipedia

Formality[edit]

Symbols Gloss Notes
crd Cordial
el Elite
fam Familiar
frm Formal
infml Informal
pol Polite
low Low courtesy
mid Mid courtesy
hi High courtesy

Specificity[edit]

Symbols Gloss Notes
spc Specific Definite=Spec
nspc Non-sepecific

Others[edit]

Symbol Gloss Notes
abbr Abbreviation (e.g. etc., Mr.) Acronyms are also included (see acr)
date Dates, years...
email Electronic Mail Shorten form of Electronic Mail
file Filenames
mon Money
percent Percentage e.g. 25%, 0.9%
time Time
url Web address
web Links and Emails
year Years
maj Large script in which every letter is the same height
min small script in which every letter is the same height

Compounds[edit]

Symbol Gloss Notes Universal feature
cmp Compound Noun

Chunk tags[edit]

Tag Description
<SN> Noun phrase / noun group (sintagma nominal)
<SA> Adjective phrase / adjective group
<SV> Verb phrase / verb group (sintagma verbal)

XML tags[edit]

Note: All XML tags are explained in depth in the PDF documentation, see also the dix.dtd and dix.rng files in the GitHub repository.

XML tag Means Appears in XML tags / notes / examples
<dictionary> Mono- or bilingual dictionary Toplevel tag for all dictionaries
<alphabet> Set of characters in the language In <dictionary>
<sdefs> Symbol definitions In <dictionary>
<sdef> Symbol definition In <sdefs>. Ex: <sdef n="noun"/>
<pardefs> Paradigm definitions In <dictionary>.
<pardef> Paradigm definition In <pardefs>.
<section> A section of the dictionary In <dictionary>. Ex: <section id="main" type="standard">
<e> A dictionary entry (a word) In <section> and in <pardef>.
<i> Invariant (left and right side) In <e>. Ex.: <i>beer</i>
<p> A pair In <e>.
<l> Left side (surface form) In <p>. Ex.: <l>beer</l>
<r> Right side (lexical unit) In <p>. Ex.: <r>beer<s n="noun"/><s n="singular"/></r>
<s> A lexical symbol (noun, adj..) In <r>, <l> and <i>. Ex.: <s n="noun"/>
<a> Post-generator wake-up mark In <r>, <l> and <i>. Ex.: <l><a/>a<s ... (for the a/an rule in English)
<b> Blank space In <r>, <l> and <i>. Ex.: <l>you're<b/>welcome<s ...
<g> Group For multiwords
<ig> Identity group Combination of <i> and <g>
<j> Join A + symbol in compounds
<prm> Parameter Only in Metadix
<sa> Symbol Argument ??? Only in Metadix
<t> Tag or Template In Apertium-separable <t> is any tag, in crossdix it is template (matches a single tag)
<d> Delimiter In Apertium-separable marks end-of-word
<v> Variable Only in crossdix - like + in regexes

Transfer[edit]

<clip> tag[edit]

See the documentation (pdf), p.144 for more information.

XML attribute value Means Appears in attribute Notes
whole lemma and grammatical symbols part
lem lemma part
lemh (inflected) head word of multiword part
lemq following queue of multiword part

Scraping this page[edit]

This page should be relatively scrapeable if requested with ?action=raw.

Section headers which precede tables all have = as the first character of the line and have a category name without spaces in a comment.

Lines that define tags begin with | <code>. Splitting a line on || gives either 3 or 4 columns. The 4th column can be split on spaces to give UD POS tags and feature values or the word or. These are mixed together but features have = and POS tags don't. A line might be followed by a comment containing either unknown or default, which indicate a placeholder tag or a tag which is commonly used when the correct value cannot be determined, respectively.

A Python scraper script can be found at https://github.com/mr-martian/apertium-recursive-learning/blob/master/tags.py

See also[edit]