Difference between revisions of "List of symbols"
Firespeaker (talk | contribs) (aor isn't Past) |
|||
(64 intermediate revisions by 10 users not shown) | |||
Line 2: | Line 2: | ||
This page lists the symbols in Apertium used to denote part-of-speech and further morphological features, as well as chunk tags used for more syntactic functions, as well as XML tags. |
This page lists the symbols in Apertium used to denote part-of-speech and further morphological features, as well as chunk tags used for more syntactic functions, as well as XML tags. |
||
This page also documents alignment between Apertium morphological tags and [https://universaldependencies.org/ Universal Dependencies] [https://universaldependencies.org/u/pos/index.html POS tags] and [https://universaldependencies.org/u/feat/index.html features]. |
|||
Line 9: | Line 11: | ||
If you were wondering what the symbols #, /, @, +, ~ or * mean, read [[Apertium stream format]]. |
If you were wondering what the symbols #, /, @, +, ~ or * mean, read [[Apertium stream format]]. |
||
<!-- comments following section headers are intended to make scraping this page easier --> |
|||
==Part-of-speech Categories== |
|||
==Part-of-speech Categories== <!-- POS --> |
|||
{|class=wikitable |
{|class=wikitable |
||
Line 22: | Line 26: | ||
| <code>vbmod</code> || Modal verb || || VERB |
| <code>vbmod</code> || Modal verb || || VERB |
||
|- |
|- |
||
| <code>vbser</code> || Verb "to be" || from ''ser'' (to be) || VERB |
| <code>vbser</code> || Verb "to be" || from ''ser'' (to be) || VERB or AUX |
||
|- |
|- |
||
| <code>vbhaver</code> || Verb "to have" || from ''haver'' (to have) || |
| <code>vbhaver</code> || Verb "to have" || from ''haver'' (to have) || VERB or AUX |
||
|- |
|- |
||
| <code>vbdo</code> || Verb "to do" || "to do" includes all eleven tenses and forms of to do, can also be an auxiliary verb || |
| <code>vbdo</code> || Verb "to do" || "to do" includes all eleven tenses and forms of to do, can also be an auxiliary verb || VERB or AUX |
||
|- |
|- |
||
| <code>vaux</code> || Auxiliary verb || [http://en.wikipedia.org/wiki/Auxilliary_verb wikipedia] || |
| <code>vaux</code> || Auxiliary verb || [http://en.wikipedia.org/wiki/Auxilliary_verb wikipedia] || AUX |
||
|- |
|- |
||
| <code>cop</code> || Copula || [http://en.wikipedia.org/wiki/Copula_(linguistics) wikipedia]; sometimes verb-like, sometimes not || |
| <code>cop</code> || Copula || [http://en.wikipedia.org/wiki/Copula_(linguistics) wikipedia]; sometimes verb-like, sometimes not || AUX |
||
|- |
|- |
||
| <code>adj</code> || Adjective || || |
| <code>adj</code> || Adjective || || ADJ |
||
|- |
|- |
||
| <code>adv</code> || Adverb || || |
| <code>adv</code> || Adverb || || ADV |
||
|- |
|- |
||
| <code>preadv</code> || Pre-adverb || || |
| <code>preadv</code> || Pre-adverb || || ADV |
||
|- |
|- |
||
| <code>postadv</code> || Post-adverb || || |
| <code>postadv</code> || Post-adverb || || ADV |
||
|- |
|- |
||
| <code>mod</code> || Modal word || [http://dic.academic.ru/dic.nsf/lingvistic/749] || PART |
| <code>mod</code> || Modal word || [http://dic.academic.ru/dic.nsf/lingvistic/749] || PART |
||
|- |
|- |
||
| <code>det</code> || Determiner || [http://en.wikipedia.org/wiki/Determiner_(class) wikipedia] || |
| <code>det</code> || Determiner || [http://en.wikipedia.org/wiki/Determiner_(class) wikipedia] || DET |
||
|- |
|- |
||
| <code>prn</code> || Pronoun || [http://en.wikipedia.org/wiki/Pronoun wikipedia] || |
| <code>prn</code> || Pronoun || [http://en.wikipedia.org/wiki/Pronoun wikipedia] || PRON |
||
|- |
|- |
||
| <code>pr</code> || Preposition || [http://en.wikipedia.org/wiki/Preposition wikipedia] || ADP |
| <code>pr</code> || Preposition || [http://en.wikipedia.org/wiki/Preposition wikipedia] || ADP |
||
Line 52: | Line 56: | ||
| <code>num</code> || Numeral || || NUM |
| <code>num</code> || Numeral || || NUM |
||
|- |
|- |
||
| <code>np</code> || Proper noun || From ''nom propi'' [http://en.wikipedia.org/wiki/Proper_noun wikipedia] || |
| <code>np</code> || Proper noun || From ''nom propi'' [http://en.wikipedia.org/wiki/Proper_noun wikipedia] || PROPN |
||
|- |
|- |
||
| <code>ij</code> || Interjection || [http://en.wikipedia.org/wiki/Interjection wikipedia] || INTJ |
| <code>ij</code> || Interjection || [http://en.wikipedia.org/wiki/Interjection wikipedia] || INTJ |
||
Line 60: | Line 64: | ||
| <code>cnjsub</code> || Sub-ordinating conjunction || || SCONJ |
| <code>cnjsub</code> || Sub-ordinating conjunction || || SCONJ |
||
|- |
|- |
||
| <code>cnjadv</code> || Conjunctive adverb || [http://en.wikipedia.org/wiki/Conjunctive_adverb wikipedia] || |
| <code>cnjadv</code> || Conjunctive adverb || [http://en.wikipedia.org/wiki/Conjunctive_adverb wikipedia] || SCONJ, ADV |
||
|- |
|- |
||
| <code>atp</code> || Attachable prefix || In [[German]], ''zusammen''- || |
| <code>atp</code> || Attachable prefix || In [[German]], ''zusammen''- || |
||
|- |
|||
| <code>ideo</code> || Ideophone || || |
|||
|- |
|||
| <code>clt</code> || Clitic || || |
|||
|} |
|} |
||
=== Punctuation === <!-- punct --> |
|||
==Part-of-speech Sub-categories== |
|||
{|class=wikitable |
|||
===Gender=== |
|||
! Symbol !! Gloss !! Notes !! Universal POS |
|||
|- |
|||
| <code>sent</code> || Sentence-ending punctuation || e.g. full stop, question mark || PUNCT |
|||
|- |
|||
| <code>cm</code> || Comma punctuation || , || PUNCT PunctType=Comm |
|||
|- |
|||
| <code>lquot</code> || Left quote || « || PUNCT PunctType=Quot PunctSide=Ini |
|||
|- |
|||
| <code>rquot</code> || Right quote || » || PUNCT PunctType=Quot PunctSide=Fin |
|||
|- |
|||
| <code>lpar</code> || Left parenthesis || ( || PUNCT PunctType=Brck PunctSide=Ini |
|||
|- |
|||
| <code>rpar</code> || Right parenthesis || ) || PUNCT PunctType=Brck PunctSide=Fin |
|||
|- |
|||
| <code>guio</code> || Hyphen || - used to connect two words into one e.g. year-long|| PUNCT PunctType=Dash |
|||
|- |
|||
| <code>apos</code> || Apostrophe || ' or ' || PUNCT |
|||
|- |
|||
| <code>quot</code> || Quotation || " || PUNCT PunctType=Quot |
|||
|- |
|||
| <code>percent</code> || Percentage || % || PUNCT |
|||
|- |
|||
| <code>lquest</code> || Left question/exclamation mark || ¿¡ (''used in Spanish'') || PUNCT PunctSide=Ini |
|||
|- |
|||
| <code>clb</code> || Clause Boundary || Refers to any of the following symbols: .?;:!·… || PUNCT |
|||
|- |
|||
| <code>punct</code> || Punctuation || || PUNCT |
|||
|} |
|||
==Part-of-speech Sub-categories== <!-- subtype --> |
|||
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs). |
|||
===Gender=== <!-- gender --> |
|||
These tags are usually used with nouns. When they occur with things that agree/concord with nouns (like adjectives and verbs), they in fact constitute inflectional/grammatical tags. |
|||
{|class=wikitable |
{|class=wikitable |
||
! Symbol !! Gloss !! Notes !! Universal |
! Symbol !! Gloss !! Notes !! Universal features |
||
|- |
|- |
||
| <code>f</code> || Feminine || || Gender=Fem |
| <code>f</code> || Feminine || || Gender=Fem |
||
|- |
|- |
||
| <code>m</code> || Masculine || || Gender=Masc |
| <code>m</code> || Masculine || || Gender=Masc <!-- default --> |
||
|- |
|- |
||
| <code>nt</code> || Neuter || || Gender=Neut |
| <code>nt</code> || Neuter || || Gender=Neut |
||
Line 86: | Line 126: | ||
| <code>mp</code> || Masculine (personal) || in Polish || Gender=Masc |
| <code>mp</code> || Masculine (personal) || in Polish || Gender=Masc |
||
|- |
|- |
||
| <code>mn</code> || Masculine or neuter || || |
| <code>mn</code> || Masculine or neuter || || Gender=Masc,Neut |
||
|- |
|- |
||
| <code>fn</code> || Feminine or neuter || || Gender=Fem,Neut |
| <code>fn</code> || Feminine or neuter || || Gender=Fem,Neut |
||
|- |
|- |
||
| <code>mf</code> || Masculine or feminine || |
| <code>mf</code> || Masculine or feminine || Used when masculine and feminine have the same form || Gender=Masc,Fem |
||
|- |
|- |
||
| <code>mfn</code> || Masculine , feminine , neuter || |
| <code>mfn</code> || Masculine , feminine , neuter || Used when masculine, feminine, and neuter have the same form || Gender=Masc,Fem,Neut |
||
|- |
|- |
||
| <code>ut</code> || Common || From ''utrum'', found in Scandinavian languages. || Gender=Com |
| <code>ut</code> || Common || From ''utrum'', found in Scandinavian languages. || Gender=Com |
||
Line 98: | Line 138: | ||
| <code>un</code> || Common or neuter || As above, only common or neuter || Gender=Com,Neut |
| <code>un</code> || Common or neuter || As above, only common or neuter || Gender=Com,Neut |
||
|- |
|- |
||
| <code>GD</code> || Gender to be determined || |
| <code>GD</code> || Gender to be determined || || <!-- unknown --> |
||
|- |
|- |
||
|} |
|} |
||
===Count/Mass=== |
===Count/Mass=== <!-- countability --> |
||
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs). |
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs). |
||
Line 109: | Line 149: | ||
! Symbol !! Gloss !! Notes !! Universal feature |
! Symbol !! Gloss !! Notes !! Universal feature |
||
|- |
|- |
||
| <code>cnt</code> || Countable || |
| <code>cnt</code> || Countable || || |
||
|- |
|- |
||
| <code>unc</code> || Uncountable (mass) || |
| <code>unc</code> || Uncountable (mass) || || |
||
|- |
|- |
||
|} |
|} |
||
===Animacy=== |
===Animacy=== <!-- animacy --> |
||
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs). |
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs). |
||
Line 122: | Line 162: | ||
! Symbol !! Gloss !! Notes !! Universal feature |
! Symbol !! Gloss !! Notes !! Universal feature |
||
|- |
|- |
||
| <code>aa</code> || Animate || |
| <code>aa</code> || Animate || || Animacy=Anim |
||
|- |
|- |
||
| <code>an</code> || Animate or inanimate || |
| <code>an</code> || Animate or inanimate || || Animacy=Anim,Inan |
||
|- |
|- |
||
| <code>nn</code> || Inanimate || |
| <code>nn</code> || Inanimate || || Animacy=Inan |
||
|- |
|- |
||
| <code>hu</code> || Human || || Animacy=Hum |
|||
|} |
|} |
||
===Adjectives=== |
===Adjectives=== <!-- adj_type --> |
||
{|class=wikitable |
{|class=wikitable |
||
! Symbol !! Gloss !! Notes !! Universal feature |
! Symbol !! Gloss !! Notes !! Universal feature |
||
|- |
|- |
||
| <code>sint</code> || Synthetic || "nice, nicer, nicest" is synthetic. "handsome, more handsome, the most handsome" is not. [http://en.wikipedia.org/wiki/Synthetic_language wikipedia] |
| <code>sint</code> || Synthetic || "nice, nicer, nicest" is synthetic. "handsome, more handsome, the most handsome" is not. [http://en.wikipedia.org/wiki/Synthetic_language wikipedia] || |
||
|- |
|- |
||
| <code>preadj</code> || Pre-adjective || for languages where most of adjectives are after the noun (ex: French in eo->fr bidix) |
| <code>preadj</code> || Pre-adjective || for languages where most of adjectives are after the noun (ex: French in eo->fr bidix) || |
||
|- |
|- |
||
| <code>preadj_nh</code> || Pre-adjective if not human || according to the noun, the adjective is before or after |
| <code>preadj_nh</code> || Pre-adjective if not human || according to the noun, the adjective is before or after || |
||
|- |
|- |
||
|} |
|} |
||
=== |
===Noun Class === <!-- n_class --> |
||
{| class="wikitable" border="1" |
|||
! Symbol !! Gloss !! Notes |
|||
|- |
|||
| <code>cl1</code> || Noun class 1 || |
|||
|- |
|||
| <code>cl2</code> || Noun class 2 || |
|||
|- |
|||
| <code>cl3</code> || Noun class 3 || |
|||
|- |
|||
| <code>cl4</code> || Noun class 4 || |
|||
|- |
|||
| <code>cl5</code> || Noun class 5 || |
|||
|- |
|||
| <code>cl6</code> || Noun class 6 || |
|||
|- |
|||
| <code>cl7</code> || Noun class 7 || |
|||
|- |
|||
| <code>cl8</code> || Noun class 8 || |
|||
|- |
|||
| <code>cl9</code> || Noun class 9 || |
|||
|- |
|||
| <code>cl10</code> || Noun class 10 || |
|||
|- |
|||
| <code>cl11</code> || Noun class 11 || |
|||
|- |
|||
| <code>cl12</code> || Noun class 12 || |
|||
|} |
|||
===Pronoun types === <!-- prn_type --> |
|||
{| class="wikitable" border="1" |
{| class="wikitable" border="1" |
||
! Symbol !! Gloss !! Notes !! Universal feature |
! Symbol !! Gloss !! Notes !! Universal feature |
||
|- |
|- |
||
| <code>pers</code> || Personal || || |
| <code>pers</code> || Personal || || PronType=Prs |
||
|- |
|- |
||
| <code>tn</code> || Tónico || |
| <code>tn</code> || Tónico || || |
||
|- |
|- |
||
| <code> |
| <code>log</code> || Logophoric || || |
||
|- |
|- |
||
| <code> |
| <code>detnt</code> || Neuter determiner || POS? || DET |
||
|- |
|- |
||
| <code> |
| <code>predet</code> || Pre determiner || POS? || DET |
||
|- |
|- |
||
| <code> |
| <code>atn</code> || Atónico || || |
||
|- |
|- |
||
| <code> |
| <code>qnt</code> || Quantifier || || PronType=Ind |
||
|- |
|- |
||
| <code> |
| <code>ord</code> || Ordinal || || NumType=Ord |
||
|- |
|- |
||
| <code> |
| <code>obj</code> || Object || || Case=Acc |
||
|- |
|- |
||
| <code> |
| <code>subj</code> || Subject || || Case=Nom |
||
|- |
|- |
||
| <code> |
| <code>pro</code> || Proclitic || || |
||
|- |
|- |
||
| <code> |
| <code>enc</code> || Enclitic || || |
||
|- |
|- |
||
| <code> |
| <code>acr</code> || Acronym || Not Pronuon? || Abbr=Yes |
||
|- |
|- |
||
| <code> |
| <code>rel</code> || Relative || || PronType=Rel |
||
|- |
|- |
||
| <code> |
| <code>ind</code> || Indefinite || || PronType=Ind |
||
|- |
|||
| <code>itg</code> || Interrogative || || PronType=Int |
|||
|- |
|- |
||
| <code>dem</code> || Demonstrative || || PronType=Dem |
| <code>dem</code> || Demonstrative || || PronType=Dem |
||
|- |
|- |
||
| <code>def</code> || Definite || |
| <code>def</code> || Definite || || Definite=Def |
||
|- |
|- |
||
| <code>pos</code> || Possessive || || Poss=Yes |
| <code>pos</code> || Possessive || || Poss=Yes |
||
Line 186: | Line 259: | ||
| <code>ref</code> || Reflexive || || Reflex=Yes |
| <code>ref</code> || Reflexive || || Reflex=Yes |
||
|- |
|- |
||
| <code>prx</code> || Proximate || |
| <code>prx</code> || Proximate || || |
||
|- |
|- |
||
| <code> |
| <code>med</code> || Medial || || |
||
|- |
|||
| <code>dst</code> || Distal || || |
|||
|- |
|||
| <code>expl</code> || Syntactic expletive || [https://en.wikipedia.org/wiki/Syntactic_expletive wikipedia] || |
|||
|- |
|||
| <code>rec</code> || Reciprocal Pronoun || || |
|||
|- |
|||
| <code>res</code> || Reciprocal Pronoun || || |
|||
|} |
|} |
||
=== Transitivity === |
=== Transitivity === <!-- transitivity --> |
||
Used for verbs. |
Used for verbs. |
||
Line 198: | Line 279: | ||
! Symbol !! Gloss !! Notes !! Universal feature |
! Symbol !! Gloss !! Notes !! Universal feature |
||
|- |
|- |
||
| <code>tv</code> || Transitive || takes direct object in accusative case (used in Turkic) |
| <code>tv</code> || Transitive || takes direct object in accusative case (used in Turkic) || Subcat=Tran |
||
|- |
|- |
||
| <code>iv</code> || Intransitive || does not take direct object in accusative case (used in Turkic) |
| <code>iv</code> || Intransitive || does not take direct object in accusative case (used in Turkic) || Subcat=Intr |
||
|- |
|- |
||
| <code>TD</code> |
| <code>TD</code> || Transitivity to be determined || if the sub-category is (currently) unknown || <!-- unknown --> |
||
|} |
|} |
||
===Separable verbs=== |
===Separable verbs=== <!-- separable --> |
||
{|class=wikitable |
{|class=wikitable |
||
Line 218: | Line 299: | ||
|} |
|} |
||
===Proper nouns=== <!-- np_type --> |
|||
=== Punctuation === |
|||
{| class="wikitable" border="1" |
|||
{|class=wikitable |
|||
! Symbol !! Gloss !! Notes !! Universal feature |
|||
! Symbol !! Gloss !! Notes |
|||
|- |
|- |
||
| <code>ant</code> || Anthroponym || [http://en.wikipedia.org/wiki/Anthroponym wikipedia], it's very common to use ant together with f and m for traditionally gender-specific names |
|||
| <code>sent</code> || Sentence-ending punctuation || e.g. full stop, question mark || PUNCT |
|||
|- |
|- |
||
| <code>top</code> || Toponym || In some language pairs without the locative case this may be ''loc''. Although this should be changed. [http://en.wikipedia.org/wiki/Toponym wikipedia] |
|||
| <code>cm</code> || Comma punctuation || , || PUNCT |
|||
|- |
|- |
||
| <code> |
| <code>hyd</code> || Hydronym || [http://en.wikipedia.org/wiki/Hydronym wikipedia] |
||
|- |
|- |
||
| <code> |
| <code>cog</code> || Cognomen || In normal use, surnames |
||
|- |
|- |
||
| <code> |
| <code>org</code> || Organisation || |
||
|- |
|- |
||
| <code> |
| <code>al</code> || Altres || Other, misc. |
||
|- |
|||
| <code>guio</code> || Hyphen || - || PUNCT |
|||
|- |
|||
| <code>apos</code> || Apostrophe || ' or ' || PUNCT |
|||
|- |
|||
| <code>lquest</code> || Left question/exclamation mark || ¿¡ (''used in Spanish'') || PUNCT |
|||
|- |
|- |
||
| <code>pat</code> ||Patronymic || A name derived from the name of a father or ancestor, e.g. Johnson, O'Brien, Ivanovich. |
|||
|} |
|} |
||
== Inflectional morphology == |
== Inflectional morphology == <!-- infl --> |
||
===Number=== |
===Number=== <!-- number --> |
||
Note: number can be a sub-category tag too, e.g. with pronouns. |
Note: number can be a sub-category tag too, e.g. with pronouns. |
||
Line 250: | Line 327: | ||
! Symbol !! Gloss !! Notes !! Universal feature |
! Symbol !! Gloss !! Notes !! Universal feature |
||
|- |
|- |
||
| <code>sg</code> || Singular || || Number=Sing |
| <code>sg</code> || Singular || || Number=Sing <!-- default --> |
||
|- |
|- |
||
| <code>pl</code> || Plural || || Number=Plur |
| <code>pl</code> || Plural || || Number=Plur |
||
Line 256: | Line 333: | ||
| <code>sp</code> || Singular or plural || || Number=Sing,Plur |
| <code>sp</code> || Singular or plural || || Number=Sing,Plur |
||
|- |
|- |
||
| <code>du</code> || Dual || || |
| <code>du</code> || Dual || || Number=Dual |
||
|- |
|- |
||
| <code>ct</code> || Count || see mk-bg || Number=Count |
| <code>ct</code> || Count || see mk-bg || Number=Count |
||
Line 262: | Line 339: | ||
| <code>coll</code> || Collective || || Number=Coll |
| <code>coll</code> || Collective || || Number=Coll |
||
|- |
|- |
||
| <code>ND</code> || Number to be determined || |
| <code>ND</code> || Number to be determined || || <!-- unknown --> |
||
|- |
|- |
||
|} |
|} |
||
===Case=== |
===Case=== <!-- case --> |
||
{|class=wikitable |
{|class=wikitable |
||
! Symbol !! Gloss !! Notes !! Universal feature |
! Symbol !! Gloss !! Notes !! Universal feature |
||
|- |
|- |
||
| <code>nom</code> || Nominative || || |
| <code>nom</code> || Nominative || || Case=Nom |
||
|- |
|- |
||
| <code>acc</code> || Accusative || || |
| <code>acc</code> || Accusative || || Case=Acc |
||
|- |
|- |
||
| <code>dat</code> || Dative || || Case=Dat |
| <code>dat</code> || Dative || || Case=Dat |
||
|- |
|- |
||
| <code>gen</code> || Genitive || || |
| <code>gen</code> || Genitive || || Case=Gen |
||
|- |
|- |
||
| <code>dg</code> || Dative and Genitive || in [[ro-es]], discouraged in new developments || Case=Dat,Gen |
| <code>dg</code> || Dative and Genitive || in [[ro-es]], discouraged in new developments || Case=Dat,Gen |
||
|- |
|- |
||
| <code>voc</code> || Vocative || || |
| <code>voc</code> || Vocative || || Case=Voc |
||
|- |
|- |
||
| <code>abl</code> || Ablative || [http://en.wikipedia.org/wiki/Ablative wikipedia] || Case=Abl |
| <code>abl</code> || Ablative || [http://en.wikipedia.org/wiki/Ablative wikipedia] || Case=Abl |
||
|- |
|- |
||
| <code>ins</code> || Instrumental or Instructive |
| <code>ins</code> || Instrumental or Instructive || [http://en.wikipedia.org/wiki/Instrumental_case wikipedia] || Case=Ins |
||
|- |
|- |
||
| <code>loc</code> || Locative || [http://en.wikipedia.org/wiki/Locative wikipedia] || Case=Loc |
| <code>loc</code> || Locative || [http://en.wikipedia.org/wiki/Locative wikipedia] || Case=Loc |
||
|- |
|- |
||
| <code>prp</code> || Prepositional || [http://en.wikipedia.org/wiki/Prepositional wikipedia] |
| <code>prp</code> || Prepositional || [http://en.wikipedia.org/wiki/Prepositional wikipedia] || |
||
|- |
|- |
||
| <code>tra</code> || Translative || || Case=Tra |
| <code>tra</code> || Translative || || Case=Tra |
||
|- |
|- |
||
| <code>ill</code> || Illative || || |
| <code>ill</code> || Illative || || Case=Ill |
||
|- |
|- |
||
| <code>ine</code> || Inessive || || Case=Ine |
| <code>ine</code> || Inessive || || Case=Ine |
||
Line 302: | Line 379: | ||
| <code>all</code> || Allative || || Case=All |
| <code>all</code> || Allative || || Case=All |
||
|- |
|- |
||
| <code>abe</code> || Abessive || || |
| <code>abe</code> || Abessive || || Case=Abe |
||
|- |
|- |
||
| <code>ess</code> || Essive || || Case=Ess |
| <code>ess</code> || Essive || || Case=Ess |
||
|- |
|- |
||
| <code>par</code> || Partitive || || |
| <code>par</code> || Partitive || || Case=Par |
||
|- |
|- |
||
| <code>dis</code> || Distributive || || Case=Dis |
| <code>dis</code> || Distributive || || Case=Dis |
||
|- |
|- |
||
| <code>com</code> || Comitative || || |
| <code>com</code> || Comitative || || Case=Com |
||
|- |
|- |
||
| <code>soc</code> || Sociative || || |
| <code>soc</code> || Sociative || || |
||
|- |
|- |
||
| <code>prl</code> || Prolative || || |
| <code>prl</code> || Prolative || || Case=Pro |
||
|- |
|- |
||
| <code>ses</code> || Superessive || [[Hungarian]] || Case=Sup |
| <code>ses</code> || Superessive || [[Hungarian]] || Case=Sup |
||
Line 322: | Line 399: | ||
| <code>dela</code> || Delative || [[Hungarian]] || Case=Del |
| <code>dela</code> || Delative || [[Hungarian]] || Case=Del |
||
|- |
|- |
||
| <code>term</code> || Terminative |
| <code>term</code> || Terminative || [[Hungarian]], Estonian, ... || Case=Ter |
||
|- |
|||
| <code>temp</code> || Temporal || [https://en.wikipedia.org/wiki/Temporal_case] || Case=Tem |
|||
|- |
|||
| <code>obl</code> || Oblique || [https://en.wikipedia.org/wiki/Oblique_case] || Case=Obl |
|||
|- |
|||
| <code>erg</code> || Ergative || [https://en.wikipedia.org/wiki/Ergative_case] || Case=Erg |
|||
|- |
|||
| <code>CD</code> || Case to be determined || || <!-- unknown --> |
|||
|} |
|} |
||
===Voice=== |
===Voice=== <!-- voice --> |
||
{|class=wikitable |
{|class=wikitable |
||
! Symbol !! Gloss !! Notes !! Universal feature |
! Symbol !! Gloss !! Notes !! Universal feature |
||
|- |
|- |
||
| <code>actv</code> || Active voice || || |
| <code>actv</code> || Active voice || || Voice=Act |
||
|- |
|- |
||
| <code>pass</code> || Passive voice || is more used in Turkic. || Voice=Pass |
| <code>pass</code> || Passive voice || is more used in Turkic. || Voice=Pass |
||
|- |
|- |
||
| <code>pasv</code> || Passive voice || is more used in Germanic. || Voice= |
| <code>pasv</code> || Passive voice || is more used in Germanic. || Voice=Pass |
||
|- |
|- |
||
| <code>midv</code> || Middle voice || || |
| <code>midv</code> || Middle voice || || Voice=Mid |
||
|- |
|- |
||
| <code>nactv</code> || Non-active voice || See Albanian. || |
| <code>nactv</code> || Non-active voice || See Albanian. || |
||
Line 344: | Line 429: | ||
|} |
|} |
||
===Tense and mode=== |
===Tense and mode=== <!-- tense --> |
||
{|class=wikitable |
{|class=wikitable |
||
! Symbol !! Gloss !! Notes !! Universal features |
! Symbol !! Gloss !! Notes !! Universal features |
||
|- |
|- |
||
| <code> |
| <code>aff</code> || Affirmative || [https://en.wikipedia.org/wiki/Affirmation_and_negation wikipedia] || Polarity=Pos |
||
|- |
|- |
||
| <code> |
| <code>aor</code> || Aorist || [https://en.wikipedia.org/wiki/Aorist wikipedia] A tense in Turkic languages. || |
||
|- |
|- |
||
| <code> |
| <code>cni</code> || Conditional || Lot of pairs will probably use cnd or cond... || Mood=Cnd |
||
|- |
|||
| <code>deb</code> || Debitive mode || Exclusive to Latvian ([https://en.wikipedia.org/wiki/Debitive wikipedia]) || |
|||
|- |
|||
| <code>fti</code> || Future indicative || || Tense=Fut Mood=Ind |
|||
|- |
|||
| <code>fts</code> || Future subjunctive || || Tense=Fut Mood=Sub |
|||
|- |
|||
| <code>fut</code> || Future || || Tense=Fut |
|||
|- |
|||
| <code>ifi</code> || Past definite || from ''Pretério perfecto o indefinido'' || Tense=Past Definite=Def |
|||
|- |
|- |
||
| <code>imp</code> || Imperative || [http://www.englishlanguageguide.com/grammar/imperative.asp englishlanguageguide] || Mood=Imp |
| <code>imp</code> || Imperative || [http://www.englishlanguageguide.com/grammar/imperative.asp englishlanguageguide] || Mood=Imp |
||
|- |
|- |
||
| <code> |
| <code>itg</code> || Interrogative || || |
||
|- |
|- |
||
| <code>ito</code> || Infinitive with 'to' || [[German]] || VerbForm=Inf |
| <code>ito</code> || Infinitive with 'to' || [[German]] || VerbForm=Inf |
||
|- |
|- |
||
| <code> |
| <code>lp</code> || L-participle || || |
||
|- |
|- |
||
| <code> |
| <code>neg</code> || Negative || || Polarity=Neg |
||
|- |
|- |
||
| <code> |
| <code>nonpast</code> || Non-past || || Tense=Pres,Fut |
||
|- |
|- |
||
| <code> |
| <code>past</code> || Past || || Tense=Past |
||
|- |
|- |
||
| <code> |
| <code>pii</code> || Imperfect || from ''Pretério imperfecto de indicativo'' [https://en.wikipedia.org/wiki/Imperfect wikipedia] || Tense=Past Mood=Ind Aspect=Imp |
||
|- |
|- |
||
| <code> |
| <code>pis</code> || Imperfect subjunctive || || Tense=Past Mood=Sub Aspect=Imp |
||
|- |
|- |
||
| <code> |
| <code>plu</code> || Pluperfect || In <code>cy-en</code> || Tense=Pqp |
||
|- |
|- |
||
| <code> |
| <code>pmp</code> || Pluperfect || In <code>es-gl</code> (from ''Pluscamperfecto'') || Tense=Pqp |
||
|- |
|- |
||
| <code> |
| <code>pp2</code> || Past participle (???) || It's at least used in the Esperanto dictionaries for future active participles, ''ont'' (seems quite odd) || VerbForm=Part Tense=Past |
||
|- |
|- |
||
| <code> |
| <code>pp3</code> || Past participle (???) || It's at least used in the Esperanto dictionaries for past active participles, ''int'' (seems quite odd) || VerbForm=Part Tense=Past |
||
|- |
|- |
||
| <code> |
| <code>pp</code> || Past participle || [http://en.wikipedia.org/wiki/Participle wikipedia] || VerbForm=Part Tense=Past |
||
|- |
|- |
||
| <code> |
| <code>pprs</code> || Present participle || Also appears as <code>ppres</code> (deprecated) || VerbForm=Part Tense=Pres |
||
|- |
|- |
||
| <code> |
| <code>ppres</code> || Present participle || ''see also: pprs''. [http://en.wikipedia.org/wiki/Present_participle wikipedia] || Tense=Pres VerbForm=Part |
||
|- |
|- |
||
| <code> |
| <code>pres</code> || Present || || Tense=Pres |
||
|- |
|||
| <code>pret</code> || Preterite || [https://en.wikipedia.org/wiki/Preterite Preterite] || Tense=Past |
|||
|- |
|||
| <code>pri</code> || Present indicative || ''see also: pres''. [http://en.wikipedia.org/wiki/Present_indicative wikipedia] || Tense=Pres Mood=Ind |
|||
|- |
|- |
||
| <code>prs</code> || Present subjunctive || [http://en.wikipedia.org/wiki/Present_subjunctive wikipedia] || Tense=Pres Mood=Sub |
| <code>prs</code> || Present subjunctive || [http://en.wikipedia.org/wiki/Present_subjunctive wikipedia] || Tense=Pres Mood=Sub |
||
|- |
|- |
||
| <code> |
| <code>supn</code> || Supine || [http://en.wikipedia.org/wiki/Supine wikipedia] || VerbForm=Sup |
||
|} |
|||
=== Non-finite verb forms === <!-- nonfinite --> |
|||
These tags are used for non-finite verb forms, which are often elsewhere called "infinitives" or "participles". See https://doi.org/10.3765/ptu.v4i1.4587 for discussion. |
|||
==== Noun-like ==== <!-- verbal-nouns --> |
|||
{|class=wikitable |
|||
! Symbol !! Gloss !! Notes !! Universal features |
|||
|- |
|- |
||
| <code> |
| <code>ger</code> || Gerund || || VerbForm=Vnoun |
||
|- |
|- |
||
| <code> |
| <code>ger_aor</code> || Aorist gerund || || VerbForm=Vnoun |
||
|- |
|- |
||
| <code> |
| <code>ger_fut</code> || Future gerund || || VerbForm=Vnoun Tense=Fut |
||
|- |
|- |
||
| <code> |
| <code>ger_hab</code> || Habitual gerund || || VerbForm=Vnoun Aspect=Hab |
||
|- |
|- |
||
| <code> |
| <code>ger_impf</code> || Imperfect gerund || || VerbForm=Vnoun Aspect=Imp |
||
|- |
|- |
||
| <code> |
| <code>ger_past</code> || Past gerund || || VerbForm=Vnoun Tense=Past |
||
|- |
|- |
||
| <code>ger_perf</code> || Perfect gerund || || VerbForm=Vnoun Aspect=Perf |
|||
|- |
|||
| <code>ger_pres</code> || Present gerund || || VerbForm=Vnoun Tense=Pres |
|||
|} |
|||
==== Adjective-like ==== <!-- verbal-adjectives --> |
|||
{|class=wikitable |
|||
! Symbol !! Gloss !! Notes !! Universal features |
|||
|- |
|||
| <code>gpr</code> || Verbal adjective || || VerbForm=Part |
|||
|- |
|||
| <code>gpr_aor</code> || Aorist verbal adjective || || VerbForm=Part |
|||
|- |
|||
| <code>gpr_fut</code> || Future verbal adjective || || VerbForm=Part Tense=Fut |
|||
|- |
|||
| <code>gpr_hab</code> || Habitual verbal adjective || || VerbForm=Part Aspect=Hab |
|||
|- |
|||
| <code>gpr_impf</code> || Imperfect verbal adjective || || VerbForm=Part Aspect=Imp |
|||
|- |
|||
| <code>gpr_past</code> || Past verbal adjective || || VerbForm=Part Tense=Past |
|||
|- |
|||
| <code>gpr_perf</code> || Perfect verbal adjective || || VerbForm=Part Aspect=Perf |
|||
|- |
|||
| <code>gpr_pres</code> || Present verbal adjective || || VerbForm=Part Tense=Pres |
|||
|} |
|||
==== Adverb-like ==== <!-- verbal-adverbs --> |
|||
{|class=wikitable |
|||
! Symbol !! Gloss !! Notes !! Universal features |
|||
|- |
|||
| <code>gna</code> || Verbal adverb || || VerbForm=Conv |
|||
|- |
|||
| <code>gna_aor</code> || Aorist verbal adverb || || VerbForm=Conv |
|||
|- |
|||
| <code>gna_fut</code> || Future verbal adverb || || VerbForm=Conv Tense=Fut |
|||
|- |
|||
| <code>gna_hab</code> || Habitual verbal adverb || || VerbForm=Conv Aspect=Hab |
|||
|- |
|||
| <code>gna_impf</code> || Imperfect verbal adverb || || VerbForm=Conv Aspect=Imp |
|||
|- |
|||
| <code>gna_past</code> || Past verbal adverb || || VerbForm=Conv Tense=Past |
|||
|- |
|||
| <code>gna_perf</code> || Perfect verbal adverb || || VerbForm=Conv Aspect=Perf |
|||
|- |
|||
| <code>gna_pres</code> || Present verbal adverb || || VerbForm=Conv Tense=Pres |
|||
|} |
|||
==== Infinitives ==== <!-- infinitives --> |
|||
Generally these must occur with auxiliaries. |
|||
{|class=wikitable |
|||
! Symbol !! Gloss !! Notes !! Universal features |
|||
|- |
|||
| <code>inf</code> || Infinitive || || VerbForm=Inf |
|||
|- |
|||
| <code>infps</code> || Personal infinitive || Used in Portuguese, likely should be merged || VerbForm=Inf |
|||
|- |
|||
| <code>prc_aor</code> || Aorist participle || || VerbForm=Inf |
|||
|- |
|||
| <code>prc_fut</code> || Future participle || || VerbForm=Inf Tense=Fut |
|||
|- |
|||
| <code>prc_hab</code> || Habitual participle || || VerbForm=Inf Aspect=Hab |
|||
|- |
|||
| <code>prc_impf</code> || Imperfect participle || || VerbForm=Inf Aspect=Imp |
|||
|- |
|||
| <code>prc_past</code> || Past participle || || VerbForm=Inf Tense=Past |
|||
|- |
|||
| <code>prc_perf</code> || Perfect participle || || VerbForm=Inf Aspect=Perf |
|||
|- |
|||
| <code>prc_pres</code> || Present participle || || VerbForm=Inf Tense=Pres |
|||
|} |
|||
===Aspect=== <!-- aspect --> |
|||
{|class=wikitable |
|||
! Symbol !! Gloss !! Notes !! Universal feature |
|||
|- |
|||
| <code>hab</code> || Habitual || || Aspect=Hab |
|||
|- |
|||
| <code>imperf</code> || Imperfective || Should be merged with <code>impf</code> || Aspect=Imp |
|||
|- |
|||
| <code>impf</code> || Imperfective || || Aspect=Imp |
|||
|- |
|||
| <code>perf</code> || Perfective || || Aspect=Perf |
|||
|} |
|} |
||
===Person=== |
===Person=== <!-- person --> |
||
Note: person can be a sub-category tag, e.g. with pronouns. |
Note: person can be a sub-category tag, e.g. with pronouns. |
||
Line 415: | Line 610: | ||
| <code>p1</code> || First person || || Person=1 |
| <code>p1</code> || First person || || Person=1 |
||
|- |
|- |
||
| <code>p2</code> || Second person || || |
| <code>p2</code> || Second person || || Person=2 |
||
|- |
|- |
||
| <code>p3</code> || Third person || || |
| <code>p3</code> || Third person || || Person=3 |
||
|- |
|- |
||
| |
| <code>impers</code> || Impersonal || Sometimes called 'autonomous' || Person=0 |
||
|- |
|- |
||
| <code>past3p</code> || Past third person || In <code>rus</code> and <code>bel-rus</code>, should be 2 tags || Person=3 Tense=Past |
|||
|} |
|} |
||
===Derivations=== |
===Derivations=== <!-- verb_deriv --> |
||
{|class=wikitable |
{|class=wikitable |
||
! Symbol !! Gloss !! Notes |
! Symbol !! Gloss !! Notes |
||
Line 437: | Line 633: | ||
|} |
|} |
||
===Possession=== |
===Possession=== <!-- possessor --> |
||
{|class=wikitable |
{|class=wikitable |
||
! Symbol !! Gloss !! Notes !! Universal feature |
! Symbol !! Gloss !! Notes !! Universal feature |
||
|- |
|- |
||
| <code>px1sg</code> || First person singular possessive || e.g. in [[Turkic languages]] || |
| <code>px1sg</code> || First person singular possessive || e.g. in [[Turkic languages]] || Person[psor]=1 Number[psor]=Sing |
||
|- |
|- |
||
| <code>px2sg</code> || Second person singular possessive || e.g. in [[Turkic languages]] || |
| <code>px2sg</code> || Second person singular possessive || e.g. in [[Turkic languages]] || Person[psor]=2 Number[psor]=Sing |
||
|- |
|- |
||
| <code>px3sg</code> || Third person singular possessive || e.g. in [[Turkic languages]] || |
| <code>px3sg</code> || Third person singular possessive || e.g. in [[Turkic languages]] || Person[psor]=3 Number[psor]=Sing |
||
|- |
|- |
||
| <code>px1pl</code> || First person plural possessive || e.g. in [[Turkic languages]] || |
| <code>px1pl</code> || First person plural possessive || e.g. in [[Turkic languages]] || Person[psor]=1 Number[psor]=Plur |
||
|- |
|- |
||
| <code>px2pl</code> || Second person plural possessive || e.g. in [[Turkic languages]] || |
| <code>px2pl</code> || Second person plural possessive || e.g. in [[Turkic languages]] || Person[psor]=2 Number[psor]=Plur |
||
|- |
|- |
||
| <code>px3pl</code> || Third person plural possessive || e.g. in [[Turkic languages]] || |
| <code>px3pl</code> || Third person plural possessive || e.g. in [[Turkic languages]] || Person[psor]=3 Number[psor]=Plur |
||
|- |
|- |
||
| <code>px3sp</code> || Third person possessive singular or plural || e.g. in [[Turkic languages]] || |
| <code>px3sp</code> || Third person possessive singular or plural || e.g. in [[Turkic languages]] || Person[psor]=3 |
||
|- |
|- |
||
|} |
|} |
||
=== |
===Subject marking=== <!-- subject --> |
||
e.g. in verbs with both |
e.g. in verbs with both, otherwise, see [[#Person]] and [[#Number]]. |
||
{|class=wikitable |
{|class=wikitable |
||
! Symbol !! Gloss !! Notes !! Universal features |
! Symbol !! Gloss !! Notes !! Universal features |
||
|- |
|- |
||
| <code> |
| <code>s_sg1</code> || First person singular object || || Number[subj]=Sing Person[subj]=1 |
||
|- |
|- |
||
| <code> |
| <code>s_sg2</code> || Second person singular object || || Number[subj]=Sing Person[subj]=2 |
||
|- |
|- |
||
| <code> |
| <code>s_sg3</code> || Third person singular object || || Number[subj]=Sing Person[subj]=3 |
||
|- |
|- |
||
| <code> |
| <code>s_pl1</code> || First person plural object || || Number[subj]=Plur Person[subj]=1 |
||
|- |
|- |
||
| <code> |
| <code>s_pl2</code> || Second person plural object || || Number[subj]=Plur Person[subj]=2 |
||
|- |
|- |
||
| <code> |
| <code>s_pl3</code> || Third person plural object || || Number[subj]=Plur Person[subj]=3 |
||
|- |
|- |
||
|} |
|} |
||
===Proper nouns=== |
|||
===Object marking=== <!-- object --> |
|||
e.g. in verbs with both |
|||
{|class=wikitable |
{|class=wikitable |
||
! Symbol !! Gloss !! Notes !! Universal features |
! Symbol !! Gloss !! Notes !! Universal features |
||
|- |
|- |
||
| <code>o_sg1</code> || First person singular object || || Number[obj]=Sing Person[obj]=1 |
|||
| <code>ant</code> || Anthroponym || [http://en.wikipedia.org/wiki/Anthroponym wikipedia], it's very common to use ant together with f and m for traditionally gender-specific names |
|||
|- |
|- |
||
| <code>o_sg2</code> || Second person singular object || || Number[obj]=Sing Person[obj]=2 |
|||
| <code>top</code> || Toponym || In some language pairs without the locative case this may be ''loc''. Although this should be changed. [http://en.wikipedia.org/wiki/Toponym wikipedia] |
|||
|- |
|- |
||
| <code> |
| <code>o_sg3</code> || Third person singular object || || Number[obj]=Sing Person[obj]=3 |
||
|- |
|- |
||
| <code> |
| <code>o_pl1</code> || First person plural object || || Number[obj]=Plur Person[obj]=1 |
||
|- |
|- |
||
| <code> |
| <code>o_pl2</code> || Second person plural object || || Number[obj]=Plur Person[obj]=2 |
||
|- |
|||
| <code>o_pl3</code> || Third person plural object || || Number[obj]=Plur Person[obj]=3 |
|||
|- |
|- |
||
| <code>al</code> || Altres || Other, misc. |
|||
|} |
|} |
||
===Adjectives=== |
===Adjectives=== <!-- adj_infl --> |
||
{|class=wikitable |
{|class=wikitable |
||
! Symbol !! Gloss !! Notes !! Universal features |
! Symbol !! Gloss !! Notes !! Universal features |
||
|- |
|- |
||
| <code>pst</code> || Positive || || |
| <code>pst</code> || Positive || || Degree=Pos |
||
|- |
|- |
||
| <code>comp</code> || Comparative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] || Degree=Comp |
| <code>comp</code> || Comparative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] || Degree=Comp |
||
Line 507: | Line 707: | ||
| <code>sup</code> || Superlative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] || Degree=Sup |
| <code>sup</code> || Superlative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] || Degree=Sup |
||
|- |
|- |
||
| <code>attr</code> || Attributive || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] |
| <code>attr</code> || Attributive || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] || |
||
|- |
|- |
||
| <code>pred</code> || Predicative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] |
| <code>pred</code> || Predicative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] || |
||
|- |
|||
|-<code>short</code> || Short adjective || |
|||
|} |
|} |
||
===Formality=== <!-- formality --> |
|||
{|class=wikitable |
|||
! Symbols !! Gloss !! Notes |
|||
|- |
|||
| <code>crd</code> || Cordial || |
|||
|- |
|||
| <code>el</code> || Elite || |
|||
|- |
|||
| <code>fam</code> || Familiar || |
|||
|- |
|||
| <code>frm</code> || Formal || |
|||
|- |
|||
| <code>infml</code> || Informal || |
|||
|- |
|||
| <code>pol</code> || Polite || |
|||
|- |
|||
| <code>low</code> || Low courtesy || |
|||
|- |
|||
| <code>mid</code> || Mid courtesy || |
|||
|- |
|||
| <code>hi</code> || High courtesy || |
|||
|} |
|||
===Specificity=== <!-- specificity --> |
|||
===Others=== |
|||
{|class=wikitable |
|||
! Symbols !! Gloss !! Notes |
|||
|- |
|||
| <code>spc</code> || Specific || Definite=Spec |
|||
|- |
|||
| <code>nspc</code> || Non-sepecific || |
|||
|} |
|||
===Others=== <!-- other --> |
|||
{|class=wikitable |
{|class=wikitable |
||
! Symbol !! Gloss !! Notes |
! Symbol !! Gloss !! Notes |
||
Line 520: | Line 753: | ||
|- |
|- |
||
| <code>date</code> || Dates, years... || |
| <code>date</code> || Dates, years... || |
||
|- |
|||
| <code>email</code> || Electronic Mail || Shorten form of Electronic Mail |
|||
|- |
|||
| <code>file</code> || Filenames || |
|||
|- |
|||
| <code>mon</code> || Money || |
|||
|- |
|- |
||
| <code>percent</code> || Percentage || e.g. 25%, 0.9% |
| <code>percent</code> || Percentage || e.g. 25%, 0.9% |
||
|- |
|||
| <code>time</code> || Time || |
|||
|- |
|||
| <code>url</code> || Web address || |
|||
|- |
|- |
||
| <code>web</code> || Links and Emails || |
| <code>web</code> || Links and Emails || |
||
|- |
|- |
||
| <code> |
| <code>year</code> || Years || |
||
|- |
|- |
||
| <code>maj</code> || Large script in which every letter is the same height || |
|||
|- |
|||
| <code>min</code> || small script in which every letter is the same height || |
|||
|} |
|} |
||
=== Compounds === <!-- compound --> |
|||
===See also=== |
|||
* [[Turkic lexicon|Guidelines for tag assignment (etc.) in Turkic]] |
|||
{|class=wikitable |
|||
* [[Tagging guidelines for Portuguese]] |
|||
! Symbol !! Gloss !! Notes !! Universal feature |
|||
|- |
|||
| <code>cmp</code> || Compound Noun || || |
|||
|} |
|||
==Chunk tags== |
==Chunk tags== <!-- chunk --> |
||
{|class=wikitable |
{|class=wikitable |
||
Line 546: | Line 796: | ||
|} |
|} |
||
==XML tags== |
==XML tags== <!-- xml --> |
||
Note: All XML tags are explained in depth in the PDF [[documentation]], see also the [https://github.com/apertium/lttoolbox/blob/master/lttoolbox/dix.dtd dix.dtd] and [https://github.com/apertium/lttoolbox/blob/master/lttoolbox/dix.rng dix.rng] files in the GitHub repository. |
Note: All XML tags are explained in depth in the PDF [[documentation]], see also the [https://github.com/apertium/lttoolbox/blob/master/lttoolbox/dix.dtd dix.dtd] and [https://github.com/apertium/lttoolbox/blob/master/lttoolbox/dix.rng dix.rng] files in the GitHub repository. |
||
Line 552: | Line 802: | ||
! XML tag !! Means !! Appears in XML tags / notes / examples |
! XML tag !! Means !! Appears in XML tags / notes / examples |
||
|- |
|- |
||
| <code><dictionary></code> || Mono- or bilingual dictionary || |
| <code><dictionary></code> || Mono- or bilingual dictionary || Toplevel tag for all dictionaries |
||
|- |
|- |
||
| <code><alphabet></code> || Set of characters in the language|| In <code><dictionary></code> |
| <code><alphabet></code> || Set of characters in the language|| In <code><dictionary></code> |
||
Line 582: | Line 832: | ||
| <code><b></code> || Blank space || In <code><r></code>, <code><l></code> and <code><i></code>. Ex.: <code><l>you're<b/>welcome<s ...</code> |
| <code><b></code> || Blank space || In <code><r></code>, <code><l></code> and <code><i></code>. Ex.: <code><l>you're<b/>welcome<s ...</code> |
||
|- |
|- |
||
| <code><g></code> || Group || For [[Chunking:_A_full_example#Handling_of_multiwords_with_inner_inflection|multiwords]] |
|||
|- |
|||
| <code><ig></code> || Identity group || Combination of <code><i></code> and <code><g></code> |
|||
|- |
|||
| <code><j></code> || Join || A <code>+</code> symbol in compounds |
|||
|- |
|||
| <code><prm></code> || Parameter || Only in [[Metadix]] |
|||
|- |
|||
| <code><sa></code> || Symbol Argument ??? || Only in [[Metadix]] |
|||
|- |
|||
| <code><t></code> || Tag or Template || In [[Apertium-separable]] <code><t></code> is any tag, in crossdix it is template (matches a single tag) |
|||
|- |
|||
| <code><d></code> || Delimiter || In [[Apertium-separable]] marks end-of-word |
|||
|- |
|||
| <code><v></code> || Variable || Only in crossdix - like + in regexes |
|||
|} |
|} |
||
TODO: Probably there are more. --[[User:Jacob Nordfalk|Jacob Nordfalk]] 14:47, 25 August 2008 (UTC) |
|||
Other tags: |
|||
<pre> |
|||
<j/> (in stream format #) is to mark multiwords |
|||
<t/> and <v/> are only in crossdix |
|||
t = template, v = variable |
|||
t matches any single tag, v is like + in regexes (0 or more) |
|||
<sa/> and <prm/> are only used in metadixes. |
|||
'sa' lets you add n optional extra tag, prm is an extra string for the paradigm |
|||
</pre> |
|||
=== Transfer === |
=== Transfer === |
||
==== <clip> tag ==== |
==== <clip> tag ==== <!-- clip --> |
||
See the [https://wiki.apertium.org/w/images/d/d0/Apertium2-documentation.pdf documentation (pdf)], p.144 for more information. |
See the [https://wiki.apertium.org/w/images/d/d0/Apertium2-documentation.pdf documentation (pdf)], p.144 for more information. |
||
Line 616: | Line 867: | ||
|- |
|- |
||
|} |
|} |
||
==Scraping this page== |
|||
This page should be relatively scrapeable if requested with <code>?action=raw</code>. |
|||
Section headers which precede tables all have <code>=</code> as the first character of the line and have a category name without spaces in a comment. |
|||
Lines that define tags begin with <code>| <code></code>. Splitting a line on <code>||</code> gives either 3 or 4 columns. The 4th column can be split on spaces to give UD POS tags and feature values or the word <code>or</code>. These are mixed together but features have <code>=</code> and POS tags don't. A line might be followed by a comment containing either <code>unknown</code> or <code>default</code>, which indicate a placeholder tag or a tag which is commonly used when the correct value cannot be determined, respectively. |
|||
A Python scraper script can be found at https://github.com/mr-martian/apertium-recursive-learning/blob/master/tags.py |
|||
==See also== |
==See also== |
||
* [[Turkic lexicon|Guidelines for tag assignment (etc.) in Turkic]] |
|||
* [[Tagging guidelines for Portuguese]] |
|||
* [[Syntax tags]] |
* [[Syntax tags]] |
||
* [[Secondary tags]] |
|||
* [[Apertium stream format]] |
* [[Apertium stream format]] |
||
* [[User:Adverick#FreeMind_Apertium_PoS|FreeMind Apertium PoS]] |
* [[User:Adverick#FreeMind_Apertium_PoS|FreeMind Apertium PoS]] |
Latest revision as of 15:36, 9 May 2024
This page lists the symbols in Apertium used to denote part-of-speech and further morphological features, as well as chunk tags used for more syntactic functions, as well as XML tags.
This page also documents alignment between Apertium morphological tags and Universal Dependencies POS tags and features.
Contents
|
This is meant to be a glossary of symbol names in alphabetical order with notes. Some of these names are specific to particular packages or language pairs, as not all languages have the same grammatical features (most don't have spatial distinction in articles for example).
If you were wondering what the symbols #, /, @, +, ~ or * mean, read Apertium stream format.
Part-of-speech Categories[edit]
Symbol | Gloss | Notes | Universal POS |
---|---|---|---|
n |
Noun | see 'np' for proper noun | NOUN |
vblex |
Standard ("lexical") verb | see also: vbser, vbhaver, vbmod, vaux, vbdo | VERB |
v |
Standard verb | shortened form of vblex, often used in agglutinative languages | VERB |
vbmod |
Modal verb | VERB | |
vbser |
Verb "to be" | from ser (to be) | VERB or AUX |
vbhaver |
Verb "to have" | from haver (to have) | VERB or AUX |
vbdo |
Verb "to do" | "to do" includes all eleven tenses and forms of to do, can also be an auxiliary verb | VERB or AUX |
vaux |
Auxiliary verb | wikipedia | AUX |
cop |
Copula | wikipedia; sometimes verb-like, sometimes not | AUX |
adj |
Adjective | ADJ | |
adv |
Adverb | ADV | |
preadv |
Pre-adverb | ADV | |
postadv |
Post-adverb | ADV | |
mod |
Modal word | [1] | PART |
det |
Determiner | wikipedia | DET |
prn |
Pronoun | wikipedia | PRON |
pr |
Preposition | wikipedia | ADP |
post |
Postposition | ADP | |
num |
Numeral | NUM | |
np |
Proper noun | From nom propi wikipedia | PROPN |
ij |
Interjection | wikipedia | INTJ |
cnjcoo |
Co-ordinating conjunction | wikipedia | CCONJ |
cnjsub |
Sub-ordinating conjunction | SCONJ | |
cnjadv |
Conjunctive adverb | wikipedia | SCONJ, ADV |
atp |
Attachable prefix | In German, zusammen- | |
ideo |
Ideophone | ||
clt |
Clitic |
Punctuation[edit]
Symbol | Gloss | Notes | Universal POS |
---|---|---|---|
sent |
Sentence-ending punctuation | e.g. full stop, question mark | PUNCT |
cm |
Comma punctuation | , | PUNCT PunctType=Comm |
lquot |
Left quote | « | PUNCT PunctType=Quot PunctSide=Ini |
rquot |
Right quote | » | PUNCT PunctType=Quot PunctSide=Fin |
lpar |
Left parenthesis | ( | PUNCT PunctType=Brck PunctSide=Ini |
rpar |
Right parenthesis | ) | PUNCT PunctType=Brck PunctSide=Fin |
guio |
Hyphen | - used to connect two words into one e.g. year-long | PUNCT PunctType=Dash |
apos |
Apostrophe | ' or ' | PUNCT |
quot |
Quotation | " | PUNCT PunctType=Quot |
percent |
Percentage | % | PUNCT |
lquest |
Left question/exclamation mark | ¿¡ (used in Spanish) | PUNCT PunctSide=Ini |
clb |
Clause Boundary | Refers to any of the following symbols: .?;:!·… | PUNCT |
punct |
Punctuation | PUNCT |
Part-of-speech Sub-categories[edit]
Gender[edit]
These tags are usually used with nouns. When they occur with things that agree/concord with nouns (like adjectives and verbs), they in fact constitute inflectional/grammatical tags.
Symbol | Gloss | Notes | Universal features |
---|---|---|---|
f |
Feminine | Gender=Fem | |
m |
Masculine | Gender=Masc | |
nt |
Neuter | Gender=Neut | |
ma |
Masculine (animate) | Mostly in Slavic languages | Gender=Masc |
mi |
Masculine (inanimate) | Mostly in Slavic languages | Gender=Masc |
mp |
Masculine (personal) | in Polish | Gender=Masc |
mn |
Masculine or neuter | Gender=Masc,Neut | |
fn |
Feminine or neuter | Gender=Fem,Neut | |
mf |
Masculine or feminine | Used when masculine and feminine have the same form | Gender=Masc,Fem |
mfn |
Masculine , feminine , neuter | Used when masculine, feminine, and neuter have the same form | Gender=Masc,Fem,Neut |
ut |
Common | From utrum, found in Scandinavian languages. | Gender=Com |
un |
Common or neuter | As above, only common or neuter | Gender=Com,Neut |
GD |
Gender to be determined |
Count/Mass[edit]
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
cnt |
Countable | ||
unc |
Uncountable (mass) |
Animacy[edit]
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
aa |
Animate | Animacy=Anim | |
an |
Animate or inanimate | Animacy=Anim,Inan | |
nn |
Inanimate | Animacy=Inan | |
hu |
Human | Animacy=Hum |
Adjectives[edit]
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
sint |
Synthetic | "nice, nicer, nicest" is synthetic. "handsome, more handsome, the most handsome" is not. wikipedia | |
preadj |
Pre-adjective | for languages where most of adjectives are after the noun (ex: French in eo->fr bidix) | |
preadj_nh |
Pre-adjective if not human | according to the noun, the adjective is before or after |
Noun Class[edit]
Symbol | Gloss | Notes |
---|---|---|
cl1 |
Noun class 1 | |
cl2 |
Noun class 2 | |
cl3 |
Noun class 3 | |
cl4 |
Noun class 4 | |
cl5 |
Noun class 5 | |
cl6 |
Noun class 6 | |
cl7 |
Noun class 7 | |
cl8 |
Noun class 8 | |
cl9 |
Noun class 9 | |
cl10 |
Noun class 10 | |
cl11 |
Noun class 11 | |
cl12 |
Noun class 12 |
Pronoun types[edit]
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
pers |
Personal | PronType=Prs | |
tn |
Tónico | ||
log |
Logophoric | ||
detnt |
Neuter determiner | POS? | DET |
predet |
Pre determiner | POS? | DET |
atn |
Atónico | ||
qnt |
Quantifier | PronType=Ind | |
ord |
Ordinal | NumType=Ord | |
obj |
Object | Case=Acc | |
subj |
Subject | Case=Nom | |
pro |
Proclitic | ||
enc |
Enclitic | ||
acr |
Acronym | Not Pronuon? | Abbr=Yes |
rel |
Relative | PronType=Rel | |
ind |
Indefinite | PronType=Ind | |
itg |
Interrogative | PronType=Int | |
dem |
Demonstrative | PronType=Dem | |
def |
Definite | Definite=Def | |
pos |
Possessive | Poss=Yes | |
ref |
Reflexive | Reflex=Yes | |
prx |
Proximate | ||
med |
Medial | ||
dst |
Distal | ||
expl |
Syntactic expletive | wikipedia | |
rec |
Reciprocal Pronoun | ||
res |
Reciprocal Pronoun |
Transitivity[edit]
Used for verbs.
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
tv |
Transitive | takes direct object in accusative case (used in Turkic) | Subcat=Tran |
iv |
Intransitive | does not take direct object in accusative case (used in Turkic) | Subcat=Intr |
TD |
Transitivity to be determined | if the sub-category is (currently) unknown |
Separable verbs[edit]
Symbol | Gloss | Notes |
---|---|---|
sep |
Separable verb | wikipedia, lingolia, PDF |
fs |
Separable verb in subordinate clause | |
fm |
Separable verb in main clause |
Proper nouns[edit]
Symbol | Gloss | Notes |
---|---|---|
ant |
Anthroponym | wikipedia, it's very common to use ant together with f and m for traditionally gender-specific names |
top |
Toponym | In some language pairs without the locative case this may be loc. Although this should be changed. wikipedia |
hyd |
Hydronym | wikipedia |
cog |
Cognomen | In normal use, surnames |
org |
Organisation | |
al |
Altres | Other, misc. |
pat |
Patronymic | A name derived from the name of a father or ancestor, e.g. Johnson, O'Brien, Ivanovich. |
Inflectional morphology[edit]
Number[edit]
Note: number can be a sub-category tag too, e.g. with pronouns.
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
sg |
Singular | Number=Sing | |
pl |
Plural | Number=Plur | |
sp |
Singular or plural | Number=Sing,Plur | |
du |
Dual | Number=Dual | |
ct |
Count | see mk-bg | Number=Count |
coll |
Collective | Number=Coll | |
ND |
Number to be determined |
Case[edit]
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
nom |
Nominative | Case=Nom | |
acc |
Accusative | Case=Acc | |
dat |
Dative | Case=Dat | |
gen |
Genitive | Case=Gen | |
dg |
Dative and Genitive | in ro-es, discouraged in new developments | Case=Dat,Gen |
voc |
Vocative | Case=Voc | |
abl |
Ablative | wikipedia | Case=Abl |
ins |
Instrumental or Instructive | wikipedia | Case=Ins |
loc |
Locative | wikipedia | Case=Loc |
prp |
Prepositional | wikipedia | |
tra |
Translative | Case=Tra | |
ill |
Illative | Case=Ill | |
ine |
Inessive | Case=Ine | |
ade |
Adessive | Case=Ade | |
all |
Allative | Case=All | |
abe |
Abessive | Case=Abe | |
ess |
Essive | Case=Ess | |
par |
Partitive | Case=Par | |
dis |
Distributive | Case=Dis | |
com |
Comitative | Case=Com | |
soc |
Sociative | ||
prl |
Prolative | Case=Pro | |
ses |
Superessive | Hungarian | Case=Sup |
sub |
Sublative | Hungarian | Case=Sub |
dela |
Delative | Hungarian | Case=Del |
term |
Terminative | Hungarian, Estonian, ... | Case=Ter |
temp |
Temporal | [2] | Case=Tem |
obl |
Oblique | [3] | Case=Obl |
erg |
Ergative | [4] | Case=Erg |
CD |
Case to be determined |
Voice[edit]
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
actv |
Active voice | Voice=Act | |
pass |
Passive voice | is more used in Turkic. | Voice=Pass |
pasv |
Passive voice | is more used in Germanic. | Voice=Pass |
midv |
Middle voice | Voice=Mid | |
nactv |
Non-active voice | See Albanian. | |
caus |
Causative voice | see also #Derivations | Voice=Cau |
Tense and mode[edit]
Symbol | Gloss | Notes | Universal features |
---|---|---|---|
aff |
Affirmative | wikipedia | Polarity=Pos |
aor |
Aorist | wikipedia A tense in Turkic languages. | |
cni |
Conditional | Lot of pairs will probably use cnd or cond... | Mood=Cnd |
deb |
Debitive mode | Exclusive to Latvian (wikipedia) | |
fti |
Future indicative | Tense=Fut Mood=Ind | |
fts |
Future subjunctive | Tense=Fut Mood=Sub | |
fut |
Future | Tense=Fut | |
ifi |
Past definite | from Pretério perfecto o indefinido | Tense=Past Definite=Def |
imp |
Imperative | englishlanguageguide | Mood=Imp |
itg |
Interrogative | ||
ito |
Infinitive with 'to' | German | VerbForm=Inf |
lp |
L-participle | ||
neg |
Negative | Polarity=Neg | |
nonpast |
Non-past | Tense=Pres,Fut | |
past |
Past | Tense=Past | |
pii |
Imperfect | from Pretério imperfecto de indicativo wikipedia | Tense=Past Mood=Ind Aspect=Imp |
pis |
Imperfect subjunctive | Tense=Past Mood=Sub Aspect=Imp | |
plu |
Pluperfect | In cy-en |
Tense=Pqp |
pmp |
Pluperfect | In es-gl (from Pluscamperfecto) |
Tense=Pqp |
pp2 |
Past participle (???) | It's at least used in the Esperanto dictionaries for future active participles, ont (seems quite odd) | VerbForm=Part Tense=Past |
pp3 |
Past participle (???) | It's at least used in the Esperanto dictionaries for past active participles, int (seems quite odd) | VerbForm=Part Tense=Past |
pp |
Past participle | wikipedia | VerbForm=Part Tense=Past |
pprs |
Present participle | Also appears as ppres (deprecated) |
VerbForm=Part Tense=Pres |
ppres |
Present participle | see also: pprs. wikipedia | Tense=Pres VerbForm=Part |
pres |
Present | Tense=Pres | |
pret |
Preterite | Preterite | Tense=Past |
pri |
Present indicative | see also: pres. wikipedia | Tense=Pres Mood=Ind |
prs |
Present subjunctive | wikipedia | Tense=Pres Mood=Sub |
supn |
Supine | wikipedia | VerbForm=Sup |
Non-finite verb forms[edit]
These tags are used for non-finite verb forms, which are often elsewhere called "infinitives" or "participles". See https://doi.org/10.3765/ptu.v4i1.4587 for discussion.
Noun-like[edit]
Symbol | Gloss | Notes | Universal features |
---|---|---|---|
ger |
Gerund | VerbForm=Vnoun | |
ger_aor |
Aorist gerund | VerbForm=Vnoun | |
ger_fut |
Future gerund | VerbForm=Vnoun Tense=Fut | |
ger_hab |
Habitual gerund | VerbForm=Vnoun Aspect=Hab | |
ger_impf |
Imperfect gerund | VerbForm=Vnoun Aspect=Imp | |
ger_past |
Past gerund | VerbForm=Vnoun Tense=Past | |
ger_perf |
Perfect gerund | VerbForm=Vnoun Aspect=Perf | |
ger_pres |
Present gerund | VerbForm=Vnoun Tense=Pres |
Adjective-like[edit]
Symbol | Gloss | Notes | Universal features |
---|---|---|---|
gpr |
Verbal adjective | VerbForm=Part | |
gpr_aor |
Aorist verbal adjective | VerbForm=Part | |
gpr_fut |
Future verbal adjective | VerbForm=Part Tense=Fut | |
gpr_hab |
Habitual verbal adjective | VerbForm=Part Aspect=Hab | |
gpr_impf |
Imperfect verbal adjective | VerbForm=Part Aspect=Imp | |
gpr_past |
Past verbal adjective | VerbForm=Part Tense=Past | |
gpr_perf |
Perfect verbal adjective | VerbForm=Part Aspect=Perf | |
gpr_pres |
Present verbal adjective | VerbForm=Part Tense=Pres |
Adverb-like[edit]
Symbol | Gloss | Notes | Universal features |
---|---|---|---|
gna |
Verbal adverb | VerbForm=Conv | |
gna_aor |
Aorist verbal adverb | VerbForm=Conv | |
gna_fut |
Future verbal adverb | VerbForm=Conv Tense=Fut | |
gna_hab |
Habitual verbal adverb | VerbForm=Conv Aspect=Hab | |
gna_impf |
Imperfect verbal adverb | VerbForm=Conv Aspect=Imp | |
gna_past |
Past verbal adverb | VerbForm=Conv Tense=Past | |
gna_perf |
Perfect verbal adverb | VerbForm=Conv Aspect=Perf | |
gna_pres |
Present verbal adverb | VerbForm=Conv Tense=Pres |
Infinitives[edit]
Generally these must occur with auxiliaries.
Symbol | Gloss | Notes | Universal features |
---|---|---|---|
inf |
Infinitive | VerbForm=Inf | |
infps |
Personal infinitive | Used in Portuguese, likely should be merged | VerbForm=Inf |
prc_aor |
Aorist participle | VerbForm=Inf | |
prc_fut |
Future participle | VerbForm=Inf Tense=Fut | |
prc_hab |
Habitual participle | VerbForm=Inf Aspect=Hab | |
prc_impf |
Imperfect participle | VerbForm=Inf Aspect=Imp | |
prc_past |
Past participle | VerbForm=Inf Tense=Past | |
prc_perf |
Perfect participle | VerbForm=Inf Aspect=Perf | |
prc_pres |
Present participle | VerbForm=Inf Tense=Pres |
Aspect[edit]
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
hab |
Habitual | Aspect=Hab | |
imperf |
Imperfective | Should be merged with impf |
Aspect=Imp |
impf |
Imperfective | Aspect=Imp | |
perf |
Perfective | Aspect=Perf |
Person[edit]
Note: person can be a sub-category tag, e.g. with pronouns.
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
p1 |
First person | Person=1 | |
p2 |
Second person | Person=2 | |
p3 |
Third person | Person=3 | |
impers |
Impersonal | Sometimes called 'autonomous' | Person=0 |
past3p |
Past third person | In rus and bel-rus , should be 2 tags |
Person=3 Tense=Past |
Derivations[edit]
Symbol | Gloss | Notes |
---|---|---|
caus |
Causative | |
ingr |
Ingressive | https://nn.wikipedia.org/w/index.php?title=Ingressiv |
subs |
Verbal Noun or Verbal Substantive | Shorten form of substantive. Noun formed from a verb |
agnt |
Agent noun | Agent Noun |
Possession[edit]
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
px1sg |
First person singular possessive | e.g. in Turkic languages | Person[psor]=1 Number[psor]=Sing |
px2sg |
Second person singular possessive | e.g. in Turkic languages | Person[psor]=2 Number[psor]=Sing |
px3sg |
Third person singular possessive | e.g. in Turkic languages | Person[psor]=3 Number[psor]=Sing |
px1pl |
First person plural possessive | e.g. in Turkic languages | Person[psor]=1 Number[psor]=Plur |
px2pl |
Second person plural possessive | e.g. in Turkic languages | Person[psor]=2 Number[psor]=Plur |
px3pl |
Third person plural possessive | e.g. in Turkic languages | Person[psor]=3 Number[psor]=Plur |
px3sp |
Third person possessive singular or plural | e.g. in Turkic languages | Person[psor]=3 |
Subject marking[edit]
e.g. in verbs with both, otherwise, see #Person and #Number.
Symbol | Gloss | Notes | Universal features |
---|---|---|---|
s_sg1 |
First person singular object | Number[subj]=Sing Person[subj]=1 | |
s_sg2 |
Second person singular object | Number[subj]=Sing Person[subj]=2 | |
s_sg3 |
Third person singular object | Number[subj]=Sing Person[subj]=3 | |
s_pl1 |
First person plural object | Number[subj]=Plur Person[subj]=1 | |
s_pl2 |
Second person plural object | Number[subj]=Plur Person[subj]=2 | |
s_pl3 |
Third person plural object | Number[subj]=Plur Person[subj]=3 |
Object marking[edit]
e.g. in verbs with both
Symbol | Gloss | Notes | Universal features |
---|---|---|---|
o_sg1 |
First person singular object | Number[obj]=Sing Person[obj]=1 | |
o_sg2 |
Second person singular object | Number[obj]=Sing Person[obj]=2 | |
o_sg3 |
Third person singular object | Number[obj]=Sing Person[obj]=3 | |
o_pl1 |
First person plural object | Number[obj]=Plur Person[obj]=1 | |
o_pl2 |
Second person plural object | Number[obj]=Plur Person[obj]=2 | |
o_pl3 |
Third person plural object | Number[obj]=Plur Person[obj]=3 |
Adjectives[edit]
Symbol | Gloss | Notes | Universal features |
---|---|---|---|
pst |
Positive | Degree=Pos | |
comp |
Comparative | wikipedia | Degree=Comp |
sup |
Superlative | wikipedia | Degree=Sup |
attr |
Attributive | wikipedia | |
pred |
Predicative | wikipedia |
Formality[edit]
Symbols | Gloss | Notes |
---|---|---|
crd |
Cordial | |
el |
Elite | |
fam |
Familiar | |
frm |
Formal | |
infml |
Informal | |
pol |
Polite | |
low |
Low courtesy | |
mid |
Mid courtesy | |
hi |
High courtesy |
Specificity[edit]
Symbols | Gloss | Notes |
---|---|---|
spc |
Specific | Definite=Spec |
nspc |
Non-sepecific |
Others[edit]
Symbol | Gloss | Notes |
---|---|---|
abbr |
Abbreviation (e.g. etc., Mr.) | Acronyms are also included (see acr )
|
date |
Dates, years... | |
email |
Electronic Mail | Shorten form of Electronic Mail |
file |
Filenames | |
mon |
Money | |
percent |
Percentage | e.g. 25%, 0.9% |
time |
Time | |
url |
Web address | |
web |
Links and Emails | |
year |
Years | |
maj |
Large script in which every letter is the same height | |
min |
small script in which every letter is the same height |
Compounds[edit]
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
cmp |
Compound Noun |
Chunk tags[edit]
Tag | Description |
---|---|
<SN> |
Noun phrase / noun group (sintagma nominal) |
<SA> |
Adjective phrase / adjective group |
<SV> |
Verb phrase / verb group (sintagma verbal) |
XML tags[edit]
Note: All XML tags are explained in depth in the PDF documentation, see also the dix.dtd and dix.rng files in the GitHub repository.
XML tag | Means | Appears in XML tags / notes / examples |
---|---|---|
<dictionary> |
Mono- or bilingual dictionary | Toplevel tag for all dictionaries |
<alphabet> |
Set of characters in the language | In <dictionary>
|
<sdefs> |
Symbol definitions | In <dictionary>
|
<sdef> |
Symbol definition | In <sdefs> . Ex: <sdef n="noun"/>
|
<pardefs> |
Paradigm definitions | In <dictionary> .
|
<pardef> |
Paradigm definition | In <pardefs> .
|
<section> |
A section of the dictionary | In <dictionary> . Ex: <section id="main" type="standard">
|
<e> |
A dictionary entry (a word) | In <section> and in <pardef> .
|
<i> |
Invariant (left and right side) | In <e> . Ex.: <i>beer</i>
|
<p> |
A pair | In <e> .
|
<l> |
Left side (surface form) | In <p> . Ex.: <l>beer</l>
|
<r> |
Right side (lexical unit) | In <p> . Ex.: <r>beer<s n="noun"/><s n="singular"/></r>
|
<s> |
A lexical symbol (noun, adj..) | In <r> , <l> and <i> . Ex.: <s n="noun"/>
|
<a> |
Post-generator wake-up mark | In <r> , <l> and <i> . Ex.: <l><a/>a<s ... (for the a/an rule in English)
|
<b> |
Blank space | In <r> , <l> and <i> . Ex.: <l>you're<b/>welcome<s ...
|
<g> |
Group | For multiwords |
<ig> |
Identity group | Combination of <i> and <g>
|
<j> |
Join | A + symbol in compounds
|
<prm> |
Parameter | Only in Metadix |
<sa> |
Symbol Argument ??? | Only in Metadix |
<t> |
Tag or Template | In Apertium-separable <t> is any tag, in crossdix it is template (matches a single tag)
|
<d> |
Delimiter | In Apertium-separable marks end-of-word |
<v> |
Variable | Only in crossdix - like + in regexes |
Transfer[edit]
<clip> tag[edit]
See the documentation (pdf), p.144 for more information.
XML attribute value | Means | Appears in attribute | Notes |
---|---|---|---|
whole |
lemma and grammatical symbols | part | |
lem |
lemma | part | |
lemh |
(inflected) head word of multiword | part | |
lemq |
following queue of multiword | part |
Scraping this page[edit]
This page should be relatively scrapeable if requested with ?action=raw
.
Section headers which precede tables all have =
as the first character of the line and have a category name without spaces in a comment.
Lines that define tags begin with | <code>
. Splitting a line on ||
gives either 3 or 4 columns. The 4th column can be split on spaces to give UD POS tags and feature values or the word or
. These are mixed together but features have =
and POS tags don't. A line might be followed by a comment containing either unknown
or default
, which indicate a placeholder tag or a tag which is commonly used when the correct value cannot be determined, respectively.
A Python scraper script can be found at https://github.com/mr-martian/apertium-recursive-learning/blob/master/tags.py