Difference between revisions of "List of symbols"
(→Case) |
Popcorndude (talk | contribs) |
||
(15 intermediate revisions by 4 users not shown) | |||
Line 11: | Line 11: | ||
If you were wondering what the symbols #, /, @, +, ~ or * mean, read [[Apertium stream format]]. |
If you were wondering what the symbols #, /, @, +, ~ or * mean, read [[Apertium stream format]]. |
||
+ | <!-- comments following section headers are intended to make scraping this page easier --> |
||
− | ==Part-of-speech Categories== |
||
+ | |||
+ | ==Part-of-speech Categories== <!-- POS --> |
||
{|class=wikitable |
{|class=wikitable |
||
Line 67: | Line 69: | ||
|} |
|} |
||
− | === Punctuation === |
+ | === Punctuation === <!-- punct --> |
{|class=wikitable |
{|class=wikitable |
||
Line 74: | Line 76: | ||
| <code>sent</code> || Sentence-ending punctuation || e.g. full stop, question mark || PUNCT |
| <code>sent</code> || Sentence-ending punctuation || e.g. full stop, question mark || PUNCT |
||
|- |
|- |
||
− | | <code>cm</code> || Comma punctuation || , || PunctType= |
+ | | <code>cm</code> || Comma punctuation || , || PUNCT PunctType=Comm |
|- |
|- |
||
− | | <code>lquot</code> || Left quote || « || PUNCT |
+ | | <code>lquot</code> || Left quote || « || PUNCT PunctType=Quot PunctSide=Ini |
|- |
|- |
||
− | | <code>rquot</code> || Right quote || » || PUNCT |
+ | | <code>rquot</code> || Right quote || » || PUNCT PunctType=Quot PunctSide=Fin |
|- |
|- |
||
− | | <code>lpar</code> || Left parenthesis || ( || PUNCT |
+ | | <code>lpar</code> || Left parenthesis || ( || PUNCT PunctType=Brck PunctSide=Ini |
|- |
|- |
||
− | | <code>rpar</code> || Right parenthesis || ) || PUNCT |
+ | | <code>rpar</code> || Right parenthesis || ) || PUNCT PunctType=Brck PunctSide=Fin |
|- |
|- |
||
− | | <code>guio</code> || Hyphen || - used to connect two words into one e.g. year-long|| PUNCT |
+ | | <code>guio</code> || Hyphen || - used to connect two words into one e.g. year-long|| PUNCT PunctType=Dash |
|- |
|- |
||
− | | <code>apos</code> || Apostrophe || ' or ' || |
+ | | <code>apos</code> || Apostrophe || ' or ' || PUNCT |
|- |
|- |
||
− | | <code>quot</code> || Quotation || " || PunctType= |
+ | | <code>quot</code> || Quotation || " || PUNCT PunctType=Quot |
|- |
|- |
||
− | | <code>percent</code> || Percentage |
+ | | <code>percent</code> || Percentage || % || PUNCT |
|- |
|- |
||
− | | <code>lquest</code> || Left question/exclamation mark || ¿¡ (''used in Spanish'') || PUNCT |
+ | | <code>lquest</code> || Left question/exclamation mark || ¿¡ (''used in Spanish'') || PUNCT PunctSide=Ini |
+ | |- |
||
+ | | <code>clb</code> || Clause Boundary || Refers to any of the following symbols: .?;:!·… || PUNCT |
||
+ | |- |
||
+ | | <code>punct</code> || Punctuation || || PUNCT |
||
|} |
|} |
||
− | ==Part-of-speech Sub-categories== |
+ | ==Part-of-speech Sub-categories== <!-- subtype --> |
− | ===Gender=== |
+ | ===Gender=== <!-- gender --> |
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs). |
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs). |
||
Line 120: | Line 126: | ||
| <code>fn</code> || Feminine or neuter || || Gender=Fem,Neut |
| <code>fn</code> || Feminine or neuter || || Gender=Fem,Neut |
||
|- |
|- |
||
− | | <code>mf</code> || Masculine or feminine || |
+ | | <code>mf</code> || Masculine or feminine || Used when masculine and feminine have the same form || Gender=Masc,Fem |
|- |
|- |
||
− | | <code>mfn</code> || Masculine , feminine , neuter || |
+ | | <code>mfn</code> || Masculine , feminine , neuter || Used when masculine, feminine, and neuter have the same form || Gender=Masc,Fem,Neut |
|- |
|- |
||
| <code>ut</code> || Common || From ''utrum'', found in Scandinavian languages. || Gender=Com |
| <code>ut</code> || Common || From ''utrum'', found in Scandinavian languages. || Gender=Com |
||
Line 132: | Line 138: | ||
|} |
|} |
||
− | ===Count/Mass=== |
+ | ===Count/Mass=== <!-- countability --> |
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs). |
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs). |
||
Line 145: | Line 151: | ||
|} |
|} |
||
− | ===Animacy=== |
+ | ===Animacy=== <!-- animacy --> |
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs). |
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs). |
||
Line 152: | Line 158: | ||
! Symbol !! Gloss !! Notes !! Universal feature |
! Symbol !! Gloss !! Notes !! Universal feature |
||
|- |
|- |
||
− | | <code>aa</code> || Animate || |
+ | | <code>aa</code> || Animate || || Animacy=Anim |
|- |
|- |
||
− | | <code>an</code> || Animate or inanimate || |
+ | | <code>an</code> || Animate or inanimate || || Animacy=Anim,Inan |
|- |
|- |
||
− | | <code>nn</code> || Inanimate || |
+ | | <code>nn</code> || Inanimate || || Animacy=Inan |
|- |
|- |
||
+ | | <code>hu</code> || Human || || Animacy=Hum |
||
|} |
|} |
||
− | ===Adjectives=== |
+ | ===Adjectives=== <!-- adj_type --> |
{|class=wikitable |
{|class=wikitable |
||
Line 173: | Line 180: | ||
|} |
|} |
||
− | ===Pronoun types === |
+ | ===Pronoun types === <!-- prn_type --> |
{| class="wikitable" border="1" |
{| class="wikitable" border="1" |
||
Line 223: | Line 230: | ||
|- |
|- |
||
| <code>rec</code> || Reciprocal Pronoun || |
| <code>rec</code> || Reciprocal Pronoun || |
||
+ | |- |
||
+ | | <code>res</code> || Reciprocal Pronoun || |
||
|} |
|} |
||
− | === Transitivity === |
+ | === Transitivity === <!-- transitivity --> |
Used for verbs. |
Used for verbs. |
||
Line 232: | Line 241: | ||
! Symbol !! Gloss !! Notes !! Universal feature |
! Symbol !! Gloss !! Notes !! Universal feature |
||
|- |
|- |
||
− | | <code>tv</code> || Transitive || takes direct object in accusative case (used in Turkic) |
+ | | <code>tv</code> || Transitive || takes direct object in accusative case (used in Turkic) || Subcat=Tran |
|- |
|- |
||
− | | <code>iv</code> || Intransitive || does not take direct object in accusative case (used in Turkic) |
+ | | <code>iv</code> || Intransitive || does not take direct object in accusative case (used in Turkic) || Subcat=Intr |
|- |
|- |
||
− | | <code>TD</code> |
+ | | <code>TD</code> || Transitivity to be determined || if the sub-category is [currently] unknown |
|} |
|} |
||
− | ===Separable verbs=== |
+ | ===Separable verbs=== <!-- separable --> |
{|class=wikitable |
{|class=wikitable |
||
Line 252: | Line 261: | ||
|} |
|} |
||
+ | ===Proper nouns=== <!-- np_type --> |
||
− | == Inflectional morphology == |
||
+ | {|class=wikitable |
||
− | ===Number=== |
||
+ | ! Symbol !! Gloss !! Notes !! Universal features |
||
+ | |- |
||
+ | | <code>ant</code> || Anthroponym || [http://en.wikipedia.org/wiki/Anthroponym wikipedia], it's very common to use ant together with f and m for traditionally gender-specific names |
||
+ | |- |
||
+ | | <code>top</code> || Toponym || In some language pairs without the locative case this may be ''loc''. Although this should be changed. [http://en.wikipedia.org/wiki/Toponym wikipedia] |
||
+ | |- |
||
+ | | <code>hyd</code> || Hydronym || [http://en.wikipedia.org/wiki/Hydronym wikipedia] |
||
+ | |- |
||
+ | | <code>cog</code> || Cognomen || In normal use, surnames |
||
+ | |- |
||
+ | | <code>org</code> || Organisation || |
||
+ | |- |
||
+ | | <code>al</code> || Altres || Other, misc. |
||
+ | |- |
||
+ | | <code>pat</code> ||Patronymic || A name derived from the name of a father or ancestor, e.g. Johnson, O'Brien, Ivanovich. |
||
+ | |} |
||
+ | |||
+ | == Inflectional morphology == <!-- infl --> |
||
+ | |||
+ | ===Number=== <!-- number --> |
||
Note: number can be a sub-category tag too, e.g. with pronouns. |
Note: number can be a sub-category tag too, e.g. with pronouns. |
||
Line 277: | Line 306: | ||
− | ===Case=== |
+ | ===Case=== <!-- case --> |
{|class=wikitable |
{|class=wikitable |
||
Line 332: | Line 361: | ||
| <code>dela</code> || Delative || [[Hungarian]] || Case=Del |
| <code>dela</code> || Delative || [[Hungarian]] || Case=Del |
||
|- |
|- |
||
− | | <code>term</code> || Terminative || [[Hungarian]], Estonian, ... || |
+ | | <code>term</code> || Terminative || [[Hungarian]], Estonian, ... || Case=Ter |
|- |
|- |
||
| <code>temp</code> || Temporal || [https://en.wikipedia.org/wiki/Temporal_case] || Case=Tem |
| <code>temp</code> || Temporal || [https://en.wikipedia.org/wiki/Temporal_case] || Case=Tem |
||
Line 341: | Line 370: | ||
|} |
|} |
||
− | ===Voice=== |
+ | ===Voice=== <!-- voice --> |
{|class=wikitable |
{|class=wikitable |
||
Line 360: | Line 389: | ||
|} |
|} |
||
− | ===Tense and mode=== |
+ | ===Tense and mode=== <!-- tense --> |
{|class=wikitable |
{|class=wikitable |
||
! Symbol !! Gloss !! Notes !! Universal features |
! Symbol !! Gloss !! Notes !! Universal features |
||
|- |
|- |
||
− | | <code> |
+ | | <code>aff</code> || Affirmative || [https://en.wikipedia.org/wiki/Affirmation_and_negation wikipedia] || Polarity=Pos |
|- |
|- |
||
− | | <code> |
+ | | <code>aor</code> || Aorist || [https://en.wikipedia.org/wiki/Aorist wikipedia] A tense in Turkic languages. || Tense=Past |
|- |
|- |
||
− | | <code> |
+ | | <code>cni</code> || Conditional || Lot of pairs will probably use cnd or cond... || Mood=Cnd |
|- |
|- |
||
− | | <code> |
+ | | <code>deb</code> || Debitive mode || Exclusive to Latvian ([https://en.wikipedia.org/wiki/Debitive wikipedia]) || |
|- |
|- |
||
− | | <code> |
+ | | <code>fti</code> || Future indicative || || Tense=Fut Mood=Ind |
|- |
|- |
||
− | | <code> |
+ | | <code>fts</code> || Future subjunctive || || Tense=Fut Mood=Sub |
|- |
|- |
||
− | | <code> |
+ | | <code>fut</code> || Future || || Tense=Fut |
|- |
|- |
||
− | | <code> |
+ | | <code>ger</code> || Gerund || [http://en.wikipedia.org/wiki/Gerund wikipedia] || VerbForm=Ger |
|- |
|- |
||
− | | <code> |
+ | | <code>ifi</code> || Past definite || from ''Pretério perfecto o indefinido'' || Tense=Past Definite=Def |
|- |
|- |
||
− | | <code> |
+ | | <code>imp</code> || Imperative || [http://www.englishlanguageguide.com/grammar/imperative.asp englishlanguageguide] || Mood=Imp |
|- |
|- |
||
− | | <code> |
+ | | <code>inf</code> || Infinitive || [https://en.wikipedia.org/wiki/Infinitive wikipedia] || VerbForm=Inf |
|- |
|- |
||
− | | <code> |
+ | | <code>infps</code> || Personal infinitive || Used in Portuguese || |
|- |
|- |
||
− | | <code> |
+ | | <code>itg</code> || Interrogative || || |
|- |
|- |
||
− | | <code> |
+ | | <code>ito</code> || Infinitive with 'to' || [[German]] || VerbForm=Inf |
|- |
|- |
||
− | | <code> |
+ | | <code>lp</code> || L-participle || |
|- |
|- |
||
− | | <code> |
+ | | <code>neg</code> || Negative || || Polarity=Neg |
|- |
|- |
||
− | | <code> |
+ | | <code>nonpast</code> || Non-past || || Tense=Pres,Fut |
|- |
|- |
||
− | | <code> |
+ | | <code>past</code> || Past || || Tense=Past |
|- |
|- |
||
− | | <code> |
+ | | <code>pii</code> || Imperfect || from ''Pretério imperfecto de indicativo'' [https://en.wikipedia.org/wiki/Imperfect wikipedia] || Tense=Past Mood=Ind |
+ | |- |
||
+ | | <code>pis</code> || Imperfect subjunctive || || Tense=Past Mood=Sub |
||
|- |
|- |
||
| <code>plu</code> || Pluperfect || In <code>cy-en</code> || Tense=Pqp |
| <code>plu</code> || Pluperfect || In <code>cy-en</code> || Tense=Pqp |
||
Line 407: | Line 438: | ||
| <code>pmp</code> || Pluperfect || In <code>es-gl</code> (from ''Pluscamperfecto'') || Tense=Pqp |
| <code>pmp</code> || Pluperfect || In <code>es-gl</code> (from ''Pluscamperfecto'') || Tense=Pqp |
||
|- |
|- |
||
+ | | <code>pp2</code> || Past participle (???) || It's at least used in the Esperanto dictionaries for future active participles, ''ont'' (seems quite odd) || |
||
− | | <code>prs</code> || Present subjunctive || [http://en.wikipedia.org/wiki/Present_subjunctive wikipedia] || Tense=Pres Mood=Sub |
||
|- |
|- |
||
− | | <code> |
+ | | <code>pp3</code> || Past participle (???) || It's at least used in the Esperanto dictionaries for past active participles, ''int'' (seems quite odd) || |
|- |
|- |
||
− | | <code> |
+ | | <code>pp</code> || Past participle || [http://en.wikipedia.org/wiki/Participle wikipedia] || VerbForm=Part |
|- |
|- |
||
− | | <code> |
+ | | <code>pprs</code> || Present participle || Also appears as <code>ppres</code> (deprecated) || VerbForm=Part |
|- |
|- |
||
− | | <code> |
+ | | <code>pprs</code> || Present participle || ''see also: pprs''. [http://en.wikipedia.org/wiki/Present_participle wikipedia] || Tense=Pres Mood=Part |
|- |
|- |
||
− | | <code> |
+ | | <code>pres</code> || Present || || Tense=Pres |
|- |
|- |
||
− | | <code> |
+ | | <code>pret</code> || Preterite || [https://en.wikipedia.org/wiki/Preterite Preterite] || Tense=Past |
|- |
|- |
||
− | | <code> |
+ | | <code>pri</code> || Present indicative || ''see also: pres''. [http://en.wikipedia.org/wiki/Present_indicative wikipedia] || Tense=Pres Mood=Ind |
− | |- |
+ | |- |
+ | | <code>prs</code> || Present subjunctive || [http://en.wikipedia.org/wiki/Present_subjunctive wikipedia] || Tense=Pres Mood=Sub |
||
+ | |- |
||
+ | | <code>supn</code> || Supine || [http://en.wikipedia.org/wiki/Supine wikipedia] || VerbForm=Sup |
||
+ | |} |
||
+ | |||
+ | ===Aspect=== <!-- aspect --> |
||
+ | {|class=wikitable |
||
+ | ! Symbol !! Gloss !! Notes !! Universal feature |
||
+ | |- |
||
+ | | <code>hab</code> || Habitual || || Aspect=Hab |
||
+ | |- |
||
+ | | <code>imperf</code> || Imperfective || || Aspect=Imp |
||
+ | |- |
||
+ | | <code>impf</code> || Imperfective || || Aspect=Imp |
||
+ | |- |
||
+ | | <code>perf</code> || Perfective || || Aspect=Perf |
||
|} |
|} |
||
− | ===Person=== |
+ | ===Person=== <!-- person --> |
Note: person can be a sub-category tag, e.g. with pronouns. |
Note: person can be a sub-category tag, e.g. with pronouns. |
||
Line 439: | Line 486: | ||
| <code>impers</code> || Impersonal || Sometimes called 'autonomous' || Person=0 |
| <code>impers</code> || Impersonal || Sometimes called 'autonomous' || Person=0 |
||
|- |
|- |
||
+ | | <code>past3p</code> || past third person || |
||
|} |
|} |
||
− | ===Derivations=== |
+ | ===Derivations=== <!-- verb_deriv --> |
{|class=wikitable |
{|class=wikitable |
||
! Symbol !! Gloss !! Notes |
! Symbol !! Gloss !! Notes |
||
Line 455: | Line 503: | ||
|} |
|} |
||
− | ===Possession=== |
+ | ===Possession=== <!-- possessor --> |
{|class=wikitable |
{|class=wikitable |
||
! Symbol !! Gloss !! Notes !! Universal feature |
! Symbol !! Gloss !! Notes !! Universal feature |
||
Line 475: | Line 523: | ||
|} |
|} |
||
− | === |
+ | ===Subject marking=== <!-- subject --> |
− | e.g. in verbs with both |
+ | e.g. in verbs with both, otherwise, see [[#Person]] and [[#Number]]. |
{|class=wikitable |
{|class=wikitable |
||
! Symbol !! Gloss !! Notes !! Universal features |
! Symbol !! Gloss !! Notes !! Universal features |
||
|- |
|- |
||
− | | <code> |
+ | | <code>s_sg1</code> || First person singular object || || Number=Sing Person=1 |
|- |
|- |
||
− | | <code> |
+ | | <code>s_sg2</code> || Second person singular object || || Number=Sing Person=2 |
|- |
|- |
||
− | | <code> |
+ | | <code>s_sg3</code> || Third person singular object || || Number=Sing Person=3 |
|- |
|- |
||
− | | <code> |
+ | | <code>s_pl1</code> || First person plural object || || Number=Plur Person=1 |
|- |
|- |
||
− | | <code> |
+ | | <code>s_pl2</code> || Second person plural object || || Number=Plur Person=2 |
|- |
|- |
||
− | | <code> |
+ | | <code>s_pl3</code> || Third person plural object || || Number=Plur Person=3 |
|- |
|- |
||
|} |
|} |
||
+ | |||
− | ===Proper nouns=== |
||
+ | ===Object marking=== <!-- object --> |
||
+ | |||
+ | e.g. in verbs with both |
||
{|class=wikitable |
{|class=wikitable |
||
! Symbol !! Gloss !! Notes !! Universal features |
! Symbol !! Gloss !! Notes !! Universal features |
||
|- |
|- |
||
+ | | <code>o_sg1</code> || First person singular object || |
||
− | | <code>ant</code> || Anthroponym || [http://en.wikipedia.org/wiki/Anthroponym wikipedia], it's very common to use ant together with f and m for traditionally gender-specific names |
||
|- |
|- |
||
+ | | <code>o_sg2</code> || Second person singular object || |
||
− | | <code>top</code> || Toponym || In some language pairs without the locative case this may be ''loc''. Although this should be changed. [http://en.wikipedia.org/wiki/Toponym wikipedia] |
||
|- |
|- |
||
− | | <code> |
+ | | <code>o_sg3</code> || Third person singular object || |
|- |
|- |
||
− | | <code> |
+ | | <code>o_pl1</code> || First person plural object || |
|- |
|- |
||
− | | <code> |
+ | | <code>o_pl2</code> || Second person plural object || |
+ | |- |
||
+ | | <code>o_pl3</code> || Third person plural object || |
||
|- |
|- |
||
− | | <code>al</code> || Altres || Other, misc. |
||
|} |
|} |
||
− | ===Adjectives=== |
+ | ===Adjectives=== <!-- adj_infl --> |
{|class=wikitable |
{|class=wikitable |
||
Line 525: | Line 577: | ||
| <code>sup</code> || Superlative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] || Degree=Sup |
| <code>sup</code> || Superlative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] || Degree=Sup |
||
|- |
|- |
||
− | | <code>attr</code> || Attributive || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] |
+ | | <code>attr</code> || Attributive || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] || |
|- |
|- |
||
− | | <code>pred</code> || Predicative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] |
+ | | <code>pred</code> || Predicative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] || |
+ | |- |
||
+ | |-<code>short</code> || Short adjective || |
||
|} |
|} |
||
+ | ===Others=== <!-- other --> |
||
− | |||
− | ===Others=== |
||
{|class=wikitable |
{|class=wikitable |
||
! Symbol !! Gloss !! Notes |
! Symbol !! Gloss !! Notes |
||
Line 539: | Line 592: | ||
| <code>date</code> || Dates, years... || |
| <code>date</code> || Dates, years... || |
||
|- |
|- |
||
− | | <code> |
+ | | <code>email</code> || Electronic Mail || Shorten form of Electronic Mail |
+ | |- |
||
+ | | <code>file</code> || Filenames || |
||
+ | |- |
||
+ | | <code>mon</code> || Money || |
||
|- |
|- |
||
| <code>percent</code> || Percentage || e.g. 25%, 0.9% |
| <code>percent</code> || Percentage || e.g. 25%, 0.9% |
||
+ | |- |
||
+ | | <code>time</code> || Time || |
||
|- |
|- |
||
| <code>web</code> || Links and Emails || |
| <code>web</code> || Links and Emails || |
||
|- |
|- |
||
− | | |
+ | |<code>maj</code> || Large script in which every letter is the same height || |
− | |- |
||
− | | <code>email</code> || Electronic Mail || Shorten form of Electronic Mail |
||
|- |
|- |
||
+ | |<code>min</code> || small script in which every letter is the same height || |
||
|} |
|} |
||
+ | === Compounds === <!-- compound --> |
||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | === Compounds === |
||
{|class=wikitable |
{|class=wikitable |
||
Line 570: | Line 621: | ||
* [[Tagging guidelines for Portuguese]] |
* [[Tagging guidelines for Portuguese]] |
||
− | ==Chunk tags== |
+ | ==Chunk tags== <!-- chunk --> |
{|class=wikitable |
{|class=wikitable |
||
Line 583: | Line 634: | ||
|} |
|} |
||
− | ==XML tags== |
+ | ==XML tags== <!-- xml --> |
Note: All XML tags are explained in depth in the PDF [[documentation]], see also the [https://github.com/apertium/lttoolbox/blob/master/lttoolbox/dix.dtd dix.dtd] and [https://github.com/apertium/lttoolbox/blob/master/lttoolbox/dix.rng dix.rng] files in the GitHub repository. |
Note: All XML tags are explained in depth in the PDF [[documentation]], see also the [https://github.com/apertium/lttoolbox/blob/master/lttoolbox/dix.dtd dix.dtd] and [https://github.com/apertium/lttoolbox/blob/master/lttoolbox/dix.rng dix.rng] files in the GitHub repository. |
||
Line 589: | Line 640: | ||
! XML tag !! Means !! Appears in XML tags / notes / examples |
! XML tag !! Means !! Appears in XML tags / notes / examples |
||
|- |
|- |
||
− | | <code><dictionary></code> || Mono- or bilingual dictionary || |
+ | | <code><dictionary></code> || Mono- or bilingual dictionary || Toplevel tag for all dictionaries |
|- |
|- |
||
| <code><alphabet></code> || Set of characters in the language|| In <code><dictionary></code> |
| <code><alphabet></code> || Set of characters in the language|| In <code><dictionary></code> |
||
Line 619: | Line 670: | ||
| <code><b></code> || Blank space || In <code><r></code>, <code><l></code> and <code><i></code>. Ex.: <code><l>you're<b/>welcome<s ...</code> |
| <code><b></code> || Blank space || In <code><r></code>, <code><l></code> and <code><i></code>. Ex.: <code><l>you're<b/>welcome<s ...</code> |
||
|- |
|- |
||
+ | | <code><g></code> || Group || For [[Chunking:_A_full_example#Handling_of_multiwords_with_inner_inflection|multiwords]] |
||
+ | |- |
||
+ | | <code><ig></code> || Identity group || Combination of <code><i></code> and <code><g></code> |
||
+ | |- |
||
+ | | <code><j></code> || Join || A <code>+</code> symbol in compounds. In [[Apertium-separable]], <code><j></code> indicates end-of-word |
||
+ | |- |
||
+ | | <code><prm></code> || Parameter || Only in [[Metadix]] |
||
+ | |- |
||
+ | | <code><sa></code> || Symbol Argument ??? || Only in [[Metadix]] |
||
+ | |- |
||
+ | | <code><t></code> || Tag or Template || In [[Apertium-separable]] <code><t></code> is any tag, in crossdix it is template (matches a single tag) |
||
+ | |- |
||
+ | | <code><v></code> || Variable || Only in crossdix - like + in regexes |
||
|} |
|} |
||
− | |||
− | TODO: Probably there are more. --[[User:Jacob Nordfalk|Jacob Nordfalk]] 14:47, 25 August 2008 (UTC) |
||
− | |||
− | Other tags: |
||
− | <pre> |
||
− | <j/> (in stream format #) is to mark multiwords |
||
− | |||
− | <t/> and <v/> are only in crossdix |
||
− | t = template, v = variable |
||
− | t matches any single tag, v is like + in regexes (0 or more) |
||
− | |||
− | <sa/> and <prm/> are only used in metadixes. |
||
− | 'sa' lets you add n optional extra tag, prm is an extra string for the paradigm |
||
− | </pre> |
||
=== Transfer === |
=== Transfer === |
||
− | ==== <clip> tag ==== |
+ | ==== <clip> tag ==== <!-- clip --> |
See the [https://wiki.apertium.org/w/images/d/d0/Apertium2-documentation.pdf documentation (pdf)], p.144 for more information. |
See the [https://wiki.apertium.org/w/images/d/d0/Apertium2-documentation.pdf documentation (pdf)], p.144 for more information. |
||
Line 656: | Line 706: | ||
==See also== |
==See also== |
||
* [[Syntax tags]] |
* [[Syntax tags]] |
||
+ | * [[Secondary tags]] |
||
* [[Apertium stream format]] |
* [[Apertium stream format]] |
||
* [[User:Adverick#FreeMind_Apertium_PoS|FreeMind Apertium PoS]] |
* [[User:Adverick#FreeMind_Apertium_PoS|FreeMind Apertium PoS]] |
Revision as of 18:08, 19 June 2020
This page lists the symbols in Apertium used to denote part-of-speech and further morphological features, as well as chunk tags used for more syntactic functions, as well as XML tags.
This page also documents alignment between Apertium morphological tags and Universal Dependencies POS tags and features.
This is meant to be a glossary of symbol names in alphabetical order with notes. Some of these names are specific to particular packages or language pairs, as not all languages have the same grammatical features (most don't have spatial distinction in articles for example).
If you were wondering what the symbols #, /, @, +, ~ or * mean, read Apertium stream format.
Part-of-speech Categories
Symbol | Gloss | Notes | Universal POS |
---|---|---|---|
n |
Noun | see 'np' for proper noun | NOUN |
vblex |
Standard ("lexical") verb | see also: vbser, vbhaver, vbmod, vaux, vbdo | VERB |
v |
Standard verb | shortened form of vblex, often used in agglutinative languages | VERB |
vbmod |
Modal verb | VERB | |
vbser |
Verb "to be" | from ser (to be) | VERB (or AUX) |
vbhaver |
Verb "to have" | from haver (to have) | VERB (or AUX) |
vbdo |
Verb "to do" | "to do" includes all eleven tenses and forms of to do, can also be an auxiliary verb | VERB (or AUX) |
vaux |
Auxiliary verb | wikipedia | AUX |
cop |
Copula | wikipedia; sometimes verb-like, sometimes not | AUX, ... |
adj |
Adjective | ADJ | |
adv |
Adverb | ADV | |
preadv |
Pre-adverb | ADV | |
postadv |
Post-adverb | ADV | |
mod |
Modal word | [1] | PART |
det |
Determiner | wikipedia | DET |
prn |
Pronoun | wikipedia | PRON |
pr |
Preposition | wikipedia | ADP |
post |
Postposition | ADP | |
num |
Numeral | NUM | |
np |
Proper noun | From nom propi wikipedia | PROPN |
ij |
Interjection | wikipedia | INTJ |
cnjcoo |
Co-ordinating conjunction | wikipedia | CCONJ |
cnjsub |
Sub-ordinating conjunction | SCONJ | |
cnjadv |
Conjunctive adverb | wikipedia | SCONJ, ADV |
atp |
Attachable prefix | In German, zusammen- |
Punctuation
Symbol | Gloss | Notes | Universal POS |
---|---|---|---|
sent |
Sentence-ending punctuation | e.g. full stop, question mark | PUNCT |
cm |
Comma punctuation | , | PUNCT PunctType=Comm |
lquot |
Left quote | « | PUNCT PunctType=Quot PunctSide=Ini |
rquot |
Right quote | » | PUNCT PunctType=Quot PunctSide=Fin |
lpar |
Left parenthesis | ( | PUNCT PunctType=Brck PunctSide=Ini |
rpar |
Right parenthesis | ) | PUNCT PunctType=Brck PunctSide=Fin |
guio |
Hyphen | - used to connect two words into one e.g. year-long | PUNCT PunctType=Dash |
apos |
Apostrophe | ' or ' | PUNCT |
quot |
Quotation | " | PUNCT PunctType=Quot |
percent |
Percentage | % | PUNCT |
lquest |
Left question/exclamation mark | ¿¡ (used in Spanish) | PUNCT PunctSide=Ini |
clb |
Clause Boundary | Refers to any of the following symbols: .?;:!·… | PUNCT |
punct |
Punctuation | PUNCT |
Part-of-speech Sub-categories
Gender
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).
Symbol | Gloss | Notes | Universal features |
---|---|---|---|
f |
Feminine | Gender=Fem | |
m |
Masculine | Gender=Masc | |
nt |
Neuter | Gender=Neut | |
ma |
Masculine (animate) | Mostly in Slavic languages | Gender=Masc |
mi |
Masculine (inanimate) | Mostly in Slavic languages | Gender=Masc |
mp |
Masculine (personal) | in Polish | Gender=Masc |
mn |
Masculine or neuter | Gender=Masc,Neut | |
fn |
Feminine or neuter | Gender=Fem,Neut | |
mf |
Masculine or feminine | Used when masculine and feminine have the same form | Gender=Masc,Fem |
mfn |
Masculine , feminine , neuter | Used when masculine, feminine, and neuter have the same form | Gender=Masc,Fem,Neut |
ut |
Common | From utrum, found in Scandinavian languages. | Gender=Com |
un |
Common or neuter | As above, only common or neuter | Gender=Com,Neut |
GD |
Gender to be determined |
Count/Mass
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
cnt |
Countable | ||
unc |
Uncountable (mass) |
Animacy
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
aa |
Animate | Animacy=Anim | |
an |
Animate or inanimate | Animacy=Anim,Inan | |
nn |
Inanimate | Animacy=Inan | |
hu |
Human | Animacy=Hum |
Adjectives
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
sint |
Synthetic | "nice, nicer, nicest" is synthetic. "handsome, more handsome, the most handsome" is not. wikipedia | |
preadj |
Pre-adjective | for languages where most of adjectives are after the noun (ex: French in eo->fr bidix) | |
preadj_nh |
Pre-adjective if not human | according to the noun, the adjective is before or after |
Pronoun types
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
pers |
Personal | PronType=Prs | |
tn |
Tónico | ||
detnt |
Neuter determiner | POS? | DET |
predet |
Pre determiner | POS? | DET |
atn |
Atónico | ||
qnt |
Quantifier | PronType=Ind | |
ord |
Ordinal | NumType=Ord | |
obj |
Object | ||
subj |
Subject | ||
pro |
Proclitic | ||
enc |
Enclitic | ||
acr |
Acronym | Not Pronuon? | Abbr=Yes |
rel |
Relative | PronType=Rel | |
ind |
Indefinite | PronType=Ind | |
itg |
Interrogative | PronType=Int | |
dem |
Demonstrative | PronType=Dem | |
def |
Definite | ||
pos |
Possessive | Poss=Yes | |
ref |
Reflexive | Reflex=Yes | |
prx |
Proximate | ||
dst |
Distal | ||
expl |
Syntactic expletive | wikipedia | |
rec |
Reciprocal Pronoun | ||
res |
Reciprocal Pronoun |
Transitivity
Used for verbs.
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
tv |
Transitive | takes direct object in accusative case (used in Turkic) | Subcat=Tran |
iv |
Intransitive | does not take direct object in accusative case (used in Turkic) | Subcat=Intr |
TD |
Transitivity to be determined | if the sub-category is [currently] unknown |
Separable verbs
Symbol | Gloss | Notes |
---|---|---|
sep |
Separable verb | wikipedia, lingolia, PDF |
fs |
Separable verb in subordinate clause | |
fm |
Separable verb in main clause |
Proper nouns
Symbol | Gloss | Notes | Universal features |
---|---|---|---|
ant |
Anthroponym | wikipedia, it's very common to use ant together with f and m for traditionally gender-specific names | |
top |
Toponym | In some language pairs without the locative case this may be loc. Although this should be changed. wikipedia | |
hyd |
Hydronym | wikipedia | |
cog |
Cognomen | In normal use, surnames | |
org |
Organisation | ||
al |
Altres | Other, misc. | |
pat |
Patronymic | A name derived from the name of a father or ancestor, e.g. Johnson, O'Brien, Ivanovich. |
Inflectional morphology
Number
Note: number can be a sub-category tag too, e.g. with pronouns.
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
sg |
Singular | Number=Sing | |
pl |
Plural | Number=Plur | |
sp |
Singular or plural | Number=Sing,Plur | |
du |
Dual | Number=Dual | |
ct |
Count | see mk-bg | Number=Count |
coll |
Collective | Number=Coll | |
ND |
Number to be determined |
Case
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
nom |
Nominative | Case=Nom | |
acc |
Accusative | Case=Acc | |
dat |
Dative | Case=Dat | |
gen |
Genitive | Case=Gen | |
dg |
Dative and Genitive | in ro-es, discouraged in new developments | Case=Dat,Gen |
voc |
Vocative | Case=Voc | |
abl |
Ablative | wikipedia | Case=Abl |
ins |
Instrumental or Instructive | wikipedia | Case=Ins |
loc |
Locative | wikipedia | Case=Loc |
prp |
Prepositional | wikipedia | |
tra |
Translative | Case=Tra | |
ill |
Illative | Case=Ill | |
ine |
Inessive | Case=Ine | |
ade |
Adessive | Case=Ade | |
all |
Allative | Case=All | |
abe |
Abessive | Case=Abe | |
ess |
Essive | Case=Ess | |
par |
Partitive | Case=Par | |
dis |
Distributive | Case=Dis | |
com |
Comitative | Case=Com | |
soc |
Sociative | ||
prl |
Prolative | Case=Pro | |
ses |
Superessive | Hungarian | Case=Sup |
sub |
Sublative | Hungarian | Case=Sub |
dela |
Delative | Hungarian | Case=Del |
term |
Terminative | Hungarian, Estonian, ... | Case=Ter |
temp |
Temporal | [2] | Case=Tem |
obl |
Oblique | [3] | Case=Obl |
erg |
Ergative | [4] | Case=Erg |
Voice
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
actv |
Active voice | Voice=Act | |
pass |
Passive voice | is more used in Turkic. | Voice=Pass |
pasv |
Passive voice | is more used in Germanic. | Voice=PAss |
midv |
Middle voice | Voice=Mid | |
nactv |
Non-active voice | See Albanian. | |
caus |
Causative voice | see also #Derivations | Voice=Cau |
Tense and mode
Symbol | Gloss | Notes | Universal features |
---|---|---|---|
aff |
Affirmative | wikipedia | Polarity=Pos |
aor |
Aorist | wikipedia A tense in Turkic languages. | Tense=Past |
cni |
Conditional | Lot of pairs will probably use cnd or cond... | Mood=Cnd |
deb |
Debitive mode | Exclusive to Latvian (wikipedia) | |
fti |
Future indicative | Tense=Fut Mood=Ind | |
fts |
Future subjunctive | Tense=Fut Mood=Sub | |
fut |
Future | Tense=Fut | |
ger |
Gerund | wikipedia | VerbForm=Ger |
ifi |
Past definite | from Pretério perfecto o indefinido | Tense=Past Definite=Def |
imp |
Imperative | englishlanguageguide | Mood=Imp |
inf |
Infinitive | wikipedia | VerbForm=Inf |
infps |
Personal infinitive | Used in Portuguese | |
itg |
Interrogative | ||
ito |
Infinitive with 'to' | German | VerbForm=Inf |
lp |
L-participle | ||
neg |
Negative | Polarity=Neg | |
nonpast |
Non-past | Tense=Pres,Fut | |
past |
Past | Tense=Past | |
pii |
Imperfect | from Pretério imperfecto de indicativo wikipedia | Tense=Past Mood=Ind |
pis |
Imperfect subjunctive | Tense=Past Mood=Sub | |
plu |
Pluperfect | In cy-en |
Tense=Pqp |
pmp |
Pluperfect | In es-gl (from Pluscamperfecto) |
Tense=Pqp |
pp2 |
Past participle (???) | It's at least used in the Esperanto dictionaries for future active participles, ont (seems quite odd) | |
pp3 |
Past participle (???) | It's at least used in the Esperanto dictionaries for past active participles, int (seems quite odd) | |
pp |
Past participle | wikipedia | VerbForm=Part |
pprs |
Present participle | Also appears as ppres (deprecated) |
VerbForm=Part |
pprs |
Present participle | see also: pprs. wikipedia | Tense=Pres Mood=Part |
pres |
Present | Tense=Pres | |
pret |
Preterite | Preterite | Tense=Past |
pri |
Present indicative | see also: pres. wikipedia | Tense=Pres Mood=Ind |
prs |
Present subjunctive | wikipedia | Tense=Pres Mood=Sub |
supn |
Supine | wikipedia | VerbForm=Sup |
Aspect
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
hab |
Habitual | Aspect=Hab | |
imperf |
Imperfective | Aspect=Imp | |
impf |
Imperfective | Aspect=Imp | |
perf |
Perfective | Aspect=Perf |
Person
Note: person can be a sub-category tag, e.g. with pronouns.
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
p1 |
First person | Person=1 | |
p2 |
Second person | Person=2 | |
p3 |
Third person | Person=3 | |
impers |
Impersonal | Sometimes called 'autonomous' | Person=0 |
past3p |
past third person |
Derivations
Symbol | Gloss | Notes |
---|---|---|
caus |
Causative | |
ingr |
Ingressive | https://nn.wikipedia.org/w/index.php?title=Ingressiv |
subs |
Verbal Noun or Verbal Substantive | Shorten form of substantive. Noun formed from a verb |
agnt |
Agent noun | Agent Noun |
Possession
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
px1sg |
First person singular possessive | e.g. in Turkic languages | Person[psor]=1 Number[psor]=Sing |
px2sg |
Second person singular possessive | e.g. in Turkic languages | Person[psor]=2 Number[psor]=Sing |
px3sg |
Third person singular possessive | e.g. in Turkic languages | Person[psor]=3 Number[psor]=Sing |
px1pl |
First person plural possessive | e.g. in Turkic languages | Person[psor]=1 Number[psor]=Plur |
px2pl |
Second person plural possessive | e.g. in Turkic languages | Person[psor]=2 Number[psor]=Plur |
px3pl |
Third person plural possessive | e.g. in Turkic languages | Person[psor]=3 Number[psor]=Plur |
px3sp |
Third person possessive singular or plural | e.g. in Turkic languages | Person[psor]=3 |
Subject marking
e.g. in verbs with both, otherwise, see #Person and #Number.
Symbol | Gloss | Notes | Universal features |
---|---|---|---|
s_sg1 |
First person singular object | Number=Sing Person=1 | |
s_sg2 |
Second person singular object | Number=Sing Person=2 | |
s_sg3 |
Third person singular object | Number=Sing Person=3 | |
s_pl1 |
First person plural object | Number=Plur Person=1 | |
s_pl2 |
Second person plural object | Number=Plur Person=2 | |
s_pl3 |
Third person plural object | Number=Plur Person=3 |
Object marking
e.g. in verbs with both
Symbol | Gloss | Notes | Universal features |
---|---|---|---|
o_sg1 |
First person singular object | ||
o_sg2 |
Second person singular object | ||
o_sg3 |
Third person singular object | ||
o_pl1 |
First person plural object | ||
o_pl2 |
Second person plural object | ||
o_pl3 |
Third person plural object |
Adjectives
Symbol | Gloss | Notes | Universal features |
---|---|---|---|
pst |
Positive | Degree=Pos | |
comp |
Comparative | wikipedia | Degree=Comp |
sup |
Superlative | wikipedia | Degree=Sup |
attr |
Attributive | wikipedia | |
pred |
Predicative | wikipedia |
Others
Symbol | Gloss | Notes |
---|---|---|
abbr |
Abbreviation (e.g. etc., Mr.) | Acronyms are also included (see acr )
|
date |
Dates, years... | |
email |
Electronic Mail | Shorten form of Electronic Mail |
file |
Filenames | |
mon |
Money | |
percent |
Percentage | e.g. 25%, 0.9% |
time |
Time | |
web |
Links and Emails | |
maj |
Large script in which every letter is the same height | |
min |
small script in which every letter is the same height |
Compounds
Symbol | Gloss | Notes | Universal feature |
---|---|---|---|
cmp |
Compound Noun |
See also
Chunk tags
Tag | Description |
---|---|
<SN> |
Noun phrase / noun group (sintagma nominal) |
<SA> |
Adjective phrase / adjective group |
<SV> |
Verb phrase / verb group (sintagma verbal) |
XML tags
Note: All XML tags are explained in depth in the PDF documentation, see also the dix.dtd and dix.rng files in the GitHub repository.
XML tag | Means | Appears in XML tags / notes / examples |
---|---|---|
<dictionary> |
Mono- or bilingual dictionary | Toplevel tag for all dictionaries |
<alphabet> |
Set of characters in the language | In <dictionary>
|
<sdefs> |
Symbol definitions | In <dictionary>
|
<sdef> |
Symbol definition | In <sdefs> . Ex: <sdef n="noun"/>
|
<pardefs> |
Paradigm definitions | In <dictionary> .
|
<pardef> |
Paradigm definition | In <pardefs> .
|
<section> |
A section of the dictionary | In <dictionary> . Ex: <section id="main" type="standard">
|
<e> |
A dictionary entry (a word) | In <section> and in <pardef> .
|
<i> |
Invariant (left and right side) | In <e> . Ex.: <i>beer</i>
|
<p> |
A pair | In <e> .
|
<l> |
Left side (surface form) | In <p> . Ex.: <l>beer</l>
|
<r> |
Right side (lexical unit) | In <p> . Ex.: <r>beer<s n="noun"/><s n="singular"/></r>
|
<s> |
A lexical symbol (noun, adj..) | In <r> , <l> and <i> . Ex.: <s n="noun"/>
|
<a> |
Post-generator wake-up mark | In <r> , <l> and <i> . Ex.: <l><a/>a<s ... (for the a/an rule in English)
|
<b> |
Blank space | In <r> , <l> and <i> . Ex.: <l>you're<b/>welcome<s ...
|
<g> |
Group | For multiwords |
<ig> |
Identity group | Combination of <i> and <g>
|
<j> |
Join | A + symbol in compounds. In Apertium-separable, <j> indicates end-of-word
|
<prm> |
Parameter | Only in Metadix |
<sa> |
Symbol Argument ??? | Only in Metadix |
<t> |
Tag or Template | In Apertium-separable <t> is any tag, in crossdix it is template (matches a single tag)
|
<v> |
Variable | Only in crossdix - like + in regexes |
Transfer
<clip> tag
See the documentation (pdf), p.144 for more information.
XML attribute value | Means | Appears in attribute | Notes |
---|---|---|---|
whole |
lemma and grammatical symbols | part | |
lem |
lemma | part | |
lemh |
(inflected) head word of multiword | part | |
lemq |
following queue of multiword | part |