https://wiki.apertium.org/w/api.php?action=feedcontributions&user=Albertonl&feedformat=atom
Apertium - User contributions [en]
2024-03-28T15:41:30Z
User contributions
MediaWiki 1.34.1
https://wiki.apertium.org/w/index.php?title=List_of_symbols&diff=70851
List of symbols
2019-12-10T14:46:36Z
<p>Albertonl: </p>
<hr />
<div>[[Liste de symboles|En français]] · [[Список символов|по-русски]]<br />
<br />
This page lists the symbols in Apertium used to denote part-of-speech and further morphological features, as well as chunk tags used for more syntactic functions, as well as XML tags.<br />
<br />
<br />
{{TOCD}}<br />
This is meant to be a glossary of symbol names in alphabetical order with notes. Some of these names are specific to particular packages or language pairs, as not all languages have the same grammatical features (most don't have spatial distinction in articles for example).<br />
<br />
If you were wondering what the symbols #, /, @, +, ~ or * mean, read [[Apertium stream format]].<br />
<br />
==Part-of-speech Categories==<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal POS<br />
|-<br />
| <code>n</code> || Noun || ''see 'np' for proper noun'' || NOUN<br />
|-<br />
| <code>vblex</code> || Standard ("lexical") verb || ''see also: vbser, vbhaver, vbmod, vaux, vbdo'' || VERB<br />
|-<br />
| <code>v</code> || Standard verb || shortened form of vblex, often used in agglutinative languages || VERB<br />
|-<br />
| <code>vbmod</code> || Modal verb || || VERB<br />
|-<br />
| <code>vbser</code> || Verb "to be" || from ''ser'' (to be) || VERB (or AUX)<br />
|-<br />
| <code>vbhaver</code> || Verb "to have" || from ''haver'' (to have) || VERB (or AUX)<br />
|-<br />
| <code>vbdo</code> || Verb "to do" || "to do" includes all eleven tenses and forms of to do, can also be an auxiliary verb || VERB (or AUX)<br />
|-<br />
| <code>vaux</code> || Auxiliary verb || [http://en.wikipedia.org/wiki/Auxilliary_verb wikipedia] || AUX<br />
|-<br />
| <code>cop</code> || Copula || [http://en.wikipedia.org/wiki/Copula_(linguistics) wikipedia]; sometimes verb-like, sometimes not || AUX, ...<br />
|- <br />
| <code>adj</code> || Adjective || || ADJ<br />
|-<br />
| <code>adv</code> || Adverb || || ADV<br />
|-<br />
| <code>preadv</code> || Pre-adverb || || ADV<br />
|-<br />
| <code>postadv</code> || Post-adverb || || ADV<br />
|-<br />
| <code>mod</code> || Modal word || [http://dic.academic.ru/dic.nsf/lingvistic/749] || PART<br />
|-<br />
| <code>det</code> || Determiner || [http://en.wikipedia.org/wiki/Determiner_(class) wikipedia] || DET<br />
|-<br />
| <code>prn</code> || Pronoun || [http://en.wikipedia.org/wiki/Pronoun wikipedia] || PRON<br />
|-<br />
| <code>pr</code> || Preposition || [http://en.wikipedia.org/wiki/Preposition wikipedia] || ADP<br />
|-<br />
| <code>post</code> || Postposition || || ADP<br />
|-<br />
| <code>num</code> || Numeral || || NUM<br />
|-<br />
| <code>np</code> || Proper noun || From ''nom propi'' [http://en.wikipedia.org/wiki/Proper_noun wikipedia] || PROPN<br />
|-<br />
| <code>ij</code> || Interjection || [http://en.wikipedia.org/wiki/Interjection wikipedia] || INTJ<br />
|-<br />
| <code>cnjcoo</code> || Co-ordinating conjunction || [http://en.wikipedia.org/wiki/Co-ordinating_conjunction wikipedia] || CCONJ<br />
|-<br />
| <code>cnjsub</code> || Sub-ordinating conjunction || || SCONJ<br />
|-<br />
| <code>cnjadv</code> || Conjunctive adverb || [http://en.wikipedia.org/wiki/Conjunctive_adverb wikipedia] || SCONJ, ADV<br />
|-<br />
| <code>agnt</code> || Agent noun || [https://en.wikipedia.org/wiki/Agent_noun Agent Noun]<br />
|-<br />
| <code>atp</code> || Attachable prefix || In [[German]], ''zusammen''- ||<br />
|}<br />
<br />
==Part-of-speech Sub-categories==<br />
<br />
===Gender===<br />
<br />
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal featurs<br />
|-<br />
| <code>f</code> || Feminine || || Gender=Fem<br />
|-<br />
| <code>m</code> || Masculine || || Gender=Masc<br />
|-<br />
| <code>nt</code> || Neuter || || Gender=Neut<br />
|-<br />
| <code>ma</code> || Masculine (animate) || Mostly in Slavic languages || Gender=Masc<br />
|-<br />
| <code>mi</code> || Masculine (inanimate) || Mostly in Slavic languages || Gender=Masc<br />
|-<br />
| <code>mp</code> || Masculine (personal) || in Polish || Gender=Masc<br />
|-<br />
| <code>mn</code> || Masculine or neuter || || Gender=Masc,Neut<br />
|-<br />
| <code>fn</code> || Feminine or neuter || || Gender=Fem,Neut<br />
|-<br />
| <code>mf</code> || Masculine or feminine || This is used where the gender can be either masculine or feminine || Gender=Masc,Fem<br />
|-<br />
| <code>mfn</code> || Masculine , feminine , neuter || This is used where the gender can be either masculine, feminine or neuter || Gender=Masc,Fem,Neut<br />
|-<br />
| <code>ut</code> || Common || From ''utrum'', found in Scandinavian languages. || Gender=Com<br />
|-<br />
| <code>un</code> || Common or neuter || As above, only common or neuter || Gender=Com,Neut<br />
|-<br />
| <code>GD</code> || Gender to be determined || <br />
|- <br />
|}<br />
<br />
===Count/Mass===<br />
<br />
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal feature<br />
|-<br />
| <code>cnt</code> || Countable ||<br />
|-<br />
| <code>unc</code> || Uncountable (mass) ||<br />
|- <br />
|}<br />
<br />
===Animacy===<br />
<br />
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal feature<br />
|-<br />
| <code>aa</code> || Animate ||<br />
|-<br />
| <code>an</code> || Animate or inanimate || <br />
|-<br />
| <code>nn</code> || Inanimate || <br />
|-<br />
|}<br />
<br />
===Adjectives===<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal feature<br />
|-<br />
| <code>sint</code> || Synthetic || "nice, nicer, nicest" is synthetic. "handsome, more handsome, the most handsome" is not. [http://en.wikipedia.org/wiki/Synthetic_language wikipedia]<br />
|-<br />
| <code>preadj</code> || Pre-adjective || for languages where most of adjectives are after the noun (ex: French in eo->fr bidix)<br />
|-<br />
| <code>preadj_nh</code> || Pre-adjective if not human || according to the noun, the adjective is before or after<br />
|-<br />
|}<br />
<br />
===Pronoun types ===<br />
<br />
{| class="wikitable" border="1"<br />
! Symbol !! Gloss !! Notes !! Universal feature<br />
|-<br />
| <code>pers</code> || Personal || || PronType=Prs<br />
|-<br />
| <code>tn</code> || Tónico ||<br />
|-<br />
| <code>detnt</code> || Neuter determiner || POS? || DET<br />
|-<br />
| <code>predet</code> || Pre determiner || POS? || DET<br />
|-<br />
| <code>atn</code> || Atónico ||<br />
|-<br />
| <code>qnt</code> || Quantifier || || PronType=Ind<br />
|-<br />
| <code>ord</code> || Ordinal || || NumType=Ord<br />
|-<br />
| <code>obj</code> || Object ||<br />
|-<br />
| <code>subj</code> || Subject ||<br />
|-<br />
| <code>pro</code> || Proclitic ||<br />
|-<br />
| <code>enc</code> || Enclitic ||<br />
|-<br />
| <code>acr</code> || Acronym || Not Pronuon? || Abbr=Yes<br />
|-<br />
| <code>rel</code> || Relative || || PronType=Rel<br />
|-<br />
| <code>ind</code> || Indefinite || || PronType=Ind<br />
|-<br />
| <code>itg</code> || Interrogative || || PronType=Int<br />
|-<br />
| <code>dem</code> || Demonstrative || || PronType=Dem<br />
|-<br />
| <code>def</code> || Definite ||<br />
|-<br />
| <code>pos</code> || Possessive || || Poss=Yes<br />
|-<br />
| <code>ref</code> || Reflexive || || Reflex=Yes<br />
|-<br />
| <code>prx</code> || Proximate ||<br />
|-<br />
| <code>dst</code> || Distal ||<br />
|}<br />
<br />
=== Transitivity ===<br />
<br />
Used for verbs.<br />
<br />
{| class="wikitable" border="1"<br />
! Symbol !! Gloss !! Notes !! Universal feature<br />
|-<br />
| <code>tv</code> || Transitive || takes direct object in accusative case (used in Turkic)<br />
|-<br />
| <code>iv</code> || Intransitive || does not take direct object in accusative case (used in Turkic)<br />
|-<br />
| <code>TD</code> || Transitivity to be determined || if the sub-category is [currently] unknown<br />
|}<br />
<br />
===Separable verbs===<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes<br />
|-<br />
| <code>sep</code> || Separable verb || [https://en.wikipedia.org/wiki/Separable_verb wikipedia], [https://deutsch.lingolia.com/en/grammar/verbs/separable-verbs lingolia], [https://aclweb.org/anthology/P98-1078.pdf PDF]<br />
|-<br />
| <code>fs</code> || Separable verb in subordinate clause ||<br />
|-<br />
| <code>fm</code> || Separable verb in main clause ||<br />
|-<br />
|}<br />
<br />
=== Punctuation ===<br />
{| class="wikitable" border="1"<br />
! Symbol !! Gloss !! Notes !! Universal feature<br />
|-<br />
| <code>sent</code> || Sentence-ending punctuation || e.g. full stop, question mark || PUNCT<br />
|-<br />
| <code>cm</code> || Comma punctuation || , || PUNCT<br />
|-<br />
| <code>lquot</code> || Left quote || « || PUNCT<br />
|-<br />
| <code>rquot</code> || Right quote || » || PUNCT<br />
|-<br />
| <code>lpar</code> || Left parenthesis || ( || PUNCT<br />
|-<br />
| <code>rpar</code> || Right parenthesis || ) || PUNCT<br />
|- <br />
| <code>guio</code> || Hyphen || - || PUNCT<br />
|- <br />
| <code>apos</code> || Apostrophe || ' or ' || PUNCT<br />
|- <br />
| <code>lquest</code> || Left question/exclamation mark || ¿¡ (''used in Spanish'') || PUNCT<br />
|-<br />
|}<br />
<br />
== Inflectional morphology ==<br />
<br />
===Number===<br />
Note: number can be a sub-category tag too, e.g. with pronouns.<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal feature<br />
|-<br />
| <code>sg</code> || Singular || || Number=Sing<br />
|-<br />
| <code>pl</code> || Plural || || Number=Plur<br />
|-<br />
| <code>sp</code> || Singular or plural || || Number=Sing,Plur<br />
|-<br />
| <code>du</code> || Dual || || Number=Dual<br />
|-<br />
| <code>ct</code> || Count || see mk-bg || Number=Count<br />
|-<br />
| <code>coll</code> || Collective || || Number=Coll<br />
|-<br />
| <code>ND</code> || Number to be determined ||<br />
|-<br />
|}<br />
<br />
<br />
===Case===<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal feature<br />
|-<br />
| <code>nom</code> || Nominative || || Case=Nom<br />
|-<br />
| <code>acc</code> || Accusative || || Case=Acc<br />
|-<br />
| <code>dat</code> || Dative || || Case=Dat<br />
|-<br />
| <code>gen</code> || Genitive || || Case=Gen<br />
|-<br />
| <code>dg</code> || Dative and Genitive || in [[ro-es]], discouraged in new developments || Case=Dat,Gen<br />
|-<br />
| <code>voc</code> || Vocative || || Case=Voc<br />
|-<br />
| <code>abl</code> || Ablative || [http://en.wikipedia.org/wiki/Ablative wikipedia] || Case=Abl<br />
|-<br />
| <code>ins</code> || Instrumental or Instructive || [http://en.wikipedia.org/wiki/Instrumental_case wikipedia] || Case=Ins<br />
|-<br />
| <code>loc</code> || Locative || [http://en.wikipedia.org/wiki/Locative wikipedia] || Case=Loc<br />
|-<br />
| <code>prp</code> || Prepositional || [http://en.wikipedia.org/wiki/Prepositional wikipedia] <br />
|-<br />
| <code>tra</code> || Translative || || Case=Tra<br />
|-<br />
| <code>ill</code> || Illative || || Case=Ill<br />
|-<br />
| <code>ine</code> || Inessive || || Case=Ine<br />
|-<br />
| <code>ade</code> || Adessive || || Case=Ade<br />
|-<br />
| <code>all</code> || Allative || || Case=All<br />
|-<br />
| <code>abe</code> || Abessive || || Case=Abe<br />
|-<br />
| <code>ess</code> || Essive || || Case=Ess<br />
|-<br />
| <code>par</code> || Partitive || || Case=Par<br />
|-<br />
| <code>dis</code> || Distributive || || Case=Dis<br />
|-<br />
| <code>com</code> || Comitative || || Case=Com<br />
|-<br />
| <code>soc</code> || Sociative || || <br />
|-<br />
| <code>prl</code> || Prolative || || Case=Pro<br />
|-<br />
| <code>ses</code> || Superessive || [[Hungarian]] || Case=Sup<br />
|-<br />
| <code>sub</code> || Sublative || [[Hungarian]] || Case=Sub<br />
|-<br />
| <code>dela</code> || Delative || [[Hungarian]] || Case=Del<br />
|-<br />
| <code>term</code> || Terminative || [[Hungarian]], Estonian, ... ||<br />
|}<br />
<br />
===Voice===<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal feature<br />
|-<br />
| <code>actv</code> || Active voice || || Voice=Act<br />
|-<br />
| <code>pass</code> || Passive voice || is more used in Turkic. || Voice=Pass<br />
|-<br />
| <code>pasv</code> || Passive voice || is more used in Germanic. || Voice=PAss<br />
|-<br />
| <code>midv</code> || Middle voice || || Voice=Mid<br />
|-<br />
| <code>nactv</code> || Non-active voice || See Albanian. || <br />
|-<br />
| <code>caus</code> || Causative voice || see also [[#Derivations]] || Voice=Cau<br />
|-<br />
|}<br />
<br />
===Tense and mode===<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal features<br />
|-<br />
| <code>pres</code> || Present || || Tense=Pres<br />
|-<br />
| <code>pret</code> || Preterite || [https://en.wikipedia.org/wiki/Preterite Preterite] || Tense=Past<br />
|-<br />
| <code>past</code> || Past || || Tense=Past<br />
|-<br />
| <code>imp</code> || Imperative || [http://www.englishlanguageguide.com/grammar/imperative.asp englishlanguageguide] || Mood=Imp<br />
|-<br />
| <code>inf</code> || Infinitive || [https://en.wikipedia.org/wiki/Infinitive wikipedia] || VerbForm=Inf<br />
|-<br />
| <code>ito</code> || Infinitive with 'to' || [[German]] || VerbForm=Inf<br />
|-<br />
| <code>aor</code> || Aorist || [https://en.wikipedia.org/wiki/Aorist wikipedia] A tense in Turkic languages. || Tense=Past<br />
|-<br />
| <code>pp</code> || Past participle || [http://en.wikipedia.org/wiki/Participle wikipedia] || VerbForm=Part<br />
|-<br />
| <code>pp2</code> || Past participle (???) || It's at least used in the Esperanto dictionaries for future active participles, ''ont'' (seems quite odd) ||<br />
|-<br />
| <code>pp3</code> || Past participle (???) || It's at least used in the Esperanto dictionaries for past active participles, ''int'' (seems quite odd) ||<br />
|-<br />
| <code>pprs</code> || Present participle || Also appears as <code>ppres</code> (deprecated) || VerbForm=Part<br />
|-<br />
| <code>ger</code> || Gerund || [http://en.wikipedia.org/wiki/Gerund wikipedia] || VerbForm=Ger<br />
|-<br />
| <code>supn</code> || Supine || [http://en.wikipedia.org/wiki/Supine wikipedia] || VerbForm=Sup<br />
|-<br />
| <code>pri</code> || Present indicative || ''see also: pres''. [http://en.wikipedia.org/wiki/Present_indicative wikipedia] || Tense=Pres Mood=Ind<br />
|-<br />
| <code>pii</code> || Imperfect || from ''Pretério imperfecto de indicativo'' [https://en.wikipedia.org/wiki/Imperfect wikipedia] || Tense=Past Mood=Ind<br />
|-<br />
| <code>fti</code> || Future indicative || || Tense=Fut Mood=Ind<br />
|-<br />
| <code>fts</code> || Future subjunctive || || Tense=Fut Mood=Sub<br />
|-<br />
| <code>cni</code> || Conditional || Lot of pairs will probably use cnd or cond... || Mood=Cnd<br />
|-<br />
| <code>plu</code> || Pluperfect || In <code>cy-en</code> || Tense=Pqp<br />
|-<br />
| <code>pmp</code> || Pluperfect || In <code>es-gl</code> (from ''Pluscamperfecto'') || Tense=Pqp<br />
|-<br />
| <code>prs</code> || Present subjunctive || [http://en.wikipedia.org/wiki/Present_subjunctive wikipedia] || Tense=Pres Mood=Sub<br />
|-<br />
| <code>pis</code> || Imperfect subjunctive || || Tense=Past Mood=Sub<br />
|-<br />
| <code>ifi</code> || Past definite || from ''Pretério perfecto o indefinido'' || Tense=Past Definite=Def<br />
|-<br />
| <code>aff</code> || Affirmative || [https://en.wikipedia.org/wiki/Affirmation_and_negation wikipedia] || Polarity=Pos<br />
|-<br />
| <code>itg</code> || Interrogative || ||<br />
|-<br />
| <code>neg</code> || Negative || || Polarity=Neg<br />
|-<br />
| <code>lp</code> || L-participle || <br />
|-<br />
| <code>deb</code> || Debitive mode || Exclusive to Latvian ([https://en.wikipedia.org/wiki/Debitive wikipedia]) ||<br />
|- <br />
|}<br />
<br />
===Person===<br />
Note: person can be a sub-category tag, e.g. with pronouns.<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal feature<br />
|-<br />
| <code>p1</code> || First person || || Person=1<br />
|-<br />
| <code>p2</code> || Second person || || Person=2<br />
|-<br />
| <code>p3</code> || Third person || || Person=3<br />
|-<br />
| <code>impers</code> || Impersonal || Sometimes called 'autonomous' || Person=0<br />
|-<br />
|}<br />
<br />
===Derivations===<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes<br />
|-<br />
| <code>caus</code> || Causative ||<br />
|-<br />
| <code>ingr</code> || Ingressive || https://nn.wikipedia.org/w/index.php?title=Ingressiv<br />
|-<br />
| <code>subs</code> || Verbal Noun or Verbal Substantive || Shorten form of ''substantive''. Noun formed from a verb<br />
|}<br />
<br />
===Possession===<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal feature<br />
|-<br />
| <code>px1sg</code> || First person singular possessive || e.g. in [[Turkic languages]] || Person[psor]=1 Number[psor]=Sing<br />
|-<br />
| <code>px2sg</code> || Second person singular possessive || e.g. in [[Turkic languages]] || Person[psor]=2 Number[psor]=Sing<br />
|-<br />
| <code>px3sg</code> || Third person singular possessive || e.g. in [[Turkic languages]] || Person[psor]=3 Number[psor]=Sing<br />
|-<br />
| <code>px1pl</code> || First person plural possessive || e.g. in [[Turkic languages]] || Person[psor]=1 Number[psor]=Plur<br />
|-<br />
| <code>px2pl</code> || Second person plural possessive || e.g. in [[Turkic languages]] || Person[psor]=2 Number[psor]=Plur<br />
|-<br />
| <code>px3pl</code> || Third person plural possessive || e.g. in [[Turkic languages]] || Person[psor]=3 Number[psor]=Plur<br />
|-<br />
| <code>px3sp</code> || Third person possessive singular or plural || e.g. in [[Turkic languages]] || Person[psor]=3<br />
|-<br />
|}<br />
<br />
===Object marking===<br />
<br />
e.g. in verbs with both<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal features<br />
|-<br />
| <code>o_sg1</code> || First person singular object || <br />
|-<br />
| <code>o_sg2</code> || Second person singular object || <br />
|-<br />
| <code>o_sg3</code> || Third person singular object || <br />
|-<br />
| <code>o_pl1</code> || First person plural object || <br />
|-<br />
| <code>o_pl2</code> || Second person plural object || <br />
|-<br />
| <code>o_pl3</code> || Third person plural object || <br />
|-<br />
|}<br />
<br />
===Proper nouns===<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal features<br />
|-<br />
| <code>ant</code> || Anthroponym || [http://en.wikipedia.org/wiki/Anthroponym wikipedia], it's very common to use ant together with f and m for traditionally gender-specific names<br />
|-<br />
| <code>top</code> || Toponym || In some language pairs without the locative case this may be ''loc''. Although this should be changed. [http://en.wikipedia.org/wiki/Toponym wikipedia]<br />
|-<br />
| <code>hyd</code> || Hydronym || [http://en.wikipedia.org/wiki/Hydronym wikipedia]<br />
|-<br />
| <code>cog</code> || Cognomen || In normal use, surnames<br />
|-<br />
| <code>org</code> || Organisation || <br />
|-<br />
| <code>al</code> || Altres || Other, misc.<br />
|}<br />
<br />
===Adjectives===<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal features<br />
|-<br />
| <code>pst</code> || Positive || || Degree=Pos<br />
|-<br />
| <code>comp</code> || Comparative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] || Degree=Comp<br />
|-<br />
| <code>sup</code> || Superlative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] || Degree=Sup<br />
|-<br />
| <code>attr</code> || Attributive || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia]<br />
|-<br />
| <code>pred</code> || Predicative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia]<br />
|}<br />
<br />
<br />
===Others===<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes<br />
|-<br />
| <code>abbr</code> || Abbreviation (e.g. ''etc., Mr.'') || Acronyms are also included (see <code>acr</code>)<br />
|-<br />
| <code>date</code> || Dates, years... ||<br />
|-<br />
| <code>percent</code> || Percentage || e.g. 25%, 0.9%<br />
|-<br />
| <code>web</code> || Links and Emails ||<br />
|-<br />
| <code>file</code> || Filenames ||<br />
|-<br />
|}<br />
<br />
===See also===<br />
* [[Turkic lexicon|Guidelines for tag assignment (etc.) in Turkic]]<br />
* [[Tagging guidelines for Portuguese]]<br />
<br />
==Chunk tags==<br />
<br />
{|class=wikitable<br />
! Tag !! Description<br />
|-<br />
| {{tag|SN}} || Noun phrase / noun group (''sintagma nominal'')<br />
|- <br />
| {{tag|SA}} || Adjective phrase / adjective group <br />
|-<br />
| {{tag|SV}} || Verb phrase / verb group (''sintagma verbal'')<br />
|-<br />
|}<br />
<br />
==XML tags==<br />
Note: All XML tags are explained in depth in the PDF [[documentation]], see also the [https://github.com/apertium/lttoolbox/blob/master/lttoolbox/dix.dtd dix.dtd] and [https://github.com/apertium/lttoolbox/blob/master/lttoolbox/dix.rng dix.rng] files in the GitHub repository.<br />
<br />
{|class=wikitable<br />
! XML tag !! Means !! Appears in XML tags / notes / examples<br />
|-<br />
| <code><dictionary></code> || Mono- or bilingual dictionary || In files apertium-eo-en.en.dix, apertium-eo-en.eo-en.dix, apertium-eo-en.post-en.dix, apertium-eo-en.post-eo.dix<br />
|-<br />
| <code><alphabet></code> || Set of characters in the language|| In <code><dictionary></code><br />
|-<br />
| <code><sdefs></code> || Symbol definitions || In <code>&lt;dictionary></code><br />
|-<br />
| <code><sdef></code> || Symbol definition || In <code>&lt;sdefs></code>. Ex: <code>&lt;sdef n="noun"/></code><br />
|-<br />
| <code><pardefs></code> || Paradigm definitions || In <code>&lt;dictionary></code>. <br />
|-<br />
| <code><pardef></code> || Paradigm definition || In <code>&lt;pardefs></code>. <br />
|-<br />
| <code>&lt;section></code> || A section of the dictionary || In <code>&lt;dictionary></code>. Ex: <code>&lt;section id="main" type="standard"></code><br />
|-<br />
| <code>&lt;e></code> || A dictionary entry (a word) || In <code>&lt;section></code> and in <code>&lt;pardef></code>.<br />
|-<br />
| <code>&lt;i></code> || Invariant (left and right side) || In <code>&lt;e></code>. Ex.: <code>&lt;i>beer&lt;/i></code><br />
|-<br />
| <code>&lt;p></code> || A pair || In <code><e></code>. <br />
|-<br />
| <code>&lt;l></code> || Left side (surface form) || In <code>&lt;p></code>. Ex.: <code><l>beer</l></code><br />
|-<br />
| <code>&lt;r></code> || Right side (lexical unit) || In <code>&lt;p></code>. Ex.: <code><r>beer&lt;s n="noun"/>&lt;s n="singular"/></r></code><br />
|-<br />
| <code>&lt;s></code> || A lexical symbol (noun, adj..) || In <code>&lt;r></code>, <code>&lt;l></code> and <code>&lt;i></code>. Ex.: <code>&lt;s n="noun"/></code><br />
|-<br />
| <code>&lt;a></code> || Post-generator wake-up mark || In <code>&lt;r></code>, <code>&lt;l></code> and <code>&lt;i></code>. Ex.: <code>&lt;l>&lt;a/>a&lt;s ...</code> (for the a/an rule in English)<br />
|-<br />
| <code>&lt;b></code> || Blank space || In <code>&lt;r></code>, <code>&lt;l></code> and <code>&lt;i></code>. Ex.: <code>&lt;l>you're&lt;b/>welcome&lt;s ...</code> <br />
|-<br />
|}<br />
<br />
TODO: Probably there are more. --[[User:Jacob Nordfalk|Jacob Nordfalk]] 14:47, 25 August 2008 (UTC)<br />
<br />
Other tags:<br />
<pre><br />
<j/> (in stream format #) is to mark multiwords<br />
<br />
<t/> and <v/> are only in crossdix<br />
t = template, v = variable<br />
t matches any single tag, v is like + in regexes (0 or more)<br />
<br />
<sa/> and <prm/> are only used in metadixes.<br />
'sa' lets you add n optional extra tag, prm is an extra string for the paradigm<br />
</pre><br />
<br />
=== Transfer ===<br />
<br />
==== <clip> tag ====<br />
<br />
See the [https://wiki.apertium.org/w/images/d/d0/Apertium2-documentation.pdf documentation (pdf)], p.144 for more information.<br />
<br />
{|class=wikitable<br />
! XML attribute value !! Means !! Appears in attribute || Notes<br />
|-<br />
| <code>whole</code> || lemma and grammatical symbols || part <br />
|-<br />
| <code>lem</code> || lemma || part<br />
|-<br />
| <code>lemh</code> || (inflected) head word of [[Chunking:_A_full_example#Handling_of_multiwords_with_inner_inflection|multiword]] || part<br />
|-<br />
| <code>lemq</code> || following queue of [[Chunking:_A_full_example#Handling_of_multiwords_with_inner_inflection|multiword]] || part<br />
|-<br />
|}<br />
<br />
==See also==<br />
* [[Syntax tags]]<br />
* [[Apertium stream format]]<br />
* [[User:Adverick#FreeMind_Apertium_PoS|FreeMind Apertium PoS]]<br />
<br />
[[Category:Documentation in English]]</div>
Albertonl
https://wiki.apertium.org/w/index.php?title=List_of_symbols&diff=70815
List of symbols
2019-12-09T23:07:57Z
<p>Albertonl: </p>
<hr />
<div>[[Liste de symboles|En français]] · [[Список символов|по-русски]]<br />
<br />
This page lists the symbols in Apertium used to denote part-of-speech and further morphological features, as well as chunk tags used for more syntactic functions, as well as XML tags.<br />
<br />
<br />
{{TOCD}}<br />
This is meant to be a glossary of symbol names in alphabetical order with notes. Some of these names are specific to particular packages or language pairs, as not all languages have the same grammatical features (most don't have spatial distinction in articles for example).<br />
<br />
If you were wondering what the symbols #, /, @, +, ~ or * mean, read [[Apertium stream format]].<br />
<br />
==Part-of-speech Categories==<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal POS<br />
|-<br />
| <code>n</code> || Noun || ''see 'np' for proper noun'' || NOUN<br />
|-<br />
| <code>vblex</code> || Standard ("lexical") verb || ''see also: vbser, vbhaver, vbmod, vaux'' || VERB<br />
|-<br />
| <code>v</code> || Standard verb || shortened form of vblex, often used in agglutinative languages || VERB<br />
|-<br />
| <code>vbmod</code> || Modal verb || || VERB<br />
|-<br />
| <code>vbser</code> || Verb "to be" || from ''ser'' (to be) || VERB (or AUX)<br />
|-<br />
| <code>vbhaver</code> || Verb "to have" || from ''haver'' (to have) || VERB<br />
|-<br />
| <code>vaux</code> || Auxiliary verb || [http://en.wikipedia.org/wiki/Auxilliary_verb wikipedia] || AUX<br />
|-<br />
| <code>cop</code> || Copula || [http://en.wikipedia.org/wiki/Copula_(linguistics) wikipedia]; sometimes verb-like, sometimes not || AUX, ...<br />
|- <br />
| <code>adj</code> || Adjective || || ADJ<br />
|-<br />
| <code>adv</code> || Adverb || || ADV<br />
|-<br />
| <code>preadv</code> || Pre-adverb || || ADV<br />
|-<br />
| <code>postadv</code> || Post-adverb || || ADV<br />
|-<br />
| <code>mod</code> || Modal word || [http://dic.academic.ru/dic.nsf/lingvistic/749] || PART<br />
|-<br />
| <code>det</code> || Determiner || [http://en.wikipedia.org/wiki/Determiner_(class) wikipedia] || DET<br />
|-<br />
| <code>prn</code> || Pronoun || [http://en.wikipedia.org/wiki/Pronoun wikipedia] || PRON<br />
|-<br />
| <code>pr</code> || Preposition || [http://en.wikipedia.org/wiki/Preposition wikipedia] || ADP<br />
|-<br />
| <code>post</code> || Postposition || || ADP<br />
|-<br />
| <code>num</code> || Numeral || || NUM<br />
|-<br />
| <code>np</code> || Proper noun || From ''nom propi'' [http://en.wikipedia.org/wiki/Proper_noun wikipedia] || PROPN<br />
|-<br />
| <code>ij</code> || Interjection || [http://en.wikipedia.org/wiki/Interjection wikipedia] || INTJ<br />
|-<br />
| <code>cnjcoo</code> || Co-ordinating conjunction || [http://en.wikipedia.org/wiki/Co-ordinating_conjunction wikipedia] || CCONJ<br />
|-<br />
| <code>cnjsub</code> || Sub-ordinating conjunction || || SCONJ<br />
|-<br />
| <code>cnjadv</code> || Conjunctive adverb || [http://en.wikipedia.org/wiki/Conjunctive_adverb wikipedia] || SCONJ, ADV<br />
|-<br />
| <code>sent</code> || Sentence-ending punctuation || e.g. full stop, question mark || PUNCT<br />
|-<br />
| <code>cm</code> || Comma punctuation || , || PUNCT<br />
|-<br />
| <code>lquot</code> || Left quote || « || PUNCT<br />
|-<br />
| <code>rquot</code> || Right quote || » || PUNCT<br />
|-<br />
| <code>lpar</code> || Left parenthesis || ( || PUNCT<br />
|-<br />
| <code>rpar</code> || Right parenthesis || ) || PUNCT<br />
|- <br />
| <code>lquest</code> || Left question/exclamation mark || ¿¡ (''used in Spanish'') || PUNCT<br />
|-<br />
|}<br />
<br />
===Separable verbs===<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes<br />
|-<br />
| <code>sep</code> || Separable verb || [https://en.wikipedia.org/wiki/Separable_verb wikipedia], [https://deutsch.lingolia.com/en/grammar/verbs/separable-verbs lingolia], [https://aclweb.org/anthology/P98-1078.pdf PDF]<br />
|-<br />
| <code>fs</code> || Separable verb in subordinate clause ||<br />
|-<br />
| <code>fm</code> || Separable verb in main clause ||<br />
|-<br />
|}<br />
<br />
==Part-of-speech Sub-categories==<br />
<br />
===Gender===<br />
<br />
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal featurs<br />
|-<br />
| <code>f</code> || Feminine || || Gender=Fem<br />
|-<br />
| <code>m</code> || Masculine || || Gender=Masc<br />
|-<br />
| <code>nt</code> || Neuter || || Gender=Neut<br />
|-<br />
| <code>ma</code> || Masculine (animate) || Mostly in Slavic languages || Gender=Masc<br />
|-<br />
| <code>mi</code> || Masculine (inanimate) || Mostly in Slavic languages || Gender=Masc<br />
|-<br />
| <code>mp</code> || Masculine (personal) || in Polish || Gender=Masc<br />
|-<br />
| <code>mn</code> || Masculine or neuter || || Gender=Masc,Neut<br />
|-<br />
| <code>fn</code> || Feminine or neuter || || Gender=Fem,Neut<br />
|-<br />
| <code>mf</code> || Masculine or feminine || This is used where the gender can be either masculine or feminine || Gender=Masc,Fem<br />
|-<br />
| <code>mfn</code> || Masculine , feminine , neuter || This is used where the gender can be either masculine, feminine or neuter || Gender=Masc,Fem,Neut<br />
|-<br />
| <code>ut</code> || Common || From ''utrum'', found in Scandinavian languages. || Gender=Com<br />
|-<br />
| <code>un</code> || Common or neuter || As above, only common or neuter || Gender=Com,Neut<br />
|-<br />
| <code>GD</code> || Gender to be determined || <br />
|- <br />
|}<br />
<br />
===Count/Mass===<br />
<br />
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal feature<br />
|-<br />
| <code>cnt</code> || Countable ||<br />
|-<br />
| <code>unc</code> || Uncountable (mass) ||<br />
|- <br />
|}<br />
<br />
===Animacy===<br />
<br />
These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal feature<br />
|-<br />
| <code>aa</code> || Animate ||<br />
|-<br />
| <code>an</code> || Animate or inanimate || <br />
|-<br />
| <code>nn</code> || Inanimate || <br />
|-<br />
|}<br />
<br />
===Adjectives===<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal feature<br />
|-<br />
| <code>sint</code> || Synthetic || "nice, nicer, nicest" is synthetic. "handsome, more handsome, the most handsome" is not. [http://en.wikipedia.org/wiki/Synthetic_language wikipedia]<br />
|-<br />
| <code>preadj</code> || Pre-adjective || for languages where most of adjectives are after the noun (ex: French in eo->fr bidix)<br />
|-<br />
| <code>preadj_nh</code> || Pre-adjective if not human || according to the noun, the adjective is before or after<br />
|-<br />
|}<br />
<br />
===Pronoun types ===<br />
<br />
{| class="wikitable" border="1"<br />
! Symbol !! Gloss !! Notes !! Universal feature<br />
|-<br />
| <code>pers</code> || Personal || || PronType=Prs<br />
|-<br />
| <code>tn</code> || Tónico ||<br />
|-<br />
| <code>detnt</code> || Neuter determiner || POS? || DET<br />
|-<br />
| <code>predet</code> || Pre determiner || POS? || DET<br />
|-<br />
| <code>atn</code> || Atónico ||<br />
|-<br />
| <code>qnt</code> || Quantifier || || PronType=Ind<br />
|-<br />
| <code>ord</code> || Ordinal || || NumType=Ord<br />
|-<br />
| <code>obj</code> || Object ||<br />
|-<br />
| <code>subj</code> || Subject ||<br />
|-<br />
| <code>pro</code> || Proclitic ||<br />
|-<br />
| <code>enc</code> || Enclitic ||<br />
|-<br />
| <code>acr</code> || Acronym || Not Pronuon? || Abbr=Yes<br />
|-<br />
| <code>rel</code> || Relative || || PronType=Rel<br />
|-<br />
| <code>ind</code> || Indefinite || || PronType=Ind<br />
|-<br />
| <code>itg</code> || Interrogative || || PronType=Int<br />
|-<br />
| <code>dem</code> || Demonstrative || || PronType=Dem<br />
|-<br />
| <code>def</code> || Definite ||<br />
|-<br />
| <code>pos</code> || Possessive || || Poss=Yes<br />
|-<br />
| <code>ref</code> || Reflexive || || Reflex=Yes<br />
|-<br />
| <code>prx</code> || Proximate ||<br />
|-<br />
| <code>dst</code> || Distal ||<br />
|}<br />
<br />
=== Transitivity ===<br />
<br />
Used for verbs.<br />
<br />
{| class="wikitable" border="1"<br />
! Symbol !! Gloss !! Notes !! Universal feature<br />
|-<br />
| <code>tv</code> || Transitive || takes direct object in accusative case (used in Turkic)<br />
|-<br />
| <code>iv</code> || Intransitive || does not take direct object in accusative case (used in Turkic)<br />
|-<br />
| <code>TD</code> || Transitivity to be determined || if the sub-category is [currently] unknown<br />
|}<br />
<br />
== Inflectional morphology ==<br />
<br />
===Number===<br />
Note: number can be a sub-category tag too, e.g. with pronouns.<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal feature<br />
|-<br />
| <code>sg</code> || Singular || || Number=Sing<br />
|-<br />
| <code>pl</code> || Plural || || Number=Plur<br />
|-<br />
| <code>sp</code> || Singular or plural || || Number=Sing,Plur<br />
|-<br />
| <code>du</code> || Dual || || Number=Dual<br />
|-<br />
| <code>ct</code> || Count || see mk-bg || Number=Count<br />
|-<br />
| <code>coll</code> || Collective || || Number=Coll<br />
|-<br />
| <code>ND</code> || Number to be determined ||<br />
|-<br />
|}<br />
<br />
<br />
===Case===<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal feature<br />
|-<br />
| <code>nom</code> || Nominative || || Case=Nom<br />
|-<br />
| <code>acc</code> || Accusative || || Case=Acc<br />
|-<br />
| <code>dat</code> || Dative || || Case=Dat<br />
|-<br />
| <code>gen</code> || Genitive || || Case=Gen<br />
|-<br />
| <code>dg</code> || Dative and Genitive || in [[ro-es]], discouraged in new developments || Case=Dat,Gen<br />
|-<br />
| <code>voc</code> || Vocative || || Case=Voc<br />
|-<br />
| <code>abl</code> || Ablative || [http://en.wikipedia.org/wiki/Ablative wikipedia] || Case=Abl<br />
|-<br />
| <code>ins</code> || Instrumental or Instructive || [http://en.wikipedia.org/wiki/Instrumental_case wikipedia] || Case=Ins<br />
|-<br />
| <code>loc</code> || Locative || [http://en.wikipedia.org/wiki/Locative wikipedia] || Case=Loc<br />
|-<br />
| <code>prp</code> || Prepositional || [http://en.wikipedia.org/wiki/Prepositional wikipedia] <br />
|-<br />
| <code>tra</code> || Translative || || Case=Tra<br />
|-<br />
| <code>ill</code> || Illative || || Case=Ill<br />
|-<br />
| <code>ine</code> || Inessive || || Case=Ine<br />
|-<br />
| <code>ade</code> || Adessive || || Case=Ade<br />
|-<br />
| <code>all</code> || Allative || || Case=All<br />
|-<br />
| <code>abe</code> || Abessive || || Case=Abe<br />
|-<br />
| <code>ess</code> || Essive || || Case=Ess<br />
|-<br />
| <code>par</code> || Partitive || || Case=Par<br />
|-<br />
| <code>dis</code> || Distributive || || Case=Dis<br />
|-<br />
| <code>com</code> || Comitative || || Case=Com<br />
|-<br />
| <code>soc</code> || Sociative || || <br />
|-<br />
| <code>prl</code> || Prolative || || Case=Pro<br />
|-<br />
| <code>ses</code> || Superessive || [[Hungarian]] || Case=Sup<br />
|-<br />
| <code>sub</code> || Sublative || [[Hungarian]] || Case=Sub<br />
|-<br />
| <code>dela</code> || Delative || [[Hungarian]] || Case=Del<br />
|-<br />
| <code>term</code> || Terminative || [[Hungarian]], Estonian, ... ||<br />
|}<br />
<br />
===Voice===<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal feature<br />
|-<br />
| <code>actv</code> || Active voice || || Voice=Act<br />
|-<br />
| <code>pass</code> || Passive voice || is more used in Turkic. || Voice=Pass<br />
|-<br />
| <code>pasv</code> || Passive voice || is more used in Germanic. || Voice=PAss<br />
|-<br />
| <code>midv</code> || Middle voice || || Voice=Mid<br />
|-<br />
| <code>nactv</code> || Non-active voice || See Albanian. || <br />
|-<br />
| <code>caus</code> || Causative voice || see also [[#Derivations]] || Voice=Cau<br />
|-<br />
|}<br />
<br />
===Tense and mode===<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal features<br />
|-<br />
| <code>pres</code> || Present || || Tense=Pres<br />
|-<br />
| <code>pret</code> || Preterite || [https://en.wikipedia.org/wiki/Preterite Preterite] || Tense=Past<br />
|-<br />
| <code>past</code> || Past || || Tense=Past<br />
|-<br />
| <code>imp</code> || Imperative || [http://www.englishlanguageguide.com/grammar/imperative.asp englishlanguageguide] || Mood=Imp<br />
|-<br />
| <code>inf</code> || Infinitive || [https://en.wikipedia.org/wiki/Infinitive wikipedia] || VerbForm=Inf<br />
|-<br />
| <code>ito</code> || Infinitive with 'to' || [[German]] || VerbForm=Inf<br />
|-<br />
| <code>aor</code> || Aorist || [https://en.wikipedia.org/wiki/Aorist wikipedia] A tense in Turkic languages. || Tense=Past<br />
|-<br />
| <code>pp</code> || Past participle || [http://en.wikipedia.org/wiki/Participle wikipedia] || VerbForm=Part<br />
|-<br />
| <code>pp2</code> || Past participle (???) || It's at least used in the Esperanto dictionaries for future active participles, ''ont'' (seems quite odd) ||<br />
|-<br />
| <code>pp3</code> || Past participle (???) || It's at least used in the Esperanto dictionaries for past active participles, ''int'' (seems quite odd) ||<br />
|-<br />
| <code>pprs</code> || Present participle || Also appears as <code>ppres</code> (deprecated) || VerbForm=Part<br />
|-<br />
| <code>ger</code> || Gerund || [http://en.wikipedia.org/wiki/Gerund wikipedia] || VerbForm=Ger<br />
|-<br />
| <code>supn</code> || Supine || [http://en.wikipedia.org/wiki/Supine wikipedia] || VerbForm=Sup<br />
|-<br />
| <code>pri</code> || Present indicative || ''see also: pres''. [http://en.wikipedia.org/wiki/Present_indicative wikipedia] || Tense=Pres Mood=Ind<br />
|-<br />
| <code>pii</code> || Imperfect || from ''Pretério imperfecto de indicativo'' [https://en.wikipedia.org/wiki/Imperfect wikipedia] || Tense=Past Mood=Ind<br />
|-<br />
| <code>fti</code> || Future indicative || || Tense=Fut Mood=Ind<br />
|-<br />
| <code>fts</code> || Future subjunctive || || Tense=Fut Mood=Sub<br />
|-<br />
| <code>cni</code> || Conditional || Lot of pairs will probably use cnd or cond... || Mood=Cnd<br />
|-<br />
| <code>plu</code> || Pluperfect || In <code>cy-en</code> || Tense=Pqp<br />
|-<br />
| <code>pmp</code> || Pluperfect || In <code>es-gl</code> (from ''Pluscamperfecto'') || Tense=Pqp<br />
|-<br />
| <code>prs</code> || Present subjunctive || [http://en.wikipedia.org/wiki/Present_subjunctive wikipedia] || Tense=Pres Mood=Sub<br />
|-<br />
| <code>pis</code> || Imperfect subjunctive || || Tense=Past Mood=Sub<br />
|-<br />
| <code>ifi</code> || Past definite || from ''Pretério perfecto o indefinido'' || Tense=Past Definite=Def<br />
|-<br />
| <code>aff</code> || Affirmative || [https://en.wikipedia.org/wiki/Affirmation_and_negation wikipedia] || Polarity=Pos<br />
|-<br />
| <code>itg</code> || Interrogative || ||<br />
|-<br />
| <code>neg</code> || Negative || || Polarity=Neg<br />
|-<br />
| <code>lp</code> || L-participle || <br />
|-<br />
| <code>deb</code> || Debitive mode || Exclusive to Latvian ([https://en.wikipedia.org/wiki/Debitive wikipedia]) ||<br />
|- <br />
|}<br />
<br />
===Person===<br />
Note: person can be a sub-category tag, e.g. with pronouns.<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal feature<br />
|-<br />
| <code>p1</code> || First person || || Person=1<br />
|-<br />
| <code>p2</code> || Second person || || Person=2<br />
|-<br />
| <code>p3</code> || Third person || || Person=3<br />
|-<br />
| <code>impers</code> || Impersonal || Sometimes called 'autonomous' || Person=0<br />
|-<br />
|}<br />
<br />
===Derivations===<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes<br />
|-<br />
| <code>caus</code> || Causative ||<br />
|-<br />
| <code>ingr</code> || Ingressive || https://nn.wikipedia.org/w/index.php?title=Ingressiv<br />
|}<br />
<br />
===Possession===<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal feature<br />
|-<br />
| <code>px1sg</code> || First person singular possessive || e.g. in [[Turkic languages]] || Person[psor]=1 Number[psor]=Sing<br />
|-<br />
| <code>px2sg</code> || Second person singular possessive || e.g. in [[Turkic languages]] || Person[psor]=2 Number[psor]=Sing<br />
|-<br />
| <code>px3sg</code> || Third person singular possessive || e.g. in [[Turkic languages]] || Person[psor]=3 Number[psor]=Sing<br />
|-<br />
| <code>px1pl</code> || First person plural possessive || e.g. in [[Turkic languages]] || Person[psor]=1 Number[psor]=Plur<br />
|-<br />
| <code>px2pl</code> || Second person plural possessive || e.g. in [[Turkic languages]] || Person[psor]=2 Number[psor]=Plur<br />
|-<br />
| <code>px3pl</code> || Third person plural possessive || e.g. in [[Turkic languages]] || Person[psor]=3 Number[psor]=Plur<br />
|-<br />
| <code>px3sp</code> || Third person possessive singular or plural || e.g. in [[Turkic languages]] || Person[psor]=3<br />
|-<br />
|}<br />
<br />
===Object marking===<br />
<br />
e.g. in verbs with both<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal features<br />
|-<br />
| <code>o_sg1</code> || First person singular object || <br />
|-<br />
| <code>o_sg2</code> || Second person singular object || <br />
|-<br />
| <code>o_sg3</code> || Third person singular object || <br />
|-<br />
| <code>o_pl1</code> || First person plural object || <br />
|-<br />
| <code>o_pl2</code> || Second person plural object || <br />
|-<br />
| <code>o_pl3</code> || Third person plural object || <br />
|-<br />
|}<br />
<br />
===Proper nouns===<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal features<br />
|-<br />
| <code>ant</code> || Anthroponym || [http://en.wikipedia.org/wiki/Anthroponym wikipedia], it's very common to use ant together with f and m for traditionally gender-specific names<br />
|-<br />
| <code>top</code> || Toponym || In some language pairs without the locative case this may be ''loc''. Although this should be changed. [http://en.wikipedia.org/wiki/Toponym wikipedia]<br />
|-<br />
| <code>hyd</code> || Hydronym || [http://en.wikipedia.org/wiki/Hydronym wikipedia]<br />
|-<br />
| <code>cog</code> || Cognomen || In normal use, surnames<br />
|-<br />
| <code>org</code> || Organisation || <br />
|-<br />
| <code>al</code> || Altres || Other, misc.<br />
|}<br />
<br />
===Adjectives===<br />
<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes !! Universal features<br />
|-<br />
| <code>pst</code> || Positive || || Degree=Pos<br />
|-<br />
| <code>comp</code> || Comparative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] || Degree=Comp<br />
|-<br />
| <code>sup</code> || Superlative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia] || Degree=Sup<br />
|-<br />
| <code>attr</code> || Attributive || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia]<br />
|-<br />
| <code>pred</code> || Predicative || [http://en.wikipedia.org/wiki/Adjective#Attributive.2C_predicative.2C_absolute.2C_and_substantive_adjectives wikipedia]<br />
|}<br />
<br />
<br />
===Others===<br />
{|class=wikitable<br />
! Symbol !! Gloss !! Notes<br />
|-<br />
| <code>atp</code> || Attachable prefix || In [[German]], ''zussamen''-<br />
|-<br />
| <code>abbr</code> || Abbreviation (e.g. ''etc., Mr.'') || Acronyms are also included (see <code>acr</code>)<br />
|-<br />
| <code>date</code> || Dates, years... ||<br />
|-<br />
| <code>percent</code> || Percentage || e.g. 25%, 0.9%<br />
|-<br />
| <code>web</code> || Links and Emails ||<br />
|-<br />
| <code>file</code> || Filenames ||<br />
|-<br />
|}<br />
<br />
===See also===<br />
* [[Turkic lexicon|Guidelines for tag assignment (etc.) in Turkic]]<br />
* [[Tagging guidelines for Portuguese]]<br />
<br />
==Chunk tags==<br />
<br />
{|class=wikitable<br />
! Tag !! Description<br />
|-<br />
| {{tag|SN}} || Noun phrase / noun group (''sintagma nominal'')<br />
|- <br />
| {{tag|SA}} || Adjective phrase / adjective group <br />
|-<br />
| {{tag|SV}} || Verb phrase / verb group (''sintagma verbal'')<br />
|-<br />
|}<br />
<br />
==XML tags==<br />
Note: All XML tags are explained in depth in the PDF [[documentation]], see also the [https://github.com/apertium/lttoolbox/blob/master/lttoolbox/dix.dtd dix.dtd] and [https://github.com/apertium/lttoolbox/blob/master/lttoolbox/dix.rng dix.rng] files in the GitHub repository.<br />
<br />
{|class=wikitable<br />
! XML tag !! Means !! Appears in XML tags / notes / examples<br />
|-<br />
| <code><dictionary></code> || Mono- or bilingual dictionary || In files apertium-eo-en.en.dix, apertium-eo-en.eo-en.dix, apertium-eo-en.post-en.dix, apertium-eo-en.post-eo.dix<br />
|-<br />
| <code><alphabet></code> || Set of characters in the language|| In <code><dictionary></code><br />
|-<br />
| <code><sdefs></code> || Symbol definitions || In <code>&lt;dictionary></code><br />
|-<br />
| <code><sdef></code> || Symbol definition || In <code>&lt;sdefs></code>. Ex: <code>&lt;sdef n="noun"/></code><br />
|-<br />
| <code><pardefs></code> || Paradigm definitions || In <code>&lt;dictionary></code>. <br />
|-<br />
| <code><pardef></code> || Paradigm definition || In <code>&lt;pardefs></code>. <br />
|-<br />
| <code>&lt;section></code> || A section of the dictionary || In <code>&lt;dictionary></code>. Ex: <code>&lt;section id="main" type="standard"></code><br />
|-<br />
| <code>&lt;e></code> || A dictionary entry (a word) || In <code>&lt;section></code> and in <code>&lt;pardef></code>.<br />
|-<br />
| <code>&lt;i></code> || Invariant (left and right side) || In <code>&lt;e></code>. Ex.: <code>&lt;i>beer&lt;/i></code><br />
|-<br />
| <code>&lt;p></code> || A pair || In <code><e></code>. <br />
|-<br />
| <code>&lt;l></code> || Left side (surface form) || In <code>&lt;p></code>. Ex.: <code><l>beer</l></code><br />
|-<br />
| <code>&lt;r></code> || Right side (lexical unit) || In <code>&lt;p></code>. Ex.: <code><r>beer&lt;s n="noun"/>&lt;s n="singular"/></r></code><br />
|-<br />
| <code>&lt;s></code> || A lexical symbol (noun, adj..) || In <code>&lt;r></code>, <code>&lt;l></code> and <code>&lt;i></code>. Ex.: <code>&lt;s n="noun"/></code><br />
|-<br />
| <code>&lt;a></code> || Post-generator wake-up mark || In <code>&lt;r></code>, <code>&lt;l></code> and <code>&lt;i></code>. Ex.: <code>&lt;l>&lt;a/>a&lt;s ...</code> (for the a/an rule in English)<br />
|-<br />
| <code>&lt;b></code> || Blank space || In <code>&lt;r></code>, <code>&lt;l></code> and <code>&lt;i></code>. Ex.: <code>&lt;l>you're&lt;b/>welcome&lt;s ...</code> <br />
|-<br />
|}<br />
<br />
TODO: Probably there are more. --[[User:Jacob Nordfalk|Jacob Nordfalk]] 14:47, 25 August 2008 (UTC)<br />
<br />
Other tags:<br />
<pre><br />
<j/> (in stream format #) is to mark multiwords<br />
<br />
<t/> and <v/> are only in crossdix<br />
t = template, v = variable<br />
t matches any single tag, v is like + in regexes (0 or more)<br />
<br />
<sa/> and <prm/> are only used in metadixes.<br />
'sa' lets you add n optional extra tag, prm is an extra string for the paradigm<br />
</pre><br />
<br />
=== Transfer ===<br />
<br />
==== <clip> tag ====<br />
<br />
See the [https://wiki.apertium.org/w/images/d/d0/Apertium2-documentation.pdf documentation (pdf)], p.144 for more information.<br />
<br />
{|class=wikitable<br />
! XML attribute value !! Means !! Appears in attribute || Notes<br />
|-<br />
| <code>whole</code> || lemma and grammatical symbols || part <br />
|-<br />
| <code>lem</code> || lemma || part<br />
|-<br />
| <code>lemh</code> || (inflected) head word of [[Chunking:_A_full_example#Handling_of_multiwords_with_inner_inflection|multiword]] || part<br />
|-<br />
| <code>lemq</code> || following queue of [[Chunking:_A_full_example#Handling_of_multiwords_with_inner_inflection|multiword]] || part<br />
|-<br />
|}<br />
<br />
==See also==<br />
* [[Syntax tags]]<br />
* [[Apertium stream format]]<br />
* [[User:Adverick#FreeMind_Apertium_PoS|FreeMind Apertium PoS]]<br />
<br />
[[Category:Documentation in English]]</div>
Albertonl
https://wiki.apertium.org/w/index.php?title=Documentation_for_integrating_Tesseract_(OCR)_into_Apertium&diff=68340
Documentation for integrating Tesseract (OCR) into Apertium
2018-12-10T16:23:34Z
<p>Albertonl: Created page with "== Introduction == This article provides helpful information to integrate Tesseract-OCR<sup>[https://opensource.google.com/projects/tesseract 1]</sup> into Apertium. Tessera..."</p>
<hr />
<div>== Introduction ==<br />
<br />
This article provides helpful information to integrate Tesseract-OCR<sup>[https://opensource.google.com/projects/tesseract 1]</sup> into Apertium.<br />
<br />
Tesseract could be integrated into the website and also as part of the [https://play.google.com/store/apps/details?id=org.apertium.android Apertium app] for Android.<br />
<br />
<br />
== Tesseract into Apertium website ==<br />
Tesseract can be integrated into the website with an option to use a picture to identify text in it and translate it. Below some information about different procedures and info:<br />
{| class="wikitable" border="1"<br />
|-<br />
! Language<br />
! Page<br />
|-<br />
| HTML5 or JavaScript<br />
| [https://progur.com/2016/10/how-to-use-tesseract-for-ocr-javascript.html Progur.com]<br />
|-<br />
| PHP<br />
| [https://www.sitepoint.com/ocr-in-php-read-text-from-images-with-tesseract/ Sitepoint.com]<br />
|-<br />
| Python (getting started -> can be used with django)<br />
| [https://medium.freecodecamp.org/getting-started-with-tesseract-part-i-2a6a6b1cf75e FreeCodeCamp.org]<br />
|}<br />
<br />
<br />
== Tesseract into Apertium app ==<br />
The Apertium Offline translator is primarily written in '''Java'''.<br />
<br />
For that, we can use the ideas in this video<sup>[https://www.youtube.com/watch?v=58oG5Z8_0r4 4]</sup>. We could also put a list of the downloadable packages for Tesseract<sup>[https://github.com/tesseract-ocr/tesseract/wiki/Data-Files 3]</sup> (e.g. create a link to download locally, for example, the package 'spa' shown [https://github.com/tesseract-ocr/tesseract/wiki/Data-Files here<sup>3</sup>], to be able to identify by the app texts in Spanish).<br />
<br />
Code shown in the video:<br />
<pre><br />
import net.sourceforge.tess4j.Tesseract;<br />
<br />
import java.io.File;<br />
<br />
public class OcrReader {<br />
<br />
public static void main(String[] args) throws Exception {<br />
String inputFilePath = "F:/Tesseract/English.tif";<br />
<br />
Tesseract tesseract = new Tesseract();<br />
<br />
String fullText = tesseract.doOCR(new File(inputFilePath));<br />
<br />
System.out.println(fullText);<br />
}<br />
}<br />
</pre><br />
<br />
The solution for this is simple, we should change the path for the image written directly into the code by an input where the user could change the path for the image that wants to use without rewriting code (e.g. a drop list, a text input, a menu...).<br />
<br />
Set language and default data path (setLanguage(), setDataPath()):<br />
<pre><br />
public class OcrReader {<br />
public static void main(String[] args) throws Exception {<br />
Tesseract tesseract = new Tesseract();<br />
<br />
tesseract.setDatapath("F:/Tesseract/");<br />
tesseract.setLanguage("chi_sim");<br />
<br />
String fullText = tesseract.doOCR(new File(inputFilePath));<br />
<br />
System.out.println(fullText)<br />
}<br />
}<br />
</pre><br />
<br />
== References ==<br />
'''1.''' [https://opensource.google.com/projects/tesseract https://opensource.google.com/projects/tesseract]<br />
<br />
'''2.''' [https://github.com/tesseract-ocr/tesseract https://github.com/tesseract-ocr/tesseract]<br />
<br />
'''3.''' [https://github.com/tesseract-ocr/tesseract/wiki/Data-Files https://github.com/tesseract-ocr/tesseract/wiki/Data-Files]<br />
<br />
'''4.''' [https://www.youtube.com/watch?v=58oG5Z8_0r4 https://www.youtube.com/watch?v=58oG5Z8_0r4]<br />
<br />
'''5.''' [https://priyankvex.wordpress.com/2015/09/02/making-an-ocr-app-for-android-using-tesseract/ https://priyankvex.wordpress.com/2015/09/02/making-an-ocr-app-for-android-using-tesseract/]<br />
<br />
'''6.''' [https://www.codepool.biz/making-an-android-ocr-application-with-tesseract.html https://www.codepool.biz/making-an-android-ocr-application-with-tesseract.html]</div>
Albertonl