List of symbols

From Apertium
Jump to navigation Jump to search

En français · по-русски This page lists the symbols in Apertium used to denote part-of-speech and further morphological features, as well as chunk tags used for more syntactic functions, as well as XML tags.


This is meant to be a glossary of symbol names in alphabetical order with notes. Some of these names are specific to particular packages or language pairs, as not all languages have the same grammatical features (most don't have spatial distinction in articles for example).

If you were wondering what the symbols #, /, @, +, ~ or * mean, read Apertium stream format.

Part-of-speech Categories

Symbol Gloss Notes
n Noun see 'np' for proper noun
vblex Standard ("lexical") verb see also: vbser, vbhaver, vbmod, vaux
v Standard verb shortened form of vblex, often used in agglutinative languages
vbmod Modal verb
vbser Verb "to be" from ser (to be)
vbhaver Verb "to have" from haver (to have)
vaux Auxiliary verb wikipedia
cop Copula wikipedia; sometimes verb-like, sometimes not
adj Adjective
post Postposition
adv Adverb
preadv Pre-adverb
postadv Post-adverb
mod Modal word [1]
det Determiner wikipedia
prn Pronoun wikipedia
pr Preposition wikipedia
num Numeral
np Proper noun From nom propi wikipedia
ij Interjection wikipedia
cnjcoo Co-ordinating conjunction wikipedia
cnjsub Sub-ordinating conjunction
cnjadv Conjunctive adverb wikipedia
sent Sentence-ending punctuation e.g. full stop, question mark
cm Comma punctuation ,
lquot Left quote «
rquot Right quote »
lpar Left parenthesis (
rpar Right parenthesis )

Part-of-speech Sub-categories

Gender

These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).

Symbol Gloss Notes
f Feminine
m Masculine
nt Neuter
ma Masculine (animate) Mostly in Slavic languages
mi Masculine (inanimate) Mostly in Slavic languages
mp Masculine (personal) in Polish
mn Masculine or neuter
fn Feminine or neuter
mf Masculine or feminine This is used where the gender can be either masculine or feminine
mfn Masculine , feminine , neuter This is used where the gender can be either masculine, feminine or neuter
ut Common From utrum, found in Scandinavian languages.
un Common or neuter As above, only common or neuter
GD Gender to be determined

Count/Mass

These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).

Symbol Gloss Notes
cnt Countable
unc Uncountable (mass)

Animacy

These tags are usually used with nouns, and things that agree/concord with nouns (like adjectives and verbs).

Symbol Gloss Notes
aa Animate
an Animate or inanimate
nn Inanimate

Adjectives

Symbol Gloss Notes
sint Synthetic "nice, nicer, nicest" is synthetic. "handsome, more handsome, the most handsome" is not. wikipedia
preadj Pre-adjective for languages where most of adjectives are after the noun (ex: French in eo->fr bidix)
preadj_nh Pre-adjective if not human according to the noun, the adjective is before or after

Pronoun types

Symbol Gloss Notes
pers Personal
tn Tónico
detnt Neuter determiner POS?
predet Pre determiner POS?
atn Atónico
qnt Quantifier
ord Ordinal
obj Object
subj Subject
pro Proclitic
enc Enclitic
acr Acronym Not Pronuon?
rel Relative
ind Indefinite
itg Interrogative
dem Demonstrative
def Definite
pos Possessive
ref Reflexive
prx Proximate
dst Distal

Transitivity

Used for verbs.

Symbol Gloss Notes
tv Transitive takes direct object in accusative case (used in Turkic)
iv Intransitive does not take direct object in accusative case (used in Turkic)
TD Transitivity to be determined if the sub-category is [currently] unknown

Inflectional morphology

Number

Note: number can be a sub-category tag too, e.g. with pronouns.

Symbol Gloss Notes
sg Singular
pl Plural
sp Singular or plural
du Dual
ct Count see mk-bg
coll Collective
ND Number to be determined


Case

Symbol Gloss Notes
nom Nominative
acc Accusative
dat Dative
gen Genitive
dg Dative and Genitive in ro-es, discouraged in new developments
voc Vocative
abl Ablative wikipedia
ins Instrumental or Instructive wikipedia
loc Locative wikipedia
prp Prepositional wikipedia
tra Translative
ill Illative
ine Inessive
ade Adessive
all Allative
abe Abessive
ess Essive
par Partitive
dis Distributive
com Comitative
soc Sociative
prl Prolative

Voice

Symbol Gloss Notes
actv Active voice
pass Passive voice is more used in Turkic.
pasv Passive voice is more used in Germanic.
midv Middle voice
nactv Non-active voice See Albanian.
caus Causative voice see also #Derivations

Tense and mode

Symbol Gloss Notes
pres Present
pret Preterite
past Past
imp Imperative
inf Infinitive
aor Aorist A tense in Turkic languages. wikipedia
pp Past participle wikipedia
pp2 Past participle (???) It's at least used in the Esperanto dictionaries for future active participles, ont (seems quite odd)
pp3 Past participle (???) It's at least used in the Esperanto dictionaries for past active participles, int (seems quite odd)
pprs Present participle Also appears as ppres (deprecated)
ger Gerund wikipedia
supn Supine wikipedia
pri Present indicative see also: pres. wikipedia
pii Imperfect from Pretério imperfecto de indicativo
fti Future indicative
fts Future subjunctive
cni Conditional
plu Pluperfect In cy-en
pmp Pluperfect In es-gl (from Pluscamperfecto)
prs Present subjunctive wikipedia
pis Imperfect subjunctive
ifi Past definite from Pretério perfecto o indefinido
aff Affirmative
itg Interrogative
neg Negative
lp L-participle

Person

Note: person can be a sub-category tag, e.g. with pronouns.

Symbol Gloss Notes
p1 First person
p2 Second person
p3 Third person
impers Impersonal Sometimes called 'autonomous'

Derivations

Symbol Gloss Notes
caus Causative
ingr Ingressive https://nn.wikipedia.org/w/index.php?title=Ingressiv

Possession

Symbol Gloss Notes
px1sg First person singular possessive e.g. in Turkic languages
px2sg Second person singular possessive e.g. in Turkic languages
px3sg Third person singular possessive e.g. in Turkic languages
px1pl First person plural possessive e.g. in Turkic languages
px2pl Second person plural possessive e.g. in Turkic languages
px3pl Third person plural possessive e.g. in Turkic languages
px3sp Third person possessive singular or plural e.g. in Turkic languages

Object marking

e.g. in verbs with both

Symbol Gloss Notes
o_sg1 First person singular object
o_sg2 Second person singular object
o_sg3 Third person singular object
o_pl1 First person plural object
o_pl2 Second person plural object
o_pl3 Third person plural object

Proper nouns

Symbol Gloss Notes
ant Anthroponym wikipedia
top Toponym In some language pairs without the locative case this may be loc. Although this should be changed. wikipedia
hyd Hydronym wikipedia
cog Cognomen In normal use, surnames
org Organisation
al Altres Other, misc.


Adjectives

Symbol Gloss Notes
pst Positive
comp Comparative wikipedia
sup Superlative wikipedia
attr Attributive wikipedia
pred Predicative wikipedia


Others

Symbol Gloss Notes
web Links and Emails

See also

Chunk tags

Tag Description
<SN> Noun phrase / noun group (sintagma nominal)
<SA> Adjective phrase / adjective group
<SV> Verb phrase / verb group (sintagma verbal)

XML tags

Note: All XML tags are explained in depth in the PDF documentation, see also the dix.dtd and dix.rng files in the GitHub repository.

XML tag Means Appears in XML tags / notes / examples
<dictionary> Mono- or bilingual dictionary In files apertium-eo-en.en.dix, apertium-eo-en.eo-en.dix, apertium-eo-en.post-en.dix, apertium-eo-en.post-eo.dix
<alphabet> Set of characters in the language In <dictionary>
<sdefs> Symbol definitions In <dictionary>
<sdef> Symbol definition In <sdefs>. Ex: <sdef n="noun"/>
<pardefs> Paradigm definitions In <dictionary>.
<pardef> Paradigm definition In <pardefs>.
<section> A section of the dictionary In <dictionary>. Ex: <section id="main" type="standard">
<e> A dictionary entry (a word) In <section> and in <pardef>.
<i> Invariant (left and right side) In <e>. Ex.: <i>beer</i>
<p> A pair In <e>.
<l> Left side (surface form) In <p>. Ex.: <l>beer</l>
<r> Right side (lexical unit) In <p>. Ex.: <r>beer<s n="noun"/><s n="singular"/></r>
<s> A lexical symbol (noun, adj..) In <r>, <l> and <i>. Ex.: <s n="noun"/>
<a> Post-generator wake-up mark In <r>, <l> and <i>. Ex.: <l><a/>a<s ... (for the a/an rule in English)
<b> Blank space In <r>, <l> and <i>. Ex.: <l>you're<b/>welcome<s ...

TODO: Probably there are more. --Jacob Nordfalk 14:47, 25 August 2008 (UTC)

Other tags:

<j/> (in stream format #) is to mark multiwords

<t/> and <v/> are only in crossdix
t = template, v = variable
t matches any single tag, v is like + in regexes (0 or more)

<sa/> and <prm/> are only used in metadixes.
'sa' lets you add n optional extra tag, prm is an extra string for the paradigm

Transfer

<clip> tag

See the documentation (pdf), p.144 for more information.

XML attribute value Means Appears in attribute Notes
whole lemma and grammatical symbols part
lem lemma part
lemh (inflected) head word of multiword part
lemq following queue of multiword part

See also