List of symbols

From Apertium
Jump to navigation Jump to search

En français · по-русски

This page lists the symbols in Apertium used to denote part-of-speech and further morphological features, as well as chunk tags used for more syntactic functions, as well as XML tags.

This is meant to be a glossary of symbol names in alphabetical order with notes. Some of these names are specific to particular packages or language pairs, as not all languages have the same grammatical features (most don't have spatial distinction in articles for example).

If you were wondering what the symbols #, /, @, +, ~ or * mean, read Apertium stream format.

Part-of-speech Categories

Symbol Gloss Notes
n Noun see 'np' for proper noun
vblex Standard verb see also: vbser, vbhaver, vbmod, vaux
vbmod Modal verb
vbser Verb "to be" from ser (to be)
vbhaver Verb "to have" from haver (to have)
vaux Auxilliary verb wikipedia
adj Adjective
post Postposition
adv Adverb
preadv Pre-adverb
postadv Post-adverb
mod Modal word [1]
det Determiner wikipedia
prn Pronoun wikipedia
pr Preposition wikipedia
num Numeral
np Proper noun From nom propi wikipedia
ij Interjection wikipedia
cnjcoo Co-ordinating conjunction wikipedia
cnjsub Sub-ordinating conjunction
cnjadv Conjunctive adverb wikipedia
sent Sentence-ending punctuation e.g. full stop, question mark
cm Comma punctuation ,
lquot Left quote «
rquot Right quote »
lpar Left parenthesis (
rpar Right parenthesis )

Part-of-speech Sub-categories


Symbol Gloss Notes
f Feminine
m Masculine
nt Neuter
ma Masculine (animate) Mostly in Slavic languages
mi Masculine (inanimate) Mostly in Slavic languages
mp Masculine (personal) in Polish
mn Masculine or neuter
fn Feminine or neuter
mf Masculine or feminine This is used where the gender can be either masculine or feminine
mfn Masculine , feminine , neuter This is used where the gender can be either masculine, feminine or neuter
ut Common From utrum, found in Scandinavian languages.
un Common or neuter As above, only common or neuter
GD Gender to be determined


Symbol Gloss Notes
sg Singular
pl Plural
sp Singular or plural
du Dual
ct Count see mk-bg
coll Collective
ND Number to be determined


Symbol Gloss Notes
cnt Countable
unc Uncountable (mass)


Symbol Gloss Notes
nom Nominative
acc Accusative
dat Dative
gen Genitive
dg Dative and Genitive in ro-es, discouraged in new developments
voc Vocative
abl Ablative wikipedia
ins Instrumental wikipedia
loc Locative wikipedia
prp Prepositional wikipedia
tra Translative
ill Illative
ine Inessive
ade Adessive
all Allative
abe Abessive
ess Essive
par Partitive
dis Distributive
com Comitative
soc Sociative
prl Prolative


Symbol Gloss Notes
actv Active voice
pass Passive voice is more used in Turkic.
pasv Passive voice is more used in Germanic.
midv Middle voice
nactv Non-active voice See Albanian.

Tense and mode

Symbol Gloss Notes
pres Present
pret Preterite
past Past
imp Imperative
inf Infinitive
aor Aorist A tense in Turkic languages. wikipedia
pp Past participle wikipedia
pp2 Past participle (???) It's at least used in the Esperanto dictionaries for future active participles, ont (seems quite odd)
pp3 Past participle (???) It's at least used in the Esperanto dictionaries for past active participles, int (seems quite odd)
pprs Present participle Also appears as ppres (deprecated)
ger Gerund wikipedia
supn Supine wikipedia
pri Present indicative see also: pres. wikipedia
pii Imperfect from Pretério imperfecto de indicativo
fti Future indicative
fts Future subjunctive
cni Conditional
plu Pluperfect In cy-en
pmp Pluperfect In es-gl (from Pluscamperfecto)
prs Present subjunctive wikipedia
pis Imperfect subjunctive
ifi Past definite from Pretério perfecto o indefinido
aff Affirmative
itg Interrogative
neg Negative
lp L-participle


Symbol Gloss Notes
p1 First person
p2 Second person
p3 Third person
impers Impersonal Sometimes called 'autonomous'



Symbol Gloss Notes
caus Causative
ingr Ingressive
Symbol Gloss Notes
px1sg First person singular possessive e.g. in Turkic languages
px2sg Second person singular possessive e.g. in Turkic languages
px3sg Third person singular possessive e.g. in Turkic languages
px1pl First person plural possessive e.g. in Turkic languages
px2pl Second person plural possessive e.g. in Turkic languages
px3pl Third person plural possessive e.g. in Turkic languages
px3sp Third person possessive singular or plural e.g. in Turkic languages

Proper nouns

Symbol Gloss Notes
ant Anthroponym wikipedia
top Toponym In some language pairs without the locative case this may be loc. Although this should be changed. wikipedia
hyd Hydronym wikipedia
cog Cognomen In normal use, surnames
org Organisation
al Altres Other, misc.


Symbol Gloss Notes
aa Animate
an Animate or inanimate
nn Inanimate


Symbol Gloss Notes
sint Synthetic "nice, nicer, nicest" is synthetic. "handsome, more handsome, the most handsome" is not. wikipedia
pst Positive
comp Comparative wikipedia
sup Superlative wikipedia
attr Attributive wikipedia
pred Predicative wikipedia
preadj Pre-adjective for languages where most of adjectives are after the noun (ex: French in eo->fr bidix)
preadj_nh Pre-adjective if not human according to the noun, the adjective is before or after
Symbol Gloss Notes
tn Tónico
detnt Neuter determiner
predet Pre determiner
atn Atónico
qnt Quantifier
ord Ordinal
obj Object
subj Subject
pro Proclitic
enc Enclitic
acr Acronym
rel Relative
ind Indefinite
itg Interrogative
dem Demonstrative
def Definite
pos Possesive
ref Reflexive
prx Proximate
dst Distal


Symbol Gloss Notes
web Links and Emails

Chunk tags

Tag Description
<SN> Noun phrase / noun group (sintagma nominal)
<SA> Adjective phrase / adjective group
<SV> Verb phrase / verb group (sintagma verbal)

XML tags

Note: All XML tags are explained in depth in the PDF documentation, see also the dix.dtd/dix.rng files in lttoolbox (svn).

XML tag Means Appears in XML tags / notes / examples
<dictionary> Mono- or bilingual dictionary In files apertium-eo-en.en.dix, apertium-eo-en.eo-en.dix,,
<alphabet> Set of characters in the language In <dictionary>
<sdefs> Symbol definitions In <dictionary>
<sdef> Symbol definition In <sdefs>. Ex: <sdef n="noun"/>
<pardefs> Paradigm definitions In <dictionary>.
<pardef> Paradigm definition In <pardefs>.
<section> A section of the dictionary In <dictionary>. Ex: <section id="main" type="standard">
<e> A dictionary entry (a word) In <section> and in <pardef>.
<i> Invariant (left and right side) In <e>. Ex.: <i>beer</i>
<p> A pair In <e>.
<l> Left side (surface form) In <p>. Ex.: <l>beer</l>
<r> Right side (lexical unit) In <p>. Ex.: <r>beer<s n="noun"/><s n="singular"/></r>
<s> A lexical symbol (noun, adj..) In <r>, <l> and <i>. Ex.: <s n="noun"/>
<a> Post-generator wake-up mark In <r>, <l> and <i>. Ex.: <l><a/>a<s ... (for the a/an rule in English)
<b> Blank space In <r>, <l> and <i>. Ex.: <l>you're<b/>welcome<s ...

TODO: Probably there are more. --Jacob Nordfalk 14:47, 25 August 2008 (UTC)

Other tags:

<j/> (in stream format #) is to mark multiwords

<t/> and <v/> are only in crossdix
t = template, v = variable
t matches any single tag, v is like + in regexes (0 or more)

<sa/> and <prm/> are only used in metadixes.
'sa' lets you add n optional extra tag, prm is an extra string for the paradigm


<clip> tag

See the documentation (pdf), p.144 for more information.

XML attribute value Means Appears in attribute Notes
whole lemma and grammatical symbols part
lem lemma part
lemh (inflected) head word of multiword part
lemq following queue of multiword part

See also