User:TommiPirinen/English tagset

From Apertium
< User:TommiPirinen
Revision as of 04:21, 10 August 2014 by TommiPirinen (talk | contribs) (add interjections and formatting)
Jump to navigation Jump to search

RFC for English tags

This is from apertium-fin-eng.eng.dix though I hope to get it to be like langs/apertium-eng some day...

Verbs (google pos: Verb)

Regular English verbs inflect in these forms: accept, accepts, accepted, accepting. Some irregular verbs have like five: forget, forgets, forgot, forgotten, forgetting. The verb to behas bunch of forms: 'be, am, are, is, was, were, been, being.

The tags we are using to classify English verbs are:

  • vblex: for regular verbs, like accept
  • vaux: auxiliary verbs; that have verb complement, like can
  • vbser: verb be
  • vbdo: verb do
  • vbhaver: verb have

The morphs coming after (or lack of them) are classified with:

  • inf: infinitive (as in: to do, to walk)
  • pri: present indicative (as in: I do, he walks)
  • prs: present subjunctive (as in: Let there be light ; At other times it is important that we be quiet.)
  • past: common past (as in: I did, he _walked_)
  • pis: imperfect subjunctive (as in: If I were you, ...)
  • pp: past participle (as I've done, he has walked)
  • imp: imperative (as in: be quiet!)
  • pprs: present participle
  • ger: gerund
  • subs: substantive

and potentially

  • +not.adv.neg: (as in _can't_, _didn't_)

In future likely:

  • transitivity

Verb examples

The tag sequences are as follows:

Regular verbs:

walk:walk<vblex><inf>
walk:walk<vblex><pri>
walk:walk<vblex><prs>
walk:walk<vblex><imp>
walks:walk<vblex><pri><p3><sg>
walked:walk<vblex><pis>
walked:walk<vblex><past>
walked:walk<vblex><pp>
walking:walk<vblex><subs>
walking:walk<vblex><pprs>
walking:walk<vblex><ger>

Irregulars:

forget:forget<vblex><inf>
forget:forget<vblex><pri>
forgets:forget<vblex><pri><p3><sg>
forgot:forget<vblex><past>
forgotten:forget<vblex><pp>
forgetting:forget<vblex><ger>

Auxiliaries:

can:can<vaux><pri>
could:can<vaux><past>
can't:can<vaux><pri>+not<adv>
cannot:can<vaux><pri>+not<adv>
couldn't:can<vaux><past>+not<adv>
may:may<vaux><pri>
may:may<vaux><past>
might:might<vaux><pri>
might:might<vaux><past>
must:must<vaux><pri>
must:must<vaux><past>
ought:ought<vaux><pri>
ought:ought<vaux><past>
shall:shall<vaux><pri>
should:shall<vaux><past>
shan't:shall<vaux><pri>+not<adv>
shouldn't:shall<vaux><past>+not<adv>
will:will<vaux><pri>
would:will<vaux><past>
won't:will<vaux><pri>+not<adv>
wouldn't:will<vaux><past>+not<adv>

Verb have:

have:have<vbhaver><inf>
have:have<vbhaver><pri>
has:have<vbhaver><pri><p3><sg>
had:have<vbhaver><past>
having:have<vbhaver><ger>

Verb do:

do:do<vbdo><inf>
do:do<vbdo><imp>
do:do<vbdo><pri>
does:do<vbdo><pri><p3><sg>
did:do<vbdo><past>
did:do<vbdo><pis>
doing:do<vbdo><subs>
doing:do<vbdo><pprs>
doing:do<vbdo><ger>
done:do<vbdo><pp>

Nouns (google pos: Noun)

Nouns have commonly two forms and possessives along them: beer, beers, beer's, beers'. Some don't: ?

The tags used to classify nouns are:

  • n: regular noun, like beer
  • np: proper noun, like Jack
  • m: male
  • f: female
  • mf: both female and male
  • nt: neuter female nor male
  • top: place
  • ant: human

And also:

  • cnt: countable, like chair
  • unc: uncountable, like cheese

the suffixes are:

  • sg: singular as in beer
  • pl: plural as in beers
  • gen: genitive or possessive or somehting, as in beer's

Noun examples

Regular nouns go like:

beer:beer<n><sg>
beers:beer<n><pl>
beer's:beer<n><sg><gen>
beers':beer<n><pl><gen>

Proper nouns:

Aaron:Aaron<np><ant><m><sg>
Aarons:Aaron<np><ant><m><pl>
Aarons':Aaron<np><ant><m><pl><gen>
Aaron's:Aaron<np><ant><m><sg><gen>
Amsterdam:Amsterdam<np><top><sg>
Amsterdams:Amsterdam<np><top><pl>
Amsterdam's:Amsterdam<np><top><sg><gen>
Amsterdams':Amsterdam<np><top><pl><gen>

Adjectives (google pos: Adj)

Adjectives mostly don't do anything, like hairy, but some have three forms, like: small, smaller, smallest. The tags used for classifying are:

 * adj: for non-inflecting ones, like hairy
 * sint: for those with three forms, like small

the suffixes are marked with:

  • comp. for comparative, like in smaller
  • sup for superlative, like in smallest'

Adjective examples

Like so:

small:small<adj><sint>
smaller:small<adj><sint><comp>
smallest:small<adj><sint><sup>
hairy:hairy<adj>


Adverbs (google pos: Adv)

Adverbs don't inflect. There are couple of tags to classify them:

  • adv: for adverbs
  • itg: for interrogatives

Adverb examples

aboard:aboard<adv>
drunk:drunk<adv>
no:no<adv><neg>
where:where<adv><itg>
when:when<adv><itg>
why:why<adv><itg>

Some have other tags too.

Adadverbs (google pos: ?)

Ad-adverbs don't inflect. They are tagged:

  • preadv: for preadverbs

Adadverb examples

as:as<preadv>
more:more<preadv>
most:most<preadv>
so:so<preadv>
very:very<preadv>

Pronouns (google pos: Pron)

Pronouns are categorised with:

  • prn: for pronouns
  • pers: for personal (I, you...)
  • dem: for demonstrative (this, that...)
  • ref: for reflexives (...self)

Some pronouns inflect like nouns, some have more cases like:

  • acc: for object form (him)
  • p1, p2, p3: for persons (I, you, he,...)

Pronoun examples

anybody:anybody<prn><sg>
anyone:anyone<prn><sg>
anything:anything<prn><sg>
both:both<prn><pl>
everybody:everybody<prn><sg>
everyone:everyone<prn><sg>
everything:everything<prn><sg>
few:few<prn><pl>
he:he<prn><pers><p3><m><sg>
his:he<prn><pers><p3><m><sg><poss>
his:he<prn><pers><p3><m><sg><gen>
him:he<prn><pers><p3><m><sg><acc>
herself:herself<prn><ref><p3><f><sg>
himself:himself<prn><ref><p3><m><sg>
hisself:himself<prn><ref><p3><m><sg>
I:I<prn><pers><p1><mf><sg>
me:I<prn><pers><p1><mf><sg><acc>
my:I<prn><pers><p1><mf><sg><gen>
mine:I<prn><pers><p1><mf><sg><poss>
it:it<prn><dem><sg>
its:it<prn><dem><sg><poss>
itself:itself<prn><ref><p3><nt><sg>
myself:myself<prn><ref><p1><mf><sg>
oneself:oneself<prn><ref><p1><mf><sg>
oneself:oneself<prn><ref><p3><mf><sg>
one's self:oneself<prn><ref><p1><mf><sg>
one's self:oneself<prn><ref><p3><mf><sg>
ourself:ourselves<prn><ref><p1><mf><pl>
ourselves:ourselves<prn><ref><p1><mf><pl>
several:several<prn><sg>
she:she<prn><pers><p3><m><sg>
hers:she<prn><pers><p3><m><sg><poss>
her:she<prn><pers><p3><m><sg><gen>
her:she<prn><pers><p3><m><sg><acc>
something:something<prn><sg>
that:that<prn><rel>
that:that<prn><sg>
those:that<prn><pl>
theirselves:themselves<prn><ref><p3><mf><pl>
themself:themself<prn><ref><p3><mf><sg>
themselves:themselves<prn><ref><p3><mf><sg>
themselves:themselves<prn><ref><p3><mf><pl>
they:they<prn><pers><p3><mf><pl>
their:they<prn><pers><p3><mf><pl><gen>
theirs:they<prn><pers><p3><mf><pl><poss>
them:they<prn><pers><p3><m><sg><acc>
this:this<prn><sg>
these:this<prn><pl>
thyself:thyself<prn><ref><p2><mf><sg>
we:we<prn><pers><p1><mf><pl>
us:we<prn><pers><p1><mf><pl><acc>
our:we<prn><pers><p1><mf><pl><gen>
ours:we<prn><pers><p1><mf><pl><poss>
which:which<prn><itg>
which:which<prn><rel>
who:who<prn><itg>
whose:who<prn><poss>
whom:who<prn><itg><acc>
you:you<prn><pers><p2><mf><sp>
yours:you<prn><pers><p2><mf><sp><poss>
your:you<prn><pers><p2><mf><sp><gen>
you:you<prn><pers><p2><mf><sp><acc>
yourself:yourself<prn><ref><p2><mf><sg>
yourselves:yourselves<prn><ref><p2><mf><pl>

Determiners (Det)

Determiners don't inflect. They're classified with:

  • det: determiners
  • ind: indefinite
  • def: definite
  • dem: demonstrative
  • itg: interrogative

Determiner examples

a:>:a<det><ind><sg>
an:>:a<det><ind><sg>
~a:<:a<det><ind><sg>
both:both<det><qnt>
many:many<det><qnt>
no:no<det><ind><neg>
several:several<det><dem>
that:th<det><dem><sg>
those:th<det><dem><pl>
the:the<det><def><sp>
this:th<det><dem><sg>
these:th<det><dem><pl>
which:which<det><itg><sp>


Prepositions (Adp)

Prepositions don't inflect. They are classified with:

  • pr: preposition

Preposition examples

above:above<pr>
according to:according to<pr>
across:across<pr>
after:after<pr>
against:against<pr>
along:along<pr>
alongside:alongside<pr>
along with:along with<pr>
amid:amid<pr>
among:among<pr>
amongst:amongst<pr>
around:around<pr>
as:as<pr>
as of:as of<pr>
at:at<pr>
atop:atop<pr>
because of:because of<pr>
before:before<pr>
behind:behind<pr>
below:below<pr>
between:between<pr>
but:but<pr>
by:by<pr>
by means of:by means of<pr>
despite:despite<pr>
due to:due to<pr>
during:during<pr>
except for:except for<pr>
except:except<pr>
for:for<pr>
from:from<pr>
in contrast to:in contrast to<pr>
in front of:in front of<pr>
in:in<pr>
in order to:in order to<pr>
inside:inside<pr>
into:into<pr>
near:near<pr>
off:off<pr>
of:of<pr>
on:on<pr>
onto:onto<pr>
out:out<pr>
out of:out of<pr>
outside:outside<pr>
over:over<pr>
per:per<pr>
prior to:prior to<pr>
since:since<pr>
through:through<pr>
throughout:throughout<pr>
to:to<pr>
towards:towards<pr>
under:under<pr>
until:until<pr>
up:up<pr>
upon:upon<pr>
up to:up to<pr>
via:via<pr>
within:within<pr>
with:with<pr>
without:without<pr>

Numerals (Num)

Numerals have genitive possessive inflections. They are classified as:

  • num: numerals (one, two)
  • ord: ordinals (first, ...)

Numeral examples

one:one<num><sg>
one's:one<num><sg><gen>
two:two<num><pl>
two's:two<num><pl><gen>
three:three<num><pl>
three's:three<num><pl><gen>
first:first<num><pl>
first's:first<num><pl><gen>
second:second<num><pl>
second's:second<num><pl><gen>
third:third<num><pl>
third's:third<num><pl><gen>

Conjunctions (Conj)

Conjunctions don't inflect. They are classified as:

  • cnjcoo: coordinating (and, or)
  • cnjsub: subordinating (that)
  • cnjadv: adverbial (after)

Conjunction examples

albeit:albeit<cnjadv>
albeit:albeit<cnjsub>
although:although<cnjadv>
and:and<cnjcoo>
an if:an if<cnjadv>
because:because<cnjadv>
because:because<cnjsub>
both:both<cnjcoo>
but:but<cnjcoo>
either:either<cnjadv>
however:however<cnjadv>
if:if<cnjadv>
if:if<cnjsub>
lest:lest<cnjadv>
neither:neither<cnjcoo>
nor:nor<cnjcoo>
or:or<cnjcoo>
since:since<cnjadv>
than:than<cnjadv>
than:than<cnjsub>
that:that<cnjsub>
then:then<cnjadv>
though:though<cnjadv>
til:til<cnjadv>
till:till<cnjadv>
unless:unless<cnjadv>
until:until<cnjadv>
unto:unto<cnjadv>
what:what<cnjsub>
whenas:whenas<cnjadv>
whence:whence<cnjadv>
when:when<cnjadv>
wherealong:wherealong<cnjadv>
whereas:whereas<cnjadv>
whereat:whereat<cnjadv>
wherefore:wherefore<cnjadv>
whereinbefore:whereinbefore<cnjadv>
wherein:wherein<cnjadv>
whereof:whereof<cnjadv>
whereout:whereout<cnjadv>
whereover:whereover<cnjadv>
wheresoever:wheresoever<cnjadv>
whether:whether<cnjadv>
which:which<cnjsub>
while:while<cnjadv>
whilst:whilst<cnjadv>


Interjections (Prt)

Interjections don't inflect. They're classified as:

  • ij: interjections

Interjection examples

argh:argh<ij>
fuck:fuck<ij>
hello:hello<ij>
hey:hey<ij>
aah:aah<ij>
aargh:aargh<ij>
agh:agh<ij>
ah:ah<ij>
aha:aha<ij>
ahem:ahem<ij>
ahh:ahh<ij>
aw:aw<ij>
aww:aww<ij>
aye:aye<ij>
bah:bah<ij>
boo:boo<ij>
brr:brr<ij>
bye:bye<ij>
crap:crap<ij>
crud:crud<ij>
damn:damn<ij>
darn:darn<ij>
d'oh:d'oh<ij>
doh:doh<ij>
eh:eh<ij>
goddamn:goddamn<ij>
grr:grr<ij>
ha:ha<ij>
hah:hah<ij>
haha:haha<ij>
heh:heh<ij>
hehe:hehe<ij>
hi:hi<ij>
hm:hm<ij>
hmm:hmm<ij>
hmph:hmph<ij>
hrm:hrm<ij>
huh:huh<ij>
like:like<ij>
lol:lol<ij>
omg:omg<ij>
ok:ok<ij>
ooh:ooh<ij>
oops:oops<ij>
ouch:ouch<ij>
oww:oww<ij>
phew:phew<ij>
shh:shh<ij>
shit:shit<ij>
sorry:sorry<ij>
thanks:thanks<ij>
ugh:ugh<ij>
uh:uh<ij>
uh-huh:uh-huh<ij>
umm:umm<ij>
welcome:welcome<ij>
well:well<ij>
what:what<ij>
whew:whew<ij>
whoa:whoa<ij>
woohoo:woohoo<ij>
yay:yay<ij>

Punctuations (Google pos: .)

These are more or less same everywhere apart from directionality and some orthographic variation.

':'<apos>
,:,<cm>
-:-<guio>
--:–<guio>
–:-<guio>
—:—<guio>
(:(<lpar>
[:[<lpar>
":"<lquot>
“:“<lquot>
«:«<lquot>
»:«<lquot>
):)<rpar>
]:]<rpar>
":"<rquot>
”:”<rquot>
»:»<rquot>
(:(<lpar>
):)<rpar>
_:_<sent>
:::<sent>
;:;<sent>
!:!<sent>
?:?<sent>
.:.<sent>
#:#<sent>
%:%<sent>