User:TommiPirinen/English tagset

From Apertium
Jump to navigation Jump to search

RFC for English tags, eh?

Verbs (google pos: Verb)

Regular English verbs inflect in these forms: _accept_, _accepts_, _accepted_, _accepting_. Some irregular verbs have like five: _forget_, _forgets_, _forgot_, _forgotten_, _forgetting_. The verb to _be_ has bunch of forms: _be_, _am_, _are_, _is_, _was_, _were_, _been_, _being_.

The tags we are using to classify English verbs are:

 * vblex: for regular verbs
 * vaux: auxiliary verbs; that have verb complement
 * vbser: verb _be_
 * vbdo: verb _do_
 * vbhaver: verb _have_

The morphs coming after (or lack of them) are classified with:

 * inf: infinitive (as in: to _do_, to _walk_)
 * pri: present indicative (as in: I _do_, he _walks_)
 * prs: present subjunctive (as in: Let there _be_ light ; At other times it is important that we _be_ quiet.)
 * past: common past (as in: I _did_, he _walked_)
 * pis: imperfect subjunctive (as in: If I _were_ you, ...)
 * pp: past participle (as I've _done_, he has _walked_)
 * pprs: present participle 
 * ger: gerund
 * subs: substantive

and potentially

 * +not.adv.neg: (as in _can't_, _didn't_)

In future likely:

 * transitivity

The tag sequences are as follows:

Regular verbs:

walk:walk<vblex><inf>
walk:walk<vblex><pri>
walk:walk<vblex><prs>
walk:walk<vblex><imp>
walks:walk<vblex><pri><p3><sg>
walked:walk<vblex><pis>
walked:walk<vblex><past>
walked:walk<vblex><pp>
walking:walk<vblex><subs>
walking:walk<vblex><pprs>
walking:walk<vblex><ger>

Irregulars:

forget:forget<vblex><inf>
forget:forget<vblex><pri>
forgets:forget<vblex><pri><p3><sg>
forgot:forget<vblex><past>
forgotten:forget<vblex><pp>
forgetting:forget<vblex><ger>

Auxiliaries:

can:can<vaux><pri>
could:can<vaux><past>
can't:can<vaux><pri>+not<adv>
cannot:can<vaux><pri>+not<adv>
couldn't:can<vaux><past>+not<adv>
may:may<vaux><pri>
may:may<vaux><past>
might:might<vaux><pri>
might:might<vaux><past>
must:must<vaux><pri>
must:must<vaux><past>
ought:ought<vaux><pri>
ought:ought<vaux><past>
shall:shall<vaux><pri>
should:shall<vaux><past>
shan't:shall<vaux><pri>+not<adv>
shouldn't:shall<vaux><past>+not<adv>
will:will<vaux><pri>
would:will<vaux><past>
won't:will<vaux><pri>+not<adv>
wouldn't:will<vaux><past>+not<adv>

Verb have:

have:have<vbhaver><inf>
have:have<vbhaver><pri>
has:have<vbhaver><pri><p3><sg>
had:have<vbhaver><past>
having:have<vbhaver><ger>

Verb do:

do:do<vbdo><inf>
do:do<vbdo><imp>
do:do<vbdo><pri>
does:do<vbdo><pri><p3><sg>
did:do<vbdo><past>
did:do<vbdo><pis>
doing:do<vbdo><subs>
doing:do<vbdo><pprs>
doing:do<vbdo><ger>
done:do<vbdo><pp>

Nouns (google pos: Noun)

Nouns have commonly two forms and possessives along them: _beer_, _beers_, _beer's_ , _beers'_. Some don't: ?

The tags used to classify nouns are:

 * n: regular noun
 * np: proper noun
 * m: male
 * f: female
 * mf: both female and male
 * nt: neuter female nor male
 * top: place
 * ant: human

And also:

 * cnt
 * unc

the suffixes are:

 * sg: singular
 * pl: plural
 * gen: genitive or possessive or somehting


Regular nouns go like:

beer:beer<n><sg>
beers:beer<n><pl>
beer's:beer<n><sg><gen>
beers':beer<n><pl><gen>

Proper nouns:

Aaron:Aaron<np><ant><m><sg>
Aarons:Aaron<np><ant><m><pl>
Aarons':Aaron<np><ant><m><pl><gen>
Aaron's:Aaron<np><ant><m><sg><gen>
Amsterdam:Amsterdam<np><top><sg>
Amsterdams:Amsterdam<np><top><pl>
Amsterdam's:Amsterdam<np><top><sg><gen>
Amsterdams':Amsterdam<np><top><pl><gen>

Adjectives (google pos: Adj)

Adjectives mostly don't do anything, like _hairy_, but some have three forms, like: _small_, _smaller_, _smallest_. The tags used for classifying are:

 * adj: for non-inflecting ones
 * sint: for those with three forms

the suffixes are marked with:

 * comp. for comparative
 * sup for superlative

Like so:

small:small<adj><sint>
smaller:small<adj><sint><comp>
smallest:small<adj><sint><sup>
hairy:hairy<adj>

Adverbs (google pos: Adv)

Adverbs are adverbs. They use the tag adv:

aboard:aboard<adv>
drunk:drunk<adv>
no:no<adv><neg>
where:where<adv><itg>
when:when<adv><itg>
why:why<adv><itg>

Some have other tags too.

Pronouns (google pos: Pron)

There's a lot of different pronouns.

anybody:anybody<prn><sg>
anyone:anyone<prn><sg>
anything:anything<prn><sg>
both:both<prn><pl>
everybody:everybody<prn><sg>
everyone:everyone<prn><sg>
everything:everything<prn><sg>
few:few<prn><pl>
he:he<prn><pers><p3><m><sg>
his:he<prn><pers><p3><m><sg><poss>
his:he<prn><pers><p3><m><sg><gen>
him:he<prn><pers><p3><m><sg><acc>
herself:herself<prn><ref><p3><f><sg>
himself:himself<prn><ref><p3><m><sg>
hisself:himself<prn><ref><p3><m><sg>
I:I<prn><pers><p1><mf><sg>
me:I<prn><pers><p1><mf><sg><acc>
my:I<prn><pers><p1><mf><sg><gen>
mine:I<prn><pers><p1><mf><sg><poss>
it:it<prn><dem><sg>
its:it<prn><dem><sg><poss>
itself:itself<prn><ref><p3><nt><sg>
myself:myself<prn><ref><p1><mf><sg>
oneself:oneself<prn><ref><p1><mf><sg>
oneself:oneself<prn><ref><p3><mf><sg>
one's self:oneself<prn><ref><p1><mf><sg>
one's self:oneself<prn><ref><p3><mf><sg>
ourself:ourselves<prn><ref><p1><mf><pl>
ourselves:ourselves<prn><ref><p1><mf><pl>
several:several<prn><sg>
she:she<prn><pers><p3><m><sg>
hers:she<prn><pers><p3><m><sg><poss>
her:she<prn><pers><p3><m><sg><gen>
her:she<prn><pers><p3><m><sg><acc>
something:something<prn><sg>
that:that<prn><rel>
that:that<prn><sg>
those:that<prn><pl>
theirselves:themselves<prn><ref><p3><mf><pl>
themself:themself<prn><ref><p3><mf><sg>
themselves:themselves<prn><ref><p3><mf><sg>
themselves:themselves<prn><ref><p3><mf><pl>
they:they<prn><pers><p3><mf><pl>
their:they<prn><pers><p3><mf><pl><gen>
theirs:they<prn><pers><p3><mf><pl><poss>
them:they<prn><pers><p3><m><sg><acc>
this:this<prn><sg>
these:this<prn><pl>
thyself:thyself<prn><ref><p2><mf><sg>
we:we<prn><pers><p1><mf><pl>
us:we<prn><pers><p1><mf><pl><acc>
our:we<prn><pers><p1><mf><pl><gen>
ours:we<prn><pers><p1><mf><pl><poss>
which:which<prn><itg>
which:which<prn><rel>
who:who<prn><itg>
whose:who<prn><poss>
whom:who<prn><itg><acc>
you:you<prn><pers><p2><mf><sp>
yours:you<prn><pers><p2><mf><sp><poss>
your:you<prn><pers><p2><mf><sp><gen>
you:you<prn><pers><p2><mf><sp><acc>
yourself:yourself<prn><ref><p2><mf><sg>
yourselves:yourselves<prn><ref><p2><mf><pl>

Determiners (Det)

There are couple of determiners:

a:>:a<det><ind><sg>
an:>:a<det><ind><sg>
~a:<:a<det><ind><sg>
both:both<det><qnt>
many:many<det><qnt>
no:no<det><ind><neg>
several:several<det><dem>
that:th<det><dem><sg>
those:th<det><dem><pl>
the:the<det><def><sp>
this:th<det><dem><sg>
these:th<det><dem><pl>
which:which<det><itg><sp>


Prepositions (Adp)

above:above<pr>
according to:according to<pr>
across:across<pr>
after:after<pr>
against:against<pr>
along:along<pr>
alongside:alongside<pr>
along with:along with<pr>
amid:amid<pr>
among:among<pr>
amongst:amongst<pr>
around:around<pr>
as:as<pr>
as of:as of<pr>
at:at<pr>
atop:atop<pr>
because of:because of<pr>
before:before<pr>
behind:behind<pr>
below:below<pr>
between:between<pr>
but:but<pr>
by:by<pr>
by means of:by means of<pr>
despite:despite<pr>
due to:due to<pr>
during:during<pr>
except for:except for<pr>
except:except<pr>
for:for<pr>
from:from<pr>
in contrast to:in contrast to<pr>
in front of:in front of<pr>
in:in<pr>
in order to:in order to<pr>
inside:inside<pr>
into:into<pr>
near:near<pr>
off:off<pr>
of:of<pr>
on:on<pr>
onto:onto<pr>
out:out<pr>
out of:out of<pr>
outside:outside<pr>
over:over<pr>
per:per<pr>
prior to:prior to<pr>
since:since<pr>
through:through<pr>
throughout:throughout<pr>
to:to<pr>
towards:towards<pr>
under:under<pr>
until:until<pr>
up:up<pr>
upon:upon<pr>
up to:up to<pr>
via:via<pr>
within:within<pr>
with:with<pr>
without:without<pr>

Numerals (Num)

There's quite a bit of number words in existence:

one:one<num><sg>
one's:one<num><sg><gen>
two:two<num><pl>
two's:two<num><pl><gen>
three:three<num><pl>
three's:three<num><pl><gen>
first:first<num><pl>
first's:first<num><pl><gen>
second:second<num><pl>
second's:second<num><pl><gen>
third:third<num><pl>
third's:third<num><pl><gen>

Conjunctions (Conj)

Some classes for conjuncions:

albeit:albeit<cnjadv>
albeit:albeit<cnjsub>
although:although<cnjadv>
and:and<cnjcoo>
an if:an if<cnjadv>
because:because<cnjadv>
because:because<cnjsub>
both:both<cnjcoo>
but:but<cnjcoo>
either:either<cnjadv>
however:however<cnjadv>
if:if<cnjadv>
if:if<cnjsub>
lest:lest<cnjadv>
neither:neither<cnjcoo>
nor:nor<cnjcoo>
or:or<cnjcoo>
since:since<cnjadv>
than:than<cnjadv>
than:than<cnjsub>
that:that<cnjsub>
then:then<cnjadv>
though:though<cnjadv>
til:til<cnjadv>
till:till<cnjadv>
unless:unless<cnjadv>
until:until<cnjadv>
unto:unto<cnjadv>
what:what<cnjsub>
whenas:whenas<cnjadv>
whence:whence<cnjadv>
when:when<cnjadv>
wherealong:wherealong<cnjadv>
whereas:whereas<cnjadv>
whereat:whereat<cnjadv>
wherefore:wherefore<cnjadv>
whereinbefore:whereinbefore<cnjadv>
wherein:wherein<cnjadv>
whereof:whereof<cnjadv>
whereout:whereout<cnjadv>
whereover:whereover<cnjadv>
wheresoever:wheresoever<cnjadv>
whether:whether<cnjadv>
which:which<cnjsub>
while:while<cnjadv>
whilst:whilst<cnjadv>


Interjections (Prt)

Punctuations (Google pos: .)

These are more or less same everywhere apart from directionality and some orthographic variation.

':'<apos>
,:,<cm>
-:-<guio>
--:–<guio>
–:-<guio>
—:—<guio>
(:(<lpar>
[:[<lpar>
":"<lquot>
“:“<lquot>
«:«<lquot>
»:«<lquot>
):)<rpar>
]:]<rpar>
":"<rquot>
”:”<rquot>
»:»<rquot>
(:(<lpar>
):)<rpar>
_:_<sent>
:::<sent>
;:;<sent>
!:!<sent>
?:?<sent>
.:.<sent>
#:#<sent>
%:%<sent>