Difference between revisions of "User:TommiPirinen/English tagset"
Line 336: | Line 336: | ||
* dem: demonstrative |
* dem: demonstrative |
||
* itg: interrogative |
* itg: interrogative |
||
* qnt: quantifier |
|||
=== Determiner examples === |
=== Determiner examples === |
Revision as of 16:21, 10 August 2014
Contents
- 1 RFC for English tags
- 1.1 Verbs (google pos: Verb)
- 1.2 Nouns (google pos: Noun)
- 1.3 Adjectives (google pos: Adj)
- 1.4 Adverbs (google pos: Adv)
- 1.5 Adadverbs (google pos: ?)
- 1.6 Pronouns (google pos: Pron)
- 1.7 Rel (Google pos: ?)
- 1.8 Determiners (Det)
- 1.9 Prepositions (Adp)
- 1.10 Numerals (Num)
- 1.11 Conjunctions (Conj)
- 1.12 Interjections (Prt)
- 1.13 Punctuations (Google pos: .)
RFC for English tags
This is from apertium-fin-eng.eng.dix though I hope to get it to be like langs/apertium-eng some day...
Verbs (google pos: Verb)
Regular English verbs inflect in these forms: accept, accepts, accepted, accepting. Some irregular verbs have like five: forget, forgets, forgot, forgotten, forgetting. The verb to behas bunch of forms: 'be, am, are, is, was, were, been, being.
The tags we are using to classify English verbs are:
- vblex: for regular verbs, like accept
- vaux: auxiliary verbs; that have verb complement, like can
- vbser: verb be
- vbdo: verb do
- vbhaver: verb have
The morphs coming after (or lack of them) are classified with:
- inf: infinitive (as in: to do, to walk)
- pri: present indicative (as in: I do, he walks)
- prs: present subjunctive (as in: Let there be light ; At other times it is important that we be quiet.)
- past: common past (as in: I did, he _walked_)
- pis: imperfect subjunctive (as in: If I were you, ...)
- pp: past participle (as I've done, he has walked)
- imp: imperative (as in: be quiet!)
- pprs: present participle
- ger: gerund
- subs: substantive
and potentially
- +not.adv.neg: (as in _can't_, _didn't_)
In future likely:
- transitivity
Verb examples
The tag sequences are as follows:
Regular verbs:
walk:walk<vblex><inf> walk:walk<vblex><pri> walk:walk<vblex><prs> walk:walk<vblex><imp> walks:walk<vblex><pri><p3><sg> walked:walk<vblex><pis> walked:walk<vblex><past> walked:walk<vblex><pp> walking:walk<vblex><subs> walking:walk<vblex><pprs> walking:walk<vblex><ger>
Irregulars:
forget:forget<vblex><inf> forget:forget<vblex><pri> forgets:forget<vblex><pri><p3><sg> forgot:forget<vblex><past> forgotten:forget<vblex><pp> forgetting:forget<vblex><ger>
Auxiliaries:
can:can<vaux><pri> could:can<vaux><past> can't:can<vaux><pri>+not<adv> cannot:can<vaux><pri>+not<adv> couldn't:can<vaux><past>+not<adv> may:may<vaux><pri> may:may<vaux><past> might:might<vaux><pri> might:might<vaux><past> must:must<vaux><pri> must:must<vaux><past> ought:ought<vaux><pri> ought:ought<vaux><past> shall:shall<vaux><pri> should:shall<vaux><past> shan't:shall<vaux><pri>+not<adv> shouldn't:shall<vaux><past>+not<adv> will:will<vaux><pri> would:will<vaux><past> won't:will<vaux><pri>+not<adv> wouldn't:will<vaux><past>+not<adv>
Verb have:
have:have<vbhaver><inf> have:have<vbhaver><pri> has:have<vbhaver><pri><p3><sg> had:have<vbhaver><past> having:have<vbhaver><ger>
Verb do:
do:do<vbdo><inf> do:do<vbdo><imp> do:do<vbdo><pri> does:do<vbdo><pri><p3><sg> did:do<vbdo><past> did:do<vbdo><pis> doing:do<vbdo><subs> doing:do<vbdo><pprs> doing:do<vbdo><ger> done:do<vbdo><pp>
Nouns (google pos: Noun)
Nouns have commonly two forms and possessives along them: beer, beers, beer's, beers'. Some don't: ?
The tags used to classify nouns are:
- n: regular noun, like beer
- np: proper noun, like Jack
- m: male
- f: female
- mf: both female and male
- nt: neuter female nor male
- top: place
- ant: human
And also:
- cnt: countable, like chair
- unc: uncountable, like cheese
the suffixes are:
- sg: singular as in beer
- pl: plural as in beers
- gen: genitive or possessive or somehting, as in beer's
Noun examples
Regular nouns go like:
beer:beer<n><sg> beers:beer<n><pl> beer's:beer<n><sg><gen> beers':beer<n><pl><gen>
Proper nouns:
Aaron:Aaron<np><ant><m><sg> Aarons:Aaron<np><ant><m><pl> Aarons':Aaron<np><ant><m><pl><gen> Aaron's:Aaron<np><ant><m><sg><gen> Amsterdam:Amsterdam<np><top><sg> Amsterdams:Amsterdam<np><top><pl> Amsterdam's:Amsterdam<np><top><sg><gen> Amsterdams':Amsterdam<np><top><pl><gen>
Adjectives (google pos: Adj)
Adjectives mostly don't do anything, like expensive, but some have three forms, like: small, smaller, smallest. The tags used for classifying are:
* adj: for non-inflecting ones, like expensive * sint: for those with three forms, like small * pos: ??? * ord: ???
the suffixes are marked with:
- comp. for comparative, like in smaller
- sup for superlative, like in smallest
Adjectives that don't normally take comparative should allow the use of comparative marked with a 'sub' marker, e.g. "expensiver", "expensivest".
Adjective examples
Like so:
small:small<adj><sint> smaller:small<adj><sint><comp> smallest:small<adj><sint><sup> expensive:expensive<adj>
Adverbs (google pos: Adv)
Adverbs don't inflect. There are couple of tags to classify them:
- adv: for adverbs
- itg: for interrogatives
Adverb examples
aboard:aboard<adv> drunk:drunk<adv> no:no<adv><neg> where:where<adv><itg> when:when<adv><itg> why:why<adv><itg>
Some have other tags too.
Adadverbs (google pos: ?)
Ad-adverbs don't inflect. They are tagged:
- preadv: for preadverbs
Adadverb examples
as:as<preadv> more:more<preadv> most:most<preadv> so:so<preadv> very:very<preadv>
Pronouns (google pos: Pron)
Pronouns are categorised with:
- prn: for pronouns
- pers: for personal (I, you...)
- dem: for demonstrative (this, that...)
- ref: for reflexives (...self)
Some pronouns inflect like nouns, some have more cases like:
- acc: for object form (him)
- p1, p2, p3: for persons (I, you, he,...)
Pronoun examples
anybody:anybody<prn><sg> anyone:anyone<prn><sg> anything:anything<prn><sg> both:both<prn><pl> everybody:everybody<prn><sg> everyone:everyone<prn><sg> everything:everything<prn><sg> few:few<prn><pl> he:he<prn><pers><p3><m><sg> his:he<prn><pers><p3><m><sg><poss> his:he<prn><pers><p3><m><sg><gen> him:he<prn><pers><p3><m><sg><acc> herself:herself<prn><ref><p3><f><sg> himself:himself<prn><ref><p3><m><sg> hisself:himself<prn><ref><p3><m><sg> I:I<prn><pers><p1><mf><sg> me:I<prn><pers><p1><mf><sg><acc> my:I<prn><pers><p1><mf><sg><gen> mine:I<prn><pers><p1><mf><sg><poss> it:it<prn><dem><sg> its:it<prn><dem><sg><poss> itself:itself<prn><ref><p3><nt><sg> myself:myself<prn><ref><p1><mf><sg> oneself:oneself<prn><ref><p1><mf><sg> oneself:oneself<prn><ref><p3><mf><sg> one's self:oneself<prn><ref><p1><mf><sg> one's self:oneself<prn><ref><p3><mf><sg> ourself:ourselves<prn><ref><p1><mf><pl> ourselves:ourselves<prn><ref><p1><mf><pl> several:several<prn><sg> she:she<prn><pers><p3><m><sg> hers:she<prn><pers><p3><m><sg><poss> her:she<prn><pers><p3><m><sg><gen> her:she<prn><pers><p3><m><sg><acc> something:something<prn><sg> that:that<prn><rel> that:that<prn><sg> those:that<prn><pl> theirselves:themselves<prn><ref><p3><mf><pl> themself:themself<prn><ref><p3><mf><sg> themselves:themselves<prn><ref><p3><mf><sg> themselves:themselves<prn><ref><p3><mf><pl> they:they<prn><pers><p3><mf><pl> their:they<prn><pers><p3><mf><pl><gen> theirs:they<prn><pers><p3><mf><pl><poss> them:they<prn><pers><p3><m><sg><acc> this:this<prn><sg> these:this<prn><pl> thyself:thyself<prn><ref><p2><mf><sg> we:we<prn><pers><p1><mf><pl> us:we<prn><pers><p1><mf><pl><acc> our:we<prn><pers><p1><mf><pl><gen> ours:we<prn><pers><p1><mf><pl><poss> which:which<prn><itg> which:which<prn><rel> who:who<prn><itg> whose:who<prn><poss> whom:who<prn><itg><acc> you:you<prn><pers><p2><mf><sp> yours:you<prn><pers><p2><mf><sp><poss> your:you<prn><pers><p2><mf><sp><gen> you:you<prn><pers><p2><mf><sp><acc> yourself:yourself<prn><ref><p2><mf><sg> yourselves:yourselves<prn><ref><p2><mf><pl>
Rel (Google pos: ?)
Relatives are those words (normally pronouns or adverbs) which can introduce a relative clause.
Relative examples
that:that<rel><an><mf><sp> that:that<rel><an><mf><sp> which:which<rel><an><mf><sp> who:who<rel><an><mf><sp> where:where<rel><adv> when:when<rel><adv> why:why<rel><adv>
Determiners (Det)
Determiners mostly don't inflect ('this' and 'that' inflect for number). They're classified with:
- det: determiners
- ind: indefinite
- def: definite
- dem: demonstrative
- itg: interrogative
- qnt: quantifier
Determiner examples
a:>:a<det><ind><sg> an:>:a<det><ind><sg> ~a:<:a<det><ind><sg> both:both<det><qnt> many:many<det><qnt> no:no<det><ind><neg> several:several<det><dem> that:that<det><dem><sg> those:that<det><dem><pl> the:the<det><def><sp> this:this<det><dem><sg> these:this<det><dem><pl> which:which<det><itg><sp>
Prepositions (Adp)
Prepositions don't inflect. They are classified with:
- pr: preposition
Preposition examples
above:above<pr> according to:according to<pr> across:across<pr> after:after<pr> against:against<pr> along:along<pr> alongside:alongside<pr> along with:along with<pr> amid:amid<pr> among:among<pr> amongst:amongst<pr> around:around<pr> as:as<pr> as of:as of<pr> at:at<pr> atop:atop<pr> because of:because of<pr> before:before<pr> behind:behind<pr> below:below<pr> between:between<pr> but:but<pr> by:by<pr> by means of:by means of<pr> despite:despite<pr> due to:due to<pr> during:during<pr> except for:except for<pr> except:except<pr> for:for<pr> from:from<pr> in contrast to:in contrast to<pr> in front of:in front of<pr> in:in<pr> in order to:in order to<pr> inside:inside<pr> into:into<pr> near:near<pr> off:off<pr> of:of<pr> on:on<pr> onto:onto<pr> out:out<pr> out of:out of<pr> outside:outside<pr> over:over<pr> per:per<pr> prior to:prior to<pr> since:since<pr> through:through<pr> throughout:throughout<pr> to:to<pr> towards:towards<pr> under:under<pr> until:until<pr> up:up<pr> upon:upon<pr> up to:up to<pr> via:via<pr> within:within<pr> with:with<pr> without:without<pr>
Numerals (Num)
Numerals have genitive possessive inflections. They are classified as:
- num: numerals (one, two)
- ord: ordinals (first, ...)
Numeral examples
one:one<num><sg> one's:one<num><sg><gen> two:two<num><pl> two's:two<num><pl><gen> three:three<num><pl> three's:three<num><pl><gen> first:first<num><pl> first's:first<num><pl><gen> second:second<num><pl> second's:second<num><pl><gen> third:third<num><pl> third's:third<num><pl><gen>
Conjunctions (Conj)
Conjunctions don't inflect. They are classified as:
- cnjcoo: coordinating (and, or)
- cnjsub: subordinating (that)
- cnjadv: adverbial (after)
Conjunction examples
albeit:albeit<cnjadv> albeit:albeit<cnjsub> although:although<cnjadv> and:and<cnjcoo> an if:an if<cnjadv> because:because<cnjadv> because:because<cnjsub> both:both<cnjcoo> but:but<cnjcoo> either:either<cnjadv> however:however<cnjadv> if:if<cnjadv> if:if<cnjsub> lest:lest<cnjadv> neither:neither<cnjcoo> nor:nor<cnjcoo> or:or<cnjcoo> since:since<cnjadv> than:than<cnjadv> than:than<cnjsub> that:that<cnjsub> then:then<cnjadv> though:though<cnjadv> til:til<cnjadv> till:till<cnjadv> unless:unless<cnjadv> until:until<cnjadv> unto:unto<cnjadv> what:what<cnjsub> whenas:whenas<cnjadv> whence:whence<cnjadv> when:when<cnjadv> wherealong:wherealong<cnjadv> whereas:whereas<cnjadv> whereat:whereat<cnjadv> wherefore:wherefore<cnjadv> whereinbefore:whereinbefore<cnjadv> wherein:wherein<cnjadv> whereof:whereof<cnjadv> whereout:whereout<cnjadv> whereover:whereover<cnjadv> wheresoever:wheresoever<cnjadv> whether:whether<cnjadv> which:which<cnjsub> while:while<cnjadv> whilst:whilst<cnjadv>
Interjections (Prt)
Interjections don't inflect. They're classified as:
- ij: interjections
Interjection examples
argh:argh<ij> fuck:fuck<ij> hello:hello<ij> hey:hey<ij> aah:aah<ij> aargh:aargh<ij> agh:agh<ij> ah:ah<ij> aha:aha<ij> ahem:ahem<ij> ahh:ahh<ij> aw:aw<ij> aww:aww<ij> aye:aye<ij> bah:bah<ij> boo:boo<ij> brr:brr<ij> bye:bye<ij> crap:crap<ij> crud:crud<ij> damn:damn<ij> darn:darn<ij> d'oh:d'oh<ij> doh:doh<ij> eh:eh<ij> goddamn:goddamn<ij> grr:grr<ij> ha:ha<ij> hah:hah<ij> haha:haha<ij> heh:heh<ij> hehe:hehe<ij> hi:hi<ij> hm:hm<ij> hmm:hmm<ij> hmph:hmph<ij> hrm:hrm<ij> huh:huh<ij> like:like<ij> lol:lol<ij> omg:omg<ij> ok:ok<ij> ooh:ooh<ij> oops:oops<ij> ouch:ouch<ij> oww:oww<ij> phew:phew<ij> shh:shh<ij> shit:shit<ij> sorry:sorry<ij> thanks:thanks<ij> ugh:ugh<ij> uh:uh<ij> uh-huh:uh-huh<ij> umm:umm<ij> welcome:welcome<ij> well:well<ij> what:what<ij> whew:whew<ij> whoa:whoa<ij> woohoo:woohoo<ij> yay:yay<ij>
Punctuations (Google pos: .)
These are more or less same everywhere apart from directionality and some orthographic variation.
':'<apos> ,:,<cm> -:-<guio> --:–<guio> –:-<guio> —:—<guio> (:(<lpar> [:[<lpar> ":"<lquot> “:“<lquot> «:«<lquot> »:«<lquot> ):)<rpar> ]:]<rpar> ":"<rquot> ”:”<rquot> »:»<rquot> (:(<lpar> ):)<rpar> _:_<sent> :::<sent> ;:;<sent> !:!<sent> ?:?<sent> .:.<sent> #:#<sent> %:%<sent>