Difference between revisions of "User:TommiPirinen/English tagset"
TommiPirinen (talk | contribs) |
TommiPirinen (talk | contribs) m (typo in code block) |
||
Line 374: | Line 374: | ||
=== Determiner examples === |
=== Determiner examples === |
||
< |
<nowiki>a:>:a<det><ind><sg> |
||
an:>:a<det><ind><sg> |
an:>:a<det><ind><sg> |
||
~a:<:a<det><ind><sg> |
~a:<:a<det><ind><sg> |
Revision as of 00:16, 11 August 2014
Contents
- 1 RFC for English tags
- 1.1 Verbs (google pos: Verb)
- 1.2 Nouns (google pos: Noun)
- 1.3 Adjectives (google pos: Adj)
- 1.4 Adverbs (google pos: Adv)
- 1.5 Adadverbs (google pos: ?)
- 1.6 Pronouns (google pos: Pron)
- 1.7 Relatives (Google pos: ?)
- 1.8 Determiners (Det)
- 1.9 Prepositions (Adp)
- 1.10 Numerals (Num)
- 1.11 Conjunctions (Conj)
- 1.12 Interjections (Prt)
- 1.13 Punctuations (Google pos: .)
RFC for English tags
This is from apertium-fin-eng.eng.dix though I hope to get it to be like langs/apertium-eng some day...
Verbs (google pos: Verb)
Regular English verbs inflect in these forms: accept, accepts, accepted, accepting. Some irregular verbs have like five: forget, forgets, forgot, forgotten, forgetting. The verb to behas bunch of forms: 'be, am, are, is, was, were, been, being.
The tags we are using to classify English verbs are:
- vblex: for regular verbs, like accept
- vaux: auxiliary verbs; that have verb complement, like can
- vbser: verb be
- vbdo: verb do
- vbhaver: verb have
The morphs coming after (or lack of them) are classified with:
- inf: infinitive (as in: to do, to walk)
- pri: present indicative (as in: I do, he walks)
- prs: present subjunctive (as in: Let there be light ; At other times it is important that we be quiet.)
- past: common past (as in: I did, he _walked_)
- pis: imperfect subjunctive (as in: If I were you, ...)
- pp: past participle (as I've done, he has walked)
- imp: imperative (as in: be quiet!)
- pprs: present participle
- ger: gerund
- subs: substantive
and potentially
- +not.adv.neg: (as in can't, didn't)
In future likely:
- transitivity
Verb examples
The tag sequences are as follows:
Regular verbs:
walk:walk<vblex><inf> walk:walk<vblex><pri> walk:walk<vblex><prs> walk:walk<vblex><imp> walks:walk<vblex><pri><p3><sg> walked:walk<vblex><pis> walked:walk<vblex><past> walked:walk<vblex><pp> walking:walk<vblex><subs> walking:walk<vblex><pprs> walking:walk<vblex><ger>
Irregulars:
forget:forget<vblex><inf> forget:forget<vblex><pri> forgets:forget<vblex><pri><p3><sg> forgot:forget<vblex><past> forgotten:forget<vblex><pp> forgetting:forget<vblex><ger>
Auxiliaries (closed class, all examples):
can:can<vaux><pri> can't:can<vaux><pri>+not<adv> cannot:can<vaux><pri>+not<adv> could:could<vaux><pri> could:couldn't<vaux><pri>+not<adv> may:may<vaux><inf> may:may<vaux><pri> may:may<vaux><past> might:might<vaux><inf> might:might<vaux><pri> might:might<vaux><past> must:must<vaux><inf> must:must<vaux><pri> must:must<vaux><past> ought:ought<vaux><inf> ought:ought<vaux><pri> ought:ought<vaux><past> shall:shall<vaux><pri> shan't:shall<vaux><pri>+not<adv> should:should<vaux><pri> should:shouldn't<vaux><pri>+not<adv> will:will<vaux><pri> would:will<vaux><past> won't:will<vaux><pri>+not<adv> wouldn't:will<vaux><past>+not<adv> would:would<vaux><pri> would:wouldn't<vaux><pri>+not<adv>
Verb be:
be:be<vbser><inf> are:be<vbser><pri> am:be<vbser><pri><p1><sg> is:be<vbser><pri><p3><sg> was:be<vbser><past><p1><sg> was:be<vbser><past><p3><sg> were:be<vbser><past> been:be<vbser><pp> being:be<vbser><ger>
Verb have:
have:have<vbhaver><inf> have:have<vbhaver><pri> has:have<vbhaver><pri><p3><sg> had:have<vbhaver><past> had:have<vbhaver><pp> having:have<vbhaver><ger>
Verb do:
do:do<vbdo><inf> do:do<vbdo><imp> do:do<vbdo><pri> does:do<vbdo><pri><p3><sg> did:do<vbdo><past> did:do<vbdo><pis> doing:do<vbdo><subs> doing:do<vbdo><pprs> doing:do<vbdo><ger> done:do<vbdo><pp>
Nouns (google pos: Noun)
Nouns have commonly two forms and possessives along them: beer, beers, beer's, beers'. Some don't: ?
The tags used to classify nouns are:
- n: regular noun, like beer
- np: proper noun, like Jack
- m: male
- f: female
- mf: both female and male
- nt: neuter female nor male
- top: place
- ant: human
And also:
- cnt: countable, like chair
- unc: uncountable, like cheese
the suffixes are:
- sg: singular as in beer
- pl: plural as in beers
- gen: genitive or possessive or somehting, as in beer's
Noun examples
Regular nouns go like:
beer:beer<n><sg> beers:beer<n><pl> beer's:beer<n><sg><gen> beers':beer<n><pl><gen>
Proper nouns:
Aaron:Aaron<np><ant><m><sg> Aarons:Aaron<np><ant><m><pl> Aarons':Aaron<np><ant><m><pl><gen> Aaron's:Aaron<np><ant><m><sg><gen> Amsterdam:Amsterdam<np><top><sg> Amsterdams:Amsterdam<np><top><pl> Amsterdam's:Amsterdam<np><top><sg><gen> Amsterdams':Amsterdam<np><top><pl><gen>
Adjectives (google pos: Adj)
Adjectives mostly don't do anything, like expensive, but some have three forms, like: small, smaller, smallest. The tags used for classifying are:
- adj: for non-inflecting ones, like expensive
- sint: for those with three forms, like small
- pst: positive isn't tagged? (not pos, that's for possessives)
- ord: ordinals as adjectives?
the suffixes are marked with:
- comp. for comparative, like in smaller
- sup for superlative, like in smallest
Adjectives that don't normally take comparative should allow the use of comparative marked with a 'sub' marker, e.g. "expensiver", "expensivest".
Adjective examples
Like so:
small:small<adj><sint> smaller:small<adj><sint><comp> smallest:small<adj><sint><sup> expensive:expensive<adj> expensiver:>:expensive<adj><sint><comp> expensivest:>:expensive<adj><sint><sup>
Adverbs (google pos: Adv)
Adverbs don't inflect. There are couple of tags to classify them:
- adv: for adverbs
- itg: for interrogatives
Adverb examples
abaxially:abaxially<adv> ably:ably<adv> abnormally:abnormally<adv> abominably:abominably<adv> abortively:abortively<adv> abruptly:abruptly<adv> absently:absently<adv> absentmindedly:absentmindedly<adv> absolutely:absolutely<adv> abstemiously:abstemiously<adv> ... aboard:aboard<adv> drunk:drunk<adv> no:no<adv><neg> where:where<adv><itg> when:when<adv><itg> why:why<adv><itg>
Adadverbs (google pos: ?)
Ad-adverbs don't inflect. They are tagged:
- preadv: for preadverbs
Adadverb examples
as:as<preadv> more:more<preadv> most:most<preadv> so:so<preadv> very:very<preadv>
Pronouns (google pos: Pron)
Pronouns are categorised with:
- prn: for pronouns
- pers: for personal (I, you...)
- dem: for demonstrative (this, that...)
- ref: for reflexives (...self)
Some pronouns inflect like nouns, some have more cases like:
- acc: for object form (him)
- p1, p2, p3: for persons (I, you, he,...)
Pronoun examples
anybody:anybody<prn><sg> anyone:anyone<prn><sg> anything:anything<prn><sg> both:both<prn><pl> everybody:everybody<prn><sg> everyone:everyone<prn><sg> everything:everything<prn><sg> few:few<prn><pl> he:he<prn><pers><p3><m><sg> his:he<prn><pers><p3><m><sg><poss> his:he<prn><pers><p3><m><sg><gen> him:he<prn><pers><p3><m><sg><acc> herself:herself<prn><ref><p3><f><sg> himself:himself<prn><ref><p3><m><sg> hisself:himself<prn><ref><p3><m><sg> I:I<prn><pers><p1><mf><sg> me:I<prn><pers><p1><mf><sg><acc> my:I<prn><pers><p1><mf><sg><gen> mine:I<prn><pers><p1><mf><sg><poss> it:it<prn><dem><sg> its:it<prn><dem><sg><poss> itself:itself<prn><ref><p3><nt><sg> myself:myself<prn><ref><p1><mf><sg> oneself:oneself<prn><ref><p1><mf><sg> oneself:oneself<prn><ref><p3><mf><sg> one's self:oneself<prn><ref><p1><mf><sg> one's self:oneself<prn><ref><p3><mf><sg> ourself:ourselves<prn><ref><p1><mf><pl> ourselves:ourselves<prn><ref><p1><mf><pl> several:several<prn><sg> she:she<prn><pers><p3><m><sg> hers:she<prn><pers><p3><m><sg><poss> her:she<prn><pers><p3><m><sg><gen> her:she<prn><pers><p3><m><sg><acc> something:something<prn><sg> that:that<prn><rel> that:that<prn><sg> those:that<prn><pl> theirselves:themselves<prn><ref><p3><mf><pl> themself:themself<prn><ref><p3><mf><sg> themselves:themselves<prn><ref><p3><mf><sg> themselves:themselves<prn><ref><p3><mf><pl> they:they<prn><pers><p3><mf><pl> their:they<prn><pers><p3><mf><pl><gen> theirs:they<prn><pers><p3><mf><pl><poss> them:they<prn><pers><p3><m><sg><acc> this:this<prn><sg> these:this<prn><pl> thyself:thyself<prn><ref><p2><mf><sg> we:we<prn><pers><p1><mf><pl> us:we<prn><pers><p1><mf><pl><acc> our:we<prn><pers><p1><mf><pl><gen> ours:we<prn><pers><p1><mf><pl><poss> which:which<prn><itg> which:which<prn><rel> who:who<prn><itg> whose:who<prn><poss> whom:who<prn><itg><acc> you:you<prn><pers><p2><mf><sp> yours:you<prn><pers><p2><mf><sp><poss> your:you<prn><pers><p2><mf><sp><gen> you:you<prn><pers><p2><mf><sp><acc> yourself:yourself<prn><ref><p2><mf><sg> yourselves:yourselves<prn><ref><p2><mf><pl>
Relatives (Google pos: ?)
Relatives are those words (normally pronouns or adverbs) which can introduce a relative clause. E.g. a beer that I drank
, a boy who cried wolf
Relative examples
that:that<rel><an><mf><sp> which:which<rel><an><mf><sp> what:what<rel><nn><mf><sg> when:when<rel><adv> where:where<rel><adv> where:where<rel><adv> who:who<rel><an><mf><sp> whom:whom<rel><an><mf><sp> whose:whose<rel><aa><mf><sg> where:where<rel><adv> when:when<rel><adv> why:why<rel><adv>
Determiners (Det)
Determiners mostly don't inflect ('this' and 'that' inflect for number). They're classified with:
- det: determiners
- ind: indefinite
- def: definite
- dem: demonstrative
- itg: interrogative
- qnt: quantifier
Determiner examples
a:>:a<det><ind><sg> an:>:a<det><ind><sg> ~a:<:a<det><ind><sg> all:all<det><ind><sp> any:any<det><ind><sp> another:another<det><ind><sp> both:both<det><qnt> each:each<det><ind><sp> her:her<det><pos><sp> his:his<det><pos><sp> its:its<det><pos><sp> many:many<det><qnt> more:more<det><qnt> most:most<det><qnt> my:my<det><pos><sp> no:no<det><ind><neg> other:other<det><ind><sp> our:our<det><pos><sp> several:several<det><dem> some:some<det><dem> that:that<det><dem><sg> those:that<det><dem><pl> their:their<det><pos><sp> the:the<det><def><sp> this:this<det><dem><sg> these:this<det><dem><pl> which:which<det><itg><sp> your:your<det><pos><sp> all:all<det> only:only<det> first:first<det><ord><sp> second:second<det><ord><sp> third:third<det><ord><sp> fourth:fourth<det><ord><sp> fifth:fifth<det><ord><sp> sixth:sixth<det><ord><sp> seventh:seventh<det><ord><sp> eighth:eighth<det><ord><sp> ninth:ninth<det><ord><sp> tenth:tenth<det><ord><sp> eleventh:eleventh<det><ord><sp> twelfth:twelfth<det><ord><sp> thirteenth:thirteenth<det><ord><sp> fourteenth:fourteenth<det><ord><sp> fifteenth:fifteenth<det><ord><sp> sixteenth:sixteenth<det><ord><sp> seventeenth:seventeenth<det><ord><sp> eighteenth:eighteenth<det><ord><sp> nineteenth:nineteenth<det><ord><sp> twentieth:twentieth<det><ord><sp> thirtieth:thirtieth<det><ord><sp> fourtieth:fourtieth<det><ord><sp> fiftieth:fiftieth<det><ord><sp> sixtieth:sixtieth<det><ord><sp> seventieth:seventieth<det><ord><sp> eightieth:eightieth<det><ord><sp> ninetieth:ninetieth<det><ord><sp> hundreth:hundreth<det><ord><sp> thousanth:thousanth<det><ord><sp> millionth:millionth<det><ord><sp> milliarth:milliarth<det><ord><sp> billionth:billionth<det><ord><sp> billiarth:billiarth<det><ord><sp> trillionth:trillionth<det><ord><sp> trilliarth:trilliarth<det><ord><sp>
Prepositions (Adp)
Prepositions don't inflect. They are classified with:
- pr: preposition
Multiword prepositions should be checked for compositional (non-multiword possibilities), these should be encoded in the lexicon.
Preposition examples
above:above<pr> according to:according to<pr> across:across<pr> after:after<pr> against:against<pr> along:along<pr> alongside:alongside<pr> along with:along with<pr> # (consider "he walked along with his friend" vs. "he walked along with a song in his heart") amid:amid<pr> among:among<pr> amongst:amongst<pr> around:around<pr> as:as<pr> as of:as of<pr> at:at<pr> atop:atop<pr> because of:because of<pr> before:before<pr> behind:behind<pr> below:below<pr> between:between<pr> but:but<pr> by:by<pr> by means of:by means of<pr> despite:despite<pr> due to:due to<pr> during:during<pr> except for:except for<pr> except:except<pr> for:for<pr> from:from<pr> in contrast to:in contrast to<pr> in front of:in front of<pr> in:in<pr> in order to:in order to<pr> inside:inside<pr> into:into<pr> near:near<pr> off:off<pr> of:of<pr> on:on<pr> onto:onto<pr> out:out<pr> out of:out of<pr> outside:outside<pr> over:over<pr> per:per<pr> prior to:prior to<pr> since:since<pr> through:through<pr> throughout:throughout<pr> to:to<pr> towards:towards<pr> under:under<pr> until:until<pr> up:up<pr> upon:upon<pr> up to:up to<pr> via:via<pr> within:within<pr> with:with<pr> without:without<pr>
Numerals (Num)
Numerals have genitive possessive inflections. They are classified as:
- num: numerals (one, two)
- ord: ordinals (first, ...)
Numeral examples
one:one<num><sg> one's:one<num><sg><gen> two:two<num><pl> two's:two<num><pl><gen> three:three<num><pl> three's:three<num><pl><gen> first:first<num><pl> first's:first<num><pl><gen> second:second<num><pl> second's:second<num><pl><gen> third:third<num><pl> third's:third<num><pl><gen>
Conjunctions (Conj)
Conjunctions don't inflect. They are classified as:
- cnjcoo: coordinating (and, or)
- cnjsub: subordinating (that)
- cnjadv: adverbial (after)
Conjunction examples
albeit:albeit<cnjadv> albeit:albeit<cnjsub> although:although<cnjadv> and:and<cnjcoo> an if:an if<cnjadv> because:because<cnjadv> because:because<cnjsub> both:both<cnjcoo> but:but<cnjcoo> either:either<cnjadv> however:however<cnjadv> if:if<cnjadv> if:if<cnjsub> lest:lest<cnjadv> neither:neither<cnjcoo> nor:nor<cnjcoo> or:or<cnjcoo> since:since<cnjadv> than:than<cnjadv> than:than<cnjsub> that:that<cnjsub> then:then<cnjadv> though:though<cnjadv> til:til<cnjadv> till:till<cnjadv> unless:unless<cnjadv> until:until<cnjadv> unto:unto<cnjadv> what:what<cnjsub> whenas:whenas<cnjadv> whence:whence<cnjadv> when:when<cnjadv> wherealong:wherealong<cnjadv> whereas:whereas<cnjadv> whereat:whereat<cnjadv> wherefore:wherefore<cnjadv> whereinbefore:whereinbefore<cnjadv> wherein:wherein<cnjadv> whereof:whereof<cnjadv> whereout:whereout<cnjadv> whereover:whereover<cnjadv> wheresoever:wheresoever<cnjadv> whether:whether<cnjadv> which:which<cnjsub> while:while<cnjadv> whilst:whilst<cnjadv>
Interjections (Prt)
Interjections don't inflect. They're classified as:
- ij: interjections
Most arbitrary letter combinations that appear in text, prose, or chat messages could be interjections. We're limiting the selection to widely attested ones that may actually be sensibly translated, esp. greetings, curses or such minimal responses.
Interjection examples
argh:argh<ij> fuck:fuck<ij> hello:hello<ij> hey:hey<ij> aah:aah<ij> aargh:aargh<ij> agh:agh<ij> ah:ah<ij> aha:aha<ij> ahem:ahem<ij> ahh:ahh<ij> aw:aw<ij> aww:aww<ij> aye:aye<ij> bah:bah<ij> boo:boo<ij> brr:brr<ij> bye:bye<ij> crap:crap<ij> crud:crud<ij> damn:damn<ij> darn:darn<ij> d'oh:d'oh<ij> doh:doh<ij> eh:eh<ij> goddamn:goddamn<ij> grr:grr<ij> ha:ha<ij> hah:hah<ij> haha:haha<ij> heh:heh<ij> hehe:hehe<ij> hi:hi<ij> hm:hm<ij> hmm:hmm<ij> hmph:hmph<ij> hrm:hrm<ij> huh:huh<ij> like:like<ij> lol:lol<ij> omg:omg<ij> ok:ok<ij> ooh:ooh<ij> oops:oops<ij> ouch:ouch<ij> oww:oww<ij> phew:phew<ij> shh:shh<ij> shit:shit<ij> sorry:sorry<ij> thanks:thanks<ij> ugh:ugh<ij> uh:uh<ij> uh-huh:uh-huh<ij> umm:umm<ij> welcome:welcome<ij> well:well<ij> what:what<ij> whew:whew<ij> whoa:whoa<ij> woohoo:woohoo<ij> yay:yay<ij>
Punctuations (Google pos: .)
These are more or less same everywhere apart from directionality and some orthographic variation.
':'<apos> ,:,<cm> -:-<guio> --:–<guio> –:-<guio> —:—<guio> (:(<lpar> [:[<lpar> ":"<lquot> “:“<lquot> «:«<lquot> »:«<lquot> ):)<rpar> ]:]<rpar> ":"<rquot> ”:”<rquot> »:»<rquot> (:(<lpar> ):)<rpar> _:_<sent> :::<sent> ;:;<sent> !:!<sent> ?:?<sent> .:.<sent> #:#<sent> %:%<sent>