Difference between revisions of "User:TommiPirinen/English tagset"
(23 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | = RFC for English tags |
+ | = RFC for English tags = |
+ | This is from apertium-fin-eng.eng.dix though I hope to get it to be like langs/apertium-eng some day... |
||
− | == Verbs (google pos: <span style="font-variant: small-caps">Verb</span>)== |
||
+ | English words are split in following poses: |
||
− | <nowiki>> be |
||
− | be be<vblex><actv><pres> 0,000000 |
||
− | be be<vblex><inf> 0,000000 |
||
− | be be<vblex><inf> 0,000000 |
||
+ | * verbs (can be recognised by morphology: 3rd sg present, infinitive, past, ...) |
||
− | > am |
||
+ | * nouns (can usually be recognised by morphology: singular, plural, genitive) |
||
− | am be<vblex><actv><pres><p1><sg> 0,000000 |
||
+ | * adjectives (can almost be recognised by morphology or syntax by comparatives and superlatives) |
||
+ | * adverbs (all sorts of stuff, most new ones end in -ly derivation, others you will know by the trail of blood) |
||
+ | The rest of the classes are kind of closed and no new words should be classified there really: |
||
− | > is |
||
− | is be<vblex><actv><pres><p3><sg> 0,000000 |
||
+ | * determiners: are before NPs and stuff |
||
− | > are |
||
+ | * predeterminers: are before dets |
||
− | are are<n><sg><nom> 0,000000 |
||
+ | * pronouns: are semantically bound to some other words |
||
− | are be<vblex><actv><pres><p1><pl> 0,000000 |
||
+ | * prepositions: come before noun or after verb and stuff |
||
− | are be<vblex><actv><pres><p2><pl> 0,000000 |
||
+ | * adadverbs: are before adverbs |
||
− | are be<vblex><actv><pres><p2><sg> 0,000000 |
||
+ | * numerals: are number words from one to infinity |
||
− | are be<vblex><actv><pres><p3><pl> 0,000000 |
||
+ | * conjunctions: are things that join clauses |
||
+ | * relatives: are things for relative clauses |
||
+ | * interjections: are particles used in spoken language mostly |
||
+ | * symbols: are not letters (cf. Unicode classes) |
||
+ | The tags should match to [[List of symbols]], but are explained here. The tagging should be interoperable (usually reducible) with commonly known "gold" standards, penn treebanks and such. |
||
− | > was |
||
− | was be<vblex><actv><past><p1><sg> 0,000000 |
||
− | was be<vblex><actv><past><p3><sg> 0,000000 |
||
+ | == Verbs (google pos: <span style="font-variant: small-caps">Verb</span>)== |
||
− | > were |
||
− | were be<vblex><actv><past><p1><pl> 0,000000 |
||
− | were be<vblex><actv><past><p2><pl> 0,000000 |
||
− | were be<vblex><actv><past><p2><sg> 0,000000 |
||
− | were be<vblex><actv><past><p3><pl> 0,000000 |
||
+ | Regular English verbs inflect in these forms: ''accept'', ''accepts'', ''accepted'', ''accepting''. Some irregular verbs have like five: ''forget'', ''forgets'', ''forgot'', ''forgotten'', ''forgetting''. The verb to ''be''has bunch of forms: 'be'', ''am'', ''are'', ''is'', ''was'', ''were'', ''been'', ''being''. |
||
− | > being |
||
− | being be<vblex><actv><ger> 0,000000 |
||
− | being be<vblex><actv><ger> 0,000000 |
||
− | being be<vblex><subst><sg><nom> 0,000000 |
||
− | [...] |
||
+ | The tags we are using to classify English verbs are: |
||
− | > been |
||
− | been be<vblex><pp> 0,000000 |
||
+ | * vblex: for regular verbs, like ''accept'' |
||
− | > walk |
||
+ | * vaux: auxiliary verbs; that have verb complement, like ''can'' |
||
− | [...] |
||
+ | * vbser: verb ''be'' |
||
− | walk walk<vblex><actv><pres> 0,000000 |
||
+ | * vbdo: verb ''do'' |
||
− | walk walk<vblex><inf> 0,000000 |
||
+ | * vbhaver: verb ''have'' |
||
+ | The morphs coming after (or lack of them) are classified with: |
||
− | > walks |
||
− | [...] |
||
− | walks walk<vblex><actv><pres><p3><sg> 0,000000 |
||
+ | * inf: infinitive (as in: to ''do'', to ''walk'') |
||
− | > walked |
||
+ | * pri: present indicative (as in: I ''do'', he ''walks'') |
||
− | walked walk<vblex><actv><past> 0,000000 |
||
+ | * prs: present subjunctive (as in: Let there ''be'' light ; At other times it is important that we ''be'' quiet.) |
||
+ | * past: common past (as in: I ''did'', he _walked_) |
||
+ | * pis: imperfect subjunctive (as in: If I ''were'' you, ...) |
||
+ | * pp: past participle (as I've ''done'', he has ''walked'') |
||
+ | * imp: imperative (as in: ''be'' quiet!) |
||
+ | * pprs: present participle |
||
+ | * ger: gerund |
||
+ | * subs: substantive |
||
+ | and potentially |
||
− | > walking |
||
− | walking walk<vblex><actv><ger> 0,000000 |
||
− | walking walk<vblex><subst><sg><nom> 0,000000 |
||
− | [...] |
||
+ | * +not.adv.neg: (as in ''can't'', ''didn't'') |
||
− | > must |
||
− | must must<vaux> 0,000000 |
||
− | must must<vaux><actv><pres> 0,000000 |
||
+ | In future likely: |
||
− | > shall |
||
− | shall shall<vaux><actv><pres> 0,000000 |
||
+ | * transitivity |
||
− | > should |
||
− | should should<vaux><actv><pres> 0,000000 |
||
+ | === Verb examples === |
||
− | > can |
||
− | can can<vaux><actv><pres> 0,000000 |
||
− | can can<vaux><actv><pres><p3><sg> 0,000000 |
||
− | can can<vblex><actv><pres> 0,000000 |
||
− | can can<vblex><inf> 0,000000 |
||
+ | The tag sequences are as follows: |
||
− | > can't |
||
− | can't can<vaux><actv><pres>+not<adv> 0,000000</nowiki> |
||
+ | Regular verbs: |
||
+ | <nowiki> |
||
− | Alternatively: |
||
+ | walk:walk<vblex><inf> |
||
+ | walk:walk<vblex><pri> |
||
+ | walk:walk<vblex><prs> |
||
+ | walk:walk<vblex><imp> |
||
+ | walks:walk<vblex><pri><p3><sg> |
||
+ | walked:walk<vblex><pis> |
||
+ | walked:walk<vblex><past> |
||
+ | walked:walk<vblex><pp> |
||
+ | walking:walk<vblex><subs> |
||
+ | walking:walk<vblex><pprs> |
||
+ | walking:walk<vblex><ger> |
||
+ | </nowiki> |
||
+ | Irregulars: |
||
− | <pre> |
||
+ | <nowiki> |
||
− | accept:accept<vblex><inf> |
||
− | + | forget:forget<vblex><inf> |
|
− | + | forget:forget<vblex><pri> |
|
− | + | forgets:forget<vblex><pri><p3><sg> |
|
− | + | forgot:forget<vblex><past> |
|
− | + | forgotten:forget<vblex><pp> |
|
− | + | forgetting:forget<vblex><ger> |
|
+ | </nowiki> |
||
− | accepting:accept<vblex><pprs> |
||
− | accepting:accept<vblex><ger> |
||
− | accepted:accept<vblex><pp> |
||
− | accept:accept<vblex><imp> |
||
+ | Auxiliaries (closed class, all examples): |
||
− | < |
+ | <nowiki> |
+ | he'd:he<prn><pers><p3><m><sg>+'d<vaux><pri> |
||
+ | he'll:he<prn><pers><p3><m><sg>+'ll<vaux><pri> |
||
+ | I'd:I<prn><pers><p1><mf><sg>+'d<vaux><pri> |
||
+ | I'll:I<prn><pers><p1><mf><sg>+'ll<vaux><pri> |
||
+ | she'd:she<prn><pers><p3><f><sg>+'d<vaux><pri> |
||
+ | she'll:she<prn><pers><p3><f><sg>+'ll<vaux><pri> |
||
+ | they'd:they<prn><pers><p1><mf><pl>+'d<vaux><pri> |
||
+ | they'll:they<prn><pers><p1><mf><pl>+'ll<vaux><pri> |
||
+ | we'd:we<prn><pers><p1><mf><pl>+'d<vaux><pri> |
||
+ | we'll:we<prn><pers><p1><mf><pl>+'ll<vaux><pri> |
||
+ | you'd:you<prn><pers><p2><mf><sp>+'ve<vaux><pri> |
||
+ | you'll:you<prn><pers><p2><mf><sp>+'ll<vaux><pri> |
||
+ | can:can<vaux><pri> |
||
+ | can't:can<vaux><pri>+n't<adv><neg> |
||
+ | cannot:can<vaux><pri>+not<adv><neg> |
||
+ | could:could<vaux><pri> |
||
+ | couldn't:could<vaux><pri>+n't<adv><neg> |
||
+ | may:may<vaux><inf> |
||
+ | may:may<vaux><pri> |
||
+ | may:may<vaux><past> |
||
+ | might:might<vaux><inf> |
||
+ | might:might<vaux><pri> |
||
+ | might:might<vaux><past> |
||
+ | must:must<vaux><inf> |
||
+ | must:must<vaux><pri> |
||
+ | must:must<vaux><past> |
||
+ | ought:ought<vaux><inf> |
||
+ | ought:ought<vaux><pri> |
||
+ | ought:ought<vaux><past> |
||
+ | shall:shall<vaux><pri> |
||
+ | shan't:shall<vaux><pri>+n't<adv><neg> |
||
+ | should:should<vaux><pri> |
||
+ | shouldn't:should<vaux><pri>+n't<adv><neg> |
||
+ | will:will<vaux><pri> |
||
+ | would:will<vaux><past> |
||
+ | won't:will<vaux><pri>+n't<adv><neg> |
||
+ | wouldn't:will<vaux><past>+n't<adv><neg> |
||
+ | would:would<vaux><pri> |
||
+ | wouldn't:would<vaux><pri>+n't<adv><neg> |
||
+ | </nowiki> |
||
+ | Verb ''be'' (all forms, except for contractions): |
||
− | <pre> |
||
+ | |||
− | be:be<vbser><inf> |
||
+ | <nowiki>be:be<vbser><inf> |
||
+ | are:be<vbser><pri> |
||
am:be<vbser><pri><p1><sg> |
am:be<vbser><pri><p1><sg> |
||
− | are:be<vbser><pri> |
||
is:be<vbser><pri><p3><sg> |
is:be<vbser><pri><p3><sg> |
||
− | be:be<vbser><prs> |
||
was:be<vbser><past><p1><sg> |
was:be<vbser><past><p1><sg> |
||
+ | was:be<vbser><past><p3><sg> |
||
were:be<vbser><past> |
were:be<vbser><past> |
||
− | were:be<vbser><pis> |
||
− | was:be<vbser><past><p3><sg> |
||
− | being:be<vbser><subs> |
||
− | being:be<vbser><pprs> |
||
− | being:be<vbser><ger> |
||
been:be<vbser><pp> |
been:be<vbser><pp> |
||
− | + | being:be<vbser><ger> |
|
− | </ |
+ | </nowiki> |
+ | |||
+ | Verb ''have'' (all forms, except for contractions): |
||
+ | |||
+ | <nowiki>have:have<vbhaver><inf> |
||
+ | have:have<vbhaver><pri> |
||
+ | has:have<vbhaver><pri><p3><sg> |
||
+ | had:have<vbhaver><past> |
||
+ | had:have<vbhaver><pp> |
||
+ | having:have<vbhaver><ger> |
||
+ | </nowiki> |
||
+ | |||
+ | Verb ''do'' (all forms): |
||
+ | |||
+ | <nowiki>do:do<vbdo><inf> |
||
+ | do:do<vbdo><imp> |
||
+ | do:do<vbdo><pri> |
||
+ | does:do<vbdo><pri><p3><sg> |
||
+ | did:do<vbdo><past> |
||
+ | did:do<vbdo><pis> |
||
+ | doing:do<vbdo><subs> |
||
+ | doing:do<vbdo><pprs> |
||
+ | doing:do<vbdo><ger> |
||
+ | done:do<vbdo><pp></nowiki> |
||
== Nouns (google pos: <span style="font-variant: small-caps">Noun</span>) == |
== Nouns (google pos: <span style="font-variant: small-caps">Noun</span>) == |
||
+ | Nouns have commonly two forms and possessives along them: ''beer'', ''beers'', ''beer's'', ''beers'''. |
||
− | <nowiki>> beer |
||
+ | Some don't: ? |
||
− | beer beer<n><sg><nom> 0,000000 |
||
+ | |||
+ | The tags used to classify nouns are: |
||
+ | |||
+ | * n: regular noun, like ''beer'' |
||
+ | * np: proper noun, like ''Jack'' |
||
+ | * m: male |
||
+ | * f: female |
||
+ | * mf: both female and male |
||
+ | * nt: neuter female nor male |
||
+ | * top: place |
||
+ | * ant: human |
||
+ | |||
+ | And also: |
||
+ | |||
+ | * cnt: countable, like ''chair'' |
||
+ | * unc: uncountable, like ''cheese'' |
||
+ | |||
+ | the suffixes are: |
||
+ | |||
+ | * sg: singular as in ''beer'' |
||
+ | * pl: plural as in ''beers'' |
||
+ | * +'s.gen: genitive or possessive or somehting, as in ''beer's'' |
||
+ | * attr: for noun-noun modification: "the car{{tag|n><attr}} industry is..." |
||
+ | ** Note: maybe this could be {{tag|cmp}} ? |
||
+ | |||
+ | === Noun examples === |
||
+ | |||
+ | Regular nouns go like: |
||
+ | <nowiki> |
||
− | > beers |
||
− | + | beer:beer<n><attr> |
|
+ | beer:beer<n><sg> |
||
+ | beers:beer<n><pl> |
||
+ | beer's:beer<n><sg>+'s<gen> |
||
+ | beers':beer<n><pl>+'s<gen> |
||
+ | </nowiki> |
||
+ | Proper nouns: |
||
− | > beer's |
||
− | beer's beer<n><sg><gen> 0,000000 |
||
+ | <nowiki>Aaron:Aaron<np><ant><m><sg> |
||
− | > beers' |
||
+ | Aarons:Aaron<np><ant><m><pl> |
||
− | beers' beer<n><pl><gen> 0,000000</nowiki> |
||
+ | Aarons':Aaron<np><ant><m><pl>+'s<gen> |
||
+ | Aaron's:Aaron<np><ant><m><sg>+'s<gen> |
||
+ | Amsterdam:Amsterdam<np><top><sg> |
||
+ | Amsterdams:Amsterdam<np><top><pl> |
||
+ | Amsterdam's:Amsterdam<np><top><sg>+'s<gen> |
||
+ | Amsterdams':Amsterdam<np><top><pl>+'s<gen> |
||
+ | </nowiki> |
||
== Adjectives (google pos: <span style="font-variant: small-caps">Adj</span>) == |
== Adjectives (google pos: <span style="font-variant: small-caps">Adj</span>) == |
||
+ | Adjectives mostly don't do anything, like ''expensive'', but some have three forms, like: |
||
− | <nowiki>> small |
||
+ | ''small'', ''smaller'', ''smallest''. The tags used for classifying are: |
||
− | small small<adj><sint> 0,000000 |
||
+ | * adj: for non-inflecting ones, like ''expensive'' |
||
− | > smaller |
||
+ | * sint: for those with three forms, like ''small'' |
||
− | smaller small<adj><sint><comp> 0,000000 |
||
+ | * pst: positive isn't tagged? (not pos, that's for possessives) |
||
+ | * ord: ordinals as adjectives? |
||
+ | the suffixes are marked with: |
||
− | > smallest |
||
− | smallest small<adj><sint><sup> 0,000000 |
||
+ | * comp. for comparative, like in ''smaller'' |
||
− | > hairy |
||
+ | * sup for superlative, like in ''smallest'' |
||
− | hairy hairy<adj> 0,000000 |
||
+ | Adjectives that don't normally take comparative should allow the use of comparative marked with a 'sub' marker, e.g. "expensiver", "expensivest". |
||
− | > hairier |
||
− | hairier hairier+? inf |
||
+ | === Adjective examples === |
||
− | > hairiest |
||
− | hairiest hairiest+? inf</nowiki> |
||
+ | Like so: |
||
− | == Adverbs (google pos: <span style="font-variant: small-caps">Adv</span> == |
||
− | <nowiki> |
+ | <nowiki>small:small<adj><sint> |
+ | smaller:small<adj><sint><comp> |
||
− | smoothly smoothly<adv> 0,000000 |
||
+ | smallest:small<adj><sint><sup> |
||
+ | expensive:expensive<adj> |
||
+ | expensiver:>:expensive<adj><comp> |
||
+ | expensivest:>:expensive<adj><sup> |
||
+ | </nowiki> |
||
+ | == Adverbs (google pos: <span style="font-variant: small-caps">Adv</span>) == |
||
− | > aboard |
||
− | aboard aboard<adv> 0,000000 |
||
+ | Adverbs don't inflect. There are couple of tags to classify them: |
||
− | > drunk |
||
− | drunk drunk<adj> 0,000000 |
||
− | drunk drunk<adv> 0,000000 |
||
− | drunk drunk<n><sg><nom> 0,000000</nowiki> |
||
+ | * adv: for adverbs |
||
− | Like why are these three in anyy imaginable way in same class? |
||
+ | * itg: for interrogatives |
||
+ | |||
+ | === Adverb examples === |
||
+ | |||
+ | <nowiki>abaxially:abaxially<adv> |
||
+ | ably:ably<adv> |
||
+ | abnormally:abnormally<adv> |
||
+ | abominably:abominably<adv> |
||
+ | abortively:abortively<adv> |
||
+ | abruptly:abruptly<adv> |
||
+ | absently:absently<adv> |
||
+ | absentmindedly:absentmindedly<adv> |
||
+ | absolutely:absolutely<adv> |
||
+ | abstemiously:abstemiously<adv> |
||
+ | ... |
||
+ | aboard:aboard<adv> |
||
+ | drunk:drunk<adv> |
||
+ | no:no<adv><neg> |
||
+ | where:where<adv><itg> |
||
+ | when:when<adv><itg> |
||
+ | why:why<adv><itg> |
||
+ | </nowiki> |
||
+ | |||
+ | |||
+ | == Adadverbs (google pos: ?) == |
||
+ | |||
+ | Ad-adverbs don't inflect. They are tagged: |
||
+ | |||
+ | * preadv: for preadverbs |
||
+ | |||
+ | === Adadverb examples === |
||
+ | |||
+ | <nowiki>as:as<preadv> |
||
+ | more:more<preadv> |
||
+ | most:most<preadv> |
||
+ | so:so<preadv> |
||
+ | very:very<preadv> |
||
+ | </nowiki> |
||
== Pronouns (google pos: <span style="font-variant: small-caps">Pron</span>) == |
== Pronouns (google pos: <span style="font-variant: small-caps">Pron</span>) == |
||
+ | Pronouns are categorised with: |
||
− | <nowiki>> I |
||
− | I I<prn><pers><p1><mf><sg><nom> 0,000000 |
||
+ | * prn: for pronouns |
||
− | > me |
||
+ | * pers: for personal (''I'', ''you''...) |
||
− | me I<prn><pers><p1><mf><sg><acc> 0,000000 |
||
+ | * dem: for demonstrative (''this'', ''that''...) |
||
+ | * ref: for reflexives (''...self'') |
||
+ | Some pronouns inflect like nouns, some have more cases like: |
||
− | > my |
||
− | my I<prn><pers><p1><mf><sg><gen> 0,000000 |
||
+ | * acc: for object form (''him'') |
||
− | > mine |
||
+ | * p1, p2, p3: for persons (''I'', ''you'', ''he'',...) |
||
− | mine I<prn><pers><p1><mf><sg><acc> 0,000000 |
||
+ | === Pronoun examples === |
||
− | > those |
||
− | those those<prn><dem><pl><acc> 0,000000 |
||
+ | All of them (closed class) |
||
− | > something |
||
− | something something<prn><sg><nom> 0,000000 |
||
+ | <nowiki>all:all<prn><sg> |
||
− | > both |
||
+ | any:any<prn><sg> |
||
− | both both<prn><ind><mf><pl><nom> 0,000000</nowiki> |
||
+ | anybody:anybody<prn><sg> |
||
+ | anyone:anyone<prn><sg> |
||
+ | anything:anything<prn><sg> |
||
+ | both:both<prn><tn><mf><pl> |
||
+ | both:both<prn><ind><mf><pl> |
||
+ | each:each<prn><sg> |
||
+ | everybody:everybody<prn><sg> |
||
+ | everyone:everyone<prn><sg> |
||
+ | everything:everything<prn><sg> |
||
+ | few:few<prn><tn><mf><pl> |
||
+ | few:few<prn><ind><mf><pl> |
||
+ | he:he<prn><pers><p3><m><sg> |
||
+ | his:he<prn><pers><p3><m><sg><pos> |
||
+ | his:he<prn><pers><p3><m><sg><gen> |
||
+ | him:he<prn><pers><p3><m><sg><acc> |
||
+ | he's:he<prn><pers><p3><m><sg>+'s<vbhaver><pri> |
||
+ | he'd:he<prn><pers><p3><m><sg>+'s<vbhaver><past> |
||
+ | he'd:he<prn><pers><p3><m><sg>+'d<vbhaver><pp> |
||
+ | he'd:he<prn><pers><p3><m><sg>+'d<vaux><pri> |
||
+ | he's:he<prn><pers><p3><m><sg>+'d<vbser><pri> |
||
+ | he'll:he<prn><pers><p3><m><sg>+'ll<vaux><pri> |
||
+ | herself:herself<prn><ref><p3><f><sg> |
||
+ | himself:himself<prn><ref><p3><m><sg> |
||
+ | hisself:himself<prn><ref><p3><m><sg> |
||
+ | I:I<prn><pers><p1><mf><sg> |
||
+ | me:I<prn><pers><p1><mf><sg><acc> |
||
+ | my:I<prn><pers><p1><mf><sg>s<gen> |
||
+ | mine:I<prn><pers><p1><mf><sg><pos> |
||
+ | I've:I<prn><pers><p1><mf><sg>+'ve<vbhaver><pri> |
||
+ | I'd:I<prn><pers><p1><mf><sg>+'d<vbhaver><past> |
||
+ | I'd:I<prn><pers><p1><mf><sg>+'d<vbhaver><pp> |
||
+ | I'd:I<prn><pers><p1><mf><sg>+'d<vaux><pri> |
||
+ | I'm:I<prn><pers><p1><mf><sg>+'m<vbser><pri> |
||
+ | I'll:I<prn><pers><p1><mf><sg>+'ll<vaux><pri> |
||
+ | it:it<prn><dem><mf><sg> |
||
+ | its:it<prn><dem><mf><sg><pos> |
||
+ | itself:itself<prn><ref><p3><nt><sg> |
||
+ | many:many<prn><sg> |
||
+ | myself:myself<prn><ref><p1><mf><sg> |
||
+ | one:one<prn><sg> |
||
+ | oneself:oneself<prn><ref><p1><mf><sg> |
||
+ | oneself:oneself<prn><ref><p3><mf><sg> |
||
+ | one's self:oneself<prn><ref><p1><mf><sg> |
||
+ | one's self:oneself<prn><ref><p3><mf><sg> |
||
+ | ourself:ourselves<prn><ref><p1><mf><pl> |
||
+ | ourselves:ourselves<prn><ref><p1><mf><pl> |
||
+ | several:several<prn><sg> |
||
+ | she:she<prn><pers><p3><f><sg> |
||
+ | hers:she<prn><pers><p3><f><sg><pos> |
||
+ | her:she<prn><pers><p3><f><sg><gen> |
||
+ | her:she<prn><pers><p3><f><sg><acc> |
||
+ | she's:she<prn><pers><p3><f><sg>+'s<vbhaver><pri> |
||
+ | she'd:she<prn><pers><p3><f><sg>+'d<vbhaver><past> |
||
+ | she'd:she<prn><pers><p3><f><sg>+'d<vbhaver><pp> |
||
+ | she'd:she<prn><pers><p3><f><sg>+'d<vaux><pri> |
||
+ | she's:she<prn><pers><p3><f><sg>+'s<vbser><pri> |
||
+ | she'll:she<prn><pers><p3><f><sg>+'ll<vaux><pri> |
||
+ | some:some<prn><sg> |
||
+ | something:something<prn><sg> |
||
+ | that:that<prn><tn><mf><sg> |
||
+ | those:that<prn><tn><mf><pl> |
||
+ | theirselves:themselves<prn><ref><p3><mf><pl> |
||
+ | themself:themself<prn><ref><p3><mf><sg> |
||
+ | themselves:themselves<prn><ref><p3><mf><sg> |
||
+ | themselves:themselves<prn><ref><p3><mf><pl> |
||
+ | that:that<prn><tn><mf><sg> |
||
+ | those:that<prn><tn><mf><pl> |
||
+ | they:they<prn><pers><p3><mf><pl> |
||
+ | their:they<prn><pers><p3><mf><pl><gen> |
||
+ | theirs:they<prn><pers><p3><mf><pl><pos> |
||
+ | them:they<prn><pers><p3><mf><pl><acc> |
||
+ | they've:they<prn><pers><p1><mf><pl>+'ve<vbhaver><pri> |
||
+ | they'd:they<prn><pers><p1><mf><pl>+'d<vbhaver><past> |
||
+ | they'd:they<prn><pers><p1><mf><pl>+'d<vbhaver><pp> |
||
+ | they'd:they<prn><pers><p1><mf><pl>+'d<vaux><pri> |
||
+ | they're:they<prn><pers><p1><mf><pl>+'re<vbser><pri> |
||
+ | they'll:they<prn><pers><p1><mf><pl>+'ll<vaux><pri> |
||
+ | they:they<prn><pers><p3><mf><sg> |
||
+ | their:they<prn><pers><p3><mf><sg><gen> |
||
+ | theirs:they<prn><pers><p3><mf><sg><pos> |
||
+ | them:they<prn><pers><p3><mf><sg><acc> |
||
+ | this:this<prn><tn><mf><sg> |
||
+ | this:this<prn><dem><mf><sg> |
||
+ | these:this<prn><tn><mf><pl> |
||
+ | thyself:thyself<prn><ref><p2><mf><sg> |
||
+ | we:we<prn><pers><p1><mf><pl> |
||
+ | us:we<prn><pers><p1><mf><pl><acc> |
||
+ | our:we<prn><pers><p1><mf><pl><gen> |
||
+ | ours:we<prn><pers><p1><mf><pl><pos> |
||
+ | we've:we<prn><pers><p1><mf><pl>+'ve<vbhaver><pri> |
||
+ | we'd:we<prn><pers><p1><mf><pl>+'d<vbhaver><past> |
||
+ | we'd:we<prn><pers><p1><mf><pl>+'d<vbhaver><pp> |
||
+ | we'd:we<prn><pers><p1><mf><pl>+'d<vaux><pri> |
||
+ | we're:we<prn><pers><p1><mf><pl>+'re<vbser><pri> |
||
+ | we'll:we<prn><pers><p1><mf><pl>+'ll<vaux><pri> |
||
+ | which:which<prn><itg><m><sp> |
||
+ | who:who<prn><itg><mf><sp> |
||
+ | whose:who<prn><pos> |
||
+ | whom:who<prn><itg><acc> |
||
+ | you:you<prn><pers><p2><mf><sp> |
||
+ | yours:you<prn><pers><p2><mf><sp><pos> |
||
+ | your:you<prn><pers><p2><mf><sp><gen> |
||
+ | you:you<prn><pers><p2><mf><sp><acc> |
||
+ | you've:you<prn><pers><p2><mf><sp>+'ve<vbhaver><pri> |
||
+ | you'd:you<prn><pers><p2><mf><sp>+'ve<vbhaver><past> |
||
+ | you'd:you<prn><pers><p2><mf><sp>+'ve<vbhaver><pp> |
||
+ | you'd:you<prn><pers><p2><mf><sp>+'ve<vaux><pri> |
||
+ | you're:you<prn><pers><p2><mf><sp>+'re<vbser><pri> |
||
+ | you'll:you<prn><pers><p2><mf><sp>+'ll<vaux><pri> |
||
+ | yourself:yourself<prn><ref><p2><mf><sg> |
||
+ | </nowiki> |
||
+ | == Relatives (Google pos: ?) == |
||
− | There's lots of stuff -_- |
||
+ | |||
+ | Relatives are those words (normally pronouns or adverbs) which can introduce a relative clause. E.g. <q>a beer ''that'' I drank</q>, <q>a boy ''who'' cried wolf</q> |
||
+ | |||
+ | === Relative examples === |
||
+ | |||
+ | <nowiki>that:that<rel><an><mf><sp> |
||
+ | which:which<rel><an><mf><sp> |
||
+ | what:what<rel><nn><mf><sg> |
||
+ | when:when<rel><adv> |
||
+ | where:where<rel><adv> |
||
+ | where:where<rel><adv> |
||
+ | who:who<rel><an><mf><sp> |
||
+ | whom:whom<rel><an><mf><sp> |
||
+ | whose:whose<rel><aa><mf><sg> |
||
+ | where:where<rel><adv> |
||
+ | when:when<rel><adv> |
||
+ | why:why<rel><adv> |
||
+ | </nowiki> |
||
== Determiners (<span style="font-variant: small-caps">Det</span>) == |
== Determiners (<span style="font-variant: small-caps">Det</span>) == |
||
+ | Determiners mostly don't inflect ('this' and 'that' inflect for number). They're classified with: |
||
+ | * det: determiners |
||
− | <nowiki>> a |
||
+ | * ind: indefinite |
||
− | a a<det><ind> 0,000000 |
||
+ | * def: definite |
||
+ | * dem: demonstrative |
||
+ | * itg: interrogative |
||
+ | * qnt: quantifier |
||
+ | === Determiner examples === |
||
− | > the |
||
− | the the<det><def> 0,000000</nowiki> |
||
+ | <nowiki>a:>:a<det><ind><sg> |
||
− | Maybe few more? Definition? |
||
+ | an:>:a<det><ind><sg> |
||
+ | ~a:<:a<det><ind><sg> |
||
+ | all:all<det><ind><sp> |
||
+ | any:any<det><ind><sp> |
||
+ | another:another<det><ind><sp> |
||
+ | both:both<det><qnt> |
||
+ | each:each<det><ind><sp> |
||
+ | her:her<det><pos><sp> |
||
+ | his:his<det><pos><sp> |
||
+ | its:its<det><pos><sp> |
||
+ | many:many<det><qnt> |
||
+ | more:more<det><qnt> |
||
+ | most:most<det><qnt> |
||
+ | my:my<det><pos><sp> |
||
+ | no:no<det><ind><neg> |
||
+ | other:other<det><ind><sp> |
||
+ | our:our<det><pos><sp> |
||
+ | several:several<det><dem> |
||
+ | some:some<det><dem> |
||
+ | that:that<det><dem><sg> |
||
+ | those:that<det><dem><pl> |
||
+ | their:their<det><pos><sp> |
||
+ | the:the<det><def><sp> |
||
+ | this:this<det><dem><sg> |
||
+ | these:this<det><dem><pl> |
||
+ | which:which<det><itg><sp> |
||
+ | your:your<det><pos><sp> |
||
+ | all:all<det> |
||
+ | only:only<det> |
||
+ | first:first<det><ord><sp> |
||
+ | second:second<det><ord><sp> |
||
+ | third:third<det><ord><sp> |
||
+ | fourth:fourth<det><ord><sp> |
||
+ | fifth:fifth<det><ord><sp> |
||
+ | sixth:sixth<det><ord><sp> |
||
+ | seventh:seventh<det><ord><sp> |
||
+ | eighth:eighth<det><ord><sp> |
||
+ | ninth:ninth<det><ord><sp> |
||
+ | tenth:tenth<det><ord><sp> |
||
+ | eleventh:eleventh<det><ord><sp> |
||
+ | twelfth:twelfth<det><ord><sp> |
||
+ | thirteenth:thirteenth<det><ord><sp> |
||
+ | fourteenth:fourteenth<det><ord><sp> |
||
+ | fifteenth:fifteenth<det><ord><sp> |
||
+ | sixteenth:sixteenth<det><ord><sp> |
||
+ | seventeenth:seventeenth<det><ord><sp> |
||
+ | eighteenth:eighteenth<det><ord><sp> |
||
+ | nineteenth:nineteenth<det><ord><sp> |
||
+ | twentieth:twentieth<det><ord><sp> |
||
+ | thirtieth:thirtieth<det><ord><sp> |
||
+ | fourtieth:fourtieth<det><ord><sp> |
||
+ | fiftieth:fiftieth<det><ord><sp> |
||
+ | sixtieth:sixtieth<det><ord><sp> |
||
+ | seventieth:seventieth<det><ord><sp> |
||
+ | eightieth:eightieth<det><ord><sp> |
||
+ | ninetieth:ninetieth<det><ord><sp> |
||
+ | hundreth:hundreth<det><ord><sp> |
||
+ | thousanth:thousanth<det><ord><sp> |
||
+ | millionth:millionth<det><ord><sp> |
||
+ | milliarth:milliarth<det><ord><sp> |
||
+ | billionth:billionth<det><ord><sp> |
||
+ | billiarth:billiarth<det><ord><sp> |
||
+ | trillionth:trillionth<det><ord><sp> |
||
+ | trilliarth:trilliarth<det><ord><sp> |
||
+ | </nowiki> |
||
+ | |||
+ | == Predeterminers == |
||
+ | |||
+ | Predeterminers don't inflect. They are classified with: |
||
+ | |||
+ | * predet |
||
+ | |||
+ | ===Predeterminer examples=== |
||
+ | |||
+ | <pre> |
||
+ | all:all<predet><sp> |
||
+ | # all the kings men |
||
+ | only:only<predet><sp> |
||
+ | # Only the strong survive |
||
+ | </pre> |
||
== Prepositions (<span style="font-variant: small-caps">Adp</span>) == |
== Prepositions (<span style="font-variant: small-caps">Adp</span>) == |
||
+ | Prepositions don't inflect. They are classified with: |
||
− | <nowiki>> in |
||
+ | |||
− | in in<pr> 0,000000</nowiki> |
||
+ | * pr: preposition |
||
+ | |||
+ | Multiword prepositions should be checked for compositional (non-multiword possibilities), these should be encoded in the lexicon. |
||
+ | |||
+ | === Preposition examples === |
||
+ | |||
+ | <nowiki>above:above<pr> |
||
+ | according to:according to<pr> |
||
+ | across:across<pr> |
||
+ | after:after<pr> |
||
+ | against:against<pr> |
||
+ | along:along<pr> |
||
+ | alongside:alongside<pr> |
||
+ | along with:along with<pr> |
||
+ | # (consider "But a European Parliament without a Sámi seat is an incomplete Parliament and I appeal to colleagues to eliminate this deficit along with all the others this report seeks to address." vs. "He walked along with a song in his heart.") |
||
+ | amid:amid<pr> |
||
+ | among:among<pr> |
||
+ | amongst:amongst<pr> |
||
+ | around:around<pr> |
||
+ | as:as<pr> |
||
+ | as of:as of<pr> |
||
+ | at:at<pr> |
||
+ | atop:atop<pr> |
||
+ | because of:because of<pr> |
||
+ | before:before<pr> |
||
+ | behind:behind<pr> |
||
+ | below:below<pr> |
||
+ | between:between<pr> |
||
+ | but:but<pr> |
||
+ | by:by<pr> |
||
+ | by means of:by means of<pr> |
||
+ | despite:despite<pr> |
||
+ | due to:due to<pr> |
||
+ | during:during<pr> |
||
+ | except for:except for<pr> |
||
+ | except:except<pr> |
||
+ | for:for<pr> |
||
+ | from:from<pr> |
||
+ | in contrast to:in contrast to<pr> |
||
+ | in front of:in front of<pr> |
||
+ | in:in<pr> |
||
+ | in order to:in order to<pr> |
||
+ | inside:inside<pr> |
||
+ | into:into<pr> |
||
+ | near:near<pr> |
||
+ | off:off<pr> |
||
+ | of:of<pr> |
||
+ | on:on<pr> |
||
+ | onto:onto<pr> |
||
+ | out:out<pr> |
||
+ | out of:out of<pr> |
||
+ | outside:outside<pr> |
||
+ | over:over<pr> |
||
+ | per:per<pr> |
||
+ | prior to:prior to<pr> |
||
+ | since:since<pr> |
||
+ | through:through<pr> |
||
+ | throughout:throughout<pr> |
||
+ | to:to<pr> |
||
+ | towards:towards<pr> |
||
+ | under:under<pr> |
||
+ | until:until<pr> |
||
+ | up:up<pr> |
||
+ | upon:upon<pr> |
||
+ | up to:up to<pr> |
||
+ | via:via<pr> |
||
+ | within:within<pr> |
||
+ | with:with<pr> |
||
+ | without:without<pr></nowiki> |
||
== Numerals (<span style="font-variant: small-caps">Num</span>) == |
== Numerals (<span style="font-variant: small-caps">Num</span>) == |
||
+ | Numerals have genitive possessive inflections. They are classified as: |
||
− | <nowiki>> one |
||
− | one one<num> 0,000000 |
||
+ | * num: numerals (''one'', ''two'') |
||
− | > first |
||
+ | * ord: ordinals (''first'', ...) |
||
− | first first<num> 0,000000 |
||
+ | ===Numeral examples=== |
||
− | > 1 |
||
− | 1 1<num> 0,000000 |
||
+ | <nowiki>one:one<num><sg> |
||
− | > 1. |
||
− | + | one's:one<num><sg><gen> |
|
+ | two:two<num><pl> |
||
+ | two's:two<num><pl><gen> |
||
+ | three:three<num><pl> |
||
+ | three's:three<num><pl><gen> |
||
+ | first:first<num><pl> |
||
+ | first's:first<num><pl><gen> |
||
+ | second:second<num><pl> |
||
+ | second's:second<num><pl><gen> |
||
+ | third:third<num><pl> |
||
+ | third's:third<num><pl><gen> |
||
+ | </nowiki> |
||
− | == Conjunctions (<span style="font-variant: small-caps">Conj</span>== |
+ | == Conjunctions (<span style="font-variant: small-caps">Conj</span>) == |
+ | Conjunctions don't inflect. They are classified as: |
||
− | <nowiki>> and |
||
− | and and<cnjcoo> 0,000000 |
||
+ | * cnjcoo: coordinating (''and'', ''or'') |
||
− | > unless |
||
+ | * cnjsub: subordinating (''that'') |
||
− | unless unless<cnjsub> 0,000000</nowiki> |
||
+ | * cnjadv: adverbial (''after'') |
||
+ | === Conjunction examples === |
||
− | No cnjadv? |
||
+ | <nowiki>albeit:albeit<cnjadv> |
||
− | == Interjections (<span style="font-variant: small-caps">Prt</span>== |
||
+ | albeit:albeit<cnjsub> |
||
+ | although:although<cnjadv> |
||
+ | and:and<cnjcoo> |
||
+ | an if:an if<cnjadv> |
||
+ | because:because<cnjadv> |
||
+ | because:because<cnjsub> |
||
+ | both:both<cnjcoo> |
||
+ | but:but<cnjcoo> |
||
+ | either:either<cnjadv> |
||
+ | however:however<cnjadv> |
||
+ | if:if<cnjadv> |
||
+ | if:if<cnjsub> |
||
+ | lest:lest<cnjadv> |
||
+ | neither:neither<cnjcoo> |
||
+ | nor:nor<cnjcoo> |
||
+ | or:or<cnjcoo> |
||
+ | since:since<cnjadv> |
||
+ | than:than<cnjadv> |
||
+ | than:than<cnjsub> |
||
+ | that:that<cnjsub> |
||
+ | then:then<cnjadv> |
||
+ | though:though<cnjadv> |
||
+ | til:til<cnjadv> |
||
+ | till:till<cnjadv> |
||
+ | unless:unless<cnjadv> |
||
+ | until:until<cnjadv> |
||
+ | unto:unto<cnjadv> |
||
+ | what:what<cnjsub> |
||
+ | whenas:whenas<cnjadv> |
||
+ | whence:whence<cnjadv> |
||
+ | when:when<cnjadv> |
||
+ | wherealong:wherealong<cnjadv> |
||
+ | whereas:whereas<cnjadv> |
||
+ | whereat:whereat<cnjadv> |
||
+ | wherefore:wherefore<cnjadv> |
||
+ | whereinbefore:whereinbefore<cnjadv> |
||
+ | wherein:wherein<cnjadv> |
||
+ | whereof:whereof<cnjadv> |
||
+ | whereout:whereout<cnjadv> |
||
+ | whereover:whereover<cnjadv> |
||
+ | wheresoever:wheresoever<cnjadv> |
||
+ | whether:whether<cnjadv> |
||
+ | which:which<cnjsub> |
||
+ | while:while<cnjadv> |
||
+ | whilst:whilst<cnjadv></nowiki> |
||
− | <nowiki>> crappy |
||
− | crappy crappy<ij> 0,000000 |
||
+ | == Interjections (<span style="font-variant: small-caps">Prt</span>)== |
||
− | > hi |
||
− | hi hi<ij> 0,000000</nowiki> |
||
+ | Interjections don't inflect. They're classified as: |
||
− | == Punctuations (Google pos: .)== |
||
+ | * ij: interjections |
||
− | <nowiki>> . |
||
− | . .<sent> 0,000000 |
||
+ | Most arbitrary letter combinations that appear in text, prose, or chat messages could be interjections. We're limiting the selection to widely attested ones that may actually be sensibly translated, esp. greetings, curses or such minimal responses. |
||
− | > " |
||
− | " "<lquot> 0,000000 |
||
− | " "<rquot> 0,000000 |
||
− | " "<sent> 0,000000 |
||
+ | === Interjection examples === |
||
− | > ) |
||
− | ) )<rpar> 0,000000 |
||
+ | <nowiki> |
||
− | > ( |
||
+ | argh:argh<ij> |
||
− | ( (<lpar> 0,000000 |
||
+ | fuck:fuck<ij> |
||
+ | hello:hello<ij> |
||
+ | hey:hey<ij> |
||
+ | aah:aah<ij> |
||
+ | aargh:aargh<ij> |
||
+ | agh:agh<ij> |
||
+ | ah:ah<ij> |
||
+ | aha:aha<ij> |
||
+ | ahem:ahem<ij> |
||
+ | ahh:ahh<ij> |
||
+ | aw:aw<ij> |
||
+ | aww:aww<ij> |
||
+ | aye:aye<ij> |
||
+ | bah:bah<ij> |
||
+ | boo:boo<ij> |
||
+ | brr:brr<ij> |
||
+ | bye:bye<ij> |
||
+ | crap:crap<ij> |
||
+ | crud:crud<ij> |
||
+ | damn:damn<ij> |
||
+ | darn:darn<ij> |
||
+ | d'oh:d'oh<ij> |
||
+ | doh:doh<ij> |
||
+ | eh:eh<ij> |
||
+ | goddamn:goddamn<ij> |
||
+ | grr:grr<ij> |
||
+ | ha:ha<ij> |
||
+ | hah:hah<ij> |
||
+ | haha:haha<ij> |
||
+ | heh:heh<ij> |
||
+ | hehe:hehe<ij> |
||
+ | hi:hi<ij> |
||
+ | hm:hm<ij> |
||
+ | hmm:hmm<ij> |
||
+ | hmph:hmph<ij> |
||
+ | hrm:hrm<ij> |
||
+ | huh:huh<ij> |
||
+ | like:like<ij> |
||
+ | lol:lol<ij> |
||
+ | omg:omg<ij> |
||
+ | ok:ok<ij> |
||
+ | ooh:ooh<ij> |
||
+ | oops:oops<ij> |
||
+ | ouch:ouch<ij> |
||
+ | oww:oww<ij> |
||
+ | phew:phew<ij> |
||
+ | shh:shh<ij> |
||
+ | shit:shit<ij> |
||
+ | sorry:sorry<ij> |
||
+ | thanks:thanks<ij> |
||
+ | ugh:ugh<ij> |
||
+ | uh:uh<ij> |
||
+ | uh-huh:uh-huh<ij> |
||
+ | umm:umm<ij> |
||
+ | welcome:welcome<ij> |
||
+ | well:well<ij> |
||
+ | what:what<ij> |
||
+ | whew:whew<ij> |
||
+ | whoa:whoa<ij> |
||
+ | woohoo:woohoo<ij> |
||
+ | yay:yay<ij> |
||
+ | </nowiki> |
||
+ | |||
+ | == Punctuations (Google pos: .)== |
||
+ | These are more or less same everywhere apart from directionality and some orthographic variation. |
||
− | > , |
||
− | , ,<cm> 0,000000 |
||
+ | <nowiki>':'<apos> |
||
− | > - |
||
+ | ,:,<cm> |
||
− | - -<guio> 0,000000</nowiki> |
||
+ | -:-<guio> |
||
+ | --:–<guio> |
||
+ | –:-<guio> |
||
+ | —:—<guio> |
||
+ | (:(<lpar> |
||
+ | [:[<lpar> |
||
+ | ":"<lquot> |
||
+ | “:“<lquot> |
||
+ | «:«<lquot> |
||
+ | »:«<lquot> |
||
+ | ):)<rpar> |
||
+ | ]:]<rpar> |
||
+ | ":"<rquot> |
||
+ | ”:”<rquot> |
||
+ | »:»<rquot> |
||
+ | (:(<lpar> |
||
+ | ):)<rpar> |
||
+ | _:_<sent> |
||
+ | :::<sent> |
||
+ | ;:;<sent> |
||
+ | !:!<sent> |
||
+ | ?:?<sent> |
||
+ | .:.<sent> |
||
+ | #:#<sent> |
||
+ | %:%<sent></nowiki> |
Latest revision as of 10:27, 15 August 2014
Contents
- 1 RFC for English tags
- 1.1 Verbs (google pos: Verb)
- 1.2 Nouns (google pos: Noun)
- 1.3 Adjectives (google pos: Adj)
- 1.4 Adverbs (google pos: Adv)
- 1.5 Adadverbs (google pos: ?)
- 1.6 Pronouns (google pos: Pron)
- 1.7 Relatives (Google pos: ?)
- 1.8 Determiners (Det)
- 1.9 Predeterminers
- 1.10 Prepositions (Adp)
- 1.11 Numerals (Num)
- 1.12 Conjunctions (Conj)
- 1.13 Interjections (Prt)
- 1.14 Punctuations (Google pos: .)
RFC for English tags[edit]
This is from apertium-fin-eng.eng.dix though I hope to get it to be like langs/apertium-eng some day...
English words are split in following poses:
- verbs (can be recognised by morphology: 3rd sg present, infinitive, past, ...)
- nouns (can usually be recognised by morphology: singular, plural, genitive)
- adjectives (can almost be recognised by morphology or syntax by comparatives and superlatives)
- adverbs (all sorts of stuff, most new ones end in -ly derivation, others you will know by the trail of blood)
The rest of the classes are kind of closed and no new words should be classified there really:
- determiners: are before NPs and stuff
- predeterminers: are before dets
- pronouns: are semantically bound to some other words
- prepositions: come before noun or after verb and stuff
- adadverbs: are before adverbs
- numerals: are number words from one to infinity
- conjunctions: are things that join clauses
- relatives: are things for relative clauses
- interjections: are particles used in spoken language mostly
- symbols: are not letters (cf. Unicode classes)
The tags should match to List of symbols, but are explained here. The tagging should be interoperable (usually reducible) with commonly known "gold" standards, penn treebanks and such.
Verbs (google pos: Verb)[edit]
Regular English verbs inflect in these forms: accept, accepts, accepted, accepting. Some irregular verbs have like five: forget, forgets, forgot, forgotten, forgetting. The verb to behas bunch of forms: 'be, am, are, is, was, were, been, being.
The tags we are using to classify English verbs are:
- vblex: for regular verbs, like accept
- vaux: auxiliary verbs; that have verb complement, like can
- vbser: verb be
- vbdo: verb do
- vbhaver: verb have
The morphs coming after (or lack of them) are classified with:
- inf: infinitive (as in: to do, to walk)
- pri: present indicative (as in: I do, he walks)
- prs: present subjunctive (as in: Let there be light ; At other times it is important that we be quiet.)
- past: common past (as in: I did, he _walked_)
- pis: imperfect subjunctive (as in: If I were you, ...)
- pp: past participle (as I've done, he has walked)
- imp: imperative (as in: be quiet!)
- pprs: present participle
- ger: gerund
- subs: substantive
and potentially
- +not.adv.neg: (as in can't, didn't)
In future likely:
- transitivity
Verb examples[edit]
The tag sequences are as follows:
Regular verbs:
walk:walk<vblex><inf> walk:walk<vblex><pri> walk:walk<vblex><prs> walk:walk<vblex><imp> walks:walk<vblex><pri><p3><sg> walked:walk<vblex><pis> walked:walk<vblex><past> walked:walk<vblex><pp> walking:walk<vblex><subs> walking:walk<vblex><pprs> walking:walk<vblex><ger>
Irregulars:
forget:forget<vblex><inf> forget:forget<vblex><pri> forgets:forget<vblex><pri><p3><sg> forgot:forget<vblex><past> forgotten:forget<vblex><pp> forgetting:forget<vblex><ger>
Auxiliaries (closed class, all examples):
he'd:he<prn><pers><p3><m><sg>+'d<vaux><pri> he'll:he<prn><pers><p3><m><sg>+'ll<vaux><pri> I'd:I<prn><pers><p1><mf><sg>+'d<vaux><pri> I'll:I<prn><pers><p1><mf><sg>+'ll<vaux><pri> she'd:she<prn><pers><p3><f><sg>+'d<vaux><pri> she'll:she<prn><pers><p3><f><sg>+'ll<vaux><pri> they'd:they<prn><pers><p1><mf><pl>+'d<vaux><pri> they'll:they<prn><pers><p1><mf><pl>+'ll<vaux><pri> we'd:we<prn><pers><p1><mf><pl>+'d<vaux><pri> we'll:we<prn><pers><p1><mf><pl>+'ll<vaux><pri> you'd:you<prn><pers><p2><mf><sp>+'ve<vaux><pri> you'll:you<prn><pers><p2><mf><sp>+'ll<vaux><pri> can:can<vaux><pri> can't:can<vaux><pri>+n't<adv><neg> cannot:can<vaux><pri>+not<adv><neg> could:could<vaux><pri> couldn't:could<vaux><pri>+n't<adv><neg> may:may<vaux><inf> may:may<vaux><pri> may:may<vaux><past> might:might<vaux><inf> might:might<vaux><pri> might:might<vaux><past> must:must<vaux><inf> must:must<vaux><pri> must:must<vaux><past> ought:ought<vaux><inf> ought:ought<vaux><pri> ought:ought<vaux><past> shall:shall<vaux><pri> shan't:shall<vaux><pri>+n't<adv><neg> should:should<vaux><pri> shouldn't:should<vaux><pri>+n't<adv><neg> will:will<vaux><pri> would:will<vaux><past> won't:will<vaux><pri>+n't<adv><neg> wouldn't:will<vaux><past>+n't<adv><neg> would:would<vaux><pri> wouldn't:would<vaux><pri>+n't<adv><neg>
Verb be (all forms, except for contractions):
be:be<vbser><inf> are:be<vbser><pri> am:be<vbser><pri><p1><sg> is:be<vbser><pri><p3><sg> was:be<vbser><past><p1><sg> was:be<vbser><past><p3><sg> were:be<vbser><past> been:be<vbser><pp> being:be<vbser><ger>
Verb have (all forms, except for contractions):
have:have<vbhaver><inf> have:have<vbhaver><pri> has:have<vbhaver><pri><p3><sg> had:have<vbhaver><past> had:have<vbhaver><pp> having:have<vbhaver><ger>
Verb do (all forms):
do:do<vbdo><inf> do:do<vbdo><imp> do:do<vbdo><pri> does:do<vbdo><pri><p3><sg> did:do<vbdo><past> did:do<vbdo><pis> doing:do<vbdo><subs> doing:do<vbdo><pprs> doing:do<vbdo><ger> done:do<vbdo><pp>
Nouns (google pos: Noun)[edit]
Nouns have commonly two forms and possessives along them: beer, beers, beer's, beers'. Some don't: ?
The tags used to classify nouns are:
- n: regular noun, like beer
- np: proper noun, like Jack
- m: male
- f: female
- mf: both female and male
- nt: neuter female nor male
- top: place
- ant: human
And also:
- cnt: countable, like chair
- unc: uncountable, like cheese
the suffixes are:
- sg: singular as in beer
- pl: plural as in beers
- +'s.gen: genitive or possessive or somehting, as in beer's
- attr: for noun-noun modification: "the car
<n><attr>
industry is..."- Note: maybe this could be
<cmp>
?
- Note: maybe this could be
Noun examples[edit]
Regular nouns go like:
beer:beer<n><attr> beer:beer<n><sg> beers:beer<n><pl> beer's:beer<n><sg>+'s<gen> beers':beer<n><pl>+'s<gen>
Proper nouns:
Aaron:Aaron<np><ant><m><sg> Aarons:Aaron<np><ant><m><pl> Aarons':Aaron<np><ant><m><pl>+'s<gen> Aaron's:Aaron<np><ant><m><sg>+'s<gen> Amsterdam:Amsterdam<np><top><sg> Amsterdams:Amsterdam<np><top><pl> Amsterdam's:Amsterdam<np><top><sg>+'s<gen> Amsterdams':Amsterdam<np><top><pl>+'s<gen>
Adjectives (google pos: Adj)[edit]
Adjectives mostly don't do anything, like expensive, but some have three forms, like: small, smaller, smallest. The tags used for classifying are:
- adj: for non-inflecting ones, like expensive
- sint: for those with three forms, like small
- pst: positive isn't tagged? (not pos, that's for possessives)
- ord: ordinals as adjectives?
the suffixes are marked with:
- comp. for comparative, like in smaller
- sup for superlative, like in smallest
Adjectives that don't normally take comparative should allow the use of comparative marked with a 'sub' marker, e.g. "expensiver", "expensivest".
Adjective examples[edit]
Like so:
small:small<adj><sint> smaller:small<adj><sint><comp> smallest:small<adj><sint><sup> expensive:expensive<adj> expensiver:>:expensive<adj><comp> expensivest:>:expensive<adj><sup>
Adverbs (google pos: Adv)[edit]
Adverbs don't inflect. There are couple of tags to classify them:
- adv: for adverbs
- itg: for interrogatives
Adverb examples[edit]
abaxially:abaxially<adv> ably:ably<adv> abnormally:abnormally<adv> abominably:abominably<adv> abortively:abortively<adv> abruptly:abruptly<adv> absently:absently<adv> absentmindedly:absentmindedly<adv> absolutely:absolutely<adv> abstemiously:abstemiously<adv> ... aboard:aboard<adv> drunk:drunk<adv> no:no<adv><neg> where:where<adv><itg> when:when<adv><itg> why:why<adv><itg>
Adadverbs (google pos: ?)[edit]
Ad-adverbs don't inflect. They are tagged:
- preadv: for preadverbs
Adadverb examples[edit]
as:as<preadv> more:more<preadv> most:most<preadv> so:so<preadv> very:very<preadv>
Pronouns (google pos: Pron)[edit]
Pronouns are categorised with:
- prn: for pronouns
- pers: for personal (I, you...)
- dem: for demonstrative (this, that...)
- ref: for reflexives (...self)
Some pronouns inflect like nouns, some have more cases like:
- acc: for object form (him)
- p1, p2, p3: for persons (I, you, he,...)
Pronoun examples[edit]
All of them (closed class)
all:all<prn><sg> any:any<prn><sg> anybody:anybody<prn><sg> anyone:anyone<prn><sg> anything:anything<prn><sg> both:both<prn><tn><mf><pl> both:both<prn><ind><mf><pl> each:each<prn><sg> everybody:everybody<prn><sg> everyone:everyone<prn><sg> everything:everything<prn><sg> few:few<prn><tn><mf><pl> few:few<prn><ind><mf><pl> he:he<prn><pers><p3><m><sg> his:he<prn><pers><p3><m><sg><pos> his:he<prn><pers><p3><m><sg><gen> him:he<prn><pers><p3><m><sg><acc> he's:he<prn><pers><p3><m><sg>+'s<vbhaver><pri> he'd:he<prn><pers><p3><m><sg>+'s<vbhaver><past> he'd:he<prn><pers><p3><m><sg>+'d<vbhaver><pp> he'd:he<prn><pers><p3><m><sg>+'d<vaux><pri> he's:he<prn><pers><p3><m><sg>+'d<vbser><pri> he'll:he<prn><pers><p3><m><sg>+'ll<vaux><pri> herself:herself<prn><ref><p3><f><sg> himself:himself<prn><ref><p3><m><sg> hisself:himself<prn><ref><p3><m><sg> I:I<prn><pers><p1><mf><sg> me:I<prn><pers><p1><mf><sg><acc> my:I<prn><pers><p1><mf><sg>s<gen> mine:I<prn><pers><p1><mf><sg><pos> I've:I<prn><pers><p1><mf><sg>+'ve<vbhaver><pri> I'd:I<prn><pers><p1><mf><sg>+'d<vbhaver><past> I'd:I<prn><pers><p1><mf><sg>+'d<vbhaver><pp> I'd:I<prn><pers><p1><mf><sg>+'d<vaux><pri> I'm:I<prn><pers><p1><mf><sg>+'m<vbser><pri> I'll:I<prn><pers><p1><mf><sg>+'ll<vaux><pri> it:it<prn><dem><mf><sg> its:it<prn><dem><mf><sg><pos> itself:itself<prn><ref><p3><nt><sg> many:many<prn><sg> myself:myself<prn><ref><p1><mf><sg> one:one<prn><sg> oneself:oneself<prn><ref><p1><mf><sg> oneself:oneself<prn><ref><p3><mf><sg> one's self:oneself<prn><ref><p1><mf><sg> one's self:oneself<prn><ref><p3><mf><sg> ourself:ourselves<prn><ref><p1><mf><pl> ourselves:ourselves<prn><ref><p1><mf><pl> several:several<prn><sg> she:she<prn><pers><p3><f><sg> hers:she<prn><pers><p3><f><sg><pos> her:she<prn><pers><p3><f><sg><gen> her:she<prn><pers><p3><f><sg><acc> she's:she<prn><pers><p3><f><sg>+'s<vbhaver><pri> she'd:she<prn><pers><p3><f><sg>+'d<vbhaver><past> she'd:she<prn><pers><p3><f><sg>+'d<vbhaver><pp> she'd:she<prn><pers><p3><f><sg>+'d<vaux><pri> she's:she<prn><pers><p3><f><sg>+'s<vbser><pri> she'll:she<prn><pers><p3><f><sg>+'ll<vaux><pri> some:some<prn><sg> something:something<prn><sg> that:that<prn><tn><mf><sg> those:that<prn><tn><mf><pl> theirselves:themselves<prn><ref><p3><mf><pl> themself:themself<prn><ref><p3><mf><sg> themselves:themselves<prn><ref><p3><mf><sg> themselves:themselves<prn><ref><p3><mf><pl> that:that<prn><tn><mf><sg> those:that<prn><tn><mf><pl> they:they<prn><pers><p3><mf><pl> their:they<prn><pers><p3><mf><pl><gen> theirs:they<prn><pers><p3><mf><pl><pos> them:they<prn><pers><p3><mf><pl><acc> they've:they<prn><pers><p1><mf><pl>+'ve<vbhaver><pri> they'd:they<prn><pers><p1><mf><pl>+'d<vbhaver><past> they'd:they<prn><pers><p1><mf><pl>+'d<vbhaver><pp> they'd:they<prn><pers><p1><mf><pl>+'d<vaux><pri> they're:they<prn><pers><p1><mf><pl>+'re<vbser><pri> they'll:they<prn><pers><p1><mf><pl>+'ll<vaux><pri> they:they<prn><pers><p3><mf><sg> their:they<prn><pers><p3><mf><sg><gen> theirs:they<prn><pers><p3><mf><sg><pos> them:they<prn><pers><p3><mf><sg><acc> this:this<prn><tn><mf><sg> this:this<prn><dem><mf><sg> these:this<prn><tn><mf><pl> thyself:thyself<prn><ref><p2><mf><sg> we:we<prn><pers><p1><mf><pl> us:we<prn><pers><p1><mf><pl><acc> our:we<prn><pers><p1><mf><pl><gen> ours:we<prn><pers><p1><mf><pl><pos> we've:we<prn><pers><p1><mf><pl>+'ve<vbhaver><pri> we'd:we<prn><pers><p1><mf><pl>+'d<vbhaver><past> we'd:we<prn><pers><p1><mf><pl>+'d<vbhaver><pp> we'd:we<prn><pers><p1><mf><pl>+'d<vaux><pri> we're:we<prn><pers><p1><mf><pl>+'re<vbser><pri> we'll:we<prn><pers><p1><mf><pl>+'ll<vaux><pri> which:which<prn><itg><m><sp> who:who<prn><itg><mf><sp> whose:who<prn><pos> whom:who<prn><itg><acc> you:you<prn><pers><p2><mf><sp> yours:you<prn><pers><p2><mf><sp><pos> your:you<prn><pers><p2><mf><sp><gen> you:you<prn><pers><p2><mf><sp><acc> you've:you<prn><pers><p2><mf><sp>+'ve<vbhaver><pri> you'd:you<prn><pers><p2><mf><sp>+'ve<vbhaver><past> you'd:you<prn><pers><p2><mf><sp>+'ve<vbhaver><pp> you'd:you<prn><pers><p2><mf><sp>+'ve<vaux><pri> you're:you<prn><pers><p2><mf><sp>+'re<vbser><pri> you'll:you<prn><pers><p2><mf><sp>+'ll<vaux><pri> yourself:yourself<prn><ref><p2><mf><sg>
Relatives (Google pos: ?)[edit]
Relatives are those words (normally pronouns or adverbs) which can introduce a relative clause. E.g. a beer that I drank
, a boy who cried wolf
Relative examples[edit]
that:that<rel><an><mf><sp> which:which<rel><an><mf><sp> what:what<rel><nn><mf><sg> when:when<rel><adv> where:where<rel><adv> where:where<rel><adv> who:who<rel><an><mf><sp> whom:whom<rel><an><mf><sp> whose:whose<rel><aa><mf><sg> where:where<rel><adv> when:when<rel><adv> why:why<rel><adv>
Determiners (Det)[edit]
Determiners mostly don't inflect ('this' and 'that' inflect for number). They're classified with:
- det: determiners
- ind: indefinite
- def: definite
- dem: demonstrative
- itg: interrogative
- qnt: quantifier
Determiner examples[edit]
a:>:a<det><ind><sg> an:>:a<det><ind><sg> ~a:<:a<det><ind><sg> all:all<det><ind><sp> any:any<det><ind><sp> another:another<det><ind><sp> both:both<det><qnt> each:each<det><ind><sp> her:her<det><pos><sp> his:his<det><pos><sp> its:its<det><pos><sp> many:many<det><qnt> more:more<det><qnt> most:most<det><qnt> my:my<det><pos><sp> no:no<det><ind><neg> other:other<det><ind><sp> our:our<det><pos><sp> several:several<det><dem> some:some<det><dem> that:that<det><dem><sg> those:that<det><dem><pl> their:their<det><pos><sp> the:the<det><def><sp> this:this<det><dem><sg> these:this<det><dem><pl> which:which<det><itg><sp> your:your<det><pos><sp> all:all<det> only:only<det> first:first<det><ord><sp> second:second<det><ord><sp> third:third<det><ord><sp> fourth:fourth<det><ord><sp> fifth:fifth<det><ord><sp> sixth:sixth<det><ord><sp> seventh:seventh<det><ord><sp> eighth:eighth<det><ord><sp> ninth:ninth<det><ord><sp> tenth:tenth<det><ord><sp> eleventh:eleventh<det><ord><sp> twelfth:twelfth<det><ord><sp> thirteenth:thirteenth<det><ord><sp> fourteenth:fourteenth<det><ord><sp> fifteenth:fifteenth<det><ord><sp> sixteenth:sixteenth<det><ord><sp> seventeenth:seventeenth<det><ord><sp> eighteenth:eighteenth<det><ord><sp> nineteenth:nineteenth<det><ord><sp> twentieth:twentieth<det><ord><sp> thirtieth:thirtieth<det><ord><sp> fourtieth:fourtieth<det><ord><sp> fiftieth:fiftieth<det><ord><sp> sixtieth:sixtieth<det><ord><sp> seventieth:seventieth<det><ord><sp> eightieth:eightieth<det><ord><sp> ninetieth:ninetieth<det><ord><sp> hundreth:hundreth<det><ord><sp> thousanth:thousanth<det><ord><sp> millionth:millionth<det><ord><sp> milliarth:milliarth<det><ord><sp> billionth:billionth<det><ord><sp> billiarth:billiarth<det><ord><sp> trillionth:trillionth<det><ord><sp> trilliarth:trilliarth<det><ord><sp>
Predeterminers[edit]
Predeterminers don't inflect. They are classified with:
- predet
Predeterminer examples[edit]
all:all<predet><sp> # all the kings men only:only<predet><sp> # Only the strong survive
Prepositions (Adp)[edit]
Prepositions don't inflect. They are classified with:
- pr: preposition
Multiword prepositions should be checked for compositional (non-multiword possibilities), these should be encoded in the lexicon.
Preposition examples[edit]
above:above<pr> according to:according to<pr> across:across<pr> after:after<pr> against:against<pr> along:along<pr> alongside:alongside<pr> along with:along with<pr> # (consider "But a European Parliament without a Sámi seat is an incomplete Parliament and I appeal to colleagues to eliminate this deficit along with all the others this report seeks to address." vs. "He walked along with a song in his heart.") amid:amid<pr> among:among<pr> amongst:amongst<pr> around:around<pr> as:as<pr> as of:as of<pr> at:at<pr> atop:atop<pr> because of:because of<pr> before:before<pr> behind:behind<pr> below:below<pr> between:between<pr> but:but<pr> by:by<pr> by means of:by means of<pr> despite:despite<pr> due to:due to<pr> during:during<pr> except for:except for<pr> except:except<pr> for:for<pr> from:from<pr> in contrast to:in contrast to<pr> in front of:in front of<pr> in:in<pr> in order to:in order to<pr> inside:inside<pr> into:into<pr> near:near<pr> off:off<pr> of:of<pr> on:on<pr> onto:onto<pr> out:out<pr> out of:out of<pr> outside:outside<pr> over:over<pr> per:per<pr> prior to:prior to<pr> since:since<pr> through:through<pr> throughout:throughout<pr> to:to<pr> towards:towards<pr> under:under<pr> until:until<pr> up:up<pr> upon:upon<pr> up to:up to<pr> via:via<pr> within:within<pr> with:with<pr> without:without<pr>
Numerals (Num)[edit]
Numerals have genitive possessive inflections. They are classified as:
- num: numerals (one, two)
- ord: ordinals (first, ...)
Numeral examples[edit]
one:one<num><sg> one's:one<num><sg><gen> two:two<num><pl> two's:two<num><pl><gen> three:three<num><pl> three's:three<num><pl><gen> first:first<num><pl> first's:first<num><pl><gen> second:second<num><pl> second's:second<num><pl><gen> third:third<num><pl> third's:third<num><pl><gen>
Conjunctions (Conj)[edit]
Conjunctions don't inflect. They are classified as:
- cnjcoo: coordinating (and, or)
- cnjsub: subordinating (that)
- cnjadv: adverbial (after)
Conjunction examples[edit]
albeit:albeit<cnjadv> albeit:albeit<cnjsub> although:although<cnjadv> and:and<cnjcoo> an if:an if<cnjadv> because:because<cnjadv> because:because<cnjsub> both:both<cnjcoo> but:but<cnjcoo> either:either<cnjadv> however:however<cnjadv> if:if<cnjadv> if:if<cnjsub> lest:lest<cnjadv> neither:neither<cnjcoo> nor:nor<cnjcoo> or:or<cnjcoo> since:since<cnjadv> than:than<cnjadv> than:than<cnjsub> that:that<cnjsub> then:then<cnjadv> though:though<cnjadv> til:til<cnjadv> till:till<cnjadv> unless:unless<cnjadv> until:until<cnjadv> unto:unto<cnjadv> what:what<cnjsub> whenas:whenas<cnjadv> whence:whence<cnjadv> when:when<cnjadv> wherealong:wherealong<cnjadv> whereas:whereas<cnjadv> whereat:whereat<cnjadv> wherefore:wherefore<cnjadv> whereinbefore:whereinbefore<cnjadv> wherein:wherein<cnjadv> whereof:whereof<cnjadv> whereout:whereout<cnjadv> whereover:whereover<cnjadv> wheresoever:wheresoever<cnjadv> whether:whether<cnjadv> which:which<cnjsub> while:while<cnjadv> whilst:whilst<cnjadv>
Interjections (Prt)[edit]
Interjections don't inflect. They're classified as:
- ij: interjections
Most arbitrary letter combinations that appear in text, prose, or chat messages could be interjections. We're limiting the selection to widely attested ones that may actually be sensibly translated, esp. greetings, curses or such minimal responses.
Interjection examples[edit]
argh:argh<ij> fuck:fuck<ij> hello:hello<ij> hey:hey<ij> aah:aah<ij> aargh:aargh<ij> agh:agh<ij> ah:ah<ij> aha:aha<ij> ahem:ahem<ij> ahh:ahh<ij> aw:aw<ij> aww:aww<ij> aye:aye<ij> bah:bah<ij> boo:boo<ij> brr:brr<ij> bye:bye<ij> crap:crap<ij> crud:crud<ij> damn:damn<ij> darn:darn<ij> d'oh:d'oh<ij> doh:doh<ij> eh:eh<ij> goddamn:goddamn<ij> grr:grr<ij> ha:ha<ij> hah:hah<ij> haha:haha<ij> heh:heh<ij> hehe:hehe<ij> hi:hi<ij> hm:hm<ij> hmm:hmm<ij> hmph:hmph<ij> hrm:hrm<ij> huh:huh<ij> like:like<ij> lol:lol<ij> omg:omg<ij> ok:ok<ij> ooh:ooh<ij> oops:oops<ij> ouch:ouch<ij> oww:oww<ij> phew:phew<ij> shh:shh<ij> shit:shit<ij> sorry:sorry<ij> thanks:thanks<ij> ugh:ugh<ij> uh:uh<ij> uh-huh:uh-huh<ij> umm:umm<ij> welcome:welcome<ij> well:well<ij> what:what<ij> whew:whew<ij> whoa:whoa<ij> woohoo:woohoo<ij> yay:yay<ij>
Punctuations (Google pos: .)[edit]
These are more or less same everywhere apart from directionality and some orthographic variation.
':'<apos> ,:,<cm> -:-<guio> --:–<guio> –:-<guio> —:—<guio> (:(<lpar> [:[<lpar> ":"<lquot> “:“<lquot> «:«<lquot> »:«<lquot> ):)<rpar> ]:]<rpar> ":"<rquot> ”:”<rquot> »:»<rquot> (:(<lpar> ):)<rpar> _:_<sent> :::<sent> ;:;<sent> !:!<sent> ?:?<sent> .:.<sent> #:#<sent> %:%<sent>