Difference between revisions of "Constructing a TSX file with a Constraint Grammar"
Jump to navigation
Jump to search
(Redirecting to Apertium and Constraint Grammar) |
|||
(4 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | #redirect[[Apertium and Constraint Grammar]] |
||
− | {{TOCD}} |
||
− | Constraint Grammar (CG) is a method of POS-tagging ambiguous text. The apertium-tagger has a basic form of this in addition to probabilistic tagging. It should be possible to use an existing CG to write, or improve an existing [[TSX]] (tagger definition file) in the form of using it to create sets of forbid/enforce rules. |
||
− | |||
− | ==Terminology== |
||
− | |||
− | * cohort — a [[surface form]] of a word, along with its analyses. |
||
− | ::Apertium equivalent: ^words/word<n><sg>/word<vblex><pres><p3><sg>$ |
||
− | |||
− | ==Labels== |
||
− | |||
− | Coarse tag "labels" in Constraint Grammar (CG) are specified either as {{sc|list}} or {{sc|set}}. Sometimes however, these are not complete sets, so may need to be combined. |
||
− | |||
− | For example: |
||
− | |||
− | <pre> |
||
− | LIST A-N-CC = A N CC ; |
||
− | LIST A-pos = (A Pos) ; |
||
− | LIST %etter/fram/opp% = ("etter" Pr) ("fram" Pr) ("frem" Pr) ("opp" Pr) ; |
||
− | </pre> |
||
− | |||
− | Is three lists, expressed in TSX format as below: |
||
− | |||
− | <pre> |
||
− | <def-label name="A-N-CC"> |
||
− | <tags-item tags="adj.*"/> |
||
− | <tags-item tags="n.*"/> |
||
− | <tags-item tags="cnjcoo"/> |
||
− | </def-label> |
||
− | <def-label name="A-pos"> |
||
− | <tags-item tags="adj.pos.*"/> |
||
− | </def-label> |
||
− | <def-label name="%etter/fram/opp%"> |
||
− | <tags-item lemma="etter" tags="pr"/> |
||
− | <tags-item lemma="fram" tags="pr"/> |
||
− | <tags-item lemma="frem" tags="pr"/> |
||
− | <tags-item lemma="opp" tags="pr"/> |
||
− | </def-label> |
||
− | </pre> |
||
− | |||
− | etc. Note that this may cause some problems, so it might be best to attempt this using only ambiguous tags to start with. |
||
− | |||
− | ==Constraints== |
||
− | |||
− | Constraint Grammar uses a series of hand-written constraints in order to POS-tag ambiguous words. |
||
− | |||
− | ===Forbid rules=== |
||
− | |||
− | The operation analogous to a ''forbid rule'' is {{sc|remove}}. |
||
− | |||
− | <pre> |
||
− | # 3526 |
||
− | "<bare>" REMOVE (CS) IF |
||
− | (-1 CS) |
||
− | ; |
||
− | </pre> |
||
− | |||
− | This means that it works on the lemma "bare", which can be a subordinating conjunction, verb or adverb. It says to forbid the string "bare bare" where both lexical units are subordinating conjunctions. In TSX format: |
||
− | |||
− | <pre> |
||
− | <forbid> |
||
− | <label-sequence> |
||
− | <label-item label="bare-CS"> |
||
− | <label-item label="bare-CS"> |
||
− | </label-sequence> |
||
− | </forbid> |
||
− | </pre> |
||
− | |||
− | Presuming we have a label definition of: |
||
− | |||
− | <pre> |
||
− | <def-label name="bare-CS"> |
||
− | <tags-item lemma="bare" tags="conjsub"/> |
||
− | </def-label> |
||
− | </pre> |
||
− | |||
− | ===Enforce rules=== |
||
− | |||
− | The operation analogous to an ''enforce rule'' is {{sc|select}}, which "selects a reading, if it contains a TARGETed tag. In practice, selection is equivalent to a removal of all other readings." |
||
− | |||
− | <pre> |
||
− | # 2866 |
||
− | SELECT (A Sg Neu Indef) IF |
||
− | (0 %rundt%) |
||
− | (1 Det-Qnt) |
||
− | ; |
||
− | </pre> |
||
− | |||
− | Means enforce <code>adj.sg.nt.indef</code> if the lemma of the word is "rundt" and the lexical unit to the left is a quantifier <code>det.qnt</code> |
||
− | |||
− | In order to convert this into Apertium format one would need to take all of the coarse tags which are not <code>det.qnt</code> and make them into label sequences as below: |
||
− | |||
− | <pre> |
||
− | <forbid> |
||
− | <label-sequence> |
||
− | <label-item label="%rundt%"> |
||
− | <label-item label="A-pos"> |
||
− | </label-sequence> |
||
− | |||
− | ... |
||
− | |||
− | </forbid> |
||
− | </pre> |
||
− | |||
− | ===Prefer tags=== |
||
− | |||
− | ==Further reading== |
||
− | |||
− | * [http://beta.visl.sdu.dk/cg3.html vislcg3 documentation] |
||
− | * [http://beta.visl.sdu.dk/cg2_howto.html VISL: Basic how-to for vislcg (vislcg2)] |
||
− | |||
− | Note that vislcg3 is the version which is actively developed. |
||
− | |||
− | [[Category:Documentation]] |
Latest revision as of 10:28, 23 March 2009
Redirect to: