Difference between revisions of "TSX format"

From Apertium
Jump to navigation Jump to search
(New page: The 'tagset' section defines the correspondance between simple or multiple morphological categories defining a lexical form and the coarser ones with which the part-of-speech tagger works ...)
 
m (TSX moved to TSX format)
(No difference)

Revision as of 12:12, 2 March 2008

The 'tagset' section defines the correspondance between simple or multiple morphological categories defining a lexical form and the coarser ones with which the part-of-speech tagger works

Each 'def-label' defines one coarse tag in terms of a list of fine tags and has a mandatory unique name. The optional attribute 'closed="true"' may be used to specify if the defined fine tags belong to a closed list

Each 'tags-item' may be a dot-separated subsequence of the morphological tags corresponding to a coarse tag optionally in association with a given lemma

Each 'def-mult' defines one coarse tag in terms of a sequence of coarse tags previously defined as 'def-labels' or a sequence of fine tags. A mandatory name is required for each 'def-mult' which may also has an optional attribute 'closed="true"' if it belongs to a closed list

Element 'sequence' encloses a set of tags o labels which defines a unit with more than one label

Each 'label' of the 'label-item' correspond to a coarse tag previously defined as a 'def-label' by a name.

Element 'forbid' contains sequences of morphological categories that are not allowed in a given language

Each 'label-sequence' is restricted to two 'label-items'

Element 'enforce-rules' defines sets of coarse tags that must follow specificied ones

Each 'enforce-after' encloses the set of coarse tags ('label-set') that must follow the one defined in 'label', as a mandatory attribute

The set of 'label-items' enforced after a 'label' are enclosed inside element 'label-set'

Element 'preferences' allows to decide amongst two or more fine tag sequences which are grouped in the same coarse tag.

Each 'prefer' element has a mandatory attribute 'tags' made of a sequence of fine tags