Difference between revisions of "Northern Sámi and Norwegian/tNxIntros"

From Apertium
Jump to navigation Jump to search
(update)
(formatting)
Line 6: Line 6:
 
* Chunking (all rules)
 
* Chunking (all rules)
 
* Handling part-of-speech changes introduced by bidix.
 
* Handling part-of-speech changes introduced by bidix.
* Rules match on source tags, so a sme verb which is specified
+
** Rules match on source tags, so a sme verb which is specified
 
as a nob noun has to be handled by the verb rule
 
as a nob noun has to be handled by the verb rule
 
(this leads to some redundancy, hopefully we'll get a bidix module soon)
 
(this leads to some redundancy, hopefully we'll get a bidix module soon)
 
* (De-)compounding
 
* (De-)compounding
* See rule: NOM.CMP NOM
+
** See rule: NOM.CMP NOM
 
* goahti-derivation
 
* goahti-derivation
* See rule: VERB Der/goahti
+
** See rule: VERB Der/goahti
 
* Simple noun phrases
 
* Simple noun phrases
* Heads and their simple modifiers/specifiers: adj nom, adj adj nom, det adj adj nom, num adj nom
+
** Heads and their simple modifiers/specifiers: adj nom, adj adj nom, det adj adj nom, num adj nom
* See rule: DET ADJ_ATTR NOM
+
** See rule: DET ADJ_ATTR NOM
* See macro: out_nom
+
** See macro: out_nom
 
* Insert prepositions based on nominal case
 
* Insert prepositions based on nominal case
* These get their own chunk; t2x might have to remove them in co-ordination or post-position rules
+
** These get their own chunk; t2x might have to remove them in co-ordination or post-position rules
 
or might change them if the verb requires something else
 
or might change them if the verb requires something else
* See macro: set_caseprep
+
** See macro: set_caseprep
 
* Verb auxiliaries
 
* Verb auxiliaries
* Tags from sme verbs are used to output finite verb auxiliaries before the main verb.
+
** Tags from sme verbs are used to output finite verb auxiliaries before the main verb.
 
All verbs get their own chunks, as do lemq's (since they can move around noun phrases in t3x)
 
All verbs get their own chunks, as do lemq's (since they can move around noun phrases in t3x)
* See macro: out_verb
+
** See macro: out_verb
   
 
== apertium-sme-nob.sme-nob.t2x ==
 
== apertium-sme-nob.sme-nob.t2x ==
Line 30: Line 30:
 
This is the second pass. Responsibilities of this file include:
 
This is the second pass. Responsibilities of this file include:
   
* Simple anaphora resolution; we keep track of gender of p3 sg
+
* Simple anaphora resolution; we keep track of gender of p3 sg subjects using the variables "ana_m_f" and "ana_gen", applying this to GD-tagged SN's and all FV's.
 
** See rule: FV, SN
subjects using the variables "ana_m_f" and "ana_gen", applying
 
 
** Note: this only applies gender in a forward direction, applying gender from the noun to pronoun in 'Pron is N' constructions happens in t3x
this to GD-tagged SN's and all FV's.
 
 
* Keeping track of verb temps for eg. Actio.Ess (which is tagged TD) using the variable "ana_temps"
* See rule: FV, SN
 
 
** See rule: FV, V.TD
* Note: this only applies gender in a forward direction, applying gender
 
from the noun to pronoun in 'Pron is N' constructions happens in t3x
 
* Keeping track of verb temps for eg. Actio.Ess
 
(which is tagged TD) using the variable "ana_temps"
 
* See rule: FV, V.TD
 
 
* Co-ordination
 
* Co-ordination
* Removes superfluous prepositons
+
** Removes superfluous prepositons
* See rule: SN CNP SN, FV CVP FV
+
** See rule: SN CNP SN, FV CVP FV
 
* Some definiteness changes for genitive clauses etc.
 
* Some definiteness changes for genitive clauses etc.
* See rule: SN_rN SN
+
** See rule: SN_rN SN
 
* Moving postpositions
 
* Moving postpositions
* See rule: SN ADPOS -> ADPOS SN
+
** See rule: SN ADPOS -> ADPOS SN
 
* Removing prepositions when case is governed by adpositions
 
* Removing prepositions when case is governed by adpositions
* See rule: PR SN_RPOST adpos
+
** See rule: PR SN_RPOST adpos
 
* Removing essive caseprep ('som') after vcop
 
* Removing essive caseprep ('som') after vcop
* See rule: vcop-FV caseprep-PR.ess
+
** See rule: vcop-FV caseprep-PR.ess
* Removing Pcle.Qst and putting the Qst tag on the preceding chunk,
+
* Removing Pcle.Qst and putting the Qst tag on the preceding chunk, so that they're treated as if there were no space
 
** See rule: SN PCLE.Qst
so that they're treated as if there were no space
 
 
** PCLE.Qst variants could probably be added to many of the other rules here too, still TODO, but the most likely seem to be covered.
* See rule: SN PCLE.Qst
 
* PCLE.Qst variants could probably be added to many of the other rules here too,
 
still TODO, but the most likely seem to be covered.
 
   
 
TODO:
 
TODO:
Line 66: Line 60:
   
 
* V2 Movement etc., rules which involve verb, adverb and/or noun chunks
 
* V2 Movement etc., rules which involve verb, adverb and/or noun chunks
* See rule: SPEC SN FV
+
** See rule: SPEC SN FV
 
* Inserting dropped pronouns
 
* Inserting dropped pronouns
* Overridden by rules matching the maybe-[LR]SUBJ categories; these include
+
** Overridden by rules matching the maybe-[LR]SUBJ categories; these include unknowns and should just be passed through unchanged.
 
** See macro: set_pro
unknowns and should just be passed through unchanged.
 
* See macro: set_pro
 
 
* Inserting adverbs to indicate modality
 
* Inserting adverbs to indicate modality
* See macro: set_adv
+
** See macro: set_adv
 
* Correct definiteness using the larger context (eg. verb animacy / number / temps)
 
* Correct definiteness using the larger context (eg. verb animacy / number / temps)
* See macro: set_defnes2
+
** See macro: set_defnes2
 
* Change perfect participle of non-finites to preterite when following the negation verb
 
* Change perfect participle of non-finites to preterite when following the negation verb
* See rules with FV.Neg
+
** See rules with FV.Neg
 
* Using verb animacy/number to guess GD/ND subject gender/number
 
* Using verb animacy/number to guess GD/ND subject gender/number
* See macro modify_GD_ND_subj3
+
** See macro modify_GD_ND_subj3
* Only regards pre-verbal GD/ND subjects, post-verbal are handled in t2x
+
** Only regards pre-verbal GD/ND subjects, post-verbal are handled in t2x
   
 
TODO
 
TODO
 
* Using verb animacy/number to guess GD/ND subject gender/number
 
* Using verb animacy/number to guess GD/ND subject gender/number
* See macro modify_GD_ND_subj3
+
** See macro modify_GD_ND_subj3
   
 
== apertium-sme-nob.sme-nob.t4x ==
 
== apertium-sme-nob.sme-nob.t4x ==
Line 90: Line 83:
   
 
* Inserting articles
 
* Inserting articles
* See rule: pre_nom
+
** See rule: pre_nom
* See macro: maybe_out_det2
+
** See macro: maybe_out_det2
 
* Cleanup
 
* Cleanup
* Making sure tags are consistent with nob.dix (esp. adjectives, personal pronouns)
+
** Making sure tags are consistent with nob.dix (esp. adjectives, personal pronouns)
* See macro: clean_adj
+
** See macro: clean_adj
* See macro: clean_det (also used for numerals)
+
** See macro: clean_det (also used for numerals)
 
* Does not output spaces occuring after a 'cmp'
 
* Does not output spaces occuring after a 'cmp'
* See rule: det_cmp_nom
+
** See rule: det_cmp_nom
   
 
TODO: prpers entries in nob.dix, to avoid the clean_pron mess.
 
TODO: prpers entries in nob.dix, to avoid the clean_pron mess.

Revision as of 18:35, 24 April 2014

Here come the comments on the beginning of each tNx file. Read at own risk, you may have forgotten to update them...

apertium-sme-nob.sme-nob.t1x

This is the first pass. Responsibilities of this file include:

  • Chunking (all rules)
  • Handling part-of-speech changes introduced by bidix.
    • Rules match on source tags, so a sme verb which is specified
   as a nob noun has to be handled by the verb rule
   (this leads to some redundancy, hopefully we'll get a bidix module soon)
  • (De-)compounding
    • See rule: NOM.CMP NOM
  • goahti-derivation
    • See rule: VERB Der/goahti
  • Simple noun phrases
    • Heads and their simple modifiers/specifiers: adj nom, adj adj nom, det adj adj nom, num adj nom
    • See rule: DET ADJ_ATTR NOM
    • See macro: out_nom
  • Insert prepositions based on nominal case
    • These get their own chunk; t2x might have to remove them in co-ordination or post-position rules
   or might change them if the verb requires something else
    • See macro: set_caseprep
  • Verb auxiliaries
    • Tags from sme verbs are used to output finite verb auxiliaries before the main verb.
   All verbs get their own chunks, as do lemq's (since they can move around noun phrases in t3x)
    • See macro: out_verb

apertium-sme-nob.sme-nob.t2x

This is the second pass. Responsibilities of this file include:

  • Simple anaphora resolution; we keep track of gender of p3 sg subjects using the variables "ana_m_f" and "ana_gen", applying this to GD-tagged SN's and all FV's.
    • See rule: FV, SN
    • Note: this only applies gender in a forward direction, applying gender from the noun to pronoun in 'Pron is N' constructions happens in t3x
  • Keeping track of verb temps for eg. Actio.Ess (which is tagged TD) using the variable "ana_temps"
    • See rule: FV, V.TD
  • Co-ordination
    • Removes superfluous prepositons
    • See rule: SN CNP SN, FV CVP FV
  • Some definiteness changes for genitive clauses etc.
    • See rule: SN_rN SN
  • Moving postpositions
    • See rule: SN ADPOS -> ADPOS SN
  • Removing prepositions when case is governed by adpositions
    • See rule: PR SN_RPOST adpos
  • Removing essive caseprep ('som') after vcop
    • See rule: vcop-FV caseprep-PR.ess
  • Removing Pcle.Qst and putting the Qst tag on the preceding chunk, so that they're treated as if there were no space
    • See rule: SN PCLE.Qst
    • PCLE.Qst variants could probably be added to many of the other rules here too, still TODO, but the most likely seem to be covered.

TODO:

  • relatives (SN "who" SV SV* -> SN), both 'real' and those that started life as Agent Constructions
  • 'adpos' covers both Pr and Po, tag them as such in t1x so that the rules here don't overgeneralise


apertium-sme-nob.sme-nob.t3x

This is the third pass. Responsibilities of this file include:

  • V2 Movement etc., rules which involve verb, adverb and/or noun chunks
    • See rule: SPEC SN FV
  • Inserting dropped pronouns
    • Overridden by rules matching the maybe-[LR]SUBJ categories; these include unknowns and should just be passed through unchanged.
    • See macro: set_pro
  • Inserting adverbs to indicate modality
    • See macro: set_adv
  • Correct definiteness using the larger context (eg. verb animacy / number / temps)
    • See macro: set_defnes2
  • Change perfect participle of non-finites to preterite when following the negation verb
    • See rules with FV.Neg
  • Using verb animacy/number to guess GD/ND subject gender/number
    • See macro modify_GD_ND_subj3
    • Only regards pre-verbal GD/ND subjects, post-verbal are handled in t2x

TODO

  • Using verb animacy/number to guess GD/ND subject gender/number
    • See macro modify_GD_ND_subj3

apertium-sme-nob.sme-nob.t4x

This is the fourth pass. Responsibilities of this file include:

  • Inserting articles
    • See rule: pre_nom
    • See macro: maybe_out_det2
  • Cleanup
    • Making sure tags are consistent with nob.dix (esp. adjectives, personal pronouns)
    • See macro: clean_adj
    • See macro: clean_det (also used for numerals)
  • Does not output spaces occuring after a 'cmp'
    • See rule: det_cmp_nom

TODO: prpers entries in nob.dix, to avoid the clean_pron mess.