Difference between revisions of "User:Khannatanmai/Progress"

From Apertium
Jump to navigation Jump to search
(Created page with "GSoc Progress Community Bonding Period")
 
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
GSoc Progress
= GSoc Progress =


== Work Done ==
Community Bonding Period

* Read the Documentation fully
* Went through code for apertium core (pretransfer, transfer, tagger, etc.)

== Work Plan ==

'''Community Bonding Period''' (May 6 - May 27)
* Understand the Apertium pipeline fully
* Modify and Understand individual files
* Get familiar with the files that I need to modify
* Formalise the problem, limit the scope of anaphora resolution (To Anaphora needed for MT)
* Flowchart of the proposed system
* Write Pseudocode for identifying Salience Factors
* '''Study the EuroParl corpus''' and see which anaphors the method will be able to resolve on paper
* Explore Constraint Grammar and use it if it proves to be beneficial

'''Week 1''' (May 27)
* Automatic Annotation of anaphora for evaluation (EuroParl Corpus)
* Implement a preliminary scoring system for antecedent indicators [work for Spanish-English and Catalan-English for now]
* Decide on a definite context window

'''Week 2''' (June 3)
* Implement Basic Anaphora for Possessive Pronouns in C++
* Create a transfer-pattern-like file as a way to mark possible NPs as antecedents.
* Implement transfer rules for Possessive Pronouns
* A basic prototype ready
* TEST the prototype with the pipeline

'''Week 3''' (June 10)

* Implement Basic Anaphora for Reflexive Pronouns (On Verbs) [For Spanish and Catalan]
* Implement Basic Anaphora for Zero Pronouns (On Verbs) [For Spanish and Catalan]
* Implement transfer rules for the above
* TEST the system extensively
* Document the outline

'''Week 4''' (June 17)
* Implement the system to work out all possible antecedents
* Add ability to give antecedents a score
* TEST basic sentences with single antecedents, Test the pipeline
* Test and Evaluate for Possessive, Reflexive, Zero Pronouns in Spa-Eng pair
* Test and Evaluate for Possessive, Reflexive, Zero Pronouns in Cat-Eng pair

=== Deliverable #1: Anaphora Resolution for single antecedents, with transfer rules [The full pipeline] ===

'''Evaluation 1: June 24-28'''

'''Week 5''' (June 28)
* Implement Antecedent Indicators:
* Implement Code to Identify Boosting Indicators
* Implement Code to Identify Lappin and Leass Indicators (as many as possible)

'''Week 6''' (July 4)
* Code to Identify Impeding Indicators
* Code to Identify Adjective Agreement Antecedent
* Implement transfer rules for agreement in adjectives for Cat-Eng & Spa-Eng
* Code to remember antecedents for a certain window
* Give scores to the antecedent indicators

'''Week 7''' (July 10)
* Implement detection of remaining salience features
* Implement remaining transfer rules for anaphora in pronouns Cat-Eng & Spa-Eng
* Code Salience Indicators & Implement tie breaking systems
* Modify the scoring system based on performance in the pairs
* TEST system with French and Russian.

'''Week 8''' (July 16)
* Implement fallback for anaphora (in case of too many antecedents or not past certainty threshold)
* TEST Scoring System
* TEST and Evaluate current system and produce precision and recall
* TEST and Evaluate Agreement for Adjectives
* TEST system with Turkish and any other required language pairs.
* Document Antecedent Indicators, Scoring System, Fallback for Cat-Eng & Spa-Eng

=== Deliverable #2: Anaphora Resolution with saliency features detection, scores, and a fallback mechanism ===

'''Evaluation 2: July 22-26'''

'''Week 9''' [OPTIONAL: If current system not producing good enough results]
* Implement Expectation-Maximization Algorithm using monolingual corpus
* Compare with current system
* Test EM Algorithm and the implemented system

'''Week 9''' [NOT OPTIONAL] (July 26)

* Implement code to ignore embedded clauses
* Evaluate increase in detection
* Insert into Apertium pipeline
* Implement code to accept input in chunks and process it

'''Week 10''' (August 1)
* EXTENSIVELY TEST final system
* Test with French, Russian, Turkish, Galician, etc.
* Evaluate and find out which features are language agnostic
* Decide on list of features for agnostic anaphora and for language specific anaphora

'''Week 11''' (August 7)
* Any remaining coding and improving the system
* TEST on multiple pairs and give Evaluation Scores
* TEST for backwards compatibility and ensure it

'''Week 12''' (August 13)
* Wrap up on the final module
* Complete the overall documentation with observations and future prospects

'''Final Evaluations: August 19-26'''

=== Project Completed ===
'''NOTE''': Week 11 and Week 12 have extra time to deal with unforeseen issues and ideas
----

Latest revision as of 06:51, 11 May 2019

GSoc Progress[edit]

Work Done[edit]

  • Read the Documentation fully
  • Went through code for apertium core (pretransfer, transfer, tagger, etc.)

Work Plan[edit]

Community Bonding Period (May 6 - May 27)

  • Understand the Apertium pipeline fully
  • Modify and Understand individual files
  • Get familiar with the files that I need to modify
  • Formalise the problem, limit the scope of anaphora resolution (To Anaphora needed for MT)
  • Flowchart of the proposed system
  • Write Pseudocode for identifying Salience Factors
  • Study the EuroParl corpus and see which anaphors the method will be able to resolve on paper
  • Explore Constraint Grammar and use it if it proves to be beneficial

Week 1 (May 27)

  • Automatic Annotation of anaphora for evaluation (EuroParl Corpus)
  • Implement a preliminary scoring system for antecedent indicators [work for Spanish-English and Catalan-English for now]
  • Decide on a definite context window

Week 2 (June 3)

  • Implement Basic Anaphora for Possessive Pronouns in C++
  • Create a transfer-pattern-like file as a way to mark possible NPs as antecedents.
  • Implement transfer rules for Possessive Pronouns
  • A basic prototype ready
  • TEST the prototype with the pipeline

Week 3 (June 10)

  • Implement Basic Anaphora for Reflexive Pronouns (On Verbs) [For Spanish and Catalan]
  • Implement Basic Anaphora for Zero Pronouns (On Verbs) [For Spanish and Catalan]
  • Implement transfer rules for the above
  • TEST the system extensively
  • Document the outline

Week 4 (June 17)

  • Implement the system to work out all possible antecedents
  • Add ability to give antecedents a score
  • TEST basic sentences with single antecedents, Test the pipeline
  • Test and Evaluate for Possessive, Reflexive, Zero Pronouns in Spa-Eng pair
  • Test and Evaluate for Possessive, Reflexive, Zero Pronouns in Cat-Eng pair

Deliverable #1: Anaphora Resolution for single antecedents, with transfer rules [The full pipeline][edit]

Evaluation 1: June 24-28

Week 5 (June 28)

  • Implement Antecedent Indicators:
  • Implement Code to Identify Boosting Indicators
  • Implement Code to Identify Lappin and Leass Indicators (as many as possible)

Week 6 (July 4)

  • Code to Identify Impeding Indicators
  • Code to Identify Adjective Agreement Antecedent
  • Implement transfer rules for agreement in adjectives for Cat-Eng & Spa-Eng
  • Code to remember antecedents for a certain window
  • Give scores to the antecedent indicators

Week 7 (July 10)

  • Implement detection of remaining salience features
  • Implement remaining transfer rules for anaphora in pronouns Cat-Eng & Spa-Eng
  • Code Salience Indicators & Implement tie breaking systems
  • Modify the scoring system based on performance in the pairs
  • TEST system with French and Russian.

Week 8 (July 16)

  • Implement fallback for anaphora (in case of too many antecedents or not past certainty threshold)
  • TEST Scoring System
  • TEST and Evaluate current system and produce precision and recall
  • TEST and Evaluate Agreement for Adjectives
  • TEST system with Turkish and any other required language pairs.
  • Document Antecedent Indicators, Scoring System, Fallback for Cat-Eng & Spa-Eng

Deliverable #2: Anaphora Resolution with saliency features detection, scores, and a fallback mechanism[edit]

Evaluation 2: July 22-26

Week 9 [OPTIONAL: If current system not producing good enough results]

  • Implement Expectation-Maximization Algorithm using monolingual corpus
  • Compare with current system
  • Test EM Algorithm and the implemented system

Week 9 [NOT OPTIONAL] (July 26)

  • Implement code to ignore embedded clauses
  • Evaluate increase in detection
  • Insert into Apertium pipeline
  • Implement code to accept input in chunks and process it

Week 10 (August 1)

  • EXTENSIVELY TEST final system
  • Test with French, Russian, Turkish, Galician, etc.
  • Evaluate and find out which features are language agnostic
  • Decide on list of features for agnostic anaphora and for language specific anaphora

Week 11 (August 7)

  • Any remaining coding and improving the system
  • TEST on multiple pairs and give Evaluation Scores
  • TEST for backwards compatibility and ensure it

Week 12 (August 13)

  • Wrap up on the final module
  • Complete the overall documentation with observations and future prospects

Final Evaluations: August 19-26

Project Completed[edit]

NOTE: Week 11 and Week 12 have extra time to deal with unforeseen issues and ideas