Difference between revisions of "User:Naan Dhaan/User friendly lexical training"

From Apertium
Jump to navigation Jump to search
m (week 4)
m (week 5)
Line 37: Line 37:
 
| June 22-28
 
| June 22-28
 
| bug fixes
 
| bug fixes
  +
|
  +
|-
  +
| June 29-2
  +
| some more bug fixes
 
|
 
|
 
|}
 
|}

Revision as of 18:43, 5 July 2021

The lexical selection module selects the right sentence in the context, based on lexical selection rules, from the multiple(ambiguous) sentences output by the transfer module. These rules can be written manually or inferred automatically by training on a corpus. But, the training process is a bit tedious with various tools like irstlm, fast-align, moses, etc, and various scripts like extract-sentences, extract-freq-lexicon, process-tagger-output, etc, involved, which require a lot of manual configs.
The goal of this project is to make this as simple and automated as possible with little involvement of the user. In a nutshell, there should be a single config file and the user does the entire training using a driver script. Finally, design regression tests on the driver script so that it works in the face of updates to the third-party tools. Also, train on different corpora and add lexical selection rules to the languages which have few to no lexical selection rules, thereby improving the quality of translation

Work Plan

Time Period Details Deliverable
Community Bonding Period

May 17-31

  • helper script check_config.py to check if the configuration and tools are fine
  • automated test script to test check_config.py
driver script can validate if the required tools are setup
Community Bonding Period

June 1-7

reading apertium documentation

June 8-14
  • added installation instructions in README
  • incorporate clean_corpus in the driver script
  • added code for tagging
  • added code for aligning
driver script can now, clean corpus, tag it and generate alignments
June 15-21 full driver script complete(requires testing (: ) driver script can now generate rules
June 22-28 bug fixes
June 29-2 some more bug fixes