User:Khannatanmai/Eliminating Dictionary Trimming

From Apertium
< User:Khannatanmai
Revision as of 18:04, 2 May 2020 by Khannatanmai (talk | contribs) (Created page with "Proposal: User:Khannatanmai/GSoC2020Proposal_Trimming = Solution = Propagate surface form. Generate surface form of source if word in monodix but not in bidix, to mainta...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Proposal: User:Khannatanmai/GSoC2020Proposal_Trimming

Solution

Propagate surface form. Generate surface form of source if word in monodix but not in bidix, to maintain benefits of trimming while also keeping source analysis to remove the disadvantages of trimming.

Modifications Needed

  • Each module will be modified to have the ability to access and add to the stream secondary information in the form of secondary tags, as explained in the proposal. Given this modification, the following modules will be modified to implement dictionary trimming.

Tagger

  • Modified to not remove the surface form of the LU and instead, add it in a secondary tag. <sf:potatoes>

Pretransfer

  • Depends on what we decide wrt compounds. With no modification, a compound XZY/X+Y, in which only Y is in the bidix, will translate as XZY Y, which can often make the translation worse.
  • One solution is to effectively keep trimming compounds if we don't have the full translation of these.
  • If we can somehow find the surface form of the parts of a compound, then we can go ahead with partial translations.

Transfer

  • Need to ensure secondary tags stay stuck to their counterparts in TL. Should already be done during the stream modification.

Generator

  • Modified to generate the source surface form of a word which doesn't have a translation, instead of the lemma with @.