Difference between revisions of "Weights in the pipeline"

From Apertium
Jump to navigation Jump to search
 
(One intermediate revision by the same user not shown)
Line 28: Line 28:
   
 
Maybe transfer could combine them and output a single one.
 
Maybe transfer could combine them and output a single one.
  +
  +
Are weights interpretable across LUs ? or are they restricted to within-LU?
  +
* What are the boundaries? Tagger (we choose the "LU") Lexsel (we choose the target "LU")
  +
** Unless we have a reweighting step, but even then the boundaries would exist.
   
 
In general I think we don't want to be writing rules that say "if this weight is >= 0.6 then ...". If weights are used they should be probably combined directly with other weights, or in terms of probability mass.
 
In general I think we don't want to be writing rules that say "if this weight is >= 0.6 then ...". If weights are used they should be probably combined directly with other weights, or in terms of probability mass.

Latest revision as of 14:29, 22 June 2020

In some cases we want to be able to pass "weights" along in the pipeline.

What are weights? Well they could be probabilities, or they could be scores, or lambdas (feature weights) or anything, but we probably want to define what they are.

morph:

^Emplea/emplear<vblex><pri><p3><sg><0.9424>/emplear<vblex><imp><p2><sg><0.2323>$ ^a/a<pr><0.9934>/a<n><m><sg><0.0123>$ ^un/uno<det><ind><m><sg>$ ^70%/70%<num><0.9999>$ ^del/de<pr>+el<det><def><m><sg>$ ^total/total<adj><mf><sg>/total<n><m><sg>$ ^de/de<pr>$ ^asalariados/asalariado<adj><m><pl>/asalariado<n><m><pl>$^./.<sent>$^./.<sent>$

tagger:

biltrans:

^Emplear<vblex><pri><p3><sg><0.9424>/Emprar<vblex><pri><p3><sg><0.9424><0.7343>/Ocupar<vblex><pri><p3><sg><0.9424><0.3204>$ ^a<pr>/a<pr><0.8930>/<0.0324>/de<pr><0.2342>$ ^uno<det><ind><m><sg>/un<det><ind><m><sg>$ ^70%<num>/70%<num>$ ^de<pr>/de<pr>$ ^el<det><def><m><sg>/el<det><def><m><sg>$ ^total<n><m><sg>/total<n><m><sg>$ ^de<pr>/de<pr>$ ^asalariado<n><m><pl>/assalariat<n><m><pl>$^.<sent>/.<sent>$^.<sent>/.<sent>$

lexsel:


transfer:

Should we maintain the weights of previous modules throughout the pipe?

  • Pros:
    • If we do this then we can output a confidence for the translation. This could be exposed through the web interface (like GF does with colourcoding)
  • Cons:
    • There could be many weights.

Maybe transfer could combine them and output a single one.

Are weights interpretable across LUs ? or are they restricted to within-LU?

  • What are the boundaries? Tagger (we choose the "LU") Lexsel (we choose the target "LU")
    • Unless we have a reweighting step, but even then the boundaries would exist.

In general I think we don't want to be writing rules that say "if this weight is >= 0.6 then ...". If weights are used they should be probably combined directly with other weights, or in terms of probability mass.