Difference between revisions of "Weights in the pipeline"

From Apertium
Jump to navigation Jump to search
(Created page with "In some cases we want to be able to pass "weights" along in the pipeline. What are weights? Well they could be probabilities, or they could be scores, or lambdas or anything,...")
 
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
In some cases we want to be able to pass "weights" along in the pipeline.
In some cases we want to be able to pass "weights" along in the pipeline.


What are weights? Well they could be probabilities, or they could be scores, or lambdas or anything, but we probably want to define what they are.
What are weights? Well they could be probabilities, or they could be scores, or lambdas (feature weights) or anything, but we probably want to define what they are.


<pre>
<pre>
Line 28: Line 28:


Maybe transfer could combine them and output a single one.
Maybe transfer could combine them and output a single one.

Are weights interpretable across LUs ? or are they restricted to within-LU?
* What are the boundaries? Tagger (we choose the "LU") Lexsel (we choose the target "LU")
** Unless we have a reweighting step, but even then the boundaries would exist.


In general I think we don't want to be writing rules that say "if this weight is >= 0.6 then ...". If weights are used they should be probably combined directly with other weights, or in terms of probability mass.
In general I think we don't want to be writing rules that say "if this weight is >= 0.6 then ...". If weights are used they should be probably combined directly with other weights, or in terms of probability mass.

Latest revision as of 14:29, 22 June 2020

In some cases we want to be able to pass "weights" along in the pipeline.

What are weights? Well they could be probabilities, or they could be scores, or lambdas (feature weights) or anything, but we probably want to define what they are.

morph:

^Emplea/emplear<vblex><pri><p3><sg><0.9424>/emplear<vblex><imp><p2><sg><0.2323>$ ^a/a<pr><0.9934>/a<n><m><sg><0.0123>$ ^un/uno<det><ind><m><sg>$ ^70%/70%<num><0.9999>$ ^del/de<pr>+el<det><def><m><sg>$ ^total/total<adj><mf><sg>/total<n><m><sg>$ ^de/de<pr>$ ^asalariados/asalariado<adj><m><pl>/asalariado<n><m><pl>$^./.<sent>$^./.<sent>$

tagger:

biltrans:

^Emplear<vblex><pri><p3><sg><0.9424>/Emprar<vblex><pri><p3><sg><0.9424><0.7343>/Ocupar<vblex><pri><p3><sg><0.9424><0.3204>$ ^a<pr>/a<pr><0.8930>/<0.0324>/de<pr><0.2342>$ ^uno<det><ind><m><sg>/un<det><ind><m><sg>$ ^70%<num>/70%<num>$ ^de<pr>/de<pr>$ ^el<det><def><m><sg>/el<det><def><m><sg>$ ^total<n><m><sg>/total<n><m><sg>$ ^de<pr>/de<pr>$ ^asalariado<n><m><pl>/assalariat<n><m><pl>$^.<sent>/.<sent>$^.<sent>/.<sent>$

lexsel:


transfer:

Should we maintain the weights of previous modules throughout the pipe?

  • Pros:
    • If we do this then we can output a confidence for the translation. This could be exposed through the web interface (like GF does with colourcoding)
  • Cons:
    • There could be many weights.

Maybe transfer could combine them and output a single one.

Are weights interpretable across LUs ? or are they restricted to within-LU?

  • What are the boundaries? Tagger (we choose the "LU") Lexsel (we choose the target "LU")
    • Unless we have a reweighting step, but even then the boundaries would exist.

In general I think we don't want to be writing rules that say "if this weight is >= 0.6 then ...". If weights are used they should be probably combined directly with other weights, or in terms of probability mass.