Weights in the pipeline

From Apertium
Revision as of 23:05, 21 June 2020 by Francis Tyers (talk | contribs)
Jump to navigation Jump to search

In some cases we want to be able to pass "weights" along in the pipeline.

What are weights? Well they could be probabilities, or they could be scores, or lambdas (feature weights) or anything, but we probably want to define what they are.

morph:

^Emplea/emplear<vblex><pri><p3><sg><0.9424>/emplear<vblex><imp><p2><sg><0.2323>$ ^a/a<pr><0.9934>/a<n><m><sg><0.0123>$ ^un/uno<det><ind><m><sg>$ ^70%/70%<num><0.9999>$ ^del/de<pr>+el<det><def><m><sg>$ ^total/total<adj><mf><sg>/total<n><m><sg>$ ^de/de<pr>$ ^asalariados/asalariado<adj><m><pl>/asalariado<n><m><pl>$^./.<sent>$^./.<sent>$

tagger:

biltrans:

^Emplear<vblex><pri><p3><sg><0.9424>/Emprar<vblex><pri><p3><sg><0.9424><0.7343>/Ocupar<vblex><pri><p3><sg><0.9424><0.3204>$ ^a<pr>/a<pr><0.8930>/<0.0324>/de<pr><0.2342>$ ^uno<det><ind><m><sg>/un<det><ind><m><sg>$ ^70%<num>/70%<num>$ ^de<pr>/de<pr>$ ^el<det><def><m><sg>/el<det><def><m><sg>$ ^total<n><m><sg>/total<n><m><sg>$ ^de<pr>/de<pr>$ ^asalariado<n><m><pl>/assalariat<n><m><pl>$^.<sent>/.<sent>$^.<sent>/.<sent>$

lexsel:


transfer:

Should we maintain the weights of previous modules throughout the pipe?

  • Pros:
    • If we do this then we can output a confidence for the translation. This could be exposed through the web interface (like GF does with colourcoding)
  • Cons:
    • There could be many weights.

Maybe transfer could combine them and output a single one.

In general I think we don't want to be writing rules that say "if this weight is >= 0.6 then ...". If weights are used they should be probably combined directly with other weights, or in terms of probability mass.