Difference between revisions of "User:Firespeaker/Removing bidix trimming"

From Apertium
Jump to navigation Jump to search
 
(5 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
== For transfer ==
 
== For transfer ==
=== Current state ===
 
   
==== OOV handled poorly ====
+
=== OOV handled poorly with trimming ===
   
 
<pre>
 
<pre>
Line 18: Line 17:
 
$ echo "Айгүл күчүктү байкабаптыр." | apertium -d . kir-eng
 
$ echo "Айгүл күчүктү байкабаптыр." | apertium -d . kir-eng
 
#Aygül puppy *байкабаптыр.
 
#Aygül puppy *байкабаптыр.
  +
</pre>
  +
  +
=== OOV handled much better without trimming ===
  +
<pre>
  +
$ echo "^Айгүл<np><ant><f><nom>$ ^күчүк<n><acc>$ ^байка<v><tv><ifi><evid><p3><sg>$^.<sent>$^.<sent>$" | rest-of-pipeline
  +
Aygül @байка puppy..
  +
  +
$ echo "^Айгүл<np><ant><f><nom>$ ^күчүк<n><acc>$ ^байка<v><tv><neg><ifi><evid><p3><sg>$^.<sent>$^.<sent>$" | rest-of-pipeline
  +
Aygül did not @байка puppy..
 
</pre>
 
</pre>
   
 
=== Ideal OOV handling ===
 
=== Ideal OOV handling ===
  +
All of this is hypothetical (all of the above is current behaviour).
  +
  +
==== Option 1 ====
  +
  +
Use source language information for
  +
* transfer rules
  +
* generation, with pseudo-lemmas from source language
  +
  +
Difference from current non-trimming behaviour: TL morphology is added to SL lemma at some point.
  +
  +
<pre>
  +
$ echo "Айгүл күчүктү байкаптыр." | apertium -d . kir-eng
  +
Aygül @байка-ed puppy.
  +
  +
$ echo "Айгүл күчүктү байкабаптыр." | apertium -d . kir-eng
  +
Aygül did not @байка puppy.
  +
</pre>
  +
  +
OR
  +
  +
==== Option 2 ====
  +
  +
Use source language information for
  +
* transfer rules
  +
  +
Only difference from current non-trimming behaviour: surface form is output instead of lemma.
   
 
<pre>
 
<pre>
 
$ echo "Айгүл күчүктү байкаптыр." | apertium -d . kir-eng
 
$ echo "Айгүл күчүктү байкаптыр." | apertium -d . kir-eng
Aygül *байка-ed puppy.
+
Aygül @байкабаптыр puppy.
   
 
$ echo "Айгүл күчүктү байкабаптыр." | apertium -d . kir-eng
 
$ echo "Айгүл күчүктү байкабаптыр." | apertium -d . kir-eng
Aygül did not *байка puppy.
+
Aygül did not @байкабаптыр puppy.
 
</pre>
 
</pre>

Latest revision as of 13:36, 22 June 2020

For transfer[edit]

OOV handled poorly with trimming[edit]

$ echo "Айгүл күчүктү издептир." | apertium -d . kir-eng
Aygül looked for puppy.

$ echo "Айгүл күчүктү издебептир." | apertium -d . kir-eng
Aygül did not look for puppy.
$ echo "Айгүл күчүктү байкаптыр." | apertium -d . kir-eng
#Aygül puppy *байкаптыр.

$ echo "Айгүл күчүктү байкабаптыр." | apertium -d . kir-eng
#Aygül puppy *байкабаптыр.

OOV handled much better without trimming[edit]

$ echo "^Айгүл<np><ant><f><nom>$ ^күчүк<n><acc>$ ^байка<v><tv><ifi><evid><p3><sg>$^.<sent>$^.<sent>$" | rest-of-pipeline
Aygül @байка puppy..

$ echo "^Айгүл<np><ant><f><nom>$ ^күчүк<n><acc>$ ^байка<v><tv><neg><ifi><evid><p3><sg>$^.<sent>$^.<sent>$" | rest-of-pipeline
Aygül did not @байка puppy..

Ideal OOV handling[edit]

All of this is hypothetical (all of the above is current behaviour).

Option 1[edit]

Use source language information for

  • transfer rules
  • generation, with pseudo-lemmas from source language

Difference from current non-trimming behaviour: TL morphology is added to SL lemma at some point.

$ echo "Айгүл күчүктү байкаптыр." | apertium -d . kir-eng
Aygül @байка-ed puppy.

$ echo "Айгүл күчүктү байкабаптыр." | apertium -d . kir-eng
Aygül did not @байка puppy.

OR

Option 2[edit]

Use source language information for

  • transfer rules

Only difference from current non-trimming behaviour: surface form is output instead of lemma.

$ echo "Айгүл күчүктү байкаптыр." | apertium -d . kir-eng
Aygül @байкабаптыр puppy.

$ echo "Айгүл күчүктү байкабаптыр." | apertium -d . kir-eng
Aygül did not @байкабаптыр puppy.