Difference between revisions of "User:Firespeaker/Removing bidix trimming"
Jump to navigation
Jump to search
Firespeaker (talk | contribs) |
Firespeaker (talk | contribs) |
||
(4 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
== For transfer == |
== For transfer == |
||
− | === OOV |
+ | === OOV handled poorly with trimming === |
<pre> |
<pre> |
||
Line 17: | Line 17: | ||
$ echo "Айгүл күчүктү байкабаптыр." | apertium -d . kir-eng |
$ echo "Айгүл күчүктү байкабаптыр." | apertium -d . kir-eng |
||
#Aygül puppy *байкабаптыр. |
#Aygül puppy *байкабаптыр. |
||
+ | </pre> |
||
+ | |||
+ | === OOV handled much better without trimming === |
||
+ | <pre> |
||
+ | $ echo "^Айгүл<np><ant><f><nom>$ ^күчүк<n><acc>$ ^байка<v><tv><ifi><evid><p3><sg>$^.<sent>$^.<sent>$" | rest-of-pipeline |
||
+ | Aygül @байка puppy.. |
||
+ | |||
+ | $ echo "^Айгүл<np><ant><f><nom>$ ^күчүк<n><acc>$ ^байка<v><tv><neg><ifi><evid><p3><sg>$^.<sent>$^.<sent>$" | rest-of-pipeline |
||
+ | Aygül did not @байка puppy.. |
||
</pre> |
</pre> |
||
=== Ideal OOV handling === |
=== Ideal OOV handling === |
||
+ | All of this is hypothetical (all of the above is current behaviour). |
||
+ | |||
+ | ==== Option 1 ==== |
||
+ | |||
+ | Use source language information for |
||
+ | * transfer rules |
||
+ | * generation, with pseudo-lemmas from source language |
||
+ | |||
+ | Difference from current non-trimming behaviour: TL morphology is added to SL lemma at some point. |
||
+ | |||
+ | <pre> |
||
+ | $ echo "Айгүл күчүктү байкаптыр." | apertium -d . kir-eng |
||
+ | Aygül @байка-ed puppy. |
||
+ | |||
+ | $ echo "Айгүл күчүктү байкабаптыр." | apertium -d . kir-eng |
||
+ | Aygül did not @байка puppy. |
||
+ | </pre> |
||
+ | |||
+ | OR |
||
+ | |||
+ | ==== Option 2 ==== |
||
+ | |||
+ | Use source language information for |
||
+ | * transfer rules |
||
+ | |||
+ | Only difference from current non-trimming behaviour: surface form is output instead of lemma. |
||
<pre> |
<pre> |
||
$ echo "Айгүл күчүктү байкаптыр." | apertium -d . kir-eng |
$ echo "Айгүл күчүктү байкаптыр." | apertium -d . kir-eng |
||
− | Aygül |
+ | Aygül @байкабаптыр puppy. |
$ echo "Айгүл күчүктү байкабаптыр." | apertium -d . kir-eng |
$ echo "Айгүл күчүктү байкабаптыр." | apertium -d . kir-eng |
||
− | Aygül did not |
+ | Aygül did not @байкабаптыр puppy. |
</pre> |
</pre> |
Latest revision as of 13:36, 22 June 2020
Contents
For transfer[edit]
OOV handled poorly with trimming[edit]
$ echo "Айгүл күчүктү издептир." | apertium -d . kir-eng Aygül looked for puppy. $ echo "Айгүл күчүктү издебептир." | apertium -d . kir-eng Aygül did not look for puppy.
$ echo "Айгүл күчүктү байкаптыр." | apertium -d . kir-eng #Aygül puppy *байкаптыр. $ echo "Айгүл күчүктү байкабаптыр." | apertium -d . kir-eng #Aygül puppy *байкабаптыр.
OOV handled much better without trimming[edit]
$ echo "^Айгүл<np><ant><f><nom>$ ^күчүк<n><acc>$ ^байка<v><tv><ifi><evid><p3><sg>$^.<sent>$^.<sent>$" | rest-of-pipeline Aygül @байка puppy.. $ echo "^Айгүл<np><ant><f><nom>$ ^күчүк<n><acc>$ ^байка<v><tv><neg><ifi><evid><p3><sg>$^.<sent>$^.<sent>$" | rest-of-pipeline Aygül did not @байка puppy..
Ideal OOV handling[edit]
All of this is hypothetical (all of the above is current behaviour).
Option 1[edit]
Use source language information for
- transfer rules
- generation, with pseudo-lemmas from source language
Difference from current non-trimming behaviour: TL morphology is added to SL lemma at some point.
$ echo "Айгүл күчүктү байкаптыр." | apertium -d . kir-eng Aygül @байка-ed puppy. $ echo "Айгүл күчүктү байкабаптыр." | apertium -d . kir-eng Aygül did not @байка puppy.
OR
Option 2[edit]
Use source language information for
- transfer rules
Only difference from current non-trimming behaviour: surface form is output instead of lemma.
$ echo "Айгүл күчүктү байкаптыр." | apertium -d . kir-eng Aygül @байкабаптыр puppy. $ echo "Айгүл күчүктү байкабаптыр." | apertium -d . kir-eng Aygül did not @байкабаптыр puppy.