From Apertium
Jump to navigation Jump to search

My wishlist for Apertium features (mostly just useful for language pair developers).

Allow the chunk tag wherever we allow other "strings"

<chunk name="foo"><tags><tag><lit-tag v="bar"/></tag></tags><lu><lit v="fie"/></lu></chunk> just outputs ^foo<bar>{fie}$ -- a simple string. We can have strings from tags, literals and variables inside variables, but not with the chunk tag, leading to this kind of mess:

             <lit v="^pron"/>
             <lit-tag v="@SUBJ→"/>
             <clip pos="1" part="pers"/>
             <lit-tag v="GD"/>
             <clip pos="1" part="nbr"/>
             <lit-tag v="nom"/>
             <lit v="{^"/>
             <lit v="prpers"/>
             <lit-tag v="prn"/>
             <clip pos="1" part="pers"/>
             <lit-tag v="mf"/>
             <clip pos="1" part="nbr"/>
             <lit-tag v="nom"/>
             <lit v="$}$"/>

Wish: allow <let><chunk>...</chunk></let> and <concat><chunk>...</chunk></concat> (chunk "returns" a string, variables hold strings).

A "grouping" tag for bidix

Most of the time when LR-ing and RL-ing in bidix, we have one pair of entries that work in both directions, with possibly lots of LR's that all go to the same <r>, or lots of RL's that all go to the same <l>. Making certain these actually _do_ go to the same, where they should, means looking through lots of entries manually, since in some cases we _don't_ want it to be like that (ie. we can't just write a program to check this since there are general rules and there are exceptions).

What I'd like is just some way of keeping LR's and RL's in bidix together. One possibility would be to represent it this way:

   <em>       <p><l>foo</l><r>bar</r></p></em>
   <LR>        <p><l>fie</l>                    </p></LR>
   <RL>        <p>                  <r>bum</r></p></RL>
 <e r="LR"><p><l>foe</l><r>baz</r></p></e>

This would be equivalent to:

 <e>           <p><l>foo</l><r>bar</r></p></e>
 <e r="LR"><p><l>fie</l><r>bar</r></p></e>
 <e r="RL"><p><l>foo</l><r>bum</r></p></e>
 <e r="LR"><p><l>foe</l><r>baz</r></p></e>

The idea is that within the <eg> entries, we know that all LR's have the same <r>, and all RL's have the same <l>, and so an LR can't have an <r> specified.