Difference between revisions of "User:Unhammer/wishlist"
Line 1: | Line 1: | ||
My wishlist for Apertium features (mostly just useful for language pair developers). |
My wishlist for Apertium features (mostly just useful for language pair developers). |
||
See also [[Talk:Northern Sámi and Norwegian#Wishlist_.2F_Difficulties_with_the_architecture_.2F_Ugly_hacks]] |
|||
==Fallthrough option in transfer== |
==Fallthrough option in transfer== |
Revision as of 13:57, 2 March 2010
My wishlist for Apertium features (mostly just useful for language pair developers).
See also Talk:Northern Sámi and Norwegian#Wishlist_.2F_Difficulties_with_the_architecture_.2F_Ugly_hacks
Contents
Fallthrough option in transfer
Some times, you match an input pattern in a rule, eg. "n vblex", and you check whether the target-language n has some feature, and then only if it has that feature do you do something special with it. It would be great if we could specify in the <otherwise>
that we want to fall through, ignoring that this rule matched.
There are two options for how to "ignore", the best (but possibly slowest?) would be to go on with trying to match on the rest of the rules, the other option is to act as if no rules matched. Both would be an improvement.
UTF-8 in sdefs
But, being XML id's, this is maybe not possible?
Allow the chunk tag wherever we allow other "strings"
<chunk name="foo"><tags><tag><lit-tag v="bar"/></tag></tags><lu><lit v="fie"/></lu></chunk>
just outputs ^foo<bar>{fie}$
-- a simple string. We can have strings from tags, literals and variables inside variables, but not with the chunk tag, leading to this kind of mess:
<let> <concat> <lit v="^pron"/> <lit-tag v="@SUBJ→"/> <clip pos="1" part="pers"/> <lit-tag v="GD"/> <clip pos="1" part="nbr"/> <lit-tag v="nom"/> <lit v="{^"/> <lit v="prpers"/> <lit-tag v="prn"/> <clip pos="1" part="pers"/> <lit-tag v="mf"/> <clip pos="1" part="nbr"/> <lit-tag v="nom"/> <lit v="$}$"/> </concat> </let>
Wish: allow <let><chunk>...</chunk></let>
and <concat><chunk>...</chunk></concat>
(chunk "returns" a string, variables hold strings).
A "grouping" tag for bidix
Most of the time when LR-ing and RL-ing in bidix, we have one pair of entries that work in both directions, with possibly lots of LR's that all go to the same <r>
, or lots of RL's that all go to the same <l>
. Making certain these actually _do_ go to the same, where they should, means looking through lots of entries manually, since in some cases we _don't_ want it to be like that (ie. we can't just write a program to check this since there are general rules and there are exceptions).
What I'd like is just some way of keeping LR's and RL's in bidix together. One possibility would be to represent it this way:
<eg> <em> <p><l>foo</l><r>bar</r></p></em> <LR> <p><l>fie</l> </p></LR> <RL> <p> <r>bum</r></p></RL> </eg> <e r="LR"><p><l>foe</l><r>baz</r></p></e>
This would be equivalent to:
<e> <p><l>foo</l><r>bar</r></p></e> <e r="LR"><p><l>fie</l><r>bar</r></p></e> <e r="RL"><p><l>foo</l><r>bum</r></p></e> <e r="LR"><p><l>foe</l><r>baz</r></p></e>
The idea is that within the <eg>
entries, we know that all LR's have the same <r>
, and all RL's have the same <l>
, and so an LR can't have an <r>
specified.
Better apertium-gen-modes
apertium-gen-modes is used for two purposes:
- making local modes files for, used like
apertium -d . nn-nb
- making installable modes files, used like
apertium nn-nb
Unfortunately, each time you sudo make install, the local ones are overwritten by files which have root ownership. Very annoying.
To avoid this, the Makefile.am in apertium-nn-nb currently has
modes/$(PREFIX1).mode: modes.xml apertium-gen-modes modes.xml cp *.mode modes/ modes/$(PREFIX2).mode: modes.xml apertium-gen-modes modes.xml cp *.mode modes/ apertium_nn_nb_DATA= […] modes/$(PREFIX1).mode modes/$(PREFIX2).mode modes.xml install-data-local: mv modes modes.bak apertium-gen-modes modes.xml apertium-$(PREFIX1) rm -rf modes mv modes.bak modes test -d $(apertium_nn_modesdir) || mkdir $(apertium_nn_modesdir) $(INSTALL_DATA) $(PREFIX1).mode $(apertium_nn_modesdir) $(INSTALL_DATA) $(PREFIX2).mode $(apertium_nn_modesdir) rm $(PREFIX1).mode $(PREFIX2).mode
There must be a better way. One could shorten it down to
modes/$(PREFIX1).mode: modes.xml apertium-gen-modes modes.xml modes/$(PREFIX2).mode: modes.xml apertium-gen-modes modes.xml noinst_DATA=modes/$(PREFIX1).mode modes/$(PREFIX2).mode modes.xml install-data-local: apertium-gen-modes modes.xml apertium-$(PREFIX1) test -d $(apertium_nn_modesdir) || mkdir $(apertium_nn_modesdir) $(INSTALL_DATA) $(PREFIX1).mode $(apertium_nn_modesdir) $(INSTALL_DATA) $(PREFIX2).mode $(apertium_nn_modesdir) rm $(PREFIX1).mode $(PREFIX2).mode
by applying
Index: apertium/apertium-createmodes.awk =================================================================== --- apertium/apertium-createmodes.awk (revision 20175) +++ apertium/apertium-createmodes.awk (working copy) @@ -8,13 +8,12 @@ } else if(HEAD != 0) { - myfilename = NAME ".mode"; - if(ARR[3] == "yes") + if(ARR[3] == "yes" || install == "no") { - myfilename = "../" myfilename; + myfilename = NAME ".mode"; + # fool code because a bug in mawk + printf $0 "\n" >> myfilename; + close(myfilename); } - # fool code because a bug in mawk - printf $0 "\n" >> myfilename; - close(myfilename); } } Index: apertium/Makefile.am =================================================================== --- apertium/Makefile.am (revision 20175) +++ apertium/Makefile.am (working copy) @@ -329,7 +329,7 @@ @cat modes-header.sh >> $@ @echo "$(XMLLINT) --dtdvalid $(apertiumdir)/modes.dtd --noout \$$FILE1 && \\" >> $@ @if [ `basename $(XSLTPROC)` == xsltproc ]; \ - then echo "$(XSLTPROC) --stringparam prefix $(prefix)/bin --stringparam dataprefix \$$FULLDIRNAME $(apertiumdir)/modes2bash.xsl \$$FILE1 | awk -f $(apertiumdir)/apertium-createmodes.awk PARAM=\$$FULLDIRNAME"; \ + then echo "$(XSLTPROC) --stringparam prefix $(prefix)/bin --stringparam dataprefix \$$FULLDIRNAME $(apertiumdir)/modes2bash.xsl \$$FILE1 | awk -f $(apertiumdir)/apertium-createmodes.awk PARAM=\$$FULLDIRNAME install=\$$INSTALL"; \ else echo "$(XSLTPROC) $(apertiumdir)/modes2bash.xsl \$$FILE1 \\\$$prefix=$(prefix)/bin \\\$$dataprefix=\$$FULLDIRNAME| awk -f $(apertiumdir)/apertium-createmodes.awk PARAM=\$$FULLDIRNAME"; \ fi >> $@ @chmod a+x $@ Index: apertium/modes-header.sh =================================================================== --- apertium/modes-header.sh (revision 20175) +++ apertium/modes-header.sh (working copy) @@ -17,15 +17,17 @@ rm -Rf *.mode -if [ ! -d $FULLDIRNAME/modes ] -then mkdir $FULLDIRNAME/modes -else rm -Rf $FULLDIRNAME/modes && mkdir $FULLDIRNAME/modes -fi - FILE1=$FULLDIRNAME/$(basename $1) -cd $FULLDIRNAME/modes -if [ $# -eq 2 ]; then +if [ $# -eq 1 ]; then + INSTALL="no" + if [ -d $FULLDIRNAME/modes ]; then + rm -Rf $FULLDIRNAME/modes + fi + mkdir $FULLDIRNAME/modes + cd $FULLDIRNAME/modes +elif [ $# -eq 2 ]; then + INSTALL="yes" PREFIX=$2; FULLDIRNAME=$APERTIUMDIR"/"$PREFIX; fi
but then a lot of Makefiles would have to be changed...