Difference between revisions of "Apertium Turkic/Use/MT"
Jump to navigation
Jump to search
(Created page with "The dictionary languages/apertium-kaz/apertium-kaz.kaz.lexc has a bunch of lines with the comment <tt>! Use/MT eng-kaz</tt>. These lines are only to appear in machine translat...") |
|||
(6 intermediate revisions by 2 users not shown) | |||
Line 3: | Line 3: | ||
Some options: |
Some options: |
||
# |
# Make two binaries in languages/apertium-kaz, one vanilla and one MT. |
||
#* Cons: |
#* Cons: Takes a lot longer to make apertium-kaz |
||
#* Installed size of languages module doubles |
|||
#* Messy |
|||
#. Use/MT as a compiler-tag, removed by twol in langs |
|||
⚫ | |||
#* Cons: Takes longer to make apertium-kaz |
|||
#. Redundant make steps in pairs |
|||
#* Installed size of languages module increases |
|||
#* Still a bit messy |
|||
#. Use/MT as visible tag, removed by CG or similar |
|||
# Have Use/MT as a visible tag, with paths (vanilla) or tags (MT) removed by postprocessing scripts (or CG) |
|||
#* Cons: Ugggly |
|||
#. Create a transducer with only the grep lines. Subtract that from vanilla after creating the att.gz |
|||
#* Ugly |
|||
#* Ugly |
|||
⚫ | |||
# Have Use/MT as a compiler-tag, paths with this tag are removed by twol when compiling for vanilla, while the tag itself is removed by twol when compiling for MT |
|||
#* Cons: compilation-internal tags, that's almost as bad as flag diacritics. |
|||
# Redundant make steps in pairs: the pair references $(AP_SRC1)/apertium-kaz.lexc and does all the same steps up to att.gz which are specified in the /languages/apertium-kaz/Makefile (except in /languages/, there's a "grep -v Use/MT" line in the lexc-step) |
|||
#* Cons: Lots of redundant make code |
|||
#* An update one place leads to a lot of Makefiles having to be updated |
|||
#* It takes a lot longer to make each language pair |
|||
#* Messy |
|||
# Create a binary transducer with only the "grep Use/MT" lines. Use "hfst-subtract MT.bin Use/MT-only.bin" to subtract only the Use/MT lines from the MT transducer to create the vanilla transducer in /languages/ |
|||
#* Cons: We can't simply "grep Use/MT *lexc", since we also need the LEXICON lines and dog knows what else |
|||
#* It would lead to really ugly lexicon-specific rules |
|||
#* We still end up with two binary transducers in /languages/ |
|||
#* Ugly |
|||
# Use a <code>./configure</code> option (e.g. <code>--with-variant=mt</code>, <code>--with-variant=vanilla</code>) to switch a particular checkout of <code>apertium-kaz</code> to do one or the other (basically just ifdef some greps in the makefile). |
|||
#* Cons: Means that if you want both you need two checkouts, or keep switching back-and-forth. |
|||
#* Easy to forget which version you have, and not know why your MT pair suddenly doesn't work like it should |
|||
#** That can be counteracted a bit by having the make goals give them different names |
Latest revision as of 12:35, 16 January 2014
The dictionary languages/apertium-kaz/apertium-kaz.kaz.lexc has a bunch of lines with the comment ! Use/MT eng-kaz. These lines are only to appear in machine translation pairs, not when creating a "vanilla" transducer. It's easy to grep them out, but harder to get this into the current build system where language pairs depend on the pre-built att.gz of languages/apertium-kaz and trim that.
Some options:
- Make two binaries in languages/apertium-kaz, one vanilla and one MT.
- Cons: Takes a lot longer to make apertium-kaz
- Installed size of languages module doubles
- Messy
- Make two transducers in /languages/ (vanilla + MT), but run the MT only to att.gz, while the vanilla is run to full binary.
- Cons: Takes longer to make apertium-kaz
- Installed size of languages module increases
- Still a bit messy
- Have Use/MT as a visible tag, with paths (vanilla) or tags (MT) removed by postprocessing scripts (or CG)
- Cons: Ugggly
- Ugly
- Ugly
- Have Use/MT as a compiler-tag, paths with this tag are removed by twol when compiling for vanilla, while the tag itself is removed by twol when compiling for MT
- Cons: compilation-internal tags, that's almost as bad as flag diacritics.
- Redundant make steps in pairs: the pair references $(AP_SRC1)/apertium-kaz.lexc and does all the same steps up to att.gz which are specified in the /languages/apertium-kaz/Makefile (except in /languages/, there's a "grep -v Use/MT" line in the lexc-step)
- Cons: Lots of redundant make code
- An update one place leads to a lot of Makefiles having to be updated
- It takes a lot longer to make each language pair
- Messy
- Create a binary transducer with only the "grep Use/MT" lines. Use "hfst-subtract MT.bin Use/MT-only.bin" to subtract only the Use/MT lines from the MT transducer to create the vanilla transducer in /languages/
- Cons: We can't simply "grep Use/MT *lexc", since we also need the LEXICON lines and dog knows what else
- It would lead to really ugly lexicon-specific rules
- We still end up with two binary transducers in /languages/
- Ugly
- Use a
./configure
option (e.g.--with-variant=mt
,--with-variant=vanilla
) to switch a particular checkout ofapertium-kaz
to do one or the other (basically just ifdef some greps in the makefile).- Cons: Means that if you want both you need two checkouts, or keep switching back-and-forth.
- Easy to forget which version you have, and not know why your MT pair suddenly doesn't work like it should
- That can be counteracted a bit by having the make goals give them different names