Difference between revisions of "Apertium Turkic/Use/MT"

From Apertium
Jump to navigation Jump to search
Line 4: Line 4:
   
 
#. Make two binaries in languages/apertium-kaz, one vanilla and one MT.
 
#. Make two binaries in languages/apertium-kaz, one vanilla and one MT.
#* Cons: takes a lot longer to make apertium-kaz, installed size doubles, looks messy
+
#* Cons: takes a lot longer to make apertium-kaz; installed size of languages module doubles; messy
   
 
#. Make two transducers in /languages/ (vanilla + MT), but run the MT only to att.gz, while the vanilla is run to full binary.
#. Have Use/MT as a visible tag, removed by CG or similar
 
  +
#* Cons: takes longer to make apertium-kaz; installed size of languages module increases; still a bit messy
#* Cons: Ugggly
 
   
#. Have Use/MT as a compiler-tag, paths with this tag are removed by twol when compiling for vanilla, while the tag itself is removed when compiling for MT
+
#. Have Use/MT as a visible tag, with paths (vanilla) or tags (MT) removed by postprocessing scripts (or CG)
 
#* Cons: Ugggly. Ugly. Ugly.
   
  +
#. Have Use/MT as a compiler-tag, paths with this tag are removed by twol when compiling for vanilla, while the tag itself is removed by twol when compiling for MT
#. Redundant make steps in pairs
 
  +
#* Cons: compilation-internal tags, that's almost as bad as flag diacritics.
   
  +
#. Redundant make steps in pairs: the pair references $(AP_SRC1)/apertium-kaz.lexc and does all the same steps up to att.gz which are specified in the /languages/apertium-kaz/Makefile (except in /languages/, there's a "grep -v Use/MT" line in the lexc-step)
#. Create a transducer with only the grep lines. Subtract that from vanilla after creating the att.gz
 
  +
#* Cons: Lots of redundant make code, an update one place leads to a lot of Makefiles having to be updated; takes a lot longer to make each language pair; messy
   
  +
#. Create a binary transducer with only the "grep Use/MT" lines. Use "hfst-subtract MT.bin Use/MT-only.bin" to subtract only the Use/MT lines from the MT transducer to create the vanilla transducer in /languages/
#. Two transducers in langs, but run one to att.gz, vanilla to full binary.
 
  +
#* Cons: We can't simply "grep Use/MT *lexc", since we also need the LEXICON lines and dog knows what else, would lead to really ugly lexicon-specific rules. And still we end up with two binary transducers in /languages/. Ugly.

Revision as of 11:58, 16 January 2014

The dictionary languages/apertium-kaz/apertium-kaz.kaz.lexc has a bunch of lines with the comment ! Use/MT eng-kaz. These lines are only to appear in machine translation pairs, not when creating a "vanilla" transducer. It's easy to grep them out, but harder to get this into the current build system where language pairs depend on the pre-built att.gz of languages/apertium-kaz and trim that.

Some options:

  1. . Make two binaries in languages/apertium-kaz, one vanilla and one MT.
    • Cons: takes a lot longer to make apertium-kaz; installed size of languages module doubles; messy
  1. . Make two transducers in /languages/ (vanilla + MT), but run the MT only to att.gz, while the vanilla is run to full binary.
    • Cons: takes longer to make apertium-kaz; installed size of languages module increases; still a bit messy
  1. . Have Use/MT as a visible tag, with paths (vanilla) or tags (MT) removed by postprocessing scripts (or CG)
    • Cons: Ugggly. Ugly. Ugly.
  1. . Have Use/MT as a compiler-tag, paths with this tag are removed by twol when compiling for vanilla, while the tag itself is removed by twol when compiling for MT
    • Cons: compilation-internal tags, that's almost as bad as flag diacritics.
  1. . Redundant make steps in pairs: the pair references $(AP_SRC1)/apertium-kaz.lexc and does all the same steps up to att.gz which are specified in the /languages/apertium-kaz/Makefile (except in /languages/, there's a "grep -v Use/MT" line in the lexc-step)
    • Cons: Lots of redundant make code, an update one place leads to a lot of Makefiles having to be updated; takes a lot longer to make each language pair; messy
  1. . Create a binary transducer with only the "grep Use/MT" lines. Use "hfst-subtract MT.bin Use/MT-only.bin" to subtract only the Use/MT lines from the MT transducer to create the vanilla transducer in /languages/
    • Cons: We can't simply "grep Use/MT *lexc", since we also need the LEXICON lines and dog knows what else, would lead to really ugly lexicon-specific rules. And still we end up with two binary transducers in /languages/. Ugly.