Ideas for Google Summer of Code/Regular expressions in lt-tmxproc
Jump to navigation
Jump to search
Gintrowicz and Jassem describe an idea for getting more reuse from translation memories, by extending them with regular expressions.
For example, the sample rule:
Rule 1: 1. <instance>([0-9]{1,2})[\.]([0-9]{1,2})[\.]([0-9]{2,4})</instance> 2. <source>([0-9]{1,2})[\.]([0-9]{1,2})[\.]([0-9]{2,4})</source> 3. <target>([0-9]{1,2})[\/]([0-9]{1,2})[\/]([0-9]{2,4})</target> 4. <orders> 5. <order sourceGroup=”1” suffix=”/” /> 6. <order sourceGroup=”2” suffix=”/” /> 7. <order sourceGroup=”3” suffix=”” /> 8. </orders>
takes a Polish date (27.03.2011) and reformats it as an English date (27/03/2011).
lttoolbox
has support for simple regexes; lt-tmxproc
builds on lttoolbox, to build a finite state transducer from TMX files. At present, it includes similar support for similar numbers, by inserting the special symbol <n> in place of the number in the transducer; at runtime, when this symbol is encountered, numbers are copied straight from input to output.
The idea of this project is to extend lt-tmxproc to include the regular expressions support in lttoolbox.