Difference between revisions of "Ideas for Google Summer of Code/Regular expressions in lt-tmxproc"
Jump to navigation
Jump to search
m (some blurb) |
(No difference)
|
Revision as of 17:51, 27 March 2011
Gintrowicz and Jassem describe an idea for getting more reuse from translation memories, by extending them with regular expressions.
For example, the sample rule:
Rule 1: 1. <instance>([0-9]{1,2})[\.]([0-9]{1,2})[\.]([0-9]{2,4})</instance> 2. <source>([0-9]{1,2})[\.]([0-9]{1,2})[\.]([0-9]{2,4})</source> 3. <target>([0-9]{1,2})[\/]([0-9]{1,2})[\/]([0-9]{2,4})</target> 4. <orders> 5. <order sourceGroup=”1” suffix=”/” /> 6. <order sourceGroup=”2” suffix=”/” /> 7. <order sourceGroup=”3” suffix=”” /> 8. </orders>
takes a Polish date (27.03.2011) and reformats it as an English date (27/03/2011).
lttoolbox
has support for simple regexes; lt-tmxproc
builds on lttoolbox, to build a finite state transducer from TMX files. At present, it includes similar support for similar numbers, by inserting the special symbol <n> in place of the number in the transducer; at runtime, when this symbol is encountered, numbers are copied straight from input to output.
The idea of this project is to extend lt-tmxproc to include the regular expressions support in lttoolbox.