Difference between revisions of "Bitextor"

From Apertium
Jump to navigation Jump to search
Line 8: Line 8:


* [http://bitextor.sourceforge.net Bitextor project web] or [http://bitextor.wiki.sourceforge.net Bitextor's wiki (documentation)]
* [http://bitextor.sourceforge.net Bitextor project web] or [http://bitextor.wiki.sourceforge.net Bitextor's wiki (documentation)]
* [http://tag-aligner.sourceforge.net TagAligner's project web] [http://tag-aligner.wiki.sourceforge.net TagAligner's wiki (documentation)]
* [http://tag-aligner.sourceforge.net TagAligner's project web] or [http://tag-aligner.wiki.sourceforge.net TagAligner's wiki (documentation)]
* [http://en.wikipedia.org/wiki/Translation_Memory_eXchange TMX on Wikipedia]
* [http://en.wikipedia.org/wiki/Translation_Memory_eXchange TMX on Wikipedia]
* [http://www.lisa.org/fileadmin/standards/tmx1.4/tmx.htm TMX 1.4b specification]
* [http://www.lisa.org/fileadmin/standards/tmx1.4/tmx.htm TMX 1.4b specification]

Revision as of 17:06, 12 May 2009

Bitextor: is an application whose objective is to generate translation memories using as multilingual websites as a corpus source. It downloads all the HTML files in a website, it performs a preprocess to convert them to a coherent and suitable format and, finally, applies a set of heuristics (based mainly on HTML tag structure and text block length) to make pairs of files which are candidates to contain the same text in different languages. From these candidates, translation memories are generated in TMX format using the library LibTagAligner, which uses the HTML tags and the length of text chunks to perform the alignment.

See also

External links

Tools used by Bitextor