Difference between revisions of "Integration and tagset conversion with Giellatekno"
Jump to navigation
Jump to search
(Created page with "One language pair setup nowadays is using transducers from Giellatekno and pair-specific data in Apertium. This is a tricky set up because there is a lot of machinery around t...") |
|||
Line 1: | Line 1: | ||
+ | {{TOCD}} |
||
One language pair setup nowadays is using transducers from Giellatekno and pair-specific data in Apertium. This is a tricky set up because there is a lot of machinery around the tagset conversion. |
One language pair setup nowadays is using transducers from Giellatekno and pair-specific data in Apertium. This is a tricky set up because there is a lot of machinery around the tagset conversion. |
||
Line 14: | Line 15: | ||
* <code>apertium-xxx-yyy/gt2apertium.cg3r</code>: This file is used for converting the CG file tags to Apertium tags. You may have to convert tags in more than one place. |
* <code>apertium-xxx-yyy/gt2apertium.cg3r</code>: This file is used for converting the CG file tags to Apertium tags. You may have to convert tags in more than one place. |
||
− | * <code> |
||
==Testing and troubleshooting== |
==Testing and troubleshooting== |
Revision as of 10:55, 11 November 2015
One language pair setup nowadays is using transducers from Giellatekno and pair-specific data in Apertium. This is a tricky set up because there is a lot of machinery around the tagset conversion.
Let's assume you're using giella-xxx
, giella-yyy
and apertium-xxx-yyy
, what are the relevant files ?
Giellatekno side
giella-xxx/tools/mt/apertium
giella-xxx/tools/mt/apertium/tagsets
giella-xxx/tools/mt/apertium/tagsets/apertium.postproc.relabel
: This file is used for 1:1 tag conversions. For example if you want to change<cc>
to<cnjcoo>
.giella-xxx/tools/mt/apertium/tagsets/modify-tags.regex
: This file is used for 1:1, 1:n and n:1 tag conversions. For example if you want to change<sg3>
to<p3><sg>
.
Apertium side
apertium-xxx-yyy/gt2apertium.cg3r
: This file is used for converting the CG file tags to Apertium tags. You may have to convert tags in more than one place.
Testing and troubleshooting
A lot of the time it takes a lot of time and patience to get the tags as they should be. Here are some tips for checking which file you need to look in.
Apertium side
- Check the trimmed analyser
$ echo , | hfst-lookup xxx-yyy.automorf.hfst , ,<cm> 0,000000
- Check the untrimmed analyser
$ echo , | hfst-lookup .deps/xxx.automorf.hfst , ,<cm> 0,000000
Giellatekno side
- Check the relabelled analyser
$ echo , | hfst-lookup tools/mt/apertium/analyser-mt-apertium-desc.yyy.hfstol , ,<cm> 0,000000
- Check the unrelabelled analyser
$ echo , | hfst-lookup src/analyser-gt-desc.hfstol , ,+CLB 0,000000