Difference between revisions of "Apertium separable/report2017"
Jump to navigation
Jump to search
Firespeaker (talk | contribs) |
m (Irene moved page Lsx module/report2017 to Apertium separable/report2017: Rename page) |
||
(2 intermediate revisions by one other user not shown) | |||
Line 13: | Line 13: | ||
* all spacing, punctuation, and superblanks were preserved |
* all spacing, punctuation, and superblanks were preserved |
||
* support for the "plus thing": |
|||
<pre> |
|||
echo "^абай<adj>$ ^бол<v><iv><neg><aor><p2><sg>+ма<qst>$^.<sent>$" | lsx-proc kaz-kir.autoseq.bin |
|||
^абай бол<v><iv><neg><aor><p2><sg>+ма<qst>$^.<sent>$ |
|||
</pre> |
|||
* (for language developers: have the language-data writer write it explicitly in the .lsx file) |
* (for language developers: have the language-data writer write it explicitly in the .lsx file) |
||
* For a full list of commits, see https://apertium.projectjj.com/gsoc2017/irene-tang.html |
* For a '''full list of commits''', see https://apertium.projectjj.com/gsoc2017/irene-tang.html |
||
* For further documentation usage instructions, see [[Lsx_module]] |
* For further documentation usage instructions, see [[Lsx_module]] |
||
Latest revision as of 18:36, 15 November 2017
Project description[edit]
The purpose of this project is to allow Apertium language-pair developers to better translate "seperable" or "discontiguous" multiwords. We do this by re-ordering word tokens before translation occurs. For example, "take something out" becomes "take out something" so that "take out" can be translated as a single unit.
To do this, a finite-state transducer was used. The transducer accepted certain patterns of words (paradigms), such as adj-noun or det-adj-noun, that could separate the multiword. If the pattern was accepted, then the transducer would output the re-ordered words for better translation quality.
Work done[edit]
- established dictionary format
- <j/>, <t/>, <w/> are supported within pair entries and loop as expected
- see https://svn.code.sf.net/p/apertium/svn/branches/apertium-separable/examples/apertium-eng-spa.eng-spa.lsx for an example dictionary
- implemented a compiler for the separable-words dictionary and a processor to process tagged input
- all spacing, punctuation, and superblanks were preserved
- (for language developers: have the language-data writer write it explicitly in the .lsx file)
- For a full list of commits, see https://apertium.projectjj.com/gsoc2017/irene-tang.html
- For further documentation usage instructions, see Lsx_module