Difference between revisions of "Lsx module report"

Revision as of 03:02, 29 August 2017

Project description

The purpose of this project is to allow Apertium language-pair developers to better translate "seperable" or "discontiguous" multiwords. We do this by re-ordering word tokens before translation occurs. For example, "take something out" becomes "take out something" so that "take out" can be translated as a single unit.

To do this, a finite-state transducer was used. The transducer accepted certain patterns of words (paradigms), such as adj-noun or det-adj-noun, that could separate the multiword. If the pattern was accepted, then the transducer would output the re-ordered words for better translation quality.

Work done

established dictionary format
- <j/>, <t/>, <w/> are supported within pair entries and loop as expected

all spacing, punctuation, and superblanks were preserved

support for the "plus thing":

echo "^абай<adj>$ ^бол<v><iv><neg><aor><p2><sg>+ма<qst>$^.<sent>$" | lsx-proc kaz-kir.autoseq.bin
^абай бол<v><iv><neg><aor><p2><sg>+ма<qst>$^.<sent>$

(for language developers: have the language-data writer write it explicitly in the .lsx file)

For a full list of commits, see https://apertium.projectjj.com/gsoc2017/irene-tang.html

Future work

10:53 firespeaker: pektii: if we offload multiwords from the transducers to lsx, (1) how do we do N N compounds with lsx? (2) how does translation *to* a multiword work?
recycling dictionaries and/or paradigms? lsx-dictionaries are packaged in language pairs. the eng-spa lsx-dictionary can mostly be reaped by eng-cat. could we make use of the similarity?
Support for language pairs: we haven't gotten much extensive beta testing. The following are language pairs that have packaged the lsx-module:
- eng-cat
- eng-deu (?)

@@ Line 1: / Line 1: @@
 ==Project description==
+The purpose of this project is to allow Apertium language-pair developers to better translate "seperable" or "discontiguous" multiwords. We do this by re-ordering word tokens before translation occurs. For example, "take something out" becomes "take out something" so that "take out" can be translated as a single unit.
+To do this, a finite-state transducer was used. The transducer accepted certain patterns of words (paradigms), such as adj-noun or det-adj-noun, that could separate the multiword. If the pattern was accepted, then the transducer would output the re-ordered words for better translation quality.
+==Work done==
+* established dictionary format
+** <j/>, <t/>, <w/> are supported within pair entries and loop as expected
+* all spacing, punctuation, and superblanks were preserved
+* support for the "plus thing":
+<pre>
+echo "^абай<adj>$ ^бол<v><iv><neg><aor><p2><sg>+ма<qst>$^.<sent>$" | lsx-proc kaz-kir.autoseq.bin
+^абай бол<v><iv><neg><aor><p2><sg>+ма<qst>$^.<sent>$
+</pre>
+* (for language developers: have the language-data writer write it explicitly in the .lsx file)
+* For a full list of commits, see https://apertium.projectjj.com/gsoc2017/irene-tang.html
 ==Future work==

Difference between revisions of "Lsx module report"

Revision as of 03:02, 29 August 2017

Project description

Work done

Future work

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools