Difference between revisions of "Talk:Apertium and Constraint Grammar"
(whatever this was it's outdated now) |
|||
(9 intermediate revisions by 3 users not shown) | |||
Line 116: | Line 116: | ||
==Current bugs== |
==Current bugs== |
||
== Wishlist == |
|||
=== <strike>Ability to specify where a MAPPING tag should be added in the tag_list</strike> === |
|||
Tags in vislcg3 are "unordered", but the input order is preserved, and MAPPING tags are added to the end. However, since Apertium matches longest left-to-right strings, we may have to disambiguate between |
|||
<code>ganga# i<vblex></code> and <code>ganga<vblex>+i<pr></code>. The first one is easy, "ganga# i" is seen as the baseform and there is just one tag, vblex, we might get something like <code>ganga# i<vblex><@FVMAIN></code>. The second one is worse. The + means that the multiword should be split into two before transfer, <code>^ganga<vblex>$ ^i<pr>$</code>; but if the mapping tags go to the end, or even after the first word, we'll get <code>^ganga<vblex><@FVMAIN><@PART>+i<pr>$</code> or <code>^ganga<vblex>+i<pr><@FVMAIN><@PART>$</code>, but we want <code>^ganga<vblex><@FVMAIN>+i<pr><@PART>$</code>. |
|||
'''CG Syntax change''': |
|||
We could say something like |
|||
<pre> |
|||
MAP (@FVMAIN) TARGET VPart:0 (1* FOO); |
|||
MAP (@PART) TARGET VPart:1 (-1* BAR); |
|||
</pre> |
|||
: This is better done using [[Subreadings]]. |
Latest revision as of 11:00, 18 September 2014
- Window = whole of what we're looking at; several sentences at the same time.
- SingleWindow = one sentence (for want of a better term). Usually there's 3 SingleWindow in a Window, but that's runtime defined. Can be anywhere from 1 to hundreds set with --num-windows
- Cohort = one
Contents
Testing[edit]
Regression test status as of 22:07, 17 April 2008 (BST)
Running tests... T_AnyMinusSome: Fail. T_Barrier: Success. T_BasicAppend: Success. T_BasicContextTest: Success. T_BasicDelimit: Success. T_BasicIff: Success. T_BasicRemove: Success. T_BasicSelect: Success. T_BasicSubstitute: Success. T_CarefulBarrier: Fail. T_CompositeSelect: Success. T_DontMatchEmptySet: Fail. T_EndlessSelect: Fail. T_MapAdd_Different: Fail. T_MatchBaseform: Success. T_MatchWordform: Success. T_MultipleSections: Success. T_NegatedContextTest: Success. T_RegExp_Map: Fail. T_RegExp_Select: Fail. T_RemoveSingleTag: Fail. T_ScanningTests: Success. T_Sections: Fail. T_SetOp_FailFast: Success. T_SetOp_OR: Success. T_SpaceInWord: Success. T_SuperBlanks: Success. T_Unification: Fail. T_UnknownWord: Success.
Regression test status as of 10:46, 3 July 2008 (UTC)
T_AnyMinusSome: Fail. T_Barrier: Success. T_BasicAppend: Fail. T_BasicContextTest: Success. T_BasicDelimit: Success. T_BasicIff: Success. T_BasicRemove: Success. T_BasicSelect: Success. T_BasicSubstitute: Success. T_CarefulBarrier: Fail. T_CompositeSelect: Success. T_DontMatchEmptySet: Fail. T_EndlessSelect: Fail. T_MapAdd_Different: Fail. T_MatchBaseform: Success. T_MatchWordform: Success. T_MultipleSections: Success. T_MultiWords: Success. T_NegatedContextTest: Success. T_RegExp_Map: Fail. T_RegExp_Select: Fail. T_RemoveSingleTag: Fail. T_ScanningTests: Fail. T_Sections: Fail. T_SetOp_FailFast: Success. T_SetOp_OR: Success. T_SpaceInWord: Success. T_SuperBlanks: Success. T_SuperBlanksNewline: Success. T_Unification: Fail. T_UnknownWord: Success.
Regression test status as of 07:30, 17 July 2008 (UTC)
T_AnyMinusSome: Success. T_Barrier: Success. T_BasicAppend: Success. T_BasicContextTest: Success. T_BasicDelimit: Success. T_BasicIff: Success. T_BasicRemove: Success. T_BasicSelect: Success. T_BasicSubstitute: Success. T_CarefulBarrier: Success. T_CompositeSelect: Success. T_DontMatchEmptySet: Success. T_EndlessSelect: Fail. T_Joiner: Success. T_MapAdd_Different: Success. T_MatchBaseform: Success. T_MatchWordform: Success. T_MultipleSections: Success. T_MultiWords: Success. T_NegatedContextTest: Success. T_RegExp_Map: Success. T_RegExp_Select: Success. T_RegExp_Substitute: Success. T_RemoveSingleTag: Fail. T_ScanningTests: Success. T_Sections: Fail. T_SetOp_FailFast: Success. T_SetOp_OR: Success. T_SpaceInWord: Success. T_SuperBlanks: Success. T_SuperBlanksNewline: Success. T_Unification: Fail. T_UnknownWord: Success.
Current bugs[edit]
Wishlist[edit]
Ability to specify where a MAPPING tag should be added in the tag_list[edit]
Tags in vislcg3 are "unordered", but the input order is preserved, and MAPPING tags are added to the end. However, since Apertium matches longest left-to-right strings, we may have to disambiguate between
ganga# i<vblex>
and ganga<vblex>+i<pr>
. The first one is easy, "ganga# i" is seen as the baseform and there is just one tag, vblex, we might get something like ganga# i<vblex><@FVMAIN>
. The second one is worse. The + means that the multiword should be split into two before transfer, ^ganga<vblex>$ ^i<pr>$
; but if the mapping tags go to the end, or even after the first word, we'll get ^ganga<vblex><@FVMAIN><@PART>+i<pr>$
or ^ganga<vblex>+i<pr><@FVMAIN><@PART>$
, but we want ^ganga<vblex><@FVMAIN>+i<pr><@PART>$
.
CG Syntax change: We could say something like
MAP (@FVMAIN) TARGET VPart:0 (1* FOO); MAP (@PART) TARGET VPart:1 (-1* BAR);
- This is better done using Subreadings.