Difference between revisions of "Talk:Apertium and Constraint Grammar"
(whatever this was it's outdated now) |
|||
(15 intermediate revisions by 3 users not shown) | |||
Line 3: | Line 3: | ||
* Cohort = one |
* Cohort = one |
||
==Testing== |
|||
Some notes: |
|||
Regression test status as of 22:07, 17 April 2008 (BST) |
|||
<pre> |
<pre> |
||
Running tests... |
|||
cCohort = 0; |
|||
T_AnyMinusSome: Fail. |
|||
cWindow = 0; |
|||
T_Barrier: Success. |
|||
T_BasicAppend: Success. |
|||
T_BasicContextTest: Success. |
|||
T_BasicDelimit: Success. |
|||
T_BasicIff: Success. |
|||
T_BasicRemove: Success. |
|||
T_BasicSelect: Success. |
|||
T_BasicSubstitute: Success. |
|||
T_CarefulBarrier: Fail. |
|||
T_CompositeSelect: Success. |
|||
T_DontMatchEmptySet: Fail. |
|||
T_EndlessSelect: Fail. |
|||
T_MapAdd_Different: Fail. |
|||
T_MatchBaseform: Success. |
|||
T_MatchWordform: Success. |
|||
T_MultipleSections: Success. |
|||
T_NegatedContextTest: Success. |
|||
T_RegExp_Map: Fail. |
|||
T_RegExp_Select: Fail. |
|||
T_RemoveSingleTag: Fail. |
|||
T_ScanningTests: Success. |
|||
T_Sections: Fail. |
|||
T_SetOp_FailFast: Success. |
|||
T_SetOp_OR: Success. |
|||
T_SpaceInWord: Success. |
|||
T_SuperBlanks: Success. |
|||
T_Unification: Fail. |
|||
T_UnknownWord: Success. |
|||
</pre> |
|||
Regression test status as of 10:46, 3 July 2008 (UTC) |
|||
lCohort = 0; |
|||
lWindow = 0; |
|||
<pre> |
|||
while ((inchar == u_fgetc(input))) { |
|||
T_AnyMinusSome: Fail. |
|||
T_Barrier: Success. |
|||
T_BasicAppend: Fail. |
|||
T_BasicContextTest: Success. |
|||
T_BasicDelimit: Success. |
|||
T_BasicIff: Success. |
|||
T_BasicRemove: Success. |
|||
T_BasicSelect: Success. |
|||
T_BasicSubstitute: Success. |
|||
T_CarefulBarrier: Fail. |
|||
T_CompositeSelect: Success. |
|||
T_DontMatchEmptySet: Fail. |
|||
T_EndlessSelect: Fail. |
|||
T_MapAdd_Different: Fail. |
|||
T_MatchBaseform: Success. |
|||
T_MatchWordform: Success. |
|||
T_MultipleSections: Success. |
|||
T_MultiWords: Success. |
|||
T_NegatedContextTest: Success. |
|||
T_RegExp_Map: Fail. |
|||
T_RegExp_Select: Fail. |
|||
T_RemoveSingleTag: Fail. |
|||
T_ScanningTests: Fail. |
|||
T_Sections: Fail. |
|||
T_SetOp_FailFast: Success. |
|||
T_SetOp_OR: Success. |
|||
T_SpaceInWord: Success. |
|||
T_SuperBlanks: Success. |
|||
T_SuperBlanksNewline: Success. |
|||
T_Unification: Fail. |
|||
T_UnknownWord: Success. |
|||
</pre> |
|||
Regression test status as of 07:30, 17 July 2008 (UTC) |
|||
if(inchar == '^') { |
|||
<pre> |
|||
// check if the current limit of Cohorts to SingleWindow has been reached on this SingleWindow |
|||
T_AnyMinusSome: Success. |
|||
T_Barrier: Success. |
|||
T_BasicAppend: Success. |
|||
T_BasicContextTest: Success. |
|||
T_BasicDelimit: Success. |
|||
T_BasicIff: Success. |
|||
T_BasicRemove: Success. |
|||
T_BasicSelect: Success. |
|||
T_BasicSubstitute: Success. |
|||
T_CarefulBarrier: Success. |
|||
T_CompositeSelect: Success. |
|||
T_DontMatchEmptySet: Success. |
|||
T_EndlessSelect: Fail. |
|||
T_Joiner: Success. |
|||
T_MapAdd_Different: Success. |
|||
T_MatchBaseform: Success. |
|||
T_MatchWordform: Success. |
|||
T_MultipleSections: Success. |
|||
T_MultiWords: Success. |
|||
T_NegatedContextTest: Success. |
|||
T_RegExp_Map: Success. |
|||
T_RegExp_Select: Success. |
|||
T_RegExp_Substitute: Success. |
|||
T_RemoveSingleTag: Fail. |
|||
T_ScanningTests: Success. |
|||
T_Sections: Fail. |
|||
T_SetOp_FailFast: Success. |
|||
T_SetOp_OR: Success. |
|||
T_SpaceInWord: Success. |
|||
T_SuperBlanks: Success. |
|||
T_SuperBlanksNewline: Success. |
|||
T_Unification: Fail. |
|||
T_UnknownWord: Success. |
|||
</pre> |
|||
==Current bugs== |
|||
if( |
|||
== Wishlist == |
|||
// check if there is an existing SingleWindow |
|||
=== <strike>Ability to specify where a MAPPING tag should be added in the tag_list</strike> === |
|||
Tags in vislcg3 are "unordered", but the input order is preserved, and MAPPING tags are added to the end. However, since Apertium matches longest left-to-right strings, we may have to disambiguate between |
|||
<code>ganga# i<vblex></code> and <code>ganga<vblex>+i<pr></code>. The first one is easy, "ganga# i" is seen as the baseform and there is just one tag, vblex, we might get something like <code>ganga# i<vblex><@FVMAIN></code>. The second one is worse. The + means that the multiword should be split into two before transfer, <code>^ganga<vblex>$ ^i<pr>$</code>; but if the mapping tags go to the end, or even after the first word, we'll get <code>^ganga<vblex><@FVMAIN><@PART>+i<pr>$</code> or <code>^ganga<vblex>+i<pr><@FVMAIN><@PART>$</code>, but we want <code>^ganga<vblex><@FVMAIN>+i<pr><@PART>$</code>. |
|||
'''CG Syntax change''': |
|||
if(!cSWindow) { |
|||
We could say something like |
|||
initialiseSingleWindow(); |
|||
<pre> |
|||
} |
|||
MAP (@FVMAIN) TARGET VPart:0 (1* FOO); |
|||
MAP (@PART) TARGET VPart:1 (-1* BAR); |
|||
</pre> |
|||
: This is better done using [[Subreadings]]. |
|||
// check for current Cohort |
|||
// read Cohort |
|||
readCohort(input, cCohort); |
|||
// Up number of cohorts. |
|||
} |
|||
if(inchar == '[') { |
|||
while(inchar != ']') { |
|||
inchar = u_fgetc(input); |
|||
if(cCohort) { |
|||
ux_append(cCohort->text, inchar); |
|||
} else if(cWindow) { |
|||
ux_append(cWindow->text, inchar); |
|||
} |
|||
} |
|||
} |
|||
} |
|||
readCohort(UFILE *input, Cohort *cCohort) |
|||
{ |
|||
while((inchar == u_fgetc(input))) { |
|||
if(inchar == '$') { |
|||
return; |
|||
} |
|||
} |
|||
} |
|||
processReading(UFILE *input, Reading *cReading) |
|||
{ |
|||
} |
|||
</pre> |
Latest revision as of 11:00, 18 September 2014
- Window = whole of what we're looking at; several sentences at the same time.
- SingleWindow = one sentence (for want of a better term). Usually there's 3 SingleWindow in a Window, but that's runtime defined. Can be anywhere from 1 to hundreds set with --num-windows
- Cohort = one
Contents
Testing[edit]
Regression test status as of 22:07, 17 April 2008 (BST)
Running tests... T_AnyMinusSome: Fail. T_Barrier: Success. T_BasicAppend: Success. T_BasicContextTest: Success. T_BasicDelimit: Success. T_BasicIff: Success. T_BasicRemove: Success. T_BasicSelect: Success. T_BasicSubstitute: Success. T_CarefulBarrier: Fail. T_CompositeSelect: Success. T_DontMatchEmptySet: Fail. T_EndlessSelect: Fail. T_MapAdd_Different: Fail. T_MatchBaseform: Success. T_MatchWordform: Success. T_MultipleSections: Success. T_NegatedContextTest: Success. T_RegExp_Map: Fail. T_RegExp_Select: Fail. T_RemoveSingleTag: Fail. T_ScanningTests: Success. T_Sections: Fail. T_SetOp_FailFast: Success. T_SetOp_OR: Success. T_SpaceInWord: Success. T_SuperBlanks: Success. T_Unification: Fail. T_UnknownWord: Success.
Regression test status as of 10:46, 3 July 2008 (UTC)
T_AnyMinusSome: Fail. T_Barrier: Success. T_BasicAppend: Fail. T_BasicContextTest: Success. T_BasicDelimit: Success. T_BasicIff: Success. T_BasicRemove: Success. T_BasicSelect: Success. T_BasicSubstitute: Success. T_CarefulBarrier: Fail. T_CompositeSelect: Success. T_DontMatchEmptySet: Fail. T_EndlessSelect: Fail. T_MapAdd_Different: Fail. T_MatchBaseform: Success. T_MatchWordform: Success. T_MultipleSections: Success. T_MultiWords: Success. T_NegatedContextTest: Success. T_RegExp_Map: Fail. T_RegExp_Select: Fail. T_RemoveSingleTag: Fail. T_ScanningTests: Fail. T_Sections: Fail. T_SetOp_FailFast: Success. T_SetOp_OR: Success. T_SpaceInWord: Success. T_SuperBlanks: Success. T_SuperBlanksNewline: Success. T_Unification: Fail. T_UnknownWord: Success.
Regression test status as of 07:30, 17 July 2008 (UTC)
T_AnyMinusSome: Success. T_Barrier: Success. T_BasicAppend: Success. T_BasicContextTest: Success. T_BasicDelimit: Success. T_BasicIff: Success. T_BasicRemove: Success. T_BasicSelect: Success. T_BasicSubstitute: Success. T_CarefulBarrier: Success. T_CompositeSelect: Success. T_DontMatchEmptySet: Success. T_EndlessSelect: Fail. T_Joiner: Success. T_MapAdd_Different: Success. T_MatchBaseform: Success. T_MatchWordform: Success. T_MultipleSections: Success. T_MultiWords: Success. T_NegatedContextTest: Success. T_RegExp_Map: Success. T_RegExp_Select: Success. T_RegExp_Substitute: Success. T_RemoveSingleTag: Fail. T_ScanningTests: Success. T_Sections: Fail. T_SetOp_FailFast: Success. T_SetOp_OR: Success. T_SpaceInWord: Success. T_SuperBlanks: Success. T_SuperBlanksNewline: Success. T_Unification: Fail. T_UnknownWord: Success.
Current bugs[edit]
Wishlist[edit]
Ability to specify where a MAPPING tag should be added in the tag_list[edit]
Tags in vislcg3 are "unordered", but the input order is preserved, and MAPPING tags are added to the end. However, since Apertium matches longest left-to-right strings, we may have to disambiguate between
ganga# i<vblex>
and ganga<vblex>+i<pr>
. The first one is easy, "ganga# i" is seen as the baseform and there is just one tag, vblex, we might get something like ganga# i<vblex><@FVMAIN>
. The second one is worse. The + means that the multiword should be split into two before transfer, ^ganga<vblex>$ ^i<pr>$
; but if the mapping tags go to the end, or even after the first word, we'll get ^ganga<vblex><@FVMAIN><@PART>+i<pr>$
or ^ganga<vblex>+i<pr><@FVMAIN><@PART>$
, but we want ^ganga<vblex><@FVMAIN>+i<pr><@PART>$
.
CG Syntax change: We could say something like
MAP (@FVMAIN) TARGET VPart:0 (1* FOO); MAP (@PART) TARGET VPart:1 (-1* BAR);
- This is better done using Subreadings.