Difference between revisions of "Talk:Apertium and Constraint Grammar"

From Apertium
Jump to navigation Jump to search
(whatever this was it's outdated now)
 
(15 intermediate revisions by 3 users not shown)
Line 3: Line 3:
* Cohort = one
* Cohort = one


==Testing==
Some notes:

Regression test status as of 22:07, 17 April 2008 (BST)


<pre>
<pre>


Running tests...
cCohort = 0;
T_AnyMinusSome: Fail.
cWindow = 0;
T_Barrier: Success.
T_BasicAppend: Success.
T_BasicContextTest: Success.
T_BasicDelimit: Success.
T_BasicIff: Success.
T_BasicRemove: Success.
T_BasicSelect: Success.
T_BasicSubstitute: Success.
T_CarefulBarrier: Fail.
T_CompositeSelect: Success.
T_DontMatchEmptySet: Fail.
T_EndlessSelect: Fail.
T_MapAdd_Different: Fail.
T_MatchBaseform: Success.
T_MatchWordform: Success.
T_MultipleSections: Success.
T_NegatedContextTest: Success.
T_RegExp_Map: Fail.
T_RegExp_Select: Fail.
T_RemoveSingleTag: Fail.
T_ScanningTests: Success.
T_Sections: Fail.
T_SetOp_FailFast: Success.
T_SetOp_OR: Success.
T_SpaceInWord: Success.
T_SuperBlanks: Success.
T_Unification: Fail.
T_UnknownWord: Success.
</pre>


Regression test status as of 10:46, 3 July 2008 (UTC)
lCohort = 0;
lWindow = 0;


<pre>
while ((inchar == u_fgetc(input))) {
T_AnyMinusSome: Fail.
T_Barrier: Success.
T_BasicAppend: Fail.
T_BasicContextTest: Success.
T_BasicDelimit: Success.
T_BasicIff: Success.
T_BasicRemove: Success.
T_BasicSelect: Success.
T_BasicSubstitute: Success.
T_CarefulBarrier: Fail.
T_CompositeSelect: Success.
T_DontMatchEmptySet: Fail.
T_EndlessSelect: Fail.
T_MapAdd_Different: Fail.
T_MatchBaseform: Success.
T_MatchWordform: Success.
T_MultipleSections: Success.
T_MultiWords: Success.
T_NegatedContextTest: Success.
T_RegExp_Map: Fail.
T_RegExp_Select: Fail.
T_RemoveSingleTag: Fail.
T_ScanningTests: Fail.
T_Sections: Fail.
T_SetOp_FailFast: Success.
T_SetOp_OR: Success.
T_SpaceInWord: Success.
T_SuperBlanks: Success.
T_SuperBlanksNewline: Success.
T_Unification: Fail.
T_UnknownWord: Success.
</pre>


Regression test status as of 07:30, 17 July 2008 (UTC)
if(inchar == '^') {


<pre>
// check if the current limit of Cohorts to SingleWindow has been reached on this SingleWindow
T_AnyMinusSome: Success.
T_Barrier: Success.
T_BasicAppend: Success.
T_BasicContextTest: Success.
T_BasicDelimit: Success.
T_BasicIff: Success.
T_BasicRemove: Success.
T_BasicSelect: Success.
T_BasicSubstitute: Success.
T_CarefulBarrier: Success.
T_CompositeSelect: Success.
T_DontMatchEmptySet: Success.
T_EndlessSelect: Fail.
T_Joiner: Success.
T_MapAdd_Different: Success.
T_MatchBaseform: Success.
T_MatchWordform: Success.
T_MultipleSections: Success.
T_MultiWords: Success.
T_NegatedContextTest: Success.
T_RegExp_Map: Success.
T_RegExp_Select: Success.
T_RegExp_Substitute: Success.
T_RemoveSingleTag: Fail.
T_ScanningTests: Success.
T_Sections: Fail.
T_SetOp_FailFast: Success.
T_SetOp_OR: Success.
T_SpaceInWord: Success.
T_SuperBlanks: Success.
T_SuperBlanksNewline: Success.
T_Unification: Fail.
T_UnknownWord: Success.
</pre>


==Current bugs==
if(


== Wishlist ==
// check if there is an existing SingleWindow
=== <strike>Ability to specify where a MAPPING tag should be added in the tag_list</strike> ===
Tags in vislcg3 are "unordered", but the input order is preserved, and MAPPING tags are added to the end. However, since Apertium matches longest left-to-right strings, we may have to disambiguate between
<code>ganga# i<vblex></code> and <code>ganga<vblex>+i<pr></code>. The first one is easy, "ganga# i" is seen as the baseform and there is just one tag, vblex, we might get something like <code>ganga# i<vblex><@FVMAIN></code>. The second one is worse. The + means that the multiword should be split into two before transfer, <code>^ganga<vblex>$ ^i<pr>$</code>; but if the mapping tags go to the end, or even after the first word, we'll get <code>^ganga<vblex><@FVMAIN><@PART>+i<pr>$</code> or <code>^ganga<vblex>+i<pr><@FVMAIN><@PART>$</code>, but we want <code>^ganga<vblex><@FVMAIN>+i<pr><@PART>$</code>.


'''CG Syntax change''':
if(!cSWindow) {
We could say something like
initialiseSingleWindow();
<pre>
}
MAP (@FVMAIN) TARGET VPart:0 (1* FOO);
MAP (@PART) TARGET VPart:1 (-1* BAR);
</pre>


: This is better done using [[Subreadings]].
// check for current Cohort

// read Cohort

readCohort(input, cCohort);

// Up number of cohorts.

}

if(inchar == '[') {
while(inchar != ']') {
inchar = u_fgetc(input);

if(cCohort) {
ux_append(cCohort->text, inchar);
} else if(cWindow) {
ux_append(cWindow->text, inchar);
}
}
}
}


readCohort(UFILE *input, Cohort *cCohort)
{

while((inchar == u_fgetc(input))) {
if(inchar == '$') {
return;
}



}
}

processReading(UFILE *input, Reading *cReading)
{

}



</pre>

Latest revision as of 11:00, 18 September 2014

  • Window = whole of what we're looking at; several sentences at the same time.
  • SingleWindow = one sentence (for want of a better term). Usually there's 3 SingleWindow in a Window, but that's runtime defined. Can be anywhere from 1 to hundreds set with --num-windows
  • Cohort = one

Testing[edit]

Regression test status as of 22:07, 17 April 2008 (BST)


Running tests...
T_AnyMinusSome: Fail.
T_Barrier: Success.
T_BasicAppend: Success.
T_BasicContextTest: Success.
T_BasicDelimit: Success.
T_BasicIff: Success.
T_BasicRemove: Success.
T_BasicSelect: Success.
T_BasicSubstitute: Success.
T_CarefulBarrier: Fail.
T_CompositeSelect: Success.
T_DontMatchEmptySet: Fail.
T_EndlessSelect: Fail.
T_MapAdd_Different: Fail.
T_MatchBaseform: Success.
T_MatchWordform: Success.
T_MultipleSections: Success.
T_NegatedContextTest: Success.
T_RegExp_Map: Fail.
T_RegExp_Select: Fail.
T_RemoveSingleTag: Fail.
T_ScanningTests: Success.
T_Sections: Fail.
T_SetOp_FailFast: Success.
T_SetOp_OR: Success.
T_SpaceInWord: Success.
T_SuperBlanks: Success.
T_Unification: Fail.
T_UnknownWord: Success.

Regression test status as of 10:46, 3 July 2008 (UTC)

T_AnyMinusSome: Fail.
T_Barrier: Success.
T_BasicAppend: Fail.
T_BasicContextTest: Success.
T_BasicDelimit: Success.
T_BasicIff: Success.
T_BasicRemove: Success.
T_BasicSelect: Success.
T_BasicSubstitute: Success.
T_CarefulBarrier: Fail.
T_CompositeSelect: Success.
T_DontMatchEmptySet: Fail.
T_EndlessSelect: Fail.
T_MapAdd_Different: Fail.
T_MatchBaseform: Success.
T_MatchWordform: Success.
T_MultipleSections: Success.
T_MultiWords: Success.
T_NegatedContextTest: Success.
T_RegExp_Map: Fail.
T_RegExp_Select: Fail.
T_RemoveSingleTag: Fail.
T_ScanningTests: Fail.
T_Sections: Fail.
T_SetOp_FailFast: Success.
T_SetOp_OR: Success.
T_SpaceInWord: Success.
T_SuperBlanks: Success.
T_SuperBlanksNewline: Success.
T_Unification: Fail.
T_UnknownWord: Success.

Regression test status as of 07:30, 17 July 2008 (UTC)

T_AnyMinusSome: Success.
T_Barrier: Success.
T_BasicAppend: Success.
T_BasicContextTest: Success.
T_BasicDelimit: Success.
T_BasicIff: Success.
T_BasicRemove: Success.
T_BasicSelect: Success.
T_BasicSubstitute: Success.
T_CarefulBarrier: Success.
T_CompositeSelect: Success.
T_DontMatchEmptySet: Success.
T_EndlessSelect: Fail.
T_Joiner: Success.
T_MapAdd_Different: Success.
T_MatchBaseform: Success.
T_MatchWordform: Success.
T_MultipleSections: Success.
T_MultiWords: Success.
T_NegatedContextTest: Success.
T_RegExp_Map: Success.
T_RegExp_Select: Success.
T_RegExp_Substitute: Success.
T_RemoveSingleTag: Fail.
T_ScanningTests: Success.
T_Sections: Fail.
T_SetOp_FailFast: Success.
T_SetOp_OR: Success.
T_SpaceInWord: Success.
T_SuperBlanks: Success.
T_SuperBlanksNewline: Success.
T_Unification: Fail.
T_UnknownWord: Success.

Current bugs[edit]

Wishlist[edit]

Ability to specify where a MAPPING tag should be added in the tag_list[edit]

Tags in vislcg3 are "unordered", but the input order is preserved, and MAPPING tags are added to the end. However, since Apertium matches longest left-to-right strings, we may have to disambiguate between ganga# i<vblex> and ganga<vblex>+i<pr>. The first one is easy, "ganga# i" is seen as the baseform and there is just one tag, vblex, we might get something like ganga# i<vblex><@FVMAIN>. The second one is worse. The + means that the multiword should be split into two before transfer, ^ganga<vblex>$ ^i<pr>$; but if the mapping tags go to the end, or even after the first word, we'll get ^ganga<vblex><@FVMAIN><@PART>+i<pr>$ or ^ganga<vblex>+i<pr><@FVMAIN><@PART>$, but we want ^ganga<vblex><@FVMAIN>+i<pr><@PART>$.

CG Syntax change: We could say something like

MAP (@FVMAIN) TARGET VPart:0 (1* FOO);
MAP (@PART) TARGET VPart:1 (-1* BAR);
This is better done using Subreadings.