Difference between revisions of "Talk:Apertium and Constraint Grammar"

From Apertium
Jump to navigation Jump to search
(New page: * Window = whole of what we're looking at; several sentences at the same time. * SingleWindow = one sentence (for want of a better term). Usually there's 3 SingleWindow in a Window, but t...)
 
(whatever this was it's outdated now)
 
(16 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
 
* Window = whole of what we're looking at; several sentences at the same time.
 
* Window = whole of what we're looking at; several sentences at the same time.
 
* SingleWindow = one sentence (for want of a better term). Usually there's 3 SingleWindow in a Window, but that's runtime defined. Can be anywhere from 1 to hundreds set with --num-windows
 
* SingleWindow = one sentence (for want of a better term). Usually there's 3 SingleWindow in a Window, but that's runtime defined. Can be anywhere from 1 to hundreds set with --num-windows
 
* Cohort = one
 
* Cohort = one
   
  +
==Testing==
Some notes:
 
  +
  +
Regression test status as of 22:07, 17 April 2008 (BST)
   
 
<pre>
 
<pre>
   
  +
Running tests...
cCohort = 0;
 
  +
T_AnyMinusSome: Fail.
cWindow = 0;
 
  +
T_Barrier: Success.
  +
T_BasicAppend: Success.
  +
T_BasicContextTest: Success.
  +
T_BasicDelimit: Success.
  +
T_BasicIff: Success.
  +
T_BasicRemove: Success.
  +
T_BasicSelect: Success.
  +
T_BasicSubstitute: Success.
  +
T_CarefulBarrier: Fail.
  +
T_CompositeSelect: Success.
  +
T_DontMatchEmptySet: Fail.
  +
T_EndlessSelect: Fail.
  +
T_MapAdd_Different: Fail.
  +
T_MatchBaseform: Success.
  +
T_MatchWordform: Success.
  +
T_MultipleSections: Success.
  +
T_NegatedContextTest: Success.
  +
T_RegExp_Map: Fail.
  +
T_RegExp_Select: Fail.
  +
T_RemoveSingleTag: Fail.
  +
T_ScanningTests: Success.
  +
T_Sections: Fail.
  +
T_SetOp_FailFast: Success.
  +
T_SetOp_OR: Success.
  +
T_SpaceInWord: Success.
  +
T_SuperBlanks: Success.
  +
T_Unification: Fail.
  +
T_UnknownWord: Success.
  +
</pre>
   
  +
Regression test status as of 10:46, 3 July 2008 (UTC)
lCohort = 0;
 
lWindow = 0;
 
   
  +
<pre>
while ((inchar == u_fgetc(input))) {
 
  +
T_AnyMinusSome: Fail.
  +
T_Barrier: Success.
  +
T_BasicAppend: Fail.
  +
T_BasicContextTest: Success.
  +
T_BasicDelimit: Success.
  +
T_BasicIff: Success.
  +
T_BasicRemove: Success.
  +
T_BasicSelect: Success.
  +
T_BasicSubstitute: Success.
  +
T_CarefulBarrier: Fail.
  +
T_CompositeSelect: Success.
  +
T_DontMatchEmptySet: Fail.
  +
T_EndlessSelect: Fail.
  +
T_MapAdd_Different: Fail.
  +
T_MatchBaseform: Success.
  +
T_MatchWordform: Success.
  +
T_MultipleSections: Success.
  +
T_MultiWords: Success.
  +
T_NegatedContextTest: Success.
  +
T_RegExp_Map: Fail.
  +
T_RegExp_Select: Fail.
  +
T_RemoveSingleTag: Fail.
  +
T_ScanningTests: Fail.
  +
T_Sections: Fail.
  +
T_SetOp_FailFast: Success.
  +
T_SetOp_OR: Success.
  +
T_SpaceInWord: Success.
  +
T_SuperBlanks: Success.
  +
T_SuperBlanksNewline: Success.
  +
T_Unification: Fail.
  +
T_UnknownWord: Success.
  +
</pre>
   
  +
Regression test status as of 07:30, 17 July 2008 (UTC)
if(inchar == '^') {
 
   
  +
<pre>
// check if the current limit of Cohorts to SingleWindow has been reached on this SingleWindow
 
  +
T_AnyMinusSome: Success.
  +
T_Barrier: Success.
  +
T_BasicAppend: Success.
  +
T_BasicContextTest: Success.
  +
T_BasicDelimit: Success.
  +
T_BasicIff: Success.
  +
T_BasicRemove: Success.
  +
T_BasicSelect: Success.
  +
T_BasicSubstitute: Success.
  +
T_CarefulBarrier: Success.
  +
T_CompositeSelect: Success.
  +
T_DontMatchEmptySet: Success.
  +
T_EndlessSelect: Fail.
  +
T_Joiner: Success.
  +
T_MapAdd_Different: Success.
  +
T_MatchBaseform: Success.
  +
T_MatchWordform: Success.
  +
T_MultipleSections: Success.
  +
T_MultiWords: Success.
  +
T_NegatedContextTest: Success.
  +
T_RegExp_Map: Success.
  +
T_RegExp_Select: Success.
  +
T_RegExp_Substitute: Success.
  +
T_RemoveSingleTag: Fail.
  +
T_ScanningTests: Success.
  +
T_Sections: Fail.
  +
T_SetOp_FailFast: Success.
  +
T_SetOp_OR: Success.
  +
T_SpaceInWord: Success.
  +
T_SuperBlanks: Success.
  +
T_SuperBlanksNewline: Success.
  +
T_Unification: Fail.
  +
T_UnknownWord: Success.
  +
</pre>
   
  +
==Current bugs==
if(
 
   
  +
== Wishlist ==
// check if there is an existing SingleWindow
 
  +
=== <strike>Ability to specify where a MAPPING tag should be added in the tag_list</strike> ===
  +
Tags in vislcg3 are "unordered", but the input order is preserved, and MAPPING tags are added to the end. However, since Apertium matches longest left-to-right strings, we may have to disambiguate between
  +
<code>ganga# i<vblex></code> and <code>ganga<vblex>+i<pr></code>. The first one is easy, "ganga# i" is seen as the baseform and there is just one tag, vblex, we might get something like <code>ganga# i<vblex><@FVMAIN></code>. The second one is worse. The + means that the multiword should be split into two before transfer, <code>^ganga<vblex>$ ^i<pr>$</code>; but if the mapping tags go to the end, or even after the first word, we'll get <code>^ganga<vblex><@FVMAIN><@PART>+i<pr>$</code> or <code>^ganga<vblex>+i<pr><@FVMAIN><@PART>$</code>, but we want <code>^ganga<vblex><@FVMAIN>+i<pr><@PART>$</code>.
   
  +
'''CG Syntax change''':
if(!cSWindow) {
 
  +
We could say something like
initialiseSingleWindow();
 
  +
<pre>
}
 
  +
MAP (@FVMAIN) TARGET VPart:0 (1* FOO);
  +
MAP (@PART) TARGET VPart:1 (-1* BAR);
  +
</pre>
   
  +
: This is better done using [[Subreadings]].
// check for current Cohort
 
 
// read Cohort
 
 
readCohort(input, cCohort);
 
 
// Up number of cohorts.
 
 
}
 
 
if(inchar == '[') {
 
while(inchar != ']') {
 
inchar = u_fgetc(input);
 
 
if(cCohort) {
 
ux_append(cCohort->text, inchar);
 
} else if(cWindow) {
 
ux_append(cWindow->text, inchar);
 
}
 
}
 
}
 
}
 
 
 
readCohort(UFILE *input, Cohort *cCohort)
 
{
 
 
while((inchar == u_fgetc(input))) {
 
if(inchar == '$') {
 
return;
 
}
 
 
 
 
}
 
}
 
 
processReading(UFILE *input, Reading *cReading)
 
{
 
 
}
 
 
 
 
</pre>
 

Latest revision as of 11:00, 18 September 2014

  • Window = whole of what we're looking at; several sentences at the same time.
  • SingleWindow = one sentence (for want of a better term). Usually there's 3 SingleWindow in a Window, but that's runtime defined. Can be anywhere from 1 to hundreds set with --num-windows
  • Cohort = one

Testing[edit]

Regression test status as of 22:07, 17 April 2008 (BST)


Running tests...
T_AnyMinusSome: Fail.
T_Barrier: Success.
T_BasicAppend: Success.
T_BasicContextTest: Success.
T_BasicDelimit: Success.
T_BasicIff: Success.
T_BasicRemove: Success.
T_BasicSelect: Success.
T_BasicSubstitute: Success.
T_CarefulBarrier: Fail.
T_CompositeSelect: Success.
T_DontMatchEmptySet: Fail.
T_EndlessSelect: Fail.
T_MapAdd_Different: Fail.
T_MatchBaseform: Success.
T_MatchWordform: Success.
T_MultipleSections: Success.
T_NegatedContextTest: Success.
T_RegExp_Map: Fail.
T_RegExp_Select: Fail.
T_RemoveSingleTag: Fail.
T_ScanningTests: Success.
T_Sections: Fail.
T_SetOp_FailFast: Success.
T_SetOp_OR: Success.
T_SpaceInWord: Success.
T_SuperBlanks: Success.
T_Unification: Fail.
T_UnknownWord: Success.

Regression test status as of 10:46, 3 July 2008 (UTC)

T_AnyMinusSome: Fail.
T_Barrier: Success.
T_BasicAppend: Fail.
T_BasicContextTest: Success.
T_BasicDelimit: Success.
T_BasicIff: Success.
T_BasicRemove: Success.
T_BasicSelect: Success.
T_BasicSubstitute: Success.
T_CarefulBarrier: Fail.
T_CompositeSelect: Success.
T_DontMatchEmptySet: Fail.
T_EndlessSelect: Fail.
T_MapAdd_Different: Fail.
T_MatchBaseform: Success.
T_MatchWordform: Success.
T_MultipleSections: Success.
T_MultiWords: Success.
T_NegatedContextTest: Success.
T_RegExp_Map: Fail.
T_RegExp_Select: Fail.
T_RemoveSingleTag: Fail.
T_ScanningTests: Fail.
T_Sections: Fail.
T_SetOp_FailFast: Success.
T_SetOp_OR: Success.
T_SpaceInWord: Success.
T_SuperBlanks: Success.
T_SuperBlanksNewline: Success.
T_Unification: Fail.
T_UnknownWord: Success.

Regression test status as of 07:30, 17 July 2008 (UTC)

T_AnyMinusSome: Success.
T_Barrier: Success.
T_BasicAppend: Success.
T_BasicContextTest: Success.
T_BasicDelimit: Success.
T_BasicIff: Success.
T_BasicRemove: Success.
T_BasicSelect: Success.
T_BasicSubstitute: Success.
T_CarefulBarrier: Success.
T_CompositeSelect: Success.
T_DontMatchEmptySet: Success.
T_EndlessSelect: Fail.
T_Joiner: Success.
T_MapAdd_Different: Success.
T_MatchBaseform: Success.
T_MatchWordform: Success.
T_MultipleSections: Success.
T_MultiWords: Success.
T_NegatedContextTest: Success.
T_RegExp_Map: Success.
T_RegExp_Select: Success.
T_RegExp_Substitute: Success.
T_RemoveSingleTag: Fail.
T_ScanningTests: Success.
T_Sections: Fail.
T_SetOp_FailFast: Success.
T_SetOp_OR: Success.
T_SpaceInWord: Success.
T_SuperBlanks: Success.
T_SuperBlanksNewline: Success.
T_Unification: Fail.
T_UnknownWord: Success.

Current bugs[edit]

Wishlist[edit]

Ability to specify where a MAPPING tag should be added in the tag_list[edit]

Tags in vislcg3 are "unordered", but the input order is preserved, and MAPPING tags are added to the end. However, since Apertium matches longest left-to-right strings, we may have to disambiguate between ganga# i<vblex> and ganga<vblex>+i<pr>. The first one is easy, "ganga# i" is seen as the baseform and there is just one tag, vblex, we might get something like ganga# i<vblex><@FVMAIN>. The second one is worse. The + means that the multiword should be split into two before transfer, ^ganga<vblex>$ ^i<pr>$; but if the mapping tags go to the end, or even after the first word, we'll get ^ganga<vblex><@FVMAIN><@PART>+i<pr>$ or ^ganga<vblex>+i<pr><@FVMAIN><@PART>$, but we want ^ganga<vblex><@FVMAIN>+i<pr><@PART>$.

CG Syntax change: We could say something like

MAP (@FVMAIN) TARGET VPart:0 (1* FOO);
MAP (@PART) TARGET VPart:1 (-1* BAR);
This is better done using Subreadings.