Difference between revisions of "Talk:Apertium and Constraint Grammar"

From Apertium
Jump to navigation Jump to search
(whatever this was it's outdated now)
 
(11 intermediate revisions by 3 users not shown)
Line 77: Line 77:
 
</pre>
 
</pre>
   
  +
Regression test status as of 07:30, 17 July 2008 (UTC)
==Current bugs==
 
 
; #01 Regexes not working in cg-proc
 
   
 
<pre>
 
<pre>
  +
T_AnyMinusSome: Success.
  +
T_Barrier: Success.
  +
T_BasicAppend: Success.
  +
T_BasicContextTest: Success.
  +
T_BasicDelimit: Success.
  +
T_BasicIff: Success.
  +
T_BasicRemove: Success.
  +
T_BasicSelect: Success.
  +
T_BasicSubstitute: Success.
  +
T_CarefulBarrier: Success.
  +
T_CompositeSelect: Success.
  +
T_DontMatchEmptySet: Success.
  +
T_EndlessSelect: Fail.
  +
T_Joiner: Success.
  +
T_MapAdd_Different: Success.
  +
T_MatchBaseform: Success.
  +
T_MatchWordform: Success.
  +
T_MultipleSections: Success.
  +
T_MultiWords: Success.
  +
T_NegatedContextTest: Success.
  +
T_RegExp_Map: Success.
  +
T_RegExp_Select: Success.
  +
T_RegExp_Substitute: Success.
  +
T_RemoveSingleTag: Fail.
  +
T_ScanningTests: Success.
  +
T_Sections: Fail.
  +
T_SetOp_FailFast: Success.
  +
T_SetOp_OR: Success.
  +
T_SpaceInWord: Success.
  +
T_SuperBlanks: Success.
  +
T_SuperBlanksNewline: Success.
  +
T_Unification: Fail.
  +
T_UnknownWord: Success.
  +
</pre>
   
 
==Current bugs==
# pobl y de
 
REMOVE ("<d.*>"r "t.*"r) IF (-1 DetDef);
 
   
  +
== Wishlist ==
$ echo "^y/yr<det><def><sp>$ ^de/te<n><m><sp>/de<n><m><sp>$" | cg-proc cy-en.cg.bin
 
  +
=== <strike>Ability to specify where a MAPPING tag should be added in the tag_list</strike> ===
^y/yr<det><def><sp>$ ^de/te<n><m><sp>/de<n><m><sp>$
 
  +
Tags in vislcg3 are "unordered", but the input order is preserved, and MAPPING tags are added to the end. However, since Apertium matches longest left-to-right strings, we may have to disambiguate between
  +
<code>ganga# i<vblex></code> and <code>ganga<vblex>+i<pr></code>. The first one is easy, "ganga# i" is seen as the baseform and there is just one tag, vblex, we might get something like <code>ganga# i<vblex><@FVMAIN></code>. The second one is worse. The + means that the multiword should be split into two before transfer, <code>^ganga<vblex>$ ^i<pr>$</code>; but if the mapping tags go to the end, or even after the first word, we'll get <code>^ganga<vblex><@FVMAIN><@PART>+i<pr>$</code> or <code>^ganga<vblex>+i<pr><@FVMAIN><@PART>$</code>, but we want <code>^ganga<vblex><@FVMAIN>+i<pr><@PART>$</code>.
   
  +
'''CG Syntax change''':
  +
We could say something like
  +
<pre>
  +
MAP (@FVMAIN) TARGET VPart:0 (1* FOO);
  +
MAP (@PART) TARGET VPart:1 (-1* BAR);
 
</pre>
 
</pre>
  +
  +
: This is better done using [[Subreadings]].

Latest revision as of 11:00, 18 September 2014

  • Window = whole of what we're looking at; several sentences at the same time.
  • SingleWindow = one sentence (for want of a better term). Usually there's 3 SingleWindow in a Window, but that's runtime defined. Can be anywhere from 1 to hundreds set with --num-windows
  • Cohort = one

Testing[edit]

Regression test status as of 22:07, 17 April 2008 (BST)


Running tests...
T_AnyMinusSome: Fail.
T_Barrier: Success.
T_BasicAppend: Success.
T_BasicContextTest: Success.
T_BasicDelimit: Success.
T_BasicIff: Success.
T_BasicRemove: Success.
T_BasicSelect: Success.
T_BasicSubstitute: Success.
T_CarefulBarrier: Fail.
T_CompositeSelect: Success.
T_DontMatchEmptySet: Fail.
T_EndlessSelect: Fail.
T_MapAdd_Different: Fail.
T_MatchBaseform: Success.
T_MatchWordform: Success.
T_MultipleSections: Success.
T_NegatedContextTest: Success.
T_RegExp_Map: Fail.
T_RegExp_Select: Fail.
T_RemoveSingleTag: Fail.
T_ScanningTests: Success.
T_Sections: Fail.
T_SetOp_FailFast: Success.
T_SetOp_OR: Success.
T_SpaceInWord: Success.
T_SuperBlanks: Success.
T_Unification: Fail.
T_UnknownWord: Success.

Regression test status as of 10:46, 3 July 2008 (UTC)

T_AnyMinusSome: Fail.
T_Barrier: Success.
T_BasicAppend: Fail.
T_BasicContextTest: Success.
T_BasicDelimit: Success.
T_BasicIff: Success.
T_BasicRemove: Success.
T_BasicSelect: Success.
T_BasicSubstitute: Success.
T_CarefulBarrier: Fail.
T_CompositeSelect: Success.
T_DontMatchEmptySet: Fail.
T_EndlessSelect: Fail.
T_MapAdd_Different: Fail.
T_MatchBaseform: Success.
T_MatchWordform: Success.
T_MultipleSections: Success.
T_MultiWords: Success.
T_NegatedContextTest: Success.
T_RegExp_Map: Fail.
T_RegExp_Select: Fail.
T_RemoveSingleTag: Fail.
T_ScanningTests: Fail.
T_Sections: Fail.
T_SetOp_FailFast: Success.
T_SetOp_OR: Success.
T_SpaceInWord: Success.
T_SuperBlanks: Success.
T_SuperBlanksNewline: Success.
T_Unification: Fail.
T_UnknownWord: Success.

Regression test status as of 07:30, 17 July 2008 (UTC)

T_AnyMinusSome: Success.
T_Barrier: Success.
T_BasicAppend: Success.
T_BasicContextTest: Success.
T_BasicDelimit: Success.
T_BasicIff: Success.
T_BasicRemove: Success.
T_BasicSelect: Success.
T_BasicSubstitute: Success.
T_CarefulBarrier: Success.
T_CompositeSelect: Success.
T_DontMatchEmptySet: Success.
T_EndlessSelect: Fail.
T_Joiner: Success.
T_MapAdd_Different: Success.
T_MatchBaseform: Success.
T_MatchWordform: Success.
T_MultipleSections: Success.
T_MultiWords: Success.
T_NegatedContextTest: Success.
T_RegExp_Map: Success.
T_RegExp_Select: Success.
T_RegExp_Substitute: Success.
T_RemoveSingleTag: Fail.
T_ScanningTests: Success.
T_Sections: Fail.
T_SetOp_FailFast: Success.
T_SetOp_OR: Success.
T_SpaceInWord: Success.
T_SuperBlanks: Success.
T_SuperBlanksNewline: Success.
T_Unification: Fail.
T_UnknownWord: Success.

Current bugs[edit]

Wishlist[edit]

Ability to specify where a MAPPING tag should be added in the tag_list[edit]

Tags in vislcg3 are "unordered", but the input order is preserved, and MAPPING tags are added to the end. However, since Apertium matches longest left-to-right strings, we may have to disambiguate between ganga# i<vblex> and ganga<vblex>+i<pr>. The first one is easy, "ganga# i" is seen as the baseform and there is just one tag, vblex, we might get something like ganga# i<vblex><@FVMAIN>. The second one is worse. The + means that the multiword should be split into two before transfer, ^ganga<vblex>$ ^i<pr>$; but if the mapping tags go to the end, or even after the first word, we'll get ^ganga<vblex><@FVMAIN><@PART>+i<pr>$ or ^ganga<vblex>+i<pr><@FVMAIN><@PART>$, but we want ^ganga<vblex><@FVMAIN>+i<pr><@PART>$.

CG Syntax change: We could say something like

MAP (@FVMAIN) TARGET VPart:0 (1* FOO);
MAP (@PART) TARGET VPart:1 (-1* BAR);
This is better done using Subreadings.