Difference between revisions of "Talk:Apertium and Constraint Grammar"
(whatever this was it's outdated now) |
|||
(11 intermediate revisions by 3 users not shown) | |||
Line 77: | Line 77: | ||
</pre> |
</pre> |
||
Regression test status as of 07:30, 17 July 2008 (UTC) |
|||
⚫ | |||
; #01 Regexes not working in cg-proc |
|||
<pre> |
<pre> |
||
T_AnyMinusSome: Success. |
|||
T_Barrier: Success. |
|||
T_BasicAppend: Success. |
|||
T_BasicContextTest: Success. |
|||
T_BasicDelimit: Success. |
|||
T_BasicIff: Success. |
|||
T_BasicRemove: Success. |
|||
T_BasicSelect: Success. |
|||
T_BasicSubstitute: Success. |
|||
T_CarefulBarrier: Success. |
|||
T_CompositeSelect: Success. |
|||
T_DontMatchEmptySet: Success. |
|||
T_EndlessSelect: Fail. |
|||
T_Joiner: Success. |
|||
T_MapAdd_Different: Success. |
|||
T_MatchBaseform: Success. |
|||
T_MatchWordform: Success. |
|||
T_MultipleSections: Success. |
|||
T_MultiWords: Success. |
|||
T_NegatedContextTest: Success. |
|||
T_RegExp_Map: Success. |
|||
T_RegExp_Select: Success. |
|||
T_RegExp_Substitute: Success. |
|||
T_RemoveSingleTag: Fail. |
|||
T_ScanningTests: Success. |
|||
T_Sections: Fail. |
|||
T_SetOp_FailFast: Success. |
|||
T_SetOp_OR: Success. |
|||
T_SpaceInWord: Success. |
|||
T_SuperBlanks: Success. |
|||
T_SuperBlanksNewline: Success. |
|||
T_Unification: Fail. |
|||
T_UnknownWord: Success. |
|||
</pre> |
|||
⚫ | |||
# pobl y de |
|||
REMOVE ("<d.*>"r "t.*"r) IF (-1 DetDef); |
|||
== Wishlist == |
|||
$ echo "^y/yr<det><def><sp>$ ^de/te<n><m><sp>/de<n><m><sp>$" | cg-proc cy-en.cg.bin |
|||
=== <strike>Ability to specify where a MAPPING tag should be added in the tag_list</strike> === |
|||
^y/yr<det><def><sp>$ ^de/te<n><m><sp>/de<n><m><sp>$ |
|||
Tags in vislcg3 are "unordered", but the input order is preserved, and MAPPING tags are added to the end. However, since Apertium matches longest left-to-right strings, we may have to disambiguate between |
|||
<code>ganga# i<vblex></code> and <code>ganga<vblex>+i<pr></code>. The first one is easy, "ganga# i" is seen as the baseform and there is just one tag, vblex, we might get something like <code>ganga# i<vblex><@FVMAIN></code>. The second one is worse. The + means that the multiword should be split into two before transfer, <code>^ganga<vblex>$ ^i<pr>$</code>; but if the mapping tags go to the end, or even after the first word, we'll get <code>^ganga<vblex><@FVMAIN><@PART>+i<pr>$</code> or <code>^ganga<vblex>+i<pr><@FVMAIN><@PART>$</code>, but we want <code>^ganga<vblex><@FVMAIN>+i<pr><@PART>$</code>. |
|||
'''CG Syntax change''': |
|||
We could say something like |
|||
<pre> |
|||
MAP (@FVMAIN) TARGET VPart:0 (1* FOO); |
|||
MAP (@PART) TARGET VPart:1 (-1* BAR); |
|||
</pre> |
</pre> |
||
: This is better done using [[Subreadings]]. |
Latest revision as of 11:00, 18 September 2014
- Window = whole of what we're looking at; several sentences at the same time.
- SingleWindow = one sentence (for want of a better term). Usually there's 3 SingleWindow in a Window, but that's runtime defined. Can be anywhere from 1 to hundreds set with --num-windows
- Cohort = one
Contents
Testing[edit]
Regression test status as of 22:07, 17 April 2008 (BST)
Running tests... T_AnyMinusSome: Fail. T_Barrier: Success. T_BasicAppend: Success. T_BasicContextTest: Success. T_BasicDelimit: Success. T_BasicIff: Success. T_BasicRemove: Success. T_BasicSelect: Success. T_BasicSubstitute: Success. T_CarefulBarrier: Fail. T_CompositeSelect: Success. T_DontMatchEmptySet: Fail. T_EndlessSelect: Fail. T_MapAdd_Different: Fail. T_MatchBaseform: Success. T_MatchWordform: Success. T_MultipleSections: Success. T_NegatedContextTest: Success. T_RegExp_Map: Fail. T_RegExp_Select: Fail. T_RemoveSingleTag: Fail. T_ScanningTests: Success. T_Sections: Fail. T_SetOp_FailFast: Success. T_SetOp_OR: Success. T_SpaceInWord: Success. T_SuperBlanks: Success. T_Unification: Fail. T_UnknownWord: Success.
Regression test status as of 10:46, 3 July 2008 (UTC)
T_AnyMinusSome: Fail. T_Barrier: Success. T_BasicAppend: Fail. T_BasicContextTest: Success. T_BasicDelimit: Success. T_BasicIff: Success. T_BasicRemove: Success. T_BasicSelect: Success. T_BasicSubstitute: Success. T_CarefulBarrier: Fail. T_CompositeSelect: Success. T_DontMatchEmptySet: Fail. T_EndlessSelect: Fail. T_MapAdd_Different: Fail. T_MatchBaseform: Success. T_MatchWordform: Success. T_MultipleSections: Success. T_MultiWords: Success. T_NegatedContextTest: Success. T_RegExp_Map: Fail. T_RegExp_Select: Fail. T_RemoveSingleTag: Fail. T_ScanningTests: Fail. T_Sections: Fail. T_SetOp_FailFast: Success. T_SetOp_OR: Success. T_SpaceInWord: Success. T_SuperBlanks: Success. T_SuperBlanksNewline: Success. T_Unification: Fail. T_UnknownWord: Success.
Regression test status as of 07:30, 17 July 2008 (UTC)
T_AnyMinusSome: Success. T_Barrier: Success. T_BasicAppend: Success. T_BasicContextTest: Success. T_BasicDelimit: Success. T_BasicIff: Success. T_BasicRemove: Success. T_BasicSelect: Success. T_BasicSubstitute: Success. T_CarefulBarrier: Success. T_CompositeSelect: Success. T_DontMatchEmptySet: Success. T_EndlessSelect: Fail. T_Joiner: Success. T_MapAdd_Different: Success. T_MatchBaseform: Success. T_MatchWordform: Success. T_MultipleSections: Success. T_MultiWords: Success. T_NegatedContextTest: Success. T_RegExp_Map: Success. T_RegExp_Select: Success. T_RegExp_Substitute: Success. T_RemoveSingleTag: Fail. T_ScanningTests: Success. T_Sections: Fail. T_SetOp_FailFast: Success. T_SetOp_OR: Success. T_SpaceInWord: Success. T_SuperBlanks: Success. T_SuperBlanksNewline: Success. T_Unification: Fail. T_UnknownWord: Success.
Current bugs[edit]
Wishlist[edit]
Ability to specify where a MAPPING tag should be added in the tag_list[edit]
Tags in vislcg3 are "unordered", but the input order is preserved, and MAPPING tags are added to the end. However, since Apertium matches longest left-to-right strings, we may have to disambiguate between
ganga# i<vblex>
and ganga<vblex>+i<pr>
. The first one is easy, "ganga# i" is seen as the baseform and there is just one tag, vblex, we might get something like ganga# i<vblex><@FVMAIN>
. The second one is worse. The + means that the multiword should be split into two before transfer, ^ganga<vblex>$ ^i<pr>$
; but if the mapping tags go to the end, or even after the first word, we'll get ^ganga<vblex><@FVMAIN><@PART>+i<pr>$
or ^ganga<vblex>+i<pr><@FVMAIN><@PART>$
, but we want ^ganga<vblex><@FVMAIN>+i<pr><@PART>$
.
CG Syntax change: We could say something like
MAP (@FVMAIN) TARGET VPart:0 (1* FOO); MAP (@PART) TARGET VPart:1 (-1* BAR);
- This is better done using Subreadings.