Difference between revisions of "Talk:Apertium and Constraint Grammar"

From Apertium
Jump to navigation Jump to search
(whatever this was it's outdated now)
 
(2 intermediate revisions by the same user not shown)
Line 116: Line 116:
   
 
==Current bugs==
 
==Current bugs==
 
== Compiling vislcg3 on Mac ==
 
 
I kept getting this error
 
<pre>
 
ld: symbol(s) not found
 
collect2: ld returned 1
 
</pre>
 
on my Mac. According to [http://www.justatheory.com/computers/databases/postgresql/howto_avoid_tigers_readline.html this], it's related to GNU readline not being discovered. I tried installing GNU readline from both MacPorts and source, no help, tried using CPPFLAGS=-I/usr/local/include LDFLAGS=-L/usr/local/lib, even moving Apple's readline files out of the way, finally gave up and tried the ./compile-mac.sh script, but this didn't give me the cg-comp and cg-proc files; but then on my next try, after make distclean, it compiled (no CPPFLAGS/LDFLAGS). I still don't know why, but hey, it works. {{unsigned|Unhammer}}
 
 
:Strange. I don't have a mac to test this on, but on the other macs I've tried it compiled without problems. (Apart from having to install ICU) - [[User:Francis Tyers|Francis&nbsp;Tyers]] 22:42, 21 March 2009 (UTC)
 
::After vislcg3 was tuned to Mac (in 2006-2007?) I '''never''' have any problems in compiling it on the Mac (and never saw that error msg). The only tweak is, as Francis says, that ICU must be installed. If the problem is still there, please write more, or contact Tino Dideriksen. [[User:Trondtr|Trondtr]] 06:22, 26 August 2009 (UTC).
 
   
 
== Wishlist ==
 
== Wishlist ==
=== Ability to specify where a MAPPING tag should be added in the tag_list ===
+
=== <strike>Ability to specify where a MAPPING tag should be added in the tag_list</strike> ===
 
Tags in vislcg3 are "unordered", but the input order is preserved, and MAPPING tags are added to the end. However, since Apertium matches longest left-to-right strings, we may have to disambiguate between
 
Tags in vislcg3 are "unordered", but the input order is preserved, and MAPPING tags are added to the end. However, since Apertium matches longest left-to-right strings, we may have to disambiguate between
 
<code>ganga# i<vblex></code> and <code>ganga<vblex>+i<pr></code>. The first one is easy, "ganga# i" is seen as the baseform and there is just one tag, vblex, we might get something like <code>ganga# i<vblex><@FVMAIN></code>. The second one is worse. The + means that the multiword should be split into two before transfer, <code>^ganga<vblex>$ ^i<pr>$</code>; but if the mapping tags go to the end, or even after the first word, we'll get <code>^ganga<vblex><@FVMAIN><@PART>+i<pr>$</code> or <code>^ganga<vblex>+i<pr><@FVMAIN><@PART>$</code>, but we want <code>^ganga<vblex><@FVMAIN>+i<pr><@PART>$</code>.
 
<code>ganga# i<vblex></code> and <code>ganga<vblex>+i<pr></code>. The first one is easy, "ganga# i" is seen as the baseform and there is just one tag, vblex, we might get something like <code>ganga# i<vblex><@FVMAIN></code>. The second one is worse. The + means that the multiword should be split into two before transfer, <code>^ganga<vblex>$ ^i<pr>$</code>; but if the mapping tags go to the end, or even after the first word, we'll get <code>^ganga<vblex><@FVMAIN><@PART>+i<pr>$</code> or <code>^ganga<vblex>+i<pr><@FVMAIN><@PART>$</code>, but we want <code>^ganga<vblex><@FVMAIN>+i<pr><@PART>$</code>.
   
CG Syntax change:
+
'''CG Syntax change''':
 
We could say something like
 
We could say something like
 
<pre>
 
<pre>
Line 141: Line 129:
 
</pre>
 
</pre>
   
  +
: This is better done using [[Subreadings]].
Implementation:
 
In the file src/Grammar.cpp, tags are added to the end of list tags_list as they are read (this is the order we follow in outputting the tags). Perhaps MAPPING tags could be spliced into a certain position in this list?
 

Latest revision as of 11:00, 18 September 2014

  • Window = whole of what we're looking at; several sentences at the same time.
  • SingleWindow = one sentence (for want of a better term). Usually there's 3 SingleWindow in a Window, but that's runtime defined. Can be anywhere from 1 to hundreds set with --num-windows
  • Cohort = one

Testing[edit]

Regression test status as of 22:07, 17 April 2008 (BST)


Running tests...
T_AnyMinusSome: Fail.
T_Barrier: Success.
T_BasicAppend: Success.
T_BasicContextTest: Success.
T_BasicDelimit: Success.
T_BasicIff: Success.
T_BasicRemove: Success.
T_BasicSelect: Success.
T_BasicSubstitute: Success.
T_CarefulBarrier: Fail.
T_CompositeSelect: Success.
T_DontMatchEmptySet: Fail.
T_EndlessSelect: Fail.
T_MapAdd_Different: Fail.
T_MatchBaseform: Success.
T_MatchWordform: Success.
T_MultipleSections: Success.
T_NegatedContextTest: Success.
T_RegExp_Map: Fail.
T_RegExp_Select: Fail.
T_RemoveSingleTag: Fail.
T_ScanningTests: Success.
T_Sections: Fail.
T_SetOp_FailFast: Success.
T_SetOp_OR: Success.
T_SpaceInWord: Success.
T_SuperBlanks: Success.
T_Unification: Fail.
T_UnknownWord: Success.

Regression test status as of 10:46, 3 July 2008 (UTC)

T_AnyMinusSome: Fail.
T_Barrier: Success.
T_BasicAppend: Fail.
T_BasicContextTest: Success.
T_BasicDelimit: Success.
T_BasicIff: Success.
T_BasicRemove: Success.
T_BasicSelect: Success.
T_BasicSubstitute: Success.
T_CarefulBarrier: Fail.
T_CompositeSelect: Success.
T_DontMatchEmptySet: Fail.
T_EndlessSelect: Fail.
T_MapAdd_Different: Fail.
T_MatchBaseform: Success.
T_MatchWordform: Success.
T_MultipleSections: Success.
T_MultiWords: Success.
T_NegatedContextTest: Success.
T_RegExp_Map: Fail.
T_RegExp_Select: Fail.
T_RemoveSingleTag: Fail.
T_ScanningTests: Fail.
T_Sections: Fail.
T_SetOp_FailFast: Success.
T_SetOp_OR: Success.
T_SpaceInWord: Success.
T_SuperBlanks: Success.
T_SuperBlanksNewline: Success.
T_Unification: Fail.
T_UnknownWord: Success.

Regression test status as of 07:30, 17 July 2008 (UTC)

T_AnyMinusSome: Success.
T_Barrier: Success.
T_BasicAppend: Success.
T_BasicContextTest: Success.
T_BasicDelimit: Success.
T_BasicIff: Success.
T_BasicRemove: Success.
T_BasicSelect: Success.
T_BasicSubstitute: Success.
T_CarefulBarrier: Success.
T_CompositeSelect: Success.
T_DontMatchEmptySet: Success.
T_EndlessSelect: Fail.
T_Joiner: Success.
T_MapAdd_Different: Success.
T_MatchBaseform: Success.
T_MatchWordform: Success.
T_MultipleSections: Success.
T_MultiWords: Success.
T_NegatedContextTest: Success.
T_RegExp_Map: Success.
T_RegExp_Select: Success.
T_RegExp_Substitute: Success.
T_RemoveSingleTag: Fail.
T_ScanningTests: Success.
T_Sections: Fail.
T_SetOp_FailFast: Success.
T_SetOp_OR: Success.
T_SpaceInWord: Success.
T_SuperBlanks: Success.
T_SuperBlanksNewline: Success.
T_Unification: Fail.
T_UnknownWord: Success.

Current bugs[edit]

Wishlist[edit]

Ability to specify where a MAPPING tag should be added in the tag_list[edit]

Tags in vislcg3 are "unordered", but the input order is preserved, and MAPPING tags are added to the end. However, since Apertium matches longest left-to-right strings, we may have to disambiguate between ganga# i<vblex> and ganga<vblex>+i<pr>. The first one is easy, "ganga# i" is seen as the baseform and there is just one tag, vblex, we might get something like ganga# i<vblex><@FVMAIN>. The second one is worse. The + means that the multiword should be split into two before transfer, ^ganga<vblex>$ ^i<pr>$; but if the mapping tags go to the end, or even after the first word, we'll get ^ganga<vblex><@FVMAIN><@PART>+i<pr>$ or ^ganga<vblex>+i<pr><@FVMAIN><@PART>$, but we want ^ganga<vblex><@FVMAIN>+i<pr><@PART>$.

CG Syntax change: We could say something like

MAP (@FVMAIN) TARGET VPart:0 (1* FOO);
MAP (@PART) TARGET VPart:1 (-1* BAR);
This is better done using Subreadings.