Talk:Subreadings in Constraint Grammar

From Apertium
Jump to navigation Jump to search

More discussion


<TinoDidriksen> +parts are hidden, currently...
<TinoDidriksen> Or rather, stuff before the + is hidden.
<|krvoje|> o_0
<TinoDidriksen> Also, the + is not part of the tag.
<TinoDidriksen> +htjeeti becomes a baseform "htjeeti"
<spectie> our thing is a horrible hack :((((
<TinoDidriksen> + is a horrible hack...
<spectie> it's great for apertium, but just doesn't fit into CG yet
<spectie> what cg-proc should do
<spectie> is treat the parts separated by '+' as separate cohorts
<TinoDidriksen> That can relatively easily be done, but sure about that? http://wiki.apertium.org/wiki/Subreadings_in_Constraint_Grammar did not mention that option.
You can't treat clitics as separate cohorts, how would that even work? Say you have ^foo bar/foo<tags>+bar<tags>/foobar<tags>$, should it be treated as "foo<tags>/foobar<tags> followed by a single reading bar<tags>", or "foo<tags> followed by a bar<tags>/foobar<tags>" ? That sounds like a type of complexity we don't want. unhammer 07:53, 12 October 2011 (UTC)
<spectie> the problem is that sometimes we don't want that :))
<spectie> but i think that's a problem with apertium
<TinoDidriksen> Ah
<spectie> we should distinguish + and #
<spectie> + should be for separate cohorts
<spectie> # for separate parts of the same cohort
<spectie> or something
<spectie> e.g. compound words get #
<spectie> and attached clitics get +
<|krvoje|> bbiab, lunch
<spectie> ok
<TinoDidriksen> So, + should be regarded as a soft cohort split thingy...
<spectie> yes
<spectie> but we should also check with unhammer

I really think it makes sense to be able to distinguish these two things, we can't use '+' for both:

  1. Attached clitics / joined words (each word needs to be referred to separately)
  2. Compound words (we're only interested in the head)

I suggest coming up with some other thing for compounds (perhaps ~, although I think we've discussed this before ?

- Francis Tyers 13:52, 11 July 2011 (UTC)

Some file

SECTION

SUBSTITUTE ("од") ("од:5") ("од") (-1 (adj));


^помладо/adj<pref><comp>+млад<adj><nt><sg><nom><ind>$ ^од/од<pr>$ ^30/30<num>$^./.<sent>$
MAP (@+FMAINV) TARGET VerbFin ;

^n'eus/ne<adv>+bezañ<vblex><pri><impers><sp>/ne<adv>+kaout<vblex><pri><p1><pl>$ ^kador/kador<n><f><sg>$ ^ebet/ebet<adv>$^./.<sent>$