Autoconcord

From Apertium
Revision as of 00:36, 25 October 2010 by Jacob Nordfalk (talk | contribs)
Jump to navigation Jump to search

Making the bidix concord with the monodices

The apertium-dixtools package contains a tool for automatically make symbols (gender, number, ...) in the bidix agree with the monodices.

How does it work?

Some preparations are needed.

The tools looks in the monodices for a special autoconcord comment in the paradigms:

<pardef n="ackord__n" '''c="autoconcord:nt,sp"'''>
  <e>       <p><l></l>          <r><s n="n"/><s n="nt"/><s n="sp"/><s n="ind"/></r></p></e>
  <e>       <p><l>et</l>        <r><s n="n"/><s n="nt"/><s n="sg"/><s n="def"/></r></p></e>
  <e>       <p><l>en</l>        <r><s n="n"/><s n="nt"/><s n="pl"/><s n="def"/></r></p></e>
</pardef>
...

<e lm="avbrott">         <i>avbrott</i><par n="ackord__n"/></e>

This comment makes all entries using paradigm ackord__n have the autoconcord symbols 'nt' and 'sp'.

The bidix contains

<e>

<l>avbrott</l><r>afbrydelse</r>

</e>


The right dix have autoconcord symbols 'ut' and 'sgpl' for the lemma:

<pardef n="abe__n" c="autoconcord:ut,sgpl">
  <e>       <p><l></l>          <r><s n="n"/><s n="ut"/><s n="sg"/><s n="ind"/></r></p></e>
  <e>       <p><l>n</l>         <r><s n="n"/><s n="ut"/><s n="sg"/><s n="def"/></r></p></e>
  <e>       <p><l>r</l>         <r><s n="n"/><s n="ut"/><s n="pl"/><s n="ind"/></r></p></e>
  <e>       <p><l>rne</l>       <r><s n="n"/><s n="ut"/><s n="pl"/><s n="def"/></r></p></e>
</pardef>
...

<e lm="afbrydelse">      <i>afbrydelse</i><par n="abe__n"/></e>


What does it do?

Autoconcord will try to make the autoconcord symbols of left dix (nt,sp) concord with those of the right dix (ut,sgpl). It does so by pairing them one by one: nt-ut and sp-sgpl. Then it searches the bidix for paradigms with the special autoconcord comments "autoconcord:nt-ut" and "autoconcord:sp-sgpl":

<pardef n="_nt_ut" c="autoconcord:nt-ut">
  <e>       <p><l><s n="nt"/></l><r><s n="ut"/></r></p></e>
</pardef>

<pardef n="_sp_sgpl" c="autoconcord:sp-sgpl">
  <e r="LR"><p><l><s n="sp"/><s n="ind"/></l><r><s n="ND"/><s n="ind"/></r></p></e>
  <e r="RL"><p><l><s n="sp"/><s n="ind"/></l><r><s n="sg"/><s n="ind"/></r></p></e>
  <e r="RL"><p><l><s n="sp"/><s n="ind"/></l><r><s n="pl"/><s n="ind"/></r></p></e>
  <e>       <p><l><s n="sg"/><s n="def"/></l><r><s n="sg"/><s n="def"/></r></p></e>
  <e>       <p><l><s n="pl"/><s n="def"/></l><r><s n="pl"/><s n="def"/></r></p></e>
</pardef>

and then it will change the bidix entry from

<e>

<l>avbrott</l><r>afbrydelse</r>

</e>

to include the autocondord paradigms in the bidix:

<e>

<l>avbrott</l><r>afbrydelse</r>

<par n="_nt_ut"/><par n="_sp_sgpl"/></e>

Variations

Some autocondord paradigms are not really usefull. For example sp-sp and sgpl-sgpl are trivial. You can avoid insertion of these paradigms by appending '/omit to these paradigms in the bidix:

<pardef n="_sgpl_sgpl" c="autoconcord:sgpl-sgpl/omit">
  <e>       <i></i></e>
</pardef>

<pardef n="_sp_sp" c="autoconcord:sp-sp/omit">
  <e>       <i></i></e>
</pardef>

If you want to 'inline' a paradigm, that is, have paradims symbols expanded directly in the entry, you add /expand to the autoconcord comment:

<pardef n="_nt_ut" c="autoconcord:nt-ut/expand">
  <e>       <p><l><s n="nt"/></l><r><s n="ut"/></r></p></e>
</pardef>

Then the corrected bidix entry will be:

<e>

<l>avbrott</l><r>afbrydelse</r>

<par n="_nt_ut"/><par n="_sp_sgpl"/></e>

Preparations

Invocation

Usage: apertium-dixtools autoconcord [-prefix symbol(s)] [-replace symbols]  [-leftMon mon1.dix] [-rightMon mon1.dix] bidix.dix [output.dix]
autoconcord -prepare [-leftMon mon1.dix] [-rightMon mon1.dix] bidix.dix

Automatically makes symbols (gender, number, ...) in the bidix agree with the monodices
in the cases where the concordance beyound doubt can be resolved automatically.
 -leftMon and -rightMon specify the monodices file names. If not specified they will be guessed according to default naming schemes
 -prefix Only concord entries starting with this list of comma-separated symbols. Default: -prefix n
 -replace Replace (remove) these symbols during processing. Default: m,f,mf,ut,nt,un
 -prepare attempts to detect and insert autoconcord data into the monodices, 

Examples

$ apertium-dixtools autoconcord apertium-sv-da.sv-da.dix


$ apertium-dixtools autoconcord -prefix n -replace ut,nt,un apertium-sv-da.sv-da.dix apertium-sv-da.sv-da.dix.new


$ apertium-dixtools autoconcord -prepare -prefix n -replace m,f,mf,ut,nt,NUMBER:sgpl{sg+pl},NUMBER:sp apertium-sv-da.sv-da.dix


There are also a number of generic options