Autoconcord

From Apertium
Revision as of 00:36, 25 October 2010 by Jacob Nordfalk (talk | contribs) (Created page with '== Making the bidix concord with the monodices == The apertium-dixtools package contains a tool for automatically make symbols (gender, number, ...) in the bidix agree with …')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Making the bidix concord with the monodices

The apertium-dixtools package contains a tool for automatically make symbols (gender, number, ...) in the bidix agree with the monodices.

How does it work?

Some preparations are needed.

The tools looks in the monodices for a special autoconcord comment in the paradigms:

<pardef n="ackord__n" '''c="autoconcord:nt,sp"'''>
  <e>       <p><l></l>          <r><s n="n"/><s n="nt"/><s n="sp"/><s n="ind"/></r></p></e>
  <e>       <p><l>et</l>        <r><s n="n"/><s n="nt"/><s n="sg"/><s n="def"/></r></p></e>
  <e>       <p><l>en</l>        <r><s n="n"/><s n="nt"/><s n="pl"/><s n="def"/></r></p></e>
</pardef>
...

<e lm="avbrott">         <i>avbrott</i><par n="ackord__n"/></e>
<pre>

This comment makes all entries using paradigm ackord__n have the autoconcord symbols 'nt' and 'sp'.

The bidix contains

 <e><p><l>avbrott<s n="n"/></l><r>afbrydelse<s n="n"/></r></p></e>


The right dix have autoconcord symbols 'ut' and 'sgpl' for the lemma:

<pre>
<pardef n="abe__n" c="autoconcord:ut,sgpl">
  <e>       <p><l></l>          <r><s n="n"/><s n="ut"/><s n="sg"/><s n="ind"/></r></p></e>
  <e>       <p><l>n</l>         <r><s n="n"/><s n="ut"/><s n="sg"/><s n="def"/></r></p></e>
  <e>       <p><l>r</l>         <r><s n="n"/><s n="ut"/><s n="pl"/><s n="ind"/></r></p></e>
  <e>       <p><l>rne</l>       <r><s n="n"/><s n="ut"/><s n="pl"/><s n="def"/></r></p></e>
</pardef>
...

<e lm="afbrydelse">      <i>afbrydelse</i><par n="abe__n"/></e>
<pre>


== What does it do? ==

Autoconcord will try to make the autoconcord symbols of left dix (nt,sp) concord with those of the right dix (ut,sgpl). It does so by pairing them one by one: nt-ut and sp-sgpl. 
Then it searches the bidix for paradigms with the special autoconcord comments "autoconcord:nt-ut" and "autoconcord:sp-sgpl":

<pre>
<pardef n="_nt_ut" c="autoconcord:nt-ut">
  <e>       <p><l><s n="nt"/></l><r><s n="ut"/></r></p></e>
</pardef>

<pardef n="_sp_sgpl" c="autoconcord:sp-sgpl">
  <e r="LR"><p><l><s n="sp"/><s n="ind"/></l><r><s n="ND"/><s n="ind"/></r></p></e>
  <e r="RL"><p><l><s n="sp"/><s n="ind"/></l><r><s n="sg"/><s n="ind"/></r></p></e>
  <e r="RL"><p><l><s n="sp"/><s n="ind"/></l><r><s n="pl"/><s n="ind"/></r></p></e>
  <e>       <p><l><s n="sg"/><s n="def"/></l><r><s n="sg"/><s n="def"/></r></p></e>
  <e>       <p><l><s n="pl"/><s n="def"/></l><r><s n="pl"/><s n="def"/></r></p></e>
</pardef>

and then it will change the bidix entry from

<e>

<l>avbrott</l><r>afbrydelse</r>

</e>

to include the autocondord paradigms in the bidix:

<e>

<l>avbrott</l><r>afbrydelse</r>

<par n="_nt_ut"/><par n="_sp_sgpl"/></e>

Variations

Some autocondord paradigms are not really usefull. For example sp-sp and sgpl-sgpl are trivial. You can avoid insertion of these paradigms by appending '/omit to these paradigms in the bidix:

<pardef n="_sgpl_sgpl" c="autoconcord:sgpl-sgpl/omit">
  <e>       <i></i></e>
</pardef>

<pardef n="_sp_sp" c="autoconcord:sp-sp/omit">
  <e>       <i></i></e>
</pardef>

If you want to 'inline' a paradigm, that is, have paradims symbols expanded directly in the entry, you add /expand to the autoconcord comment:

<pardef n="_nt_ut" c="autoconcord:nt-ut/expand">
  <e>       <p><l><s n="nt"/></l><r><s n="ut"/></r></p></e>
</pardef>

Then the corrected bidix entry will be:

<e>

<l>avbrott</l><r>afbrydelse</r>

<par n="_nt_ut"/><par n="_sp_sgpl"/></e>

Preparations

Invocation

Usage: apertium-dixtools autoconcord [-prefix symbol(s)] [-replace symbols]  [-leftMon mon1.dix] [-rightMon mon1.dix] bidix.dix [output.dix]
autoconcord -prepare [-leftMon mon1.dix] [-rightMon mon1.dix] bidix.dix

Automatically makes symbols (gender, number, ...) in the bidix agree with the monodices
in the cases where the concordance beyound doubt can be resolved automatically.
 -leftMon and -rightMon specify the monodices file names. If not specified they will be guessed according to default naming schemes
 -prefix Only concord entries starting with this list of comma-separated symbols. Default: -prefix n
 -replace Replace (remove) these symbols during processing. Default: m,f,mf,ut,nt,un
 -prepare attempts to detect and insert autoconcord data into the monodices, 

Examples

$ apertium-dixtools autoconcord apertium-sv-da.sv-da.dix


$ apertium-dixtools autoconcord -prefix n -replace ut,nt,un apertium-sv-da.sv-da.dix apertium-sv-da.sv-da.dix.new


$ apertium-dixtools autoconcord -prepare -prefix n -replace m,f,mf,ut,nt,NUMBER:sgpl{sg+pl},NUMBER:sp apertium-sv-da.sv-da.dix


There are also a number of generic options