Difference between revisions of "Autoconcord"

From Apertium
Jump to navigation Jump to search
(Created page with '== Making the bidix concord with the monodices == The apertium-dixtools package contains a tool for automatically make symbols (gender, number, ...) in the bidix agree with …')
 
(Link to French page)
 
(8 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[Autoconcord (français)|En français]]

== Making the bidix concord with the monodices ==
== Making the bidix concord with the monodices ==


Line 8: Line 10:
The tools looks in the monodices for a special ''autoconcord'' comment in the paradigms:
The tools looks in the monodices for a special ''autoconcord'' comment in the paradigms:
<pre>
<pre>
<pardef n="ackord__n" '''c="autoconcord:nt,sp"'''>
<pardef n="ackord__n" c="autoconcord:nt,sp">
<e> <p><l></l> <r><s n="n"/><s n="nt"/><s n="sp"/><s n="ind"/></r></p></e>
<e> <p><l></l> <r><s n="n"/><s n="nt"/><s n="sp"/><s n="ind"/></r></p></e>
<e> <p><l>et</l> <r><s n="n"/><s n="nt"/><s n="sg"/><s n="def"/></r></p></e>
<e> <p><l>et</l> <r><s n="n"/><s n="nt"/><s n="sg"/><s n="def"/></r></p></e>
Line 16: Line 18:


<e lm="avbrott"> <i>avbrott</i><par n="ackord__n"/></e>
<e lm="avbrott"> <i>avbrott</i><par n="ackord__n"/></e>
<pre>
</pre>


This comment makes all entries using paradigm ackord__n have the autoconcord symbols 'nt' and 'sp'.
This comment makes all entries using paradigm ackord__n have the autoconcord symbols 'nt' and 'sp'.
Line 22: Line 24:
The bidix contains
The bidix contains


<pre>
<e><p><l>avbrott<s n="n"/></l><r>afbrydelse<s n="n"/></r></p></e>
<e><p><l>avbrott<s n="n"/></l><r>afbrydelse<s n="n"/></r></p></e>
</pre>




Line 37: Line 41:


<e lm="afbrydelse"> <i>afbrydelse</i><par n="abe__n"/></e>
<e lm="afbrydelse"> <i>afbrydelse</i><par n="abe__n"/></e>
<pre>
</pre>



== What does it do? ==
== What does it do? ==
Line 60: Line 63:


and then it will change the bidix entry from
and then it will change the bidix entry from
<pre>
<e><p><l>avbrott<s n="n"/></l><r>afbrydelse<s n="n"/></r></p></e>
<e><p><l>avbrott<s n="n"/></l><r>afbrydelse<s n="n"/></r></p></e>
</pre>
to include the autocondord paradigms in the bidix:
to include the autocondord paradigms in the bidix:
<pre>
<e><p><l>avbrott<s n="n"/></l><r>afbrydelse<s n="n"/></r></p><par n="_nt_ut"/><par n="_sp_sgpl"/></e>
<e><p><l>avbrott<s n="n"/></l><r>afbrydelse<s n="n"/></r></p><par n="_nt_ut"/><par n="_sp_sgpl"/></e>
</pre>


Note: The _ prefix in the pardef names have no special meaning, its just for being able to distinguish them. The pardef names used for autoconcord can be anything.


=== Variations ===
=== Variations ===


Some autocondord paradigms are not really usefull. For example sp-sp and sgpl-sgpl are trivial. You can avoid insertion of these paradigms by appending ''''/omit''' to these paradigms in the bidix:
Some autocondord paradigms are not really usefull to insert. For example sp-sp and sgpl-sgpl are trivial. You can avoid insertion of these paradigms by appending ''''/omit''' to these paradigms in the bidix:


<pre>
<pre>
Line 87: Line 97:


Then the corrected bidix entry will be:
Then the corrected bidix entry will be:
<pre>
<e><p><l>avbrott<s n="n"/></l><r>afbrydelse<s n="n"/></r></p><par n="_nt_ut"/><par n="_sp_sgpl"/></e>
<e><p><l>avbrott<s n="n"/><s n="nt"/></l><r>afbrydelse<s n="n"/><s n="ut"/></r></p><par n="_sp_sgpl"/></e>
</pre>

Note that inline/expandable paradigms must have exactly one entry.

=== The -replace parameter ===
During processing of the bidix entries autoconcord will first delete all paradigms and the symbols to be replaced (usually gender symbols like m, f, nt and ut). This is to support inlining/expansions of the symbols as explained above.

The -replace parameter specifies which symbols should be deleted if they appear in an entry. Default value is 'm,f,mf,ut,nt,un'.

If you are the unlucky owner of a language pair where you must maintain the synthetic adjective tag (<sint>) in the bidix, you could write autoconcord rules to fix that (i.e. adding/removing the <sint> in the bidix automatically). In that case you would i.a. pass '''-replace sint''' as parameter.


== Preparations ==


== Invocation ==
== Invocation ==
Line 105: Line 125:
</pre>
</pre>


There are also a number of [[apertium-dixtools#Usage|generic options]].
=== Examples ===


If you don't provide an output filename the new bidix will be written to the original with a '.new' suffix.

When you use it its a good idea to format your dictionary first:

$ apertium-dixtools format apertium-sv-da.sv-da.dix apertium-sv-da.sv-da.dix.formatted
Check if format is OK:
$ diff apertium-sv-da.sv-da.dix apertium-sv-da.sv-da.dix.formatted | less

Then do autoconcord:

$ mv apertium-sv-da.sv-da.dix.formatted apertium-sv-da.sv-da.dix
$ apertium-dixtools autoconcord apertium-sv-da.sv-da.dix
$ apertium-dixtools autoconcord apertium-sv-da.sv-da.dix


And check if autocondord corrections are OK:


$ apertium-dixtools autoconcord -prefix n -replace ut,nt,un apertium-sv-da.sv-da.dix apertium-sv-da.sv-da.dix.new
$ diff apertium-sv-da.sv-da.dix apertium-sv-da.sv-da.dix.new | less


=== Working on other word classes than nouns ===


Default is to only process noun entries in bidix (-prefix n).
$ apertium-dixtools autoconcord -prepare -prefix n -replace m,f,mf,ut,nt,NUMBER:sgpl{sg+pl},NUMBER:sp apertium-sv-da.sv-da.dix
To process fex both nouns and adjectives use -prefix n,adj


== Preparation of a language pair to use autoconcord ==


Manually putting autoconcord comments in paradigm can take some time.
There are also a number of [[apertium-dixtools#Usage|generic options]]
If you don't want to do it manually dixtools can do some of the work for you.

Here is an example of how
$ apertium-dixtools autoconcord -prepare -prefix n -replace m,f,mf,ut,nt,NUMBER:sgpl{sg+pl},NUMBER:sp apertium-sv-da.sv-da.dix


As the command it very seldom used you may want to check the source code, and perhaps even modify it.
Its method prepareBidixAndMonodixes() in file AutoconcordBidix.java.


[[Category:Dixtools]]
[[Category:Dixtools]]
[[Category:Documentation in English]]

Latest revision as of 08:39, 6 October 2014

En français

Making the bidix concord with the monodices[edit]

The apertium-dixtools package contains a tool for automatically make symbols (gender, number, ...) in the bidix agree with the monodices.

How does it work?[edit]

Some preparations are needed.

The tools looks in the monodices for a special autoconcord comment in the paradigms:

<pardef n="ackord__n" c="autoconcord:nt,sp">
  <e>       <p><l></l>          <r><s n="n"/><s n="nt"/><s n="sp"/><s n="ind"/></r></p></e>
  <e>       <p><l>et</l>        <r><s n="n"/><s n="nt"/><s n="sg"/><s n="def"/></r></p></e>
  <e>       <p><l>en</l>        <r><s n="n"/><s n="nt"/><s n="pl"/><s n="def"/></r></p></e>
</pardef>
...

<e lm="avbrott">         <i>avbrott</i><par n="ackord__n"/></e>

This comment makes all entries using paradigm ackord__n have the autoconcord symbols 'nt' and 'sp'.

The bidix contains

 <e><p><l>avbrott<s n="n"/></l><r>afbrydelse<s n="n"/></r></p></e>


The right dix have autoconcord symbols 'ut' and 'sgpl' for the lemma:

<pardef n="abe__n" c="autoconcord:ut,sgpl">
  <e>       <p><l></l>          <r><s n="n"/><s n="ut"/><s n="sg"/><s n="ind"/></r></p></e>
  <e>       <p><l>n</l>         <r><s n="n"/><s n="ut"/><s n="sg"/><s n="def"/></r></p></e>
  <e>       <p><l>r</l>         <r><s n="n"/><s n="ut"/><s n="pl"/><s n="ind"/></r></p></e>
  <e>       <p><l>rne</l>       <r><s n="n"/><s n="ut"/><s n="pl"/><s n="def"/></r></p></e>
</pardef>
...

<e lm="afbrydelse">      <i>afbrydelse</i><par n="abe__n"/></e>

What does it do?[edit]

Autoconcord will try to make the autoconcord symbols of left dix (nt,sp) concord with those of the right dix (ut,sgpl). It does so by pairing them one by one: nt-ut and sp-sgpl. Then it searches the bidix for paradigms with the special autoconcord comments "autoconcord:nt-ut" and "autoconcord:sp-sgpl":

<pardef n="_nt_ut" c="autoconcord:nt-ut">
  <e>       <p><l><s n="nt"/></l><r><s n="ut"/></r></p></e>
</pardef>

<pardef n="_sp_sgpl" c="autoconcord:sp-sgpl">
  <e r="LR"><p><l><s n="sp"/><s n="ind"/></l><r><s n="ND"/><s n="ind"/></r></p></e>
  <e r="RL"><p><l><s n="sp"/><s n="ind"/></l><r><s n="sg"/><s n="ind"/></r></p></e>
  <e r="RL"><p><l><s n="sp"/><s n="ind"/></l><r><s n="pl"/><s n="ind"/></r></p></e>
  <e>       <p><l><s n="sg"/><s n="def"/></l><r><s n="sg"/><s n="def"/></r></p></e>
  <e>       <p><l><s n="pl"/><s n="def"/></l><r><s n="pl"/><s n="def"/></r></p></e>
</pardef>

and then it will change the bidix entry from

 <e><p><l>avbrott<s n="n"/></l><r>afbrydelse<s n="n"/></r></p></e>

to include the autocondord paradigms in the bidix:

 <e><p><l>avbrott<s n="n"/></l><r>afbrydelse<s n="n"/></r></p><par n="_nt_ut"/><par n="_sp_sgpl"/></e>


Note: The _ prefix in the pardef names have no special meaning, its just for being able to distinguish them. The pardef names used for autoconcord can be anything.

Variations[edit]

Some autocondord paradigms are not really usefull to insert. For example sp-sp and sgpl-sgpl are trivial. You can avoid insertion of these paradigms by appending '/omit to these paradigms in the bidix:

<pardef n="_sgpl_sgpl" c="autoconcord:sgpl-sgpl/omit">
  <e>       <i></i></e>
</pardef>

<pardef n="_sp_sp" c="autoconcord:sp-sp/omit">
  <e>       <i></i></e>
</pardef>

If you want to 'inline' a paradigm, that is, have paradims symbols expanded directly in the entry, you add /expand to the autoconcord comment:

<pardef n="_nt_ut" c="autoconcord:nt-ut/expand">
  <e>       <p><l><s n="nt"/></l><r><s n="ut"/></r></p></e>
</pardef>

Then the corrected bidix entry will be:

 <e><p><l>avbrott<s n="n"/><s n="nt"/></l><r>afbrydelse<s n="n"/><s n="ut"/></r></p><par n="_sp_sgpl"/></e>

Note that inline/expandable paradigms must have exactly one entry.

The -replace parameter[edit]

During processing of the bidix entries autoconcord will first delete all paradigms and the symbols to be replaced (usually gender symbols like m, f, nt and ut). This is to support inlining/expansions of the symbols as explained above.

The -replace parameter specifies which symbols should be deleted if they appear in an entry. Default value is 'm,f,mf,ut,nt,un'.

If you are the unlucky owner of a language pair where you must maintain the synthetic adjective tag (<sint>) in the bidix, you could write autoconcord rules to fix that (i.e. adding/removing the <sint> in the bidix automatically). In that case you would i.a. pass -replace sint as parameter.


Invocation[edit]

Usage: apertium-dixtools autoconcord [-prefix symbol(s)] [-replace symbols]  [-leftMon mon1.dix] [-rightMon mon1.dix] bidix.dix [output.dix]
autoconcord -prepare [-leftMon mon1.dix] [-rightMon mon1.dix] bidix.dix

Automatically makes symbols (gender, number, ...) in the bidix agree with the monodices
in the cases where the concordance beyound doubt can be resolved automatically.
 -leftMon and -rightMon specify the monodices file names. If not specified they will be guessed according to default naming schemes
 -prefix Only concord entries starting with this list of comma-separated symbols. Default: -prefix n
 -replace Replace (remove) these symbols during processing. Default: m,f,mf,ut,nt,un
 -prepare attempts to detect and insert autoconcord data into the monodices, 

There are also a number of generic options.

If you don't provide an output filename the new bidix will be written to the original with a '.new' suffix.

When you use it its a good idea to format your dictionary first:

$ apertium-dixtools format apertium-sv-da.sv-da.dix apertium-sv-da.sv-da.dix.formatted

Check if format is OK:

$ diff  apertium-sv-da.sv-da.dix apertium-sv-da.sv-da.dix.formatted | less

Then do autoconcord:

$ mv apertium-sv-da.sv-da.dix.formatted apertium-sv-da.sv-da.dix
$ apertium-dixtools autoconcord apertium-sv-da.sv-da.dix

And check if autocondord corrections are OK:

$ diff  apertium-sv-da.sv-da.dix apertium-sv-da.sv-da.dix.new | less

Working on other word classes than nouns[edit]

Default is to only process noun entries in bidix (-prefix n). To process fex both nouns and adjectives use -prefix n,adj

Preparation of a language pair to use autoconcord[edit]

Manually putting autoconcord comments in paradigm can take some time. If you don't want to do it manually dixtools can do some of the work for you.

Here is an example of how

$ apertium-dixtools autoconcord -prepare -prefix n -replace m,f,mf,ut,nt,NUMBER:sgpl{sg+pl},NUMBER:sp apertium-sv-da.sv-da.dix

As the command it very seldom used you may want to check the source code, and perhaps even modify it. Its method prepareBidixAndMonodixes() in file AutoconcordBidix.java.