Difference between revisions of "Autoconcord"
(Created page with '== Making the bidix concord with the monodices == The apertium-dixtools package contains a tool for automatically make symbols (gender, number, ...) in the bidix agree with …') |
(Link to French page) |
||
(8 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
[[Autoconcord (français)|En français]] |
|||
== Making the bidix concord with the monodices == |
== Making the bidix concord with the monodices == |
||
Line 8: | Line 10: | ||
The tools looks in the monodices for a special ''autoconcord'' comment in the paradigms: |
The tools looks in the monodices for a special ''autoconcord'' comment in the paradigms: |
||
<pre> |
<pre> |
||
<pardef n="ackord__n" |
<pardef n="ackord__n" c="autoconcord:nt,sp"> |
||
<e> <p><l></l> <r><s n="n"/><s n="nt"/><s n="sp"/><s n="ind"/></r></p></e> |
<e> <p><l></l> <r><s n="n"/><s n="nt"/><s n="sp"/><s n="ind"/></r></p></e> |
||
<e> <p><l>et</l> <r><s n="n"/><s n="nt"/><s n="sg"/><s n="def"/></r></p></e> |
<e> <p><l>et</l> <r><s n="n"/><s n="nt"/><s n="sg"/><s n="def"/></r></p></e> |
||
Line 16: | Line 18: | ||
<e lm="avbrott"> <i>avbrott</i><par n="ackord__n"/></e> |
<e lm="avbrott"> <i>avbrott</i><par n="ackord__n"/></e> |
||
<pre> |
</pre> |
||
This comment makes all entries using paradigm ackord__n have the autoconcord symbols 'nt' and 'sp'. |
This comment makes all entries using paradigm ackord__n have the autoconcord symbols 'nt' and 'sp'. |
||
Line 22: | Line 24: | ||
The bidix contains |
The bidix contains |
||
<pre> |
|||
<e><p><l>avbrott<s n="n"/></l><r>afbrydelse<s n="n"/></r></p></e> |
<e><p><l>avbrott<s n="n"/></l><r>afbrydelse<s n="n"/></r></p></e> |
||
</pre> |
|||
Line 37: | Line 41: | ||
<e lm="afbrydelse"> <i>afbrydelse</i><par n="abe__n"/></e> |
<e lm="afbrydelse"> <i>afbrydelse</i><par n="abe__n"/></e> |
||
<pre> |
</pre> |
||
== What does it do? == |
== What does it do? == |
||
Line 60: | Line 63: | ||
and then it will change the bidix entry from |
and then it will change the bidix entry from |
||
<pre> |
|||
<e><p><l>avbrott<s n="n"/></l><r>afbrydelse<s n="n"/></r></p></e> |
<e><p><l>avbrott<s n="n"/></l><r>afbrydelse<s n="n"/></r></p></e> |
||
</pre> |
|||
to include the autocondord paradigms in the bidix: |
to include the autocondord paradigms in the bidix: |
||
<pre> |
|||
<e><p><l>avbrott<s n="n"/></l><r>afbrydelse<s n="n"/></r></p><par n="_nt_ut"/><par n="_sp_sgpl"/></e> |
<e><p><l>avbrott<s n="n"/></l><r>afbrydelse<s n="n"/></r></p><par n="_nt_ut"/><par n="_sp_sgpl"/></e> |
||
</pre> |
|||
Note: The _ prefix in the pardef names have no special meaning, its just for being able to distinguish them. The pardef names used for autoconcord can be anything. |
|||
=== Variations === |
=== Variations === |
||
Some autocondord paradigms are not really usefull. For example sp-sp and sgpl-sgpl are trivial. You can avoid insertion of these paradigms by appending ''''/omit''' to these paradigms in the bidix: |
Some autocondord paradigms are not really usefull to insert. For example sp-sp and sgpl-sgpl are trivial. You can avoid insertion of these paradigms by appending ''''/omit''' to these paradigms in the bidix: |
||
<pre> |
<pre> |
||
Line 87: | Line 97: | ||
Then the corrected bidix entry will be: |
Then the corrected bidix entry will be: |
||
<pre> |
|||
<e><p><l>avbrott<s n="n"/></l><r>afbrydelse<s n="n"/></ |
<e><p><l>avbrott<s n="n"/><s n="nt"/></l><r>afbrydelse<s n="n"/><s n="ut"/></r></p><par n="_sp_sgpl"/></e> |
||
</pre> |
|||
Note that inline/expandable paradigms must have exactly one entry. |
|||
=== The -replace parameter === |
|||
During processing of the bidix entries autoconcord will first delete all paradigms and the symbols to be replaced (usually gender symbols like m, f, nt and ut). This is to support inlining/expansions of the symbols as explained above. |
|||
The -replace parameter specifies which symbols should be deleted if they appear in an entry. Default value is 'm,f,mf,ut,nt,un'. |
|||
If you are the unlucky owner of a language pair where you must maintain the synthetic adjective tag (<sint>) in the bidix, you could write autoconcord rules to fix that (i.e. adding/removing the <sint> in the bidix automatically). In that case you would i.a. pass '''-replace sint''' as parameter. |
|||
== Preparations == |
|||
== Invocation == |
== Invocation == |
||
Line 105: | Line 125: | ||
</pre> |
</pre> |
||
⚫ | |||
=== Examples === |
|||
If you don't provide an output filename the new bidix will be written to the original with a '.new' suffix. |
|||
When you use it its a good idea to format your dictionary first: |
|||
$ apertium-dixtools format apertium-sv-da.sv-da.dix apertium-sv-da.sv-da.dix.formatted |
|||
Check if format is OK: |
|||
$ diff apertium-sv-da.sv-da.dix apertium-sv-da.sv-da.dix.formatted | less |
|||
Then do autoconcord: |
|||
$ mv apertium-sv-da.sv-da.dix.formatted apertium-sv-da.sv-da.dix |
|||
$ apertium-dixtools autoconcord apertium-sv-da.sv-da.dix |
$ apertium-dixtools autoconcord apertium-sv-da.sv-da.dix |
||
And check if autocondord corrections are OK: |
|||
$ |
$ diff apertium-sv-da.sv-da.dix apertium-sv-da.sv-da.dix.new | less |
||
=== Working on other word classes than nouns === |
|||
Default is to only process noun entries in bidix (-prefix n). |
|||
⚫ | |||
To process fex both nouns and adjectives use -prefix n,adj |
|||
== Preparation of a language pair to use autoconcord == |
|||
Manually putting autoconcord comments in paradigm can take some time. |
|||
⚫ | |||
If you don't want to do it manually dixtools can do some of the work for you. |
|||
Here is an example of how |
|||
⚫ | |||
As the command it very seldom used you may want to check the source code, and perhaps even modify it. |
|||
Its method prepareBidixAndMonodixes() in file AutoconcordBidix.java. |
|||
[[Category:Dixtools]] |
[[Category:Dixtools]] |
||
[[Category:Documentation in English]] |
Latest revision as of 08:39, 6 October 2014
Contents
Making the bidix concord with the monodices[edit]
The apertium-dixtools package contains a tool for automatically make symbols (gender, number, ...) in the bidix agree with the monodices.
How does it work?[edit]
Some preparations are needed.
The tools looks in the monodices for a special autoconcord comment in the paradigms:
<pardef n="ackord__n" c="autoconcord:nt,sp"> <e> <p><l></l> <r><s n="n"/><s n="nt"/><s n="sp"/><s n="ind"/></r></p></e> <e> <p><l>et</l> <r><s n="n"/><s n="nt"/><s n="sg"/><s n="def"/></r></p></e> <e> <p><l>en</l> <r><s n="n"/><s n="nt"/><s n="pl"/><s n="def"/></r></p></e> </pardef> ... <e lm="avbrott"> <i>avbrott</i><par n="ackord__n"/></e>
This comment makes all entries using paradigm ackord__n have the autoconcord symbols 'nt' and 'sp'.
The bidix contains
<e><p><l>avbrott<s n="n"/></l><r>afbrydelse<s n="n"/></r></p></e>
The right dix have autoconcord symbols 'ut' and 'sgpl' for the lemma:
<pardef n="abe__n" c="autoconcord:ut,sgpl"> <e> <p><l></l> <r><s n="n"/><s n="ut"/><s n="sg"/><s n="ind"/></r></p></e> <e> <p><l>n</l> <r><s n="n"/><s n="ut"/><s n="sg"/><s n="def"/></r></p></e> <e> <p><l>r</l> <r><s n="n"/><s n="ut"/><s n="pl"/><s n="ind"/></r></p></e> <e> <p><l>rne</l> <r><s n="n"/><s n="ut"/><s n="pl"/><s n="def"/></r></p></e> </pardef> ... <e lm="afbrydelse"> <i>afbrydelse</i><par n="abe__n"/></e>
What does it do?[edit]
Autoconcord will try to make the autoconcord symbols of left dix (nt,sp) concord with those of the right dix (ut,sgpl). It does so by pairing them one by one: nt-ut and sp-sgpl. Then it searches the bidix for paradigms with the special autoconcord comments "autoconcord:nt-ut" and "autoconcord:sp-sgpl":
<pardef n="_nt_ut" c="autoconcord:nt-ut"> <e> <p><l><s n="nt"/></l><r><s n="ut"/></r></p></e> </pardef> <pardef n="_sp_sgpl" c="autoconcord:sp-sgpl"> <e r="LR"><p><l><s n="sp"/><s n="ind"/></l><r><s n="ND"/><s n="ind"/></r></p></e> <e r="RL"><p><l><s n="sp"/><s n="ind"/></l><r><s n="sg"/><s n="ind"/></r></p></e> <e r="RL"><p><l><s n="sp"/><s n="ind"/></l><r><s n="pl"/><s n="ind"/></r></p></e> <e> <p><l><s n="sg"/><s n="def"/></l><r><s n="sg"/><s n="def"/></r></p></e> <e> <p><l><s n="pl"/><s n="def"/></l><r><s n="pl"/><s n="def"/></r></p></e> </pardef>
and then it will change the bidix entry from
<e><p><l>avbrott<s n="n"/></l><r>afbrydelse<s n="n"/></r></p></e>
to include the autocondord paradigms in the bidix:
<e><p><l>avbrott<s n="n"/></l><r>afbrydelse<s n="n"/></r></p><par n="_nt_ut"/><par n="_sp_sgpl"/></e>
Note: The _ prefix in the pardef names have no special meaning, its just for being able to distinguish them. The pardef names used for autoconcord can be anything.
Variations[edit]
Some autocondord paradigms are not really usefull to insert. For example sp-sp and sgpl-sgpl are trivial. You can avoid insertion of these paradigms by appending '/omit to these paradigms in the bidix:
<pardef n="_sgpl_sgpl" c="autoconcord:sgpl-sgpl/omit"> <e> <i></i></e> </pardef> <pardef n="_sp_sp" c="autoconcord:sp-sp/omit"> <e> <i></i></e> </pardef>
If you want to 'inline' a paradigm, that is, have paradims symbols expanded directly in the entry, you add /expand to the autoconcord comment:
<pardef n="_nt_ut" c="autoconcord:nt-ut/expand"> <e> <p><l><s n="nt"/></l><r><s n="ut"/></r></p></e> </pardef>
Then the corrected bidix entry will be:
<e><p><l>avbrott<s n="n"/><s n="nt"/></l><r>afbrydelse<s n="n"/><s n="ut"/></r></p><par n="_sp_sgpl"/></e>
Note that inline/expandable paradigms must have exactly one entry.
The -replace parameter[edit]
During processing of the bidix entries autoconcord will first delete all paradigms and the symbols to be replaced (usually gender symbols like m, f, nt and ut). This is to support inlining/expansions of the symbols as explained above.
The -replace parameter specifies which symbols should be deleted if they appear in an entry. Default value is 'm,f,mf,ut,nt,un'.
If you are the unlucky owner of a language pair where you must maintain the synthetic adjective tag (<sint>) in the bidix, you could write autoconcord rules to fix that (i.e. adding/removing the <sint> in the bidix automatically). In that case you would i.a. pass -replace sint as parameter.
Invocation[edit]
Usage: apertium-dixtools autoconcord [-prefix symbol(s)] [-replace symbols] [-leftMon mon1.dix] [-rightMon mon1.dix] bidix.dix [output.dix] autoconcord -prepare [-leftMon mon1.dix] [-rightMon mon1.dix] bidix.dix Automatically makes symbols (gender, number, ...) in the bidix agree with the monodices in the cases where the concordance beyound doubt can be resolved automatically. -leftMon and -rightMon specify the monodices file names. If not specified they will be guessed according to default naming schemes -prefix Only concord entries starting with this list of comma-separated symbols. Default: -prefix n -replace Replace (remove) these symbols during processing. Default: m,f,mf,ut,nt,un -prepare attempts to detect and insert autoconcord data into the monodices,
There are also a number of generic options.
If you don't provide an output filename the new bidix will be written to the original with a '.new' suffix.
When you use it its a good idea to format your dictionary first:
$ apertium-dixtools format apertium-sv-da.sv-da.dix apertium-sv-da.sv-da.dix.formatted
Check if format is OK:
$ diff apertium-sv-da.sv-da.dix apertium-sv-da.sv-da.dix.formatted | less
Then do autoconcord:
$ mv apertium-sv-da.sv-da.dix.formatted apertium-sv-da.sv-da.dix $ apertium-dixtools autoconcord apertium-sv-da.sv-da.dix
And check if autocondord corrections are OK:
$ diff apertium-sv-da.sv-da.dix apertium-sv-da.sv-da.dix.new | less
Working on other word classes than nouns[edit]
Default is to only process noun entries in bidix (-prefix n). To process fex both nouns and adjectives use -prefix n,adj
Preparation of a language pair to use autoconcord[edit]
Manually putting autoconcord comments in paradigm can take some time. If you don't want to do it manually dixtools can do some of the work for you.
Here is an example of how
$ apertium-dixtools autoconcord -prepare -prefix n -replace m,f,mf,ut,nt,NUMBER:sgpl{sg+pl},NUMBER:sp apertium-sv-da.sv-da.dix
As the command it very seldom used you may want to check the source code, and perhaps even modify it. Its method prepareBidixAndMonodixes() in file AutoconcordBidix.java.