Difference between revisions of "Basque and Spanish"

From Apertium
Jump to navigation Jump to search
(Some tools)
Line 46: Line 46:
   
 
[[Category:Discussions]]
 
[[Category:Discussions]]
  +
  +
= Some tools =
  +
  +
Do
  +
  +
svn co https://apertium.svn.sourceforge.net/svnroot/apertium/crossdics
  +
  +
to have access to the scripts.
  +
  +
'''Complete information about gender'''
  +
  +
Information about gender was not available in the bilingual dictionaries. The '''add-gender''' script (uses crossdics module) has been used to complete this information based on a morphological dictionary.
  +
  +
$ ./add-gender <morph-source> <bil> <out>
  +
  +
'''Assign paradigm'''
  +
  +
New entries available in <bil> (those on the left side) can be inserted in <out> based on the information available in the morphological dictionary <morph-source>.
  +
  +
$ ./assignparadigm <morph-source> <bil> <out>
  +
  +
Example:
  +
  +
Entry in eu_changes_morph.xml:
  +
<pre>
  +
...
  +
<e><p><l>goi-indize[IZE][ARR]</l><r>indize[IZE][ARR]</r></p></e>
  +
...
  +
</pre>
  +
  +
Paradigm of ''indize'' is assigned to ''goi-indize''.

Revision as of 14:21, 26 June 2007

The idea

Mireia Ginestí is recycling Matxin data to build an Apertium-based system that would allow Spanish speakers to read Basque newspapers.

Some of the morphological choices in Matxin will be revised.

This document is to keep track of decisions and to raise questions

Deklinabidea?

For instance, "declination" will be treated as postpositions:

gizonentzat : gizon.n + a.det.pl + tzat.post

In principle, the absolutive will not be marked:

gizonak : gizon.n + a.det.pl

Determiners and postpositions will be given mnemonic lemmas, one per case.

gizonei : gizon.n + a.det.pl + i.post
Mirenekin : Miren.NP + kin.post
katuarentzat : katu.n + a.det.sg + tzat.post

Postpositions which can modify a noun phrase will be marked explicitly as ko

etxeetako: etxe.n + a.det.pl + ko.post.ko
Mikelekin : Mikel.NP + kin.post
Mikelekiko : Mikel.NP + kin.post.ko

Possessives?

A problem appears with "possessives" like 'nire', 'gure', 'zuen', 'haien', 'bere'. Should they be treated as preadjectives ('izenlagun') or as genitive constructs:

nire: ni.pron.sg + ren.post.ko
haien : hura.pron.pl + ren.post.ko

Some tools

Do

 svn co https://apertium.svn.sourceforge.net/svnroot/apertium/crossdics

to have access to the scripts.

Complete information about gender

Information about gender was not available in the bilingual dictionaries. The add-gender script (uses crossdics module) has been used to complete this information based on a morphological dictionary.

$ ./add-gender  <morph-source> <bil> <out>

Assign paradigm

New entries available in <bil> (those on the left side) can be inserted in <out> based on the information available in the morphological dictionary <morph-source>.

$ ./assignparadigm  <morph-source> <bil> <out>

Example:

Entry in eu_changes_morph.xml:

...
<e><p><l>goi-indize[IZE][ARR]</l><r>indize[IZE][ARR]</r></p></e>
...

Paradigm of indize is assigned to goi-indize.