Difference between revisions of "Talk:Matxin 1.0 New Language Pair HOWTO"

From Apertium
Jump to navigation Jump to search
Line 252: Line 252:


:There is an encoding error here, which is why it is not translating 'él'. - [[User:Francis Tyers|Francis Tyers]] 07:53, 17 June 2009 (UTC)
:There is an encoding error here, which is why it is not translating 'él'. - [[User:Francis Tyers|Francis Tyers]] 07:53, 17 June 2009 (UTC)

Exactly. Since él is entered as utf8, él, and also coding is set to udf8, the question is, why?[[User:Muki987|Muki987]] 09:40, 17 June 2009 (UTC)

Revision as of 09:40, 17 June 2009

What would be interesting is the principle of Spanish-Basque.

How is the dictionary set up? How are relations of subjects generated?

For example:

  • etxera joaten naiz: I am going home; Me voy a casa (bad: Etxera noa)
  • etxetik etortzen da: he comes from the house; él viene de la casa (bad: Hura etxetik dator)

what kind of generator is used, how are the grammatical forms transferred from Spanish to Basque.

A step for step guide: What happens after the user enters Me voy a casa, how it gets transferred to Basque.

Hey, I'm planning to add this. We've got to the end of the analysis stage (more or less), next up is the transfer stage. :) When I make a guide, I like to try and go as much as possible from start to finish. - Francis Tyers 23:31, 16 June 2009 (UTC)


él viene de la casa

<?xml version='1.0' encoding='UTF-8' ?>
<corpus>
<SENTENCE ord='1' alloc='0'>
<CHUNK ord='2' alloc='4' type='grup-verb' si='top'>
  <NODE ord='2' alloc='4' form='viene' lem='venir' mi='VMIP3S0'>
  </NODE>
  <CHUNK ord='1' alloc='0' type='s-adj' si='modnomatch'>
    <NODE ord='1' alloc='0' form='él' lem='él' mi='AQ0CS0'>
    </NODE>
  </CHUNK>
  <CHUNK ord='3' alloc='10' type='sp-de' si='sp-obj'>
    <NODE ord='3' alloc='10' form='de' lem='de' mi='SPS00'>
      <NODE ord='5' alloc='16' form='casa' lem='casa' mi='NCFS000'>
        <NODE ord='4' alloc='13' form='la' lem='el' mi='DA0FS0'>
        </NODE>
      </NODE>
    </NODE>
  </CHUNK>
</CHUNK>
</SENTENCE>
</corpus>


<?xml version='1.0' encoding='UTF-8'?>
<corpus >
<SENTENCE ref='1' alloc='0'>
<CHUNK ref='2' type='adi-kat' alloc='4' si='top'>
<NODE ref='2' alloc='4' UpCase='none' lem='_etorri_' mi='VMIP3S0' pos='[ADI][SIN]'>
</NODE>
<CHUNK ref='1' type='adjs' alloc='0' si='modnomatch'>
<NODE ref='1' alloc='0' UpCase='none' lem='él' parol='AQ0CS0' unknown='transfer'>
</NODE>
</CHUNK>
<CHUNK ref='3' type='post-izls' alloc='10' si='sp-obj'>
<NODE ref='3' alloc='10' UpCase='none' lem='' prep='de'>
<NODE ref='5' alloc='16' UpCase='none' lem='etxe' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
<NODE ref='4' alloc='13' UpCase='none' lem='' mi='[NUMS]'>
</NODE>
</NODE>
</NODE>
</CHUNK>
</CHUNK>
</SENTENCE>

</corpus>

<?xml version='1.0' encoding='UTF-8' ?>
<corpus >
<SENTENCE ref='1' alloc='0'>
<CHUNK ref='2' type='adi-kat' alloc='4' si='top'>
<NODE ref='2' alloc='4' UpCase='none' lem='_etorri_' mi='VMIP3S0' pos='[ADI][SIN]'>
</NODE>
<CHUNK ref='1' type='adjs' alloc='0' si='modnomatch'>
<NODE ref='1' alloc='0' UpCase='none' lem='él' parol='AQ0CS0' unknown='transfer'>
</NODE>
</CHUNK>
<CHUNK ref='3' type='post-izls' alloc='10' si='sp-obj'>
<NODE ref='3' alloc='10' UpCase='none' lem='' prep='de'>
<NODE ref='5' alloc='16' UpCase='none' lem='etxe' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
<NODE ref='4' alloc='13' UpCase='none' lem='' mi='[NUMS]'>
</NODE>
</NODE>
</NODE>
</CHUNK>
</CHUNK>
</SENTENCE>

</corpus>

<?xml version='1.0' encoding='UTF-8' ?>
<corpus >
<SENTENCE ref='1' alloc='0'>
<CHUNK ref='2' type='adi-kat' alloc='4' si='top' length='1' trans='DU' cas='[ABS]'>
<NODE ref='2' alloc='4' UpCase='none' lem='_etorri_' mi='VMIP3S0' pos='[ADI][SIN]'>
</NODE>
<CHUNK ref='1' type='adjs' alloc='0' si='modnomatch' length='1' cas='[ABS]'>
<NODE ref='1' alloc='0' UpCase='none' lem='él' parol='AQ0CS0' unknown='transfer'>
</NODE>
</CHUNK>
<CHUNK ref='3' type='post-izls' alloc='10' si='sp-obj' length='3' cas='[ABS]'>
<NODE ref='3' alloc='10' UpCase='none' lem='' prep='de'>
<NODE ref='5' alloc='16' UpCase='none' lem='etxe' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
<NODE ref='4' alloc='13' UpCase='none' lem='' mi='[NUMS]'>
</NODE>
</NODE>
</NODE>
</CHUNK>
</CHUNK>
</SENTENCE>

</corpus>

<?xml version='1.0' encoding='UTF-8' ?>
<corpus >
<SENTENCE ref='1' alloc='0'>
<CHUNK ref='2' type='adi-kat' alloc='4' si='top' length='1' trans='DU' cas='[ABS]'>
<NODE ref='2' alloc='4' UpCase='none' lem='_etorri_' mi='VMIP3S0' pos='[ADI][SIN]'>
</NODE>
<CHUNK ref='1' type='adjs' alloc='0' si='modnomatch' length='1' cas='[ABS]'>
<NODE ref='1' alloc='0' UpCase='none' lem='él' parol='AQ0CS0' unknown='transfer'>
</NODE>
</CHUNK>
<CHUNK ref='3' type='post-izls' alloc='10' si='sp-obj' length='3' cas='[ABS]'>
<NODE ref='3' alloc='10' UpCase='none' lem='' prep='de'>
<NODE ref='5' alloc='16' UpCase='none' lem='etxe' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
<NODE ref='4' alloc='13' UpCase='none' lem='' mi='[NUMS]'>
</NODE>
</NODE>
</NODE>
</CHUNK>
</CHUNK>
</SENTENCE>

</corpus>

<?xml version='1.0' encoding='UTF-8' ?>
<corpus >
<SENTENCE ref='1' alloc='0'>
<CHUNK ref='2' type='adi-kat' alloc='4' si='top' cas='[ABS]' trans='DU' length='1'>
<NODE ref='2' alloc='4' lem='etorri' pos='[NAG]' mi='[ADT][A1][NR_HU]'>
</NODE>
<CHUNK ref='1' type='adjs' alloc='0' si='modnomatch' cas='[ABS]' length='1'>
<NODE ref='1' alloc='0' UpCase='none' lem='él' parol='AQ0CS0' unknown='transfer'>
</NODE>
</CHUNK>
<CHUNK ref='3' type='post-izls' alloc='10' si='sp-obj' cas='[ABS]' length='3'>
<NODE ref='3' alloc='10' UpCase='none' lem='' prep='de'>
<NODE ref='5' alloc='16' UpCase='none' lem='etxe' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
<NODE ref='4' alloc='13' UpCase='none' lem='' mi='[NUMS]'>
</NODE>
</NODE>
</NODE>
</CHUNK>
</CHUNK>
</SENTENCE>

</corpus>

<?xml version='1.0' encoding='UTF-8' ?>
<corpus >
<SENTENCE ref='1' alloc='0'>
<CHUNK ref='2' type='adi-kat' alloc='4' si='top' cas='[ABS]' trans='DU' length='1'>
<NODE ref='2' alloc='4' lem='etorri' pos='[NAG]' mi='[ADT][A1][NR_HU]'>
</NODE>
<CHUNK ref='1' type='adjs' alloc='0' si='modnomatch' cas='[ABS]' length='1'>
<NODE ref='1' alloc='0' UpCase='none' lem='él' parol='AQ0CS0' unknown='transfer'>
</NODE>
</CHUNK>
<CHUNK ref='3' type='post-izls' alloc='10' si='sp-obj' cas='[ABS]' length='3'>
<NODE ref='3' alloc='10' UpCase='none' lem='' prep='de'>
<NODE ref='5' alloc='16' UpCase='none' lem='etxe' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
<NODE ref='4' alloc='13' UpCase='none' lem='' mi='[NUMS]'>
</NODE>
</NODE>
</NODE>
</CHUNK>
</CHUNK>
</SENTENCE>

</corpus>

<?xml version='1.0' encoding='UTF-8'?>
<corpus >
<SENTENCE ord='1' ref='1' alloc='0'>
<CHUNK ord='2' ref='2' type='adi-kat' alloc='4' si='top' cas='[ABS]' trans='DU' length='1'>
<NODE ref='2' alloc='4' lem='etorri' pos='[NAG]' mi='[ADT][A1][NR_HU]'>
</NODE>
<CHUNK ord='0' ref='1' type='adjs' alloc='0' si='modnomatch' cas='[ABS]' length='1'>
<NODE ref='1' alloc='0' UpCase='none' lem='él' parol='AQ0CS0' unknown='transfer'>
</NODE>
</CHUNK>
<CHUNK ord='1' ref='3' type='post-izls' alloc='10' si='sp-obj' cas='[ABS]' length='3'>
<NODE ref='3' alloc='10' UpCase='none' lem='' prep='de'>
<NODE ref='5' alloc='16' UpCase='none' lem='etxe' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
<NODE ref='4' alloc='13' UpCase='none' lem='' mi='[NUMS]'>
</NODE>
</NODE>
</NODE>
</CHUNK>
</CHUNK>
</SENTENCE>

</corpus>

<?xml version='1.0' encoding='UTF-8' ?>
<corpus >
<SENTENCE ord='1' ref='1' alloc='0'>
<CHUNK ord='2' ref='2' type='adi-kat' alloc='4' si='top' cas='[ABS]' trans='DU' length='1'>
<NODE ord='0' ref='2' alloc='4' lem='etorri' pos='[NAG]' mi='[ADT][A1][NR_HU]'>
</NODE>
<CHUNK ord='0' ref='1' type='adjs' alloc='0' si='modnomatch' cas='[ABS]' length='1'>
<NODE ord='0' ref='1' alloc='0' UpCase='none' lem='él' parol='AQ0CS0' unknown='transfer'>
</NODE>
</CHUNK>
<CHUNK ord='1' ref='3' type='post-izls' alloc='10' si='sp-obj' cas='[ABS]' length='3'>
<NODE ord='1' ref='3' alloc='10' UpCase='none' lem='' prep='de'>
<NODE ord='0' ref='5' alloc='16' UpCase='none' lem='etxe' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
<NODE ord='2' ref='4' alloc='13' UpCase='none' lem='' mi='[NUMS]'>
</NODE>
</NODE>
</NODE>
</CHUNK>
</CHUNK>
</SENTENCE>

</corpus>

<?xml version='1.0' encoding='UTF-8'?>
<corpus >
<SENTENCE ord='1' ref='1' alloc='0'>
<CHUNK ord='2' ref='2' type='adi-kat' alloc='4' si='top' cas='[ABS]' trans='DU' length='1'>
<NODE form='dator' ref ='2' alloc ='4' ord='0' lem='etorri' pos='[NAG]' mi='[ADT][A1][NR_HU]'>
</NODE>
<CHUNK ord='0' ref='1' type='adjs' alloc='0' si='modnomatch' cas='[ABS]' length='1'>
<NODE form='él' ref ='1' alloc ='0' ord='0' UpCase='none' lem='él' parol='AQ0CS0' unknown='transfer'>
</NODE>
</CHUNK>
<CHUNK ord='1' ref='3' type='post-izls' alloc='10' si='sp-obj' cas='[ABS]' length='3'>
<NODE form='' ref ='3' alloc ='10' ord='1' UpCase='none' lem='' prep='de'>
<NODE form='etxe' ref ='5' alloc ='16' ord='0' UpCase='none' lem='etxe' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
<NODE form='' ref ='4' alloc ='13' ord='2' UpCase='none' lem='' mi='[NUMS]'>
</NODE>
</NODE>
</NODE>
</CHUNK>
</CHUNK>
</SENTENCE>

</corpus>

Result: él etxe dator

There is an encoding error here, which is why it is not translating 'él'. - Francis Tyers 07:53, 17 June 2009 (UTC)

Exactly. Since él is entered as utf8, él, and also coding is set to udf8, the question is, why?Muki987 09:40, 17 June 2009 (UTC)