Difference between revisions of "Talk:Matxin 1.0 New Language Pair HOWTO"
Jump to navigation
Jump to search
Line 252: | Line 252: | ||
:There is an encoding error here, which is why it is not translating 'él'. - [[User:Francis Tyers|Francis Tyers]] 07:53, 17 June 2009 (UTC) |
:There is an encoding error here, which is why it is not translating 'él'. - [[User:Francis Tyers|Francis Tyers]] 07:53, 17 June 2009 (UTC) |
||
Exactly. Since él is entered as utf8, él, and also coding is set to udf8, the question is, why?[[User:Muki987|Muki987]] 09:40, 17 June 2009 (UTC) |
Revision as of 09:40, 17 June 2009
What would be interesting is the principle of Spanish-Basque.
How is the dictionary set up? How are relations of subjects generated?
For example:
- etxera joaten naiz: I am going home; Me voy a casa (bad: Etxera noa)
- etxetik etortzen da: he comes from the house; él viene de la casa (bad: Hura etxetik dator)
what kind of generator is used, how are the grammatical forms transferred from Spanish to Basque.
A step for step guide: What happens after the user enters Me voy a casa, how it gets transferred to Basque.
- Hey, I'm planning to add this. We've got to the end of the analysis stage (more or less), next up is the transfer stage. :) When I make a guide, I like to try and go as much as possible from start to finish. - Francis Tyers 23:31, 16 June 2009 (UTC)
él viene de la casa
<?xml version='1.0' encoding='UTF-8' ?> <corpus> <SENTENCE ord='1' alloc='0'> <CHUNK ord='2' alloc='4' type='grup-verb' si='top'> <NODE ord='2' alloc='4' form='viene' lem='venir' mi='VMIP3S0'> </NODE> <CHUNK ord='1' alloc='0' type='s-adj' si='modnomatch'> <NODE ord='1' alloc='0' form='él' lem='él' mi='AQ0CS0'> </NODE> </CHUNK> <CHUNK ord='3' alloc='10' type='sp-de' si='sp-obj'> <NODE ord='3' alloc='10' form='de' lem='de' mi='SPS00'> <NODE ord='5' alloc='16' form='casa' lem='casa' mi='NCFS000'> <NODE ord='4' alloc='13' form='la' lem='el' mi='DA0FS0'> </NODE> </NODE> </NODE> </CHUNK> </CHUNK> </SENTENCE> </corpus> <?xml version='1.0' encoding='UTF-8'?> <corpus > <SENTENCE ref='1' alloc='0'> <CHUNK ref='2' type='adi-kat' alloc='4' si='top'> <NODE ref='2' alloc='4' UpCase='none' lem='_etorri_' mi='VMIP3S0' pos='[ADI][SIN]'> </NODE> <CHUNK ref='1' type='adjs' alloc='0' si='modnomatch'> <NODE ref='1' alloc='0' UpCase='none' lem='él' parol='AQ0CS0' unknown='transfer'> </NODE> </CHUNK> <CHUNK ref='3' type='post-izls' alloc='10' si='sp-obj'> <NODE ref='3' alloc='10' UpCase='none' lem='' prep='de'> <NODE ref='5' alloc='16' UpCase='none' lem='etxe' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'> <NODE ref='4' alloc='13' UpCase='none' lem='' mi='[NUMS]'> </NODE> </NODE> </NODE> </CHUNK> </CHUNK> </SENTENCE> </corpus> <?xml version='1.0' encoding='UTF-8' ?> <corpus > <SENTENCE ref='1' alloc='0'> <CHUNK ref='2' type='adi-kat' alloc='4' si='top'> <NODE ref='2' alloc='4' UpCase='none' lem='_etorri_' mi='VMIP3S0' pos='[ADI][SIN]'> </NODE> <CHUNK ref='1' type='adjs' alloc='0' si='modnomatch'> <NODE ref='1' alloc='0' UpCase='none' lem='él' parol='AQ0CS0' unknown='transfer'> </NODE> </CHUNK> <CHUNK ref='3' type='post-izls' alloc='10' si='sp-obj'> <NODE ref='3' alloc='10' UpCase='none' lem='' prep='de'> <NODE ref='5' alloc='16' UpCase='none' lem='etxe' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'> <NODE ref='4' alloc='13' UpCase='none' lem='' mi='[NUMS]'> </NODE> </NODE> </NODE> </CHUNK> </CHUNK> </SENTENCE> </corpus> <?xml version='1.0' encoding='UTF-8' ?> <corpus > <SENTENCE ref='1' alloc='0'> <CHUNK ref='2' type='adi-kat' alloc='4' si='top' length='1' trans='DU' cas='[ABS]'> <NODE ref='2' alloc='4' UpCase='none' lem='_etorri_' mi='VMIP3S0' pos='[ADI][SIN]'> </NODE> <CHUNK ref='1' type='adjs' alloc='0' si='modnomatch' length='1' cas='[ABS]'> <NODE ref='1' alloc='0' UpCase='none' lem='él' parol='AQ0CS0' unknown='transfer'> </NODE> </CHUNK> <CHUNK ref='3' type='post-izls' alloc='10' si='sp-obj' length='3' cas='[ABS]'> <NODE ref='3' alloc='10' UpCase='none' lem='' prep='de'> <NODE ref='5' alloc='16' UpCase='none' lem='etxe' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'> <NODE ref='4' alloc='13' UpCase='none' lem='' mi='[NUMS]'> </NODE> </NODE> </NODE> </CHUNK> </CHUNK> </SENTENCE> </corpus> <?xml version='1.0' encoding='UTF-8' ?> <corpus > <SENTENCE ref='1' alloc='0'> <CHUNK ref='2' type='adi-kat' alloc='4' si='top' length='1' trans='DU' cas='[ABS]'> <NODE ref='2' alloc='4' UpCase='none' lem='_etorri_' mi='VMIP3S0' pos='[ADI][SIN]'> </NODE> <CHUNK ref='1' type='adjs' alloc='0' si='modnomatch' length='1' cas='[ABS]'> <NODE ref='1' alloc='0' UpCase='none' lem='él' parol='AQ0CS0' unknown='transfer'> </NODE> </CHUNK> <CHUNK ref='3' type='post-izls' alloc='10' si='sp-obj' length='3' cas='[ABS]'> <NODE ref='3' alloc='10' UpCase='none' lem='' prep='de'> <NODE ref='5' alloc='16' UpCase='none' lem='etxe' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'> <NODE ref='4' alloc='13' UpCase='none' lem='' mi='[NUMS]'> </NODE> </NODE> </NODE> </CHUNK> </CHUNK> </SENTENCE> </corpus> <?xml version='1.0' encoding='UTF-8' ?> <corpus > <SENTENCE ref='1' alloc='0'> <CHUNK ref='2' type='adi-kat' alloc='4' si='top' cas='[ABS]' trans='DU' length='1'> <NODE ref='2' alloc='4' lem='etorri' pos='[NAG]' mi='[ADT][A1][NR_HU]'> </NODE> <CHUNK ref='1' type='adjs' alloc='0' si='modnomatch' cas='[ABS]' length='1'> <NODE ref='1' alloc='0' UpCase='none' lem='él' parol='AQ0CS0' unknown='transfer'> </NODE> </CHUNK> <CHUNK ref='3' type='post-izls' alloc='10' si='sp-obj' cas='[ABS]' length='3'> <NODE ref='3' alloc='10' UpCase='none' lem='' prep='de'> <NODE ref='5' alloc='16' UpCase='none' lem='etxe' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'> <NODE ref='4' alloc='13' UpCase='none' lem='' mi='[NUMS]'> </NODE> </NODE> </NODE> </CHUNK> </CHUNK> </SENTENCE> </corpus> <?xml version='1.0' encoding='UTF-8' ?> <corpus > <SENTENCE ref='1' alloc='0'> <CHUNK ref='2' type='adi-kat' alloc='4' si='top' cas='[ABS]' trans='DU' length='1'> <NODE ref='2' alloc='4' lem='etorri' pos='[NAG]' mi='[ADT][A1][NR_HU]'> </NODE> <CHUNK ref='1' type='adjs' alloc='0' si='modnomatch' cas='[ABS]' length='1'> <NODE ref='1' alloc='0' UpCase='none' lem='él' parol='AQ0CS0' unknown='transfer'> </NODE> </CHUNK> <CHUNK ref='3' type='post-izls' alloc='10' si='sp-obj' cas='[ABS]' length='3'> <NODE ref='3' alloc='10' UpCase='none' lem='' prep='de'> <NODE ref='5' alloc='16' UpCase='none' lem='etxe' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'> <NODE ref='4' alloc='13' UpCase='none' lem='' mi='[NUMS]'> </NODE> </NODE> </NODE> </CHUNK> </CHUNK> </SENTENCE> </corpus> <?xml version='1.0' encoding='UTF-8'?> <corpus > <SENTENCE ord='1' ref='1' alloc='0'> <CHUNK ord='2' ref='2' type='adi-kat' alloc='4' si='top' cas='[ABS]' trans='DU' length='1'> <NODE ref='2' alloc='4' lem='etorri' pos='[NAG]' mi='[ADT][A1][NR_HU]'> </NODE> <CHUNK ord='0' ref='1' type='adjs' alloc='0' si='modnomatch' cas='[ABS]' length='1'> <NODE ref='1' alloc='0' UpCase='none' lem='él' parol='AQ0CS0' unknown='transfer'> </NODE> </CHUNK> <CHUNK ord='1' ref='3' type='post-izls' alloc='10' si='sp-obj' cas='[ABS]' length='3'> <NODE ref='3' alloc='10' UpCase='none' lem='' prep='de'> <NODE ref='5' alloc='16' UpCase='none' lem='etxe' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'> <NODE ref='4' alloc='13' UpCase='none' lem='' mi='[NUMS]'> </NODE> </NODE> </NODE> </CHUNK> </CHUNK> </SENTENCE> </corpus> <?xml version='1.0' encoding='UTF-8' ?> <corpus > <SENTENCE ord='1' ref='1' alloc='0'> <CHUNK ord='2' ref='2' type='adi-kat' alloc='4' si='top' cas='[ABS]' trans='DU' length='1'> <NODE ord='0' ref='2' alloc='4' lem='etorri' pos='[NAG]' mi='[ADT][A1][NR_HU]'> </NODE> <CHUNK ord='0' ref='1' type='adjs' alloc='0' si='modnomatch' cas='[ABS]' length='1'> <NODE ord='0' ref='1' alloc='0' UpCase='none' lem='él' parol='AQ0CS0' unknown='transfer'> </NODE> </CHUNK> <CHUNK ord='1' ref='3' type='post-izls' alloc='10' si='sp-obj' cas='[ABS]' length='3'> <NODE ord='1' ref='3' alloc='10' UpCase='none' lem='' prep='de'> <NODE ord='0' ref='5' alloc='16' UpCase='none' lem='etxe' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'> <NODE ord='2' ref='4' alloc='13' UpCase='none' lem='' mi='[NUMS]'> </NODE> </NODE> </NODE> </CHUNK> </CHUNK> </SENTENCE> </corpus> <?xml version='1.0' encoding='UTF-8'?> <corpus > <SENTENCE ord='1' ref='1' alloc='0'> <CHUNK ord='2' ref='2' type='adi-kat' alloc='4' si='top' cas='[ABS]' trans='DU' length='1'> <NODE form='dator' ref ='2' alloc ='4' ord='0' lem='etorri' pos='[NAG]' mi='[ADT][A1][NR_HU]'> </NODE> <CHUNK ord='0' ref='1' type='adjs' alloc='0' si='modnomatch' cas='[ABS]' length='1'> <NODE form='él' ref ='1' alloc ='0' ord='0' UpCase='none' lem='él' parol='AQ0CS0' unknown='transfer'> </NODE> </CHUNK> <CHUNK ord='1' ref='3' type='post-izls' alloc='10' si='sp-obj' cas='[ABS]' length='3'> <NODE form='' ref ='3' alloc ='10' ord='1' UpCase='none' lem='' prep='de'> <NODE form='etxe' ref ='5' alloc ='16' ord='0' UpCase='none' lem='etxe' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'> <NODE form='' ref ='4' alloc ='13' ord='2' UpCase='none' lem='' mi='[NUMS]'> </NODE> </NODE> </NODE> </CHUNK> </CHUNK> </SENTENCE> </corpus>
Result: él etxe dator
- There is an encoding error here, which is why it is not translating 'él'. - Francis Tyers 07:53, 17 June 2009 (UTC)
Exactly. Since él is entered as utf8, él, and also coding is set to udf8, the question is, why?Muki987 09:40, 17 June 2009 (UTC)