Difference between revisions of "Siciliano y castellano/Informe final"

From Apertium
Jump to navigation Jump to search
Line 8: Line 8:
   
   
==Project goals==
+
==Project description==
   
   
 
'''Dictionaries'''
 
'''Dictionaries'''
  +
  +
The most challenging issues ... creating the dictionary
  +
 
2. Accent system
  +
  +
On the example of verb paradigms. A regular verb can have
  +
The accent in the Sicilian verbs moves according to the pronunciation and sil
  +
  +
parrari --> pàrranu
  +
  +
stem
  +
must have up to
  +
 
1. Abundance of spelling forms
  +
  +
 
3. Pronouns
   
   
Line 56: Line 73:
   
   
  +
== =
== Challenging issues =
 
   
   
Line 81: Line 98:
   
 
== Challenging issues==
 
== Challenging issues==
1. Abundance of spelling forms
 
 
2. Accent system
 
 
3. Pronouns
 
   
 
examples
 
examples

Revision as of 16:16, 22 August 2016

Commitment

The list of all commits: https://apertium.projectjj.com/gsoc2016/uliana-sentsova.html

Monolingual Sicilian dictionary:

Bilingual Sicilian-Spanish dictionary: https://svn.code.sf.net/p/apertium/svn/incubator/apertium-scn-spa/


Project description

Dictionaries

The most challenging issues ... creating the dictionary

2. Accent system

On the example of verb paradigms. A regular verb can have The accent in the Sicilian verbs moves according to the pronunciation and sil

parrari --> pàrranu

stem must have up to

1. Abundance of spelling forms


3. Pronouns


Constraint grammar

Constraint Grammar rules allow us to distinguish words with different grammatical tags and words with different lexical meanings based on the grammatical and lexical context. CG rules work both for disambiguation within one part of speech and between words of different categories.

The following cases of grammatical ambiguity were handled with CG rules in the Sicilian package.

  • Disambiguation within one part of speech. The coincidence of verb forms within one verb paradigm occurs fairly often in Sicilian language. For instance, all Sicilian verbs demonstrate coinciding forms for first, second and third forms of Present Subjunctive. Regular verbs of the 2-nd conjugation have the same forms for Present Indicative of the first and the second person, Present Indicative of the third person singular usually coincides with the Imperative of the second person plural by verbs of the first conjugation.
  • Disambiguation between words of different categories. Since "-a", "-i" and "-u" are standard endings for Sicilian nouns, adjectives, and verb forms, there are much more ambiguous wordforms in Sicilian than one can expect. A lot of Sicilian masculine nouns coincide with Present Indicative of regular verbs (like "munni" that is both plural of "munnu" and present of "munnari"), feminine nouns can match Imperative . Conversion as word formation in Sicilian is also often the reason of ambiguous word forms.

Here is the list of ambiguous Sicilian and Spanish sentences that can be used to test the set of CG rules.

A good example is the Sicilian noun"cristianu" that not only signifies a person of Christian faith but can also denote a human being in general.

A total number of CG rules: 61.


Transfer rules Sicilian and Spanish differ in structure

by the use of transfer rules. Transfer rules help to make a better translation when syntactic differences between languages that cannot be translated directly.

  • Unlike in Spanish, the synthetic future is no longer in use in Sicilian language, therefore it is replaced by the periphrastic compound forms with common verbs like "jiri", "vèniri" or "aviri".
  • The synthetic conditional forms of verbs are normally replaced by indicative or subjunctive forms.
  • Both Siciliana and Spanish have verb constructions with passive and modal meaning. Transfer rules are used to translate them correctly where the structure of phrasal constructions doesn't coincide in these languages.
  • The transfer rules allow translating a non-reflexive verb in the which is often the case.
  • Sicilian and Spanish bear some resemblance in word order, however, they demonstrate some subtle differences, for example, in the case of articles and pronouns.

A total number of transfer rules: 40.


Corpora

To evaluate the quality of translation, different types of corpora

6 articles form Sicilian Wikipedia were translated to Spanish.


=

Statistics

Coverage Sicilian-castellano (%) Castellano-siciliano (%)
Trimmed coverage 83.4% %
Coverage Sicilian (%) Spanish (%)
Raw coverage' 85.5% 91,6%

The number of lemmas in bilingual dictionary: 11,253.

The number of lemmas in Sicilian dictionary: 13,140.


Challenging issues

examples (jardinu / iardinu / giardinu = ‘garden’, palora / parola /paràula /palàura = ‘word’).

cunjùnciri, cognùngiri, conjùngiri, cugnùnciri, cognùncici, coniùngiri, conjùnciri

TODO

Future work

Syntactic properties, more rules, automatic forms merge algorithm TODO

Resources

https://scn.wikipedia.org/wiki/P%C3%A0ggina_principali

https://scn.wiktionary.org/wiki/P%C3%A0ggina_principali

Bonner, Introduction to Sicilian Grammar

El nuovo dizionario siciliano-italiano