https://wiki.apertium.org/w/api.php?action=feedcontributions&user=Grfro3d&feedformat=atomApertium - User contributions [en]2024-03-29T00:17:01ZUser contributionsMediaWiki 1.34.1https://wiki.apertium.org/w/index.php?title=Apertium_cat-srd/_Apertium_ita-srd:_relata_finale&diff=67214Apertium cat-srd/ Apertium ita-srd: relata finale2018-06-24T19:17:25Z<p>Grfro3d: /* Segunda fase: apertium srd-ita (austu 2017) */</p>
<hr />
<div>==Descritzione de su traballu==<br />
<br />
Su progetu pro sa partetzipatzione a su programma “Google Summer of Code 2017” cun s’organizatzione Apertium est istadu s’isvilupu de unu Tradutore Automàticu basadu in règulas intre su catalanu e su sardu e sa sighida de su progetu de s’annu coladu, apertium ita-srd. Cust’idea benit dae sa voluntade de chèrrere isvilupare un’àteru trastu de agiudu pro sa limba sarda, sighinde su matessi caminu de su traballu fatu s’annu passadu pro apertium ita-srd.<br />
A custu progetu ant partetzipadu Gianfranco Fronteddu, Hèctor Alòs i Font e Francis Tyers e Adrià Martín.<br />
<br />
Comente si podet bìdere in su ligàmene de su [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], b’at àpidu duas fases: una prus longa, chi est durada totu sos meses de làmpadas e de trìulas, dedicada a catalanu-sardu e s’àtera, prus curtza, contivigiada in su mese de austu, ponende·nche sas bases pro unu tradutore sardu-italianu nou.<br />
<br />
Su sèberu de chèrrere isvilupare un'àteru tradutore automàticu in sardu, custa borta catàlanu-sardu, est dèvidu a unas cantas resones.<br />
Pro prima cosa, s'istadu de perìgulu chi est bivende sa limba sarda e su bisòngiu de nche ammanniare cantas prus fainas possìbiles in sardu, mescamente in sa tecnologia de còdighe abertu e in Limba Sarda Comuna, chi est sa proposta ortogràfica regionale de s'annu 2006 pigada comente riferimentu dae Apertium. Custu faghent a manera chi una faina che a custa siat de agiudu pro s'amparu de su sardu e pro s'istandardizatzione cumpleta de sa limba. Tando, isfrutende·nche su traballu de s'annu coladu, amus pensadu de creare un'àtera croba linguìstica e de megiorare sa chi bi fiat giai, creschende sa cantidade de risorsas in LSC e ponende·nche una base galu prus manna pro s'ammàniu de àteros traballos in su benidore.<br />
<br />
Imbetzes, su sèberu de isvilupare unu tradutore cun sa limba catalana est dèvidu in antis totu a su fatu chi su catalanu est, comente a su sardu, una de sas chimbe limbas de minoria in Sardigna (sardu, catalanu-aligheresu, gadduresu, tataresu e tabarchinu), faeddada in sa tzitade de S'Alighera, cun belle 33.000 faeddadores. In prus, su catalanu est una de sas limbas pro sa chi prus Apertium tenet risorsas.<br />
<br />
Posca, custa prataforma de còdighe abertu est adata pro limbas romantzas chi s'assimìgiant intre issas comente su catalanu e su sardu, chi rispetant custu rechisitu pro sas influèntzias e s'eredidade linguìstica catalana presente in sa limba sarda, dèvida a s'època de s'ocupatzione Catalana-Aragonesa in Sardigna. In custa manera sa cantidade manna de testos, testimònios e materiales in limba catalana a pitzu de s'istòria sarda ant a èssere a disponimentu fintzas in sardu etotu, gasi comente totu sas publicatziones e sos istùdios de sotziolinguìstica e de polìtica linguìstica de interessu pro sa situatzione de sa limba sarda.<br />
<br />
Sos impreos de custu tradutore diant pòdere èssere medas: s'isvilupu de sa Wikipèdia in sardu diat dèvere bènnere fintzas dae sa tradutzione de sos artìculos chi non sunt gasi detalliados in italianu o dant importu a aspetos diferentes. Bortende·nche fintzas dae un'àtera limba che a su catalanu sa cantidade de sas informatziones podet crèschere galu de prus oferende puru un'àtera manera pro espressare sos matessi cuntzetos. <br />
<br />
Pro cumpletare su pranu de traballu, bi fiat sa voluntade de fàghere calicuna cosa in su mese de austu fintzas pro srd-ita, semper chi sos obietivos de sa prima fase prus longa s'èsserent cumpridos in sos tempos disinnados.<br />
<br />
==Prima fase: Apertium cat-srd (29 de Maju - 29 de Trìulas)==<br />
Sa prima fase at pertocadu s'isvilupu de su tradutore catalanu-sardu. Su tradutore, pro more de su traballu fatu in antis dae Francis Tyers, fiat in sa setzione [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] e partiat cun unu tantu de 2645 paràulas in su ditzionàriu bilìngue, una "cobertura fina" (trimmed coverage) de belle su 77% e una pertzentuale de errore WER de su su 34.8%. S'obietivu fiat de lograre su 90% de cobertura e de abbassare su WER a mancu de su 15%.<br />
<br />
In càmbiu de s'annu passadu, chi pro isvilupare su tradutore italianu-sardu b'est istadu su bisòngiu de isvilupare in pràtica totu su ditzionàriu morfològicu sardu e megiorare puru aspetos de s'analizadore morfològicu italianu, ocannu si partiat dae duas limbas bene isvilupadas in sa prataforma de Apertium. Nos semus pòdidos cuntzentrare esclusivamente in su trasferimentu dae una limba a s'àtera, alleghende de paràulas, istruturas morfològicas e sintàticas.<br />
<br />
Sighende sas datas de su programma GSOC 2017, in su mese de maju e in sa prima chida de làmpadas ("Community Bounding") s'est traballadu meda in s'anàlisi cuntrastiva intre su catalanu e su sardu pro sa creatzione de sos "pending test". Pighende·nche comente riferimentu fintzas sos "pending test" de ita-srd, nche sunt essidas a campu diferèntzias istruturales in formas interrogativas, numerales, possessivos, fòrmulas de òbligu e formas continuadas, su passadu, futuru e cunditzionale, e mescamente sos clìticos.<br />
<br />
===Ditzionàriu morfològicu sardu===<br />
Pro cantu pertocat su ditzionàriu morfològicu sardu, disponìamus giai de unu cun 51.800 paràulas (chi includiat sos lemmas de su [http://www.sardegnacultura.it/cds/cros-lsc/ Curretore Ortorgràficu Regionale Sardu]), isvilupadu durante s'anteriore GSoC. Nche sunt istadas agiuntas 15.500 paràulas in prus: 1300 sustantivos, 800 agetivos, 300 avèrbios, 250 verbos e 12.500 nùmenes pròpios. Si tratat, pro sa majoria, de terminologia iscientìfica e tècnica, e vocabulàriu sotziopolìticu. Cun s'etzetzione de sos nùmenes pròpios, cun sos cales sunt istados sighidos su prus àteros critèrios, sa seletzione de sas paràulas de introdùere est istada fata partinde dae sa [https://ca.wikipedia.org/wiki/Portada/ Wikipedia Catalana].<br />
<br />
Posca, est istadu acontzadu su ditzionàriu, boghende·nche medas allegas chi non fiant normativas e curregende faddinas in s'assignatzione de sos paradigmas (mescamente alleghende de su gènere assignadu a carchi sustantivu).<br />
<br />
===Ditzionàriu morfològicu catalanu===<br />
Fintzas in su ditzionàriu morfològicu catalanu b'at àpidu un'agiunta de nùmenes pròpios, belle 10.000.<br />
<br />
===Disambiguatzione morfològica in catalanu===<br />
In su ditzionàriu catalanu nche sunt istadas iscritas 15 règulas de disambiguatzione morfològica e nde est istada modificada calicun'àtera. <br />
<br />
===Règulas de seletzione lessicale===<br />
Su tradutore disponet de 274 règulas de seletzione lessicale. Si tratat de règulas chi sèberant cale de duas o prus possìbiles tradutziones est sa prus adata in unu determinadu cuntestu. (A mesu a mesu, in su ditzionàriu bilìngue b'at chèntinas de sèberos intre diferentes possìbiles tradutziones de una paràula, ma, a diferèntzia de sas règulas, custu sèberu si faghet in cada cuntestu.)<br />
<br />
===Règulas de trasferimentu===<br />
Su tradutore disponet de 78 règulas de trasferimentu. Si tratat de règulas chi modìficant s'istrutura de sa frase in catalanu pro l'adatare a s'istrutura chi bi bolet in sardu. Pro esempru:<br />
<br />
<li>El '''meu''' llibre > Su libru '''meu'''<br />
<br />
<li>'''Vaig''' menjar > '''Apo''' mandigadu<br />
<br />
<li>'''He''' anat > '''So''' andadu<br />
<br />
<li>Vull saludar-'''lo''' > '''Lu''' chèrgio saludare<br />
<br />
===Calidade===<br />
Sa valutatzione de sa calidade serbit a proare comente funtzionat su tradutore in sa pràtica. B'at medas maneras de la fàghere e sos testos chi si sèberant dipendent dae cal'est s'impreu chi si nde devet fàghere de su tradutore: in pagas paràulas, serbit a carculare cantas paràulas tocat de cambiare pro pòdere publicare su testu.<br />
<br />
"Su Word Error Rate" (WER) est s'indicadore chi inditat sas paràulas chi si devent cambiare pro pòdere publicare su testu. Segundu su "Work plan" s'obietivu fiat de nche arribbare a una pertzentuale prus bassa de su 15%. Su tassu de faddinas in sa tradutzione est 13,9% (nùmeru otentu cun s'indicadore WER partende dae duos testos pigados a casu de 600 paràulas de sa Wikipedia).<br />
<br />
''Sa cobertura'' de su tradutore (pertzentuale de paràulas reconnotas) est 94,0% (nùmeru otentu partinde dae unu corpus mannu de sa Wikipedia).<br />
<br />
====Testu in catalanu (seberadu a s'arriscu/a casu)====<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la "ciutat alta" i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
====Tradutzione automàtica a su sardu====<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa "tzitade arta" e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Segunda fase: apertium srd-ita (austu 2017)==<br />
In sa segunda fase de su progetu amus traballadu in previsione de unu tradutore nou srd-ita. Su chi amus pòdidu fàghere est istadu a cumintzare una disàmbiguatzione morfològica manuale de una parte de sos còrpora chi tenimus pro agiudare su tradutore a reconnòschere sa morfologia curreta de cada paràula, mescamente in cuddos testos chi non fiant de su totu a norma.<br />
<br />
Amus tratadu duos corpus: unu giornalìsticu e prus dialetale e s'àteru pigadu deretu dae testos literàrios perfetamente a norma LSC. De su primu corpus est istada etichetada un'annanta de 6000 paràulas, de su segundu 11800. Sa falta de tempus (duas chidas pro sa tarea) no at permìtidu de revisionare s'etichetadura. Pro custu no est istadu possìbile a creare unu disambiguadore morfològicu pro su sardu, chi fiat s'intentzione nostra.<br />
<br />
Nche sunt istadas agiuntas fintzas 9 règulas de trasferimentu noas e curregida calicuna de sas chi bi fiant giai. In pràtica, como si traduent in manera curreta sos tempos verbales dae su sardu a s'italianu (pretzisu su futuru e su cunditzionale). Est megiorada sa tradutzione de sos possessivos e si tratat carchi casu de enclìticos (in sardu bi podent èssere finas a tres enclìticos cando chi in italianu non bi nde podent àere prus de duos)<br />
<br />
Amus fatu carchi cosa fintzas in su ditzionàriu italianu, agiunghende·nche 4 règulas de disambiguatzione morfològica (de sa bator una de importu fiat sa disambiguatzione de "sono" comente a "so" e "sunt"). In prus, amus annantu una lista de istados de su mundu (chi nos at dadu Diegu Corràine) e dae custos nd'amus bogadu a campu fintzas sos gentilìtzios currispondentes, In totu su ditzionàriu bilìngue italianu-sardu tenet 1400 in prus dae su cumintzu de GSoC. S'ispurgadura de sos errores de su ditzionàriu sardu e s'agiunta de sas intradas ant a permìtere in pagu tempus de ammaniare una versione noa de su tradutore italianu-sardu.<br />
<br />
==Risorsas==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Pranos pro su benidore==<br />
Su traballu de sa de duas fases de su progetu at a èssere isfrutadu pro sa creatzione de una versione de ita-srd prus pretzisa e agiornada. Agabbare su traballu chi amus cumintzadu serbit pro sa creatzione de unu disambiguadore morfològicu chi at a èssere ùtile pro sa disambiguatzione de sos corpora e non si nd'at a pòdere fàghere a mancu pro isvilupare àteras crobas linguìsticas cun su sardu in su benidore.<br />
<br />
==Concrusiones==<br />
Su progetu est agabbadu cun risultados chi podimus cunsiderare bonos. Lis cheria torrare gràtzias a Apertium pro custa oportunidade, a Francis Tyers e mescamente a Hèctor Alòs i Font pro àere traballadu paris cun megus totu custos meses.<br />
<br />
Li torramus gràtzias a Diegu Corràine chi nos at agiudadu meda intreghende·nos materiales medas pro s'ammàniu de custu progetu e cussigiende·nos bene cada bia chi li pregontaìamus calicuna cosa in contu de limba sarda. Gràtzias fintzas a sa Gazeta pro sos testos chi faghent parte de su corpus chi amus creadu pro traballare. Est pretzisu mentovare sa Regione Sardigna chi at postu a disponimentu risorsas lìberas e trastos chi agiudant sa creatzione de trastos noos de còdighe abertu che a custu chi amus in pessu aprontadu.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd/_Apertium_ita-srd:_relata_finale&diff=67213Apertium cat-srd/ Apertium ita-srd: relata finale2018-06-24T19:15:11Z<p>Grfro3d: /* Segunda fase: apertium srd-ita (austu 2017) */</p>
<hr />
<div>==Descritzione de su traballu==<br />
<br />
Su progetu pro sa partetzipatzione a su programma “Google Summer of Code 2017” cun s’organizatzione Apertium est istadu s’isvilupu de unu Tradutore Automàticu basadu in règulas intre su catalanu e su sardu e sa sighida de su progetu de s’annu coladu, apertium ita-srd. Cust’idea benit dae sa voluntade de chèrrere isvilupare un’àteru trastu de agiudu pro sa limba sarda, sighinde su matessi caminu de su traballu fatu s’annu passadu pro apertium ita-srd.<br />
A custu progetu ant partetzipadu Gianfranco Fronteddu, Hèctor Alòs i Font e Francis Tyers e Adrià Martín.<br />
<br />
Comente si podet bìdere in su ligàmene de su [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], b’at àpidu duas fases: una prus longa, chi est durada totu sos meses de làmpadas e de trìulas, dedicada a catalanu-sardu e s’àtera, prus curtza, contivigiada in su mese de austu, ponende·nche sas bases pro unu tradutore sardu-italianu nou.<br />
<br />
Su sèberu de chèrrere isvilupare un'àteru tradutore automàticu in sardu, custa borta catàlanu-sardu, est dèvidu a unas cantas resones.<br />
Pro prima cosa, s'istadu de perìgulu chi est bivende sa limba sarda e su bisòngiu de nche ammanniare cantas prus fainas possìbiles in sardu, mescamente in sa tecnologia de còdighe abertu e in Limba Sarda Comuna, chi est sa proposta ortogràfica regionale de s'annu 2006 pigada comente riferimentu dae Apertium. Custu faghent a manera chi una faina che a custa siat de agiudu pro s'amparu de su sardu e pro s'istandardizatzione cumpleta de sa limba. Tando, isfrutende·nche su traballu de s'annu coladu, amus pensadu de creare un'àtera croba linguìstica e de megiorare sa chi bi fiat giai, creschende sa cantidade de risorsas in LSC e ponende·nche una base galu prus manna pro s'ammàniu de àteros traballos in su benidore.<br />
<br />
Imbetzes, su sèberu de isvilupare unu tradutore cun sa limba catalana est dèvidu in antis totu a su fatu chi su catalanu est, comente a su sardu, una de sas chimbe limbas de minoria in Sardigna (sardu, catalanu-aligheresu, gadduresu, tataresu e tabarchinu), faeddada in sa tzitade de S'Alighera, cun belle 33.000 faeddadores. In prus, su catalanu est una de sas limbas pro sa chi prus Apertium tenet risorsas.<br />
<br />
Posca, custa prataforma de còdighe abertu est adata pro limbas romantzas chi s'assimìgiant intre issas comente su catalanu e su sardu, chi rispetant custu rechisitu pro sas influèntzias e s'eredidade linguìstica catalana presente in sa limba sarda, dèvida a s'època de s'ocupatzione Catalana-Aragonesa in Sardigna. In custa manera sa cantidade manna de testos, testimònios e materiales in limba catalana a pitzu de s'istòria sarda ant a èssere a disponimentu fintzas in sardu etotu, gasi comente totu sas publicatziones e sos istùdios de sotziolinguìstica e de polìtica linguìstica de interessu pro sa situatzione de sa limba sarda.<br />
<br />
Sos impreos de custu tradutore diant pòdere èssere medas: s'isvilupu de sa Wikipèdia in sardu diat dèvere bènnere fintzas dae sa tradutzione de sos artìculos chi non sunt gasi detalliados in italianu o dant importu a aspetos diferentes. Bortende·nche fintzas dae un'àtera limba che a su catalanu sa cantidade de sas informatziones podet crèschere galu de prus oferende puru un'àtera manera pro espressare sos matessi cuntzetos. <br />
<br />
Pro cumpletare su pranu de traballu, bi fiat sa voluntade de fàghere calicuna cosa in su mese de austu fintzas pro srd-ita, semper chi sos obietivos de sa prima fase prus longa s'èsserent cumpridos in sos tempos disinnados.<br />
<br />
==Prima fase: Apertium cat-srd (29 de Maju - 29 de Trìulas)==<br />
Sa prima fase at pertocadu s'isvilupu de su tradutore catalanu-sardu. Su tradutore, pro more de su traballu fatu in antis dae Francis Tyers, fiat in sa setzione [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] e partiat cun unu tantu de 2645 paràulas in su ditzionàriu bilìngue, una "cobertura fina" (trimmed coverage) de belle su 77% e una pertzentuale de errore WER de su su 34.8%. S'obietivu fiat de lograre su 90% de cobertura e de abbassare su WER a mancu de su 15%.<br />
<br />
In càmbiu de s'annu passadu, chi pro isvilupare su tradutore italianu-sardu b'est istadu su bisòngiu de isvilupare in pràtica totu su ditzionàriu morfològicu sardu e megiorare puru aspetos de s'analizadore morfològicu italianu, ocannu si partiat dae duas limbas bene isvilupadas in sa prataforma de Apertium. Nos semus pòdidos cuntzentrare esclusivamente in su trasferimentu dae una limba a s'àtera, alleghende de paràulas, istruturas morfològicas e sintàticas.<br />
<br />
Sighende sas datas de su programma GSOC 2017, in su mese de maju e in sa prima chida de làmpadas ("Community Bounding") s'est traballadu meda in s'anàlisi cuntrastiva intre su catalanu e su sardu pro sa creatzione de sos "pending test". Pighende·nche comente riferimentu fintzas sos "pending test" de ita-srd, nche sunt essidas a campu diferèntzias istruturales in formas interrogativas, numerales, possessivos, fòrmulas de òbligu e formas continuadas, su passadu, futuru e cunditzionale, e mescamente sos clìticos.<br />
<br />
===Ditzionàriu morfològicu sardu===<br />
Pro cantu pertocat su ditzionàriu morfològicu sardu, disponìamus giai de unu cun 51.800 paràulas (chi includiat sos lemmas de su [http://www.sardegnacultura.it/cds/cros-lsc/ Curretore Ortorgràficu Regionale Sardu]), isvilupadu durante s'anteriore GSoC. Nche sunt istadas agiuntas 15.500 paràulas in prus: 1300 sustantivos, 800 agetivos, 300 avèrbios, 250 verbos e 12.500 nùmenes pròpios. Si tratat, pro sa majoria, de terminologia iscientìfica e tècnica, e vocabulàriu sotziopolìticu. Cun s'etzetzione de sos nùmenes pròpios, cun sos cales sunt istados sighidos su prus àteros critèrios, sa seletzione de sas paràulas de introdùere est istada fata partinde dae sa [https://ca.wikipedia.org/wiki/Portada/ Wikipedia Catalana].<br />
<br />
Posca, est istadu acontzadu su ditzionàriu, boghende·nche medas allegas chi non fiant normativas e curregende faddinas in s'assignatzione de sos paradigmas (mescamente alleghende de su gènere assignadu a carchi sustantivu).<br />
<br />
===Ditzionàriu morfològicu catalanu===<br />
Fintzas in su ditzionàriu morfològicu catalanu b'at àpidu un'agiunta de nùmenes pròpios, belle 10.000.<br />
<br />
===Disambiguatzione morfològica in catalanu===<br />
In su ditzionàriu catalanu nche sunt istadas iscritas 15 règulas de disambiguatzione morfològica e nde est istada modificada calicun'àtera. <br />
<br />
===Règulas de seletzione lessicale===<br />
Su tradutore disponet de 274 règulas de seletzione lessicale. Si tratat de règulas chi sèberant cale de duas o prus possìbiles tradutziones est sa prus adata in unu determinadu cuntestu. (A mesu a mesu, in su ditzionàriu bilìngue b'at chèntinas de sèberos intre diferentes possìbiles tradutziones de una paràula, ma, a diferèntzia de sas règulas, custu sèberu si faghet in cada cuntestu.)<br />
<br />
===Règulas de trasferimentu===<br />
Su tradutore disponet de 78 règulas de trasferimentu. Si tratat de règulas chi modìficant s'istrutura de sa frase in catalanu pro l'adatare a s'istrutura chi bi bolet in sardu. Pro esempru:<br />
<br />
<li>El '''meu''' llibre > Su libru '''meu'''<br />
<br />
<li>'''Vaig''' menjar > '''Apo''' mandigadu<br />
<br />
<li>'''He''' anat > '''So''' andadu<br />
<br />
<li>Vull saludar-'''lo''' > '''Lu''' chèrgio saludare<br />
<br />
===Calidade===<br />
Sa valutatzione de sa calidade serbit a proare comente funtzionat su tradutore in sa pràtica. B'at medas maneras de la fàghere e sos testos chi si sèberant dipendent dae cal'est s'impreu chi si nde devet fàghere de su tradutore: in pagas paràulas, serbit a carculare cantas paràulas tocat de cambiare pro pòdere publicare su testu.<br />
<br />
"Su Word Error Rate" (WER) est s'indicadore chi inditat sas paràulas chi si devent cambiare pro pòdere publicare su testu. Segundu su "Work plan" s'obietivu fiat de nche arribbare a una pertzentuale prus bassa de su 15%. Su tassu de faddinas in sa tradutzione est 13,9% (nùmeru otentu cun s'indicadore WER partende dae duos testos pigados a casu de 600 paràulas de sa Wikipedia).<br />
<br />
''Sa cobertura'' de su tradutore (pertzentuale de paràulas reconnotas) est 94,0% (nùmeru otentu partinde dae unu corpus mannu de sa Wikipedia).<br />
<br />
====Testu in catalanu (seberadu a s'arriscu/a casu)====<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la "ciutat alta" i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
====Tradutzione automàtica a su sardu====<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa "tzitade arta" e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Segunda fase: apertium srd-ita (austu 2017)==<br />
In sa segunda fase de su progetu amus traballadu in previsione de unu tradutore nou srd-ita. Su chi amus pòdidu fàghere est istadu a cumintzare una disàmbiguatzione morfològica manuale de una parte de sos còrpora chi tenimus pro agiudare su tradutore a reconnòschere sa morfologia curreta de cada paràula, mescamente in cuddos testos chi non fiant de su totu a norma.<br />
<br />
Amus tratadu duos corpus: unu giornalìsticu e prus dialetale e s'àteru pigadu deretu dae testos literàrios perfetamente a norma LSC. De su primu corpus est istada etichetada un'annanta de 6000 paràulas, de su segundu 11800. Sa falta de tempus (duas chidas pro sa tarea) no at permìtidu revisionare s'etichetadura. Pro custu no est istadu possìbile creare unu disambiguadore morfològicu pro su sardu, comente fiat s'intentzione nostra.<br />
<br />
Sunt istadas agiuntas fintzas 9 règulas noas de trasferimentu e curregidas calicuna de sas chi bi fiant giai. In pràtica, como si traduent in manera curreta sos tempos verbales dae su sardu a s'italianu (pretzisu su futuru e su cunditzionale). Est megiorada sa tradutzione de sos possessivos e si tratat carchi casu de enclìticos (in sardu bi podent èssere finas a tres enclìticos cando chi in italianu non bi nde podent àere prus de duos)<br />
<br />
Amus fatu carchi cosa fintzas in su ditzionàriu italianu, agiunghende·nche 4 règulas de disambiguatzione morfològica (de sa bator una de importu fiat sa disambiguatzione de "sono" comente a "so" e "sunt"). In prus, amus annantu una lista de istados de su mundu (chi nos at dadu Diegu Corràine) e dae custos nd'amus bogadu a campu fintzas sos gentilìtzios currispondentes, In totu su ditzionàriu bilìngue italianu-sardu tenet 1400 in prus dae su cumintzu de GSoC. S'ispurgadura de sos errores de su ditzionàriu sardu e s'agiunta de sas intradas ant a permìtere in pagu tempus de ammaniare una versione noa de su tradutore italianu-sardu.<br />
<br />
==Risorsas==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Pranos pro su benidore==<br />
Su traballu de sa de duas fases de su progetu at a èssere isfrutadu pro sa creatzione de una versione de ita-srd prus pretzisa e agiornada. Agabbare su traballu chi amus cumintzadu serbit pro sa creatzione de unu disambiguadore morfològicu chi at a èssere ùtile pro sa disambiguatzione de sos corpora e non si nd'at a pòdere fàghere a mancu pro isvilupare àteras crobas linguìsticas cun su sardu in su benidore.<br />
<br />
==Concrusiones==<br />
Su progetu est agabbadu cun risultados chi podimus cunsiderare bonos. Lis cheria torrare gràtzias a Apertium pro custa oportunidade, a Francis Tyers e mescamente a Hèctor Alòs i Font pro àere traballadu paris cun megus totu custos meses.<br />
<br />
Li torramus gràtzias a Diegu Corràine chi nos at agiudadu meda intreghende·nos materiales medas pro s'ammàniu de custu progetu e cussigiende·nos bene cada bia chi li pregontaìamus calicuna cosa in contu de limba sarda. Gràtzias fintzas a sa Gazeta pro sos testos chi faghent parte de su corpus chi amus creadu pro traballare. Est pretzisu mentovare sa Regione Sardigna chi at postu a disponimentu risorsas lìberas e trastos chi agiudant sa creatzione de trastos noos de còdighe abertu che a custu chi amus in pessu aprontadu.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Sardo_e_italiano/Pending_tests&diff=67173Sardo e italiano/Pending tests2018-06-15T16:29:56Z<p>Grfro3d: /* Superlativos */</p>
<hr />
<div><br />
=Diferèntzias intro de sardu e italianu=<br />
<br />
==Usu de "totu"==<br />
<br />
* {{test|ita|Sono rimasto a casa per tutto il giorno.|So abarradu in domo totu sa die.}} <br />
* {{test|ita|Sono rimasto qui per tutto il tempo.|So abarradu inoghe totu su tempus.}} <br />
* {{test|ita|Me la sono mangiata tutta.|Mi l'apo mandigada totu.}} <br />
* {{test|ita|Tutti gli alberi.|Totu sos àrbores.}} <br />
* {{test|ita|Tutte le case.|Totu sas domos.}} <br />
* {{test|ita|Ci sono tutti.|Bi sunt totus.}} <br />
* {{test|ita|Ci sono tutte.|Bi sunt totus.}} <br />
* {{test|ita|Ciao a tutti.|Salude a totus.}}<br />
<br />
==Nùmeros ordinales==<br />
<br />
* {{test|ita|primo.|su primu.}} <br />
* {{test|ita|secondo.|segundu.}}<br />
* {{test|ita|terzo.|su de tres.}}<br />
* {{test|ita|quarto.|su de bator.}}<br />
* {{test|ita|quinto.|su de chimbe.}}<br />
* {{test|ita|sesto.|su de ses.}}<br />
* {{test|ita|settimo.|su de sete.}}<br />
* {{test|ita|ottavo.|su de oto.}}<br />
* {{test|ita|nono.|su de noe.}}<br />
* {{test|ita|decimo.|su de deghe.}}<br />
* {{test|ita|È arrivato il secondo.|Est arribadu su segundu.}}<br />
* {{test|ita|È arrivato secondo.|Est arribadu segundu.}}<br />
* {{test|ita|È arrivato il terzo.|Est arribadu su de tres.}}<br />
* {{test|ita|È arrivato terzo in gara.|Est arribadu su de tres in gara.}}<br />
* {{test|ita|È stato il terzo ad arrivare.|Est istadu su de tres a nche arribare.}}<br />
* {{test|ita|La seconda casa.|Sa segunda domo.}}<br />
* {{test|ita|La mia seconda casa.|Sa segunda domo mea.}}<br />
* {{test|ita|La terza casa.|Sa de tres domos.}}<br />
* {{test|ita|La mia terza casa.|Sa de tres de sas domos meas.}}<br />
* {{test|ita|Una terza famiglia.|Una de tres famìlias.}}<br />
<br />
==Imperativu negativu==<br />
<br />
* {{test|ita|non fare.|non fatzas.}} <br />
* {{test|ita|non fare così.|non fatzas gosi.}}<br />
* {{test|ita|non fare da cattivo.|non fatzas a malu.}}<br />
* {{test|ita|non fargli male.|no li fatzas male.}}<br />
* {{test|ita|non dirgli niente.|no li nàrgias nudda.}}<br />
* {{test|ita|non lo fare.|no lu fatzas.}}<br />
* {{test|ita|non farlo.|no lu fatzas.}}<br />
* {{test|ita|non farglielo.|no liu fatzas.}}<br />
* {{test|ita|non farglielo fare.|non bi lu fatzas fàghere.}}<br />
* {{test|ita|non farglielo.|non bi lu fatzas.}}<br />
* {{test|ita|non fate.|non fatzais.}}<br />
* {{test|ita|non fatelo.|no lu fatzais.}}<br />
* {{test|ita|non fateglielo.|non bi lu fatzais.}}<br />
* {{test|ita|non fategliela.|non bi la fatzais.}}<br />
<br />
==Partitivu==<br />
<br />
* {{test|ita|Bevo dell'acqua.|Bufo abba.}}<br />
* {{test|ita|Porta delle mele.|Bati·nche·nde mela.}}<br />
* {{test|ita|Qualcuno di noi.|Calicunu de nois.}}<br />
* {{test|ita|C'erano fichi e ghiande.|B'aiat figu e lande.}}<br />
<br />
==andare + partitzìpiu==<br />
<br />
* {{test|ita|Il fazzoletto va messo così.|Su mucadore andat postu gasi.}}<br />
* {{test|ita|Il fazzoletto non va messo così.|Su mucadore non andat postu gasi.}}<br />
* {{test|ita|Il fazzoletto andava messo così.|Su mucadore andaìat postu gasi.}}<br />
* {{test|ita|Il fazzoletto non andava messo così.|Su mucadore no andaìat postu gasi.}}<br />
* {{test|ita|Il fazzoletto non andava messo così.|Su mucadore no andaìat postu gasi.}}<br />
<br />
<br />
==chèrrere + partitzìpiu==<br />
* {{test|ita|Quel film è da vedere.|Cuddu film cheret bidu.}}<br />
* {{test|ita|Quel film era da vedere.|Cuddu film cherìat bidu.}}<br />
* {{test|ita|Quel film sarebbe da vedere.|Cuddu film diat chèrrere bidu.}}<br />
<br />
==chèrrere + a + infinitu==<br />
<br />
* {{test|ita|Lui vuole che si faccia così.|Isse chèret a fàghere gasi.}}<br />
* {{test|ita|Lui non vuole che si faccia così.|Isse non bòlet a fàghere gasi.}}<br />
* {{test|ita|Lui voleva che si facesse così.|Isse bolìat a fàghere gasi.}}<br />
* {{test|ita|Lui non voleva che si facesse così.|Isse bolìat a fàghere gasi.}}<br />
* {{test|ita|Lui vorrà che si faccia così.|Isse at a chèrrere a fàghere gasi.}}<br />
<br />
==abarrare==<br />
<br />
* {{test|ita|stai calmo|abarra chietu}}<br />
<br />
==èssere de + infinitu==<br />
* {{test|ita|Il secchio era da mettere in quell'angolo.|Su puale fiat de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio non era da mettere in quell'angolo.|Su puale no fiat de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio sarà da mettere in quell'angolo.|Su puale at a èssere de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio non sarà da mettere in quell'angolo.|Su puale no at a èssere de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio è da mettere in quell'angolo.|Su puale est de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio non è da mettere in quell'angolo.|Su puale no est de pònnere in cudd'angrone.}}<br />
<br />
==bi + àere de + infinitu==<br />
* {{test|ita|Cosa c'è da fare?|Ite b'at de fàghere?.}}<br />
<br />
==èssere a + infinitu==<br />
* {{test|ita|Cosa si deve fare?|Ite est a fàghere?}}<br />
<br />
==Possessivos==<br />
<br />
* {{test|ita|Nella mia terra.|In sa terra mea.}}<br />
* {{test|ita|Nella loro terra.|In sa terra issoro.}}<br />
* {{test|ita|Le loro case.|Sas domos issoro.}}<br />
* {{test|ita|I loro amici.|Sos amigos issoro.}}<br />
<br />
==Superlativos==<br />
<br />
* {{test|ita|Tua figlia è molto studiosa.|Fìgia tua est meda istudiosa.}}<br />
* {{test|ita|Contentissimo.|Cuntentu a beru.}}<br />
* {{test|ita|Bellissimi.|Bellos a beru.}}<br />
* {{test|ita|Il più bello di tutti.|Su prus bellu de totus.}}<br />
* {{test|ita|La più ricca del mondo.|Sa prus rica de su mundu.}}<br />
<br />
==Cumparativos==<br />
<br />
* {{test|ita|Carlo è più serio di Marco.|Carlo est prus seriu de Marco.}}<br />
* {{test|ita|Carlo è migliore.|Carlo est mègius.}}<br />
* {{test|ita|La figlia è così bella come la madre.|Sa fìgia est bella comente a sa mama.}}<br />
* {{test|ita|La figlia è tanto bella quanto la madre.|Sa fìgia est bella cantu a sa mama.}}<br />
* {{test|ita|La figlia è non meno bella di la madre.|Sa fìgia no est prus pagu bella de sa mama.}}<br />
* {{test|ita|Marco è meno studioso di Carlo.|Marco est prus pagu istudiosu de Carlo.}}<br />
* {{test|ita|Tuo figlio è più intelligente che studioso.|Fìgiu tuo est prus abbistu chi non istudiosu.}}<br />
* {{test|ita|È più facile promettere che mantenere.| Est prus fàtzile promìtere chi non mantènnere.}}<br />
<br />
==Pronùmenes proclìticos==<br />
<br />
* {{test|ita|Alcuni glielo chiesero.|Calicunu bi lu at pedidu.}}<br />
<br />
* {{test|ita|Glielo.|Bi lu.}}<br />
* {{test|ita|Gliela.|Bi la.}}<br />
* {{test|ita|Glieli.|Bi los.}}<br />
* {{test|ita|Gliele.|Bi las.}}<br />
* {{test|ita|Gliene.|Bi nde.}}<br />
<br />
==Pronùmenes enclìticos==<br />
<br />
* {{test|ita|Dimmi dov'è!|Nara·mi ue est!}}<br />
* {{test|ita|No Maria, non posso dirtelo.|No Maria, non ti lu potzo nàrrere.}}<br />
<br />
===Imperativo===<br />
<br />
* {{test|ita|Dallo.|Dae·lu}}<br />
* {{test|ita|Dalla.|Dae·la}}<br />
* {{test|ita|Dalli.|Dae·los}}<br />
* {{test|ita|Dalle.|Dae·las}}<br />
<br />
<br />
* {{test|ita|Dagli un libro.|Dae·li unu libru}}<br />
* {{test|ita|Dalle un libro.|Dae·li unu libru}}<br />
* {{test|ita|Dagli un libro.|Dae·lis unu libru}}<br />
* {{test|ita|Dagli un libro.|Dae·lis unu libru}}<br />
<br />
<br />
* {{test|ita|Dagliene uno.|Dae·si·nde unu}}<br />
<br />
* {{test|ita|Portacene una.|Bati·nche·nde una.}}<br />
<br />
<br />
* {{test|ita|Dà.|Dae.}}<br />
* {{test|ita|Date.|Dage.}}<br />
* {{test|ita|Datela.|Dage·la.}}<br />
* {{test|ita|Datele.|Dàge·li.}}<br />
* {{test|ita|Dategli.|Dàge·lis.}}<br />
* {{test|ita|Dategli.|Dàge·lis.}}<br />
* {{test|ita|Datelo.|Dàge·lu.}}<br />
* {{test|ita|Dateli.|Dàge·los.}}<br />
* {{test|ita|Datemi.|Dàge·mi.}}<br />
* {{test|ita|Datemela.|Dage·mi·la.}}<br />
* {{test|ita|Datemele.|Dage·mi·las.}}<br />
* {{test|ita|Datemelo.|Dage·mi·lu.}}<br />
* {{test|ita|Datemeli.|Dage·mi·los.}}<br />
* {{test|ita|Dateci.|Dàge·nos.}}<br />
* {{test|ita|Datecela.|Dage·nos·la.}}<br />
* {{test|ita|Datecele.|Dage·nos·las.}}<br />
* {{test|ita|Datecelo.|Dage·nos·lu.}}<br />
* {{test|ita|Dateceli.|Dage·nos·los.}}<br />
<br />
* {{test|ita|Dategliela.|Dage·bi·la.}}<br />
* {{test|ita|Dategliele.|Dage·bi·las.}}<br />
* {{test|ita|Dateglielo.|Dage·bi·lu.}}<br />
* {{test|ita|Dalla.|Dae·la.}}<br />
* {{test|ita|Dalle.|Dae·las.}}<br />
* {{test|ita|Dalle.|Dae·li.}}<br />
* {{test|ita|Dagli.|Dae·li.}}<br />
* {{test|ita|Dallo.|Dae·lu.}}<br />
* {{test|ita|Dalli.|Dae·los.}}<br />
* {{test|ita|Dammi.|Dae·mi.}}<br />
* {{test|ita|Dammela.|Dae·mi·la.}}<br />
* {{test|ita|Dammele.|Dae·mi·las.}}<br />
* {{test|ita|Dammelo.|Dae·mi·lu.}}<br />
* {{test|ita|Dammeli.|Dae·mi·los}}<br />
<br />
* {{test|ita|Dagliela.|Dae·bi·la.}}<br />
* {{test|ita|Dagliele.|Dae·bi·las.}}<br />
* {{test|ita|Daglielo.|Dae·bi·los.}}<br />
* {{test|ita|Daglieli.|Dae·bi·los.}}<br />
* {{test|ita|Datti.|Dae·ti.}}<br />
* {{test|ita|Dattela.|Dae·ti·la.}}<br />
* {{test|ita|Dattele.|Dae·ti·las.}}<br />
* {{test|ita|Dattelo.|Dae·ti·lu.}}<br />
* {{test|ita|Datteli.|Dae·ti·los.}}<br />
* {{test|ita|Datemi.|Dàge·mi.}}<br />
* {{test|ita|Dateci.|Dàge·nos.}}<br />
* {{test|ita|Dia.|Dia.}}<br />
* {{test|ita|Diate.|Diades.}}<br />
* {{test|ita|La dia.|La diat.}}<br />
* {{test|ita|Le dia.|Las diat.}}<br />
* {{test|ita|Le dia.|Lis diat.}}<br />
* {{test|ita|Gli dia.|Lis diat.}}<br />
* {{test|ita|Gli dia.|Li diat.}}<br />
* {{test|ita|Lo dia.|Lu diat.}}<br />
* {{test|ita|Li dia.|Los diat.}}<br />
* {{test|ita|Mi dia.|Mi diat.}}<br />
* {{test|ita|Me la dia.|Mi la diat.}}<br />
* {{test|ita|Me le dia.|Mi las diat.}}<br />
* {{test|ita|Me lo dia.|Mi lu diat.}}<br />
* {{test|ita|Me li dia.|Mi los diat.}}<br />
* {{test|ita|Diamoci.|Diamus·nos.}}<br />
* {{test|ita|Diamocela.|Diamus·nos·la.}}<br />
* {{test|ita|Diamocele.|Diamus·nos·las}}<br />
* {{test|ita|Diamocelo.|Diamus·nos·lu.}}<br />
* {{test|ita|Diamoceli.|Diamus·nos·los.}}<br />
* {{test|ita|Diamo.|Diamus.}}<br />
* {{test|ita|Diamogliela.|Diamus.}}<br />
* {{test|ita|Diamogliele.|Diamus·bi·la.}}<br />
* {{test|ita|Diamoglielo|Diamus·bi·las.}}<br />
* {{test|ita|Diamoglieli.|Diamus·bi·lu.}}<br />
* {{test|ita|Diamola.|Diamus·la.}}<br />
* {{test|ita|Diamole.|Diamus·las.}}<br />
* {{test|ita|Diamogli.|Diamus·li.}}<br />
* {{test|ita|Diamogli.|Diamus·lis.}}<br />
* {{test|ita|Diamolo.|Diamus·lu.}}<br />
* {{test|ita|Diamoli.|Diamus·los.}}<br />
<br />
* {{test|ita|Diano.|Diant.}}<br />
* {{test|ita|La diano.|La diant.}}<br />
* {{test|ita|Le diano.|Lis diant. }}<br />
* {{test|ita|Gli diano.|Lis diant.}}<br />
* {{test|ita|Gli diano.|Lis diant. }}<br />
* {{test|ita|Lo diano.|Lu diant.}}<br />
* {{test|ita|Li diano.|Lis diant. }}<br />
* {{test|ita|Mi diano.|Mi diant.}}<br />
* {{test|ita|Me la diano.|Mi la diant.}}<br />
* {{test|ita|Me le diano.|Mi las diant.}}<br />
* {{test|ita|Me lo diano.|Mi lu diant.}}<br />
* {{test|ita|Me li diano.|Mi los diant.}}<br />
* {{test|ita|Ci diano.|Nos diant.}}<br />
* {{test|ita|Ce la diano.|Nos la diant.}}<br />
* {{test|ita|Ce le diano.|Nos las diant.}}<br />
* {{test|ita|Ce lo diano.|Nos lu diant.}}<br />
* {{test|ita|Ce lo diano.|Nos lu diant.}}<br />
* {{test|ita|Ce li diano.|Nos los diant.}}<br />
* {{test|ita|Vi diano.|Bos diant.}}<br />
* {{test|ita|Ci dia.|Nos diat.}}<br />
* {{test|ita|Ce la dia.|Nos las diat.}}<br />
* {{test|ita|Ce la dia.|Nos lu diat.}}<br />
* {{test|ita|Ce le dia.|Nos las diat.}}<br />
* {{test|ita|Ce lo dia.|Nos lu diat. }}<br />
* {{test|ita|Ce li dia.|Nos los diat. }}<br />
* {{test|ita|Si diano.|Si diant. }}<br />
* {{test|ita|Gliela diano.|Bi la diat. }}<br />
* {{test|ita|Gliele diano.|Bi las diat. }}<br />
* {{test|ita|Glielo diano.|Bi lu diat.}}<br />
* {{test|ita|Gliela dia.|Bi la diat.}}<br />
* {{test|ita|Gliele dia.|Bi las diat.}}<br />
* {{test|ita|Glielo dia.|Bi lu diat.}}<br />
* {{test|ita|Glieli dia.|Bi los diat.}}<br />
<br />
===Infinitivo===<br />
<br />
* {{test|ita|Dare.|Dare}}<br />
<br />
* {{test|ita|Darla.|La dare.}}<br />
* {{test|ita|Darle.|Li dare.}}<br />
* {{test|ita|Darle.|Las dare.}}<br />
* {{test|ita|Dargli.|Li dare.}}<br />
* {{test|ita|Darglielo.|Bi lu dare.}}<br />
* {{test|ita|Dargliela.|Bi la dare.}}<br />
* {{test|ita|Darglieli.|Bi los dare.}}<br />
* {{test|ita|Dargliele.|Bi las dare.}}<br />
* {{test|ita|Darlo.|Lu dare.}}<br />
* {{test|ita|Darli.|Los dare.}}<br />
* {{test|ita|Darmi.|Mi dare.}}<br />
* {{test|ita|Darmela.|Mi la dare.}}<br />
* {{test|ita|Darmele.|Mi las dare.}}<br />
* {{test|ita|Darmelo.|Mi lu dare.}}<br />
* {{test|ita|Darmeli.|Mi los dare.}}<br />
* {{test|ita|Darci.|Nos dare.}}<br />
* {{test|ita|Darcela.|Nos la dare.}}<br />
* {{test|ita|Darcele.|Nos las dare.}}<br />
* {{test|ita|Darcelo.|Nos lu dare.}}<br />
* {{test|ita|Darceli.|Nos los dare.}}<br />
* {{test|ita|Darvi.|Bos dare.}}<br />
* {{test|ita|Darvela.|Bos la dare.}}<br />
* {{test|ita|Darvele.|Bos las dare.}}<br />
* {{test|ita|Darvelo.|Bos lu dare.}}<br />
* {{test|ita|Darveli.|Bos los dare.}}<br />
* {{test|ita|Darsi.|Si dare.}}<br />
* {{test|ita|Darsela.|Si la dare.}}<br />
* {{test|ita|Darsele.|Si las dare.}}<br />
* {{test|ita|Darselo.|Si lu dare.}}<br />
* {{test|ita|Darseli.|Si los dare.}}<br />
* {{test|ita|Darti.|Ti dare.}}<br />
* {{test|ita|Dartela.|Ti la dare.}}<br />
* {{test|ita|Dartele.|Ti las dare.}}<br />
* {{test|ita|Dartelo.|Ti lu dare.}}<br />
* {{test|ita|Darteli.|Ti los dare.}}<br />
* {{test|ita|Dartemi.|Ti mi dare.}}<br />
* {{test|ita|Darteci.|Ti nos dare.}}<br />
<br />
===Gerundio===<br />
<br />
* {{test|ita|Dando.|Dende.}}<br />
* {{test|ita|Dandola.|Dende·la.}}<br />
* {{test|ita|Dandole.|Dende·li.}}<br />
* {{test|ita|Dandole.|Dende·las.}}<br />
* {{test|ita|Dandogli.|Dende·li.}}<br />
* {{test|ita|Dandoglielo.|Dende·bi·lu.}}<br />
* {{test|ita|Dandogliela.|Dende·bi·la.}}<br />
* {{test|ita|Dandogliele.|Dende·bi·las.}}<br />
* {{test|ita|Dandoglieli.|Dende·bi·los.}}<br />
* {{test|ita|Dandolo.|Dende·lu.}}<br />
* {{test|ita|Dandoli.|Dende·los.}}<br />
* {{test|ita|Dandomi.|Dende·mi.}}<br />
* {{test|ita|Dandomela.|Dende·mi·la.}}<br />
* {{test|ita|Dandomele.|Dende·mi·las.}}<br />
* {{test|ita|Dandomelo.|Dende·mi·lu.}}<br />
* {{test|ita|Dandomeli.|Dende·mi·los.}}<br />
* {{test|ita|Dandoci.|Dende·nos.}}<br />
* {{test|ita|Dandocela.|Dende·nos·la.}}<br />
* {{test|ita|Dandocele.|Dende·nos·las.}}<br />
* {{test|ita|Dandocelo.|Dende·nos·lu.}}<br />
* {{test|ita|Dandoceli.|Dende·mi·los.}}<br />
* {{test|ita|Dandovi.|Dende·bos.}}<br />
* {{test|ita|Dandovela.|Dende·bos·la.}}<br />
* {{test|ita|Dandovele.|Dende·bos·las.}}<br />
* {{test|ita|Dandovelo.|Dende·bos·lu.}}<br />
* {{test|ita|Dandoveli.|Dende·bos·los.}}<br />
* {{test|ita|Dandosi.|Dende·si}}<br />
* {{test|ita|Dandosela.|Dende·si·la.}}<br />
* {{test|ita|Dandosele.|Dende·si·las.}}<br />
* {{test|ita|Dandosegli.|Dende·si·lis.}}<br />
* {{test|ita|Dandoselo.|Dende·si·lu.}}<br />
* {{test|ita|Dandoseli.|Dende·si·los.}}<br />
* {{test|ita|Dandoti.|Dende·ti.}}<br />
* {{test|ita|Dandotela.|Dende·ti·la.}}<br />
* {{test|ita|Dandotele.|Dende·ti·las.}}<br />
* {{test|ita|Dandotelo.|Dende·ti·lu.}}<br />
* {{test|ita|Dandoteli.|Dende·ti·los.}}<br />
* {{test|ita|Dandoteci.|Dende·ti·nos.}}<br />
<br />
===Participio===<br />
<br />
* {{test|ita|Vistolo.|Bidu·lu.}}<br />
* {{test|ita|Vistola.|Bida·la.}}<br />
<br />
[[Category:Sardo e italiano|Pending tests]]</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Sardo_e_italiano/Pending_tests&diff=67172Sardo e italiano/Pending tests2018-06-15T16:26:01Z<p>Grfro3d: /* àere de + infinitu */</p>
<hr />
<div><br />
=Diferèntzias intro de sardu e italianu=<br />
<br />
==Usu de "totu"==<br />
<br />
* {{test|ita|Sono rimasto a casa per tutto il giorno.|So abarradu in domo totu sa die.}} <br />
* {{test|ita|Sono rimasto qui per tutto il tempo.|So abarradu inoghe totu su tempus.}} <br />
* {{test|ita|Me la sono mangiata tutta.|Mi l'apo mandigada totu.}} <br />
* {{test|ita|Tutti gli alberi.|Totu sos àrbores.}} <br />
* {{test|ita|Tutte le case.|Totu sas domos.}} <br />
* {{test|ita|Ci sono tutti.|Bi sunt totus.}} <br />
* {{test|ita|Ci sono tutte.|Bi sunt totus.}} <br />
* {{test|ita|Ciao a tutti.|Salude a totus.}}<br />
<br />
==Nùmeros ordinales==<br />
<br />
* {{test|ita|primo.|su primu.}} <br />
* {{test|ita|secondo.|segundu.}}<br />
* {{test|ita|terzo.|su de tres.}}<br />
* {{test|ita|quarto.|su de bator.}}<br />
* {{test|ita|quinto.|su de chimbe.}}<br />
* {{test|ita|sesto.|su de ses.}}<br />
* {{test|ita|settimo.|su de sete.}}<br />
* {{test|ita|ottavo.|su de oto.}}<br />
* {{test|ita|nono.|su de noe.}}<br />
* {{test|ita|decimo.|su de deghe.}}<br />
* {{test|ita|È arrivato il secondo.|Est arribadu su segundu.}}<br />
* {{test|ita|È arrivato secondo.|Est arribadu segundu.}}<br />
* {{test|ita|È arrivato il terzo.|Est arribadu su de tres.}}<br />
* {{test|ita|È arrivato terzo in gara.|Est arribadu su de tres in gara.}}<br />
* {{test|ita|È stato il terzo ad arrivare.|Est istadu su de tres a nche arribare.}}<br />
* {{test|ita|La seconda casa.|Sa segunda domo.}}<br />
* {{test|ita|La mia seconda casa.|Sa segunda domo mea.}}<br />
* {{test|ita|La terza casa.|Sa de tres domos.}}<br />
* {{test|ita|La mia terza casa.|Sa de tres de sas domos meas.}}<br />
* {{test|ita|Una terza famiglia.|Una de tres famìlias.}}<br />
<br />
==Imperativu negativu==<br />
<br />
* {{test|ita|non fare.|non fatzas.}} <br />
* {{test|ita|non fare così.|non fatzas gosi.}}<br />
* {{test|ita|non fare da cattivo.|non fatzas a malu.}}<br />
* {{test|ita|non fargli male.|no li fatzas male.}}<br />
* {{test|ita|non dirgli niente.|no li nàrgias nudda.}}<br />
* {{test|ita|non lo fare.|no lu fatzas.}}<br />
* {{test|ita|non farlo.|no lu fatzas.}}<br />
* {{test|ita|non farglielo.|no liu fatzas.}}<br />
* {{test|ita|non farglielo fare.|non bi lu fatzas fàghere.}}<br />
* {{test|ita|non farglielo.|non bi lu fatzas.}}<br />
* {{test|ita|non fate.|non fatzais.}}<br />
* {{test|ita|non fatelo.|no lu fatzais.}}<br />
* {{test|ita|non fateglielo.|non bi lu fatzais.}}<br />
* {{test|ita|non fategliela.|non bi la fatzais.}}<br />
<br />
==Partitivu==<br />
<br />
* {{test|ita|Bevo dell'acqua.|Bufo abba.}}<br />
* {{test|ita|Porta delle mele.|Bati·nche·nde mela.}}<br />
* {{test|ita|Qualcuno di noi.|Calicunu de nois.}}<br />
* {{test|ita|C'erano fichi e ghiande.|B'aiat figu e lande.}}<br />
<br />
==andare + partitzìpiu==<br />
<br />
* {{test|ita|Il fazzoletto va messo così.|Su mucadore andat postu gasi.}}<br />
* {{test|ita|Il fazzoletto non va messo così.|Su mucadore non andat postu gasi.}}<br />
* {{test|ita|Il fazzoletto andava messo così.|Su mucadore andaìat postu gasi.}}<br />
* {{test|ita|Il fazzoletto non andava messo così.|Su mucadore no andaìat postu gasi.}}<br />
* {{test|ita|Il fazzoletto non andava messo così.|Su mucadore no andaìat postu gasi.}}<br />
<br />
<br />
==chèrrere + partitzìpiu==<br />
* {{test|ita|Quel film è da vedere.|Cuddu film cheret bidu.}}<br />
* {{test|ita|Quel film era da vedere.|Cuddu film cherìat bidu.}}<br />
* {{test|ita|Quel film sarebbe da vedere.|Cuddu film diat chèrrere bidu.}}<br />
<br />
==chèrrere + a + infinitu==<br />
<br />
* {{test|ita|Lui vuole che si faccia così.|Isse chèret a fàghere gasi.}}<br />
* {{test|ita|Lui non vuole che si faccia così.|Isse non bòlet a fàghere gasi.}}<br />
* {{test|ita|Lui voleva che si facesse così.|Isse bolìat a fàghere gasi.}}<br />
* {{test|ita|Lui non voleva che si facesse così.|Isse bolìat a fàghere gasi.}}<br />
* {{test|ita|Lui vorrà che si faccia così.|Isse at a chèrrere a fàghere gasi.}}<br />
<br />
==abarrare==<br />
<br />
* {{test|ita|stai calmo|abarra chietu}}<br />
<br />
==èssere de + infinitu==<br />
* {{test|ita|Il secchio era da mettere in quell'angolo.|Su puale fiat de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio non era da mettere in quell'angolo.|Su puale no fiat de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio sarà da mettere in quell'angolo.|Su puale at a èssere de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio non sarà da mettere in quell'angolo.|Su puale no at a èssere de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio è da mettere in quell'angolo.|Su puale est de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio non è da mettere in quell'angolo.|Su puale no est de pònnere in cudd'angrone.}}<br />
<br />
==bi + àere de + infinitu==<br />
* {{test|ita|Cosa c'è da fare?|Ite b'at de fàghere?.}}<br />
<br />
==èssere a + infinitu==<br />
* {{test|ita|Cosa si deve fare?|Ite est a fàghere?}}<br />
<br />
==Possessivos==<br />
<br />
* {{test|ita|Nella mia terra.|In sa terra mea.}}<br />
* {{test|ita|Nella loro terra.|In sa terra issoro.}}<br />
* {{test|ita|Le loro case.|Sas domos issoro.}}<br />
* {{test|ita|I loro amici.|Sos amigos issoro.}}<br />
<br />
==Superlativos==<br />
<br />
* {{test|ita|Tua figlia è molto studiosa.|Fìgia tua est meda istudiosa.}}<br />
* {{test|ita|Contentissimo.|Cuntentu a beru.}}<br />
* {{test|ita|Bellissimi.|Bellos a beru.}}<br />
* {{test|ita|Integerrima.|Intrega a beru.}}<br />
* {{test|ita|Il più bello di tutti.|Su prus bellu de totus.}}<br />
* {{test|ita|La più ricca del mondo.|Sa prus rica de su mundu.}}<br />
<br />
==Cumparativos==<br />
<br />
* {{test|ita|Carlo è più serio di Marco.|Carlo est prus seriu de Marco.}}<br />
* {{test|ita|Carlo è migliore.|Carlo est mègius.}}<br />
* {{test|ita|La figlia è così bella come la madre.|Sa fìgia est bella comente a sa mama.}}<br />
* {{test|ita|La figlia è tanto bella quanto la madre.|Sa fìgia est bella cantu a sa mama.}}<br />
* {{test|ita|La figlia è non meno bella di la madre.|Sa fìgia no est prus pagu bella de sa mama.}}<br />
* {{test|ita|Marco è meno studioso di Carlo.|Marco est prus pagu istudiosu de Carlo.}}<br />
* {{test|ita|Tuo figlio è più intelligente che studioso.|Fìgiu tuo est prus abbistu chi non istudiosu.}}<br />
* {{test|ita|È più facile promettere che mantenere.| Est prus fàtzile promìtere chi non mantènnere.}}<br />
<br />
==Pronùmenes proclìticos==<br />
<br />
* {{test|ita|Alcuni glielo chiesero.|Calicunu bi lu at pedidu.}}<br />
<br />
* {{test|ita|Glielo.|Bi lu.}}<br />
* {{test|ita|Gliela.|Bi la.}}<br />
* {{test|ita|Glieli.|Bi los.}}<br />
* {{test|ita|Gliele.|Bi las.}}<br />
* {{test|ita|Gliene.|Bi nde.}}<br />
<br />
==Pronùmenes enclìticos==<br />
<br />
* {{test|ita|Dimmi dov'è!|Nara·mi ue est!}}<br />
* {{test|ita|No Maria, non posso dirtelo.|No Maria, non ti lu potzo nàrrere.}}<br />
<br />
===Imperativo===<br />
<br />
* {{test|ita|Dallo.|Dae·lu}}<br />
* {{test|ita|Dalla.|Dae·la}}<br />
* {{test|ita|Dalli.|Dae·los}}<br />
* {{test|ita|Dalle.|Dae·las}}<br />
<br />
<br />
* {{test|ita|Dagli un libro.|Dae·li unu libru}}<br />
* {{test|ita|Dalle un libro.|Dae·li unu libru}}<br />
* {{test|ita|Dagli un libro.|Dae·lis unu libru}}<br />
* {{test|ita|Dagli un libro.|Dae·lis unu libru}}<br />
<br />
<br />
* {{test|ita|Dagliene uno.|Dae·si·nde unu}}<br />
<br />
* {{test|ita|Portacene una.|Bati·nche·nde una.}}<br />
<br />
<br />
* {{test|ita|Dà.|Dae.}}<br />
* {{test|ita|Date.|Dage.}}<br />
* {{test|ita|Datela.|Dage·la.}}<br />
* {{test|ita|Datele.|Dàge·li.}}<br />
* {{test|ita|Dategli.|Dàge·lis.}}<br />
* {{test|ita|Dategli.|Dàge·lis.}}<br />
* {{test|ita|Datelo.|Dàge·lu.}}<br />
* {{test|ita|Dateli.|Dàge·los.}}<br />
* {{test|ita|Datemi.|Dàge·mi.}}<br />
* {{test|ita|Datemela.|Dage·mi·la.}}<br />
* {{test|ita|Datemele.|Dage·mi·las.}}<br />
* {{test|ita|Datemelo.|Dage·mi·lu.}}<br />
* {{test|ita|Datemeli.|Dage·mi·los.}}<br />
* {{test|ita|Dateci.|Dàge·nos.}}<br />
* {{test|ita|Datecela.|Dage·nos·la.}}<br />
* {{test|ita|Datecele.|Dage·nos·las.}}<br />
* {{test|ita|Datecelo.|Dage·nos·lu.}}<br />
* {{test|ita|Dateceli.|Dage·nos·los.}}<br />
<br />
* {{test|ita|Dategliela.|Dage·bi·la.}}<br />
* {{test|ita|Dategliele.|Dage·bi·las.}}<br />
* {{test|ita|Dateglielo.|Dage·bi·lu.}}<br />
* {{test|ita|Dalla.|Dae·la.}}<br />
* {{test|ita|Dalle.|Dae·las.}}<br />
* {{test|ita|Dalle.|Dae·li.}}<br />
* {{test|ita|Dagli.|Dae·li.}}<br />
* {{test|ita|Dallo.|Dae·lu.}}<br />
* {{test|ita|Dalli.|Dae·los.}}<br />
* {{test|ita|Dammi.|Dae·mi.}}<br />
* {{test|ita|Dammela.|Dae·mi·la.}}<br />
* {{test|ita|Dammele.|Dae·mi·las.}}<br />
* {{test|ita|Dammelo.|Dae·mi·lu.}}<br />
* {{test|ita|Dammeli.|Dae·mi·los}}<br />
<br />
* {{test|ita|Dagliela.|Dae·bi·la.}}<br />
* {{test|ita|Dagliele.|Dae·bi·las.}}<br />
* {{test|ita|Daglielo.|Dae·bi·los.}}<br />
* {{test|ita|Daglieli.|Dae·bi·los.}}<br />
* {{test|ita|Datti.|Dae·ti.}}<br />
* {{test|ita|Dattela.|Dae·ti·la.}}<br />
* {{test|ita|Dattele.|Dae·ti·las.}}<br />
* {{test|ita|Dattelo.|Dae·ti·lu.}}<br />
* {{test|ita|Datteli.|Dae·ti·los.}}<br />
* {{test|ita|Datemi.|Dàge·mi.}}<br />
* {{test|ita|Dateci.|Dàge·nos.}}<br />
* {{test|ita|Dia.|Dia.}}<br />
* {{test|ita|Diate.|Diades.}}<br />
* {{test|ita|La dia.|La diat.}}<br />
* {{test|ita|Le dia.|Las diat.}}<br />
* {{test|ita|Le dia.|Lis diat.}}<br />
* {{test|ita|Gli dia.|Lis diat.}}<br />
* {{test|ita|Gli dia.|Li diat.}}<br />
* {{test|ita|Lo dia.|Lu diat.}}<br />
* {{test|ita|Li dia.|Los diat.}}<br />
* {{test|ita|Mi dia.|Mi diat.}}<br />
* {{test|ita|Me la dia.|Mi la diat.}}<br />
* {{test|ita|Me le dia.|Mi las diat.}}<br />
* {{test|ita|Me lo dia.|Mi lu diat.}}<br />
* {{test|ita|Me li dia.|Mi los diat.}}<br />
* {{test|ita|Diamoci.|Diamus·nos.}}<br />
* {{test|ita|Diamocela.|Diamus·nos·la.}}<br />
* {{test|ita|Diamocele.|Diamus·nos·las}}<br />
* {{test|ita|Diamocelo.|Diamus·nos·lu.}}<br />
* {{test|ita|Diamoceli.|Diamus·nos·los.}}<br />
* {{test|ita|Diamo.|Diamus.}}<br />
* {{test|ita|Diamogliela.|Diamus.}}<br />
* {{test|ita|Diamogliele.|Diamus·bi·la.}}<br />
* {{test|ita|Diamoglielo|Diamus·bi·las.}}<br />
* {{test|ita|Diamoglieli.|Diamus·bi·lu.}}<br />
* {{test|ita|Diamola.|Diamus·la.}}<br />
* {{test|ita|Diamole.|Diamus·las.}}<br />
* {{test|ita|Diamogli.|Diamus·li.}}<br />
* {{test|ita|Diamogli.|Diamus·lis.}}<br />
* {{test|ita|Diamolo.|Diamus·lu.}}<br />
* {{test|ita|Diamoli.|Diamus·los.}}<br />
<br />
* {{test|ita|Diano.|Diant.}}<br />
* {{test|ita|La diano.|La diant.}}<br />
* {{test|ita|Le diano.|Lis diant. }}<br />
* {{test|ita|Gli diano.|Lis diant.}}<br />
* {{test|ita|Gli diano.|Lis diant. }}<br />
* {{test|ita|Lo diano.|Lu diant.}}<br />
* {{test|ita|Li diano.|Lis diant. }}<br />
* {{test|ita|Mi diano.|Mi diant.}}<br />
* {{test|ita|Me la diano.|Mi la diant.}}<br />
* {{test|ita|Me le diano.|Mi las diant.}}<br />
* {{test|ita|Me lo diano.|Mi lu diant.}}<br />
* {{test|ita|Me li diano.|Mi los diant.}}<br />
* {{test|ita|Ci diano.|Nos diant.}}<br />
* {{test|ita|Ce la diano.|Nos la diant.}}<br />
* {{test|ita|Ce le diano.|Nos las diant.}}<br />
* {{test|ita|Ce lo diano.|Nos lu diant.}}<br />
* {{test|ita|Ce lo diano.|Nos lu diant.}}<br />
* {{test|ita|Ce li diano.|Nos los diant.}}<br />
* {{test|ita|Vi diano.|Bos diant.}}<br />
* {{test|ita|Ci dia.|Nos diat.}}<br />
* {{test|ita|Ce la dia.|Nos las diat.}}<br />
* {{test|ita|Ce la dia.|Nos lu diat.}}<br />
* {{test|ita|Ce le dia.|Nos las diat.}}<br />
* {{test|ita|Ce lo dia.|Nos lu diat. }}<br />
* {{test|ita|Ce li dia.|Nos los diat. }}<br />
* {{test|ita|Si diano.|Si diant. }}<br />
* {{test|ita|Gliela diano.|Bi la diat. }}<br />
* {{test|ita|Gliele diano.|Bi las diat. }}<br />
* {{test|ita|Glielo diano.|Bi lu diat.}}<br />
* {{test|ita|Gliela dia.|Bi la diat.}}<br />
* {{test|ita|Gliele dia.|Bi las diat.}}<br />
* {{test|ita|Glielo dia.|Bi lu diat.}}<br />
* {{test|ita|Glieli dia.|Bi los diat.}}<br />
<br />
===Infinitivo===<br />
<br />
* {{test|ita|Dare.|Dare}}<br />
<br />
* {{test|ita|Darla.|La dare.}}<br />
* {{test|ita|Darle.|Li dare.}}<br />
* {{test|ita|Darle.|Las dare.}}<br />
* {{test|ita|Dargli.|Li dare.}}<br />
* {{test|ita|Darglielo.|Bi lu dare.}}<br />
* {{test|ita|Dargliela.|Bi la dare.}}<br />
* {{test|ita|Darglieli.|Bi los dare.}}<br />
* {{test|ita|Dargliele.|Bi las dare.}}<br />
* {{test|ita|Darlo.|Lu dare.}}<br />
* {{test|ita|Darli.|Los dare.}}<br />
* {{test|ita|Darmi.|Mi dare.}}<br />
* {{test|ita|Darmela.|Mi la dare.}}<br />
* {{test|ita|Darmele.|Mi las dare.}}<br />
* {{test|ita|Darmelo.|Mi lu dare.}}<br />
* {{test|ita|Darmeli.|Mi los dare.}}<br />
* {{test|ita|Darci.|Nos dare.}}<br />
* {{test|ita|Darcela.|Nos la dare.}}<br />
* {{test|ita|Darcele.|Nos las dare.}}<br />
* {{test|ita|Darcelo.|Nos lu dare.}}<br />
* {{test|ita|Darceli.|Nos los dare.}}<br />
* {{test|ita|Darvi.|Bos dare.}}<br />
* {{test|ita|Darvela.|Bos la dare.}}<br />
* {{test|ita|Darvele.|Bos las dare.}}<br />
* {{test|ita|Darvelo.|Bos lu dare.}}<br />
* {{test|ita|Darveli.|Bos los dare.}}<br />
* {{test|ita|Darsi.|Si dare.}}<br />
* {{test|ita|Darsela.|Si la dare.}}<br />
* {{test|ita|Darsele.|Si las dare.}}<br />
* {{test|ita|Darselo.|Si lu dare.}}<br />
* {{test|ita|Darseli.|Si los dare.}}<br />
* {{test|ita|Darti.|Ti dare.}}<br />
* {{test|ita|Dartela.|Ti la dare.}}<br />
* {{test|ita|Dartele.|Ti las dare.}}<br />
* {{test|ita|Dartelo.|Ti lu dare.}}<br />
* {{test|ita|Darteli.|Ti los dare.}}<br />
* {{test|ita|Dartemi.|Ti mi dare.}}<br />
* {{test|ita|Darteci.|Ti nos dare.}}<br />
<br />
===Gerundio===<br />
<br />
* {{test|ita|Dando.|Dende.}}<br />
* {{test|ita|Dandola.|Dende·la.}}<br />
* {{test|ita|Dandole.|Dende·li.}}<br />
* {{test|ita|Dandole.|Dende·las.}}<br />
* {{test|ita|Dandogli.|Dende·li.}}<br />
* {{test|ita|Dandoglielo.|Dende·bi·lu.}}<br />
* {{test|ita|Dandogliela.|Dende·bi·la.}}<br />
* {{test|ita|Dandogliele.|Dende·bi·las.}}<br />
* {{test|ita|Dandoglieli.|Dende·bi·los.}}<br />
* {{test|ita|Dandolo.|Dende·lu.}}<br />
* {{test|ita|Dandoli.|Dende·los.}}<br />
* {{test|ita|Dandomi.|Dende·mi.}}<br />
* {{test|ita|Dandomela.|Dende·mi·la.}}<br />
* {{test|ita|Dandomele.|Dende·mi·las.}}<br />
* {{test|ita|Dandomelo.|Dende·mi·lu.}}<br />
* {{test|ita|Dandomeli.|Dende·mi·los.}}<br />
* {{test|ita|Dandoci.|Dende·nos.}}<br />
* {{test|ita|Dandocela.|Dende·nos·la.}}<br />
* {{test|ita|Dandocele.|Dende·nos·las.}}<br />
* {{test|ita|Dandocelo.|Dende·nos·lu.}}<br />
* {{test|ita|Dandoceli.|Dende·mi·los.}}<br />
* {{test|ita|Dandovi.|Dende·bos.}}<br />
* {{test|ita|Dandovela.|Dende·bos·la.}}<br />
* {{test|ita|Dandovele.|Dende·bos·las.}}<br />
* {{test|ita|Dandovelo.|Dende·bos·lu.}}<br />
* {{test|ita|Dandoveli.|Dende·bos·los.}}<br />
* {{test|ita|Dandosi.|Dende·si}}<br />
* {{test|ita|Dandosela.|Dende·si·la.}}<br />
* {{test|ita|Dandosele.|Dende·si·las.}}<br />
* {{test|ita|Dandosegli.|Dende·si·lis.}}<br />
* {{test|ita|Dandoselo.|Dende·si·lu.}}<br />
* {{test|ita|Dandoseli.|Dende·si·los.}}<br />
* {{test|ita|Dandoti.|Dende·ti.}}<br />
* {{test|ita|Dandotela.|Dende·ti·la.}}<br />
* {{test|ita|Dandotele.|Dende·ti·las.}}<br />
* {{test|ita|Dandotelo.|Dende·ti·lu.}}<br />
* {{test|ita|Dandoteli.|Dende·ti·los.}}<br />
* {{test|ita|Dandoteci.|Dende·ti·nos.}}<br />
<br />
===Participio===<br />
<br />
* {{test|ita|Vistolo.|Bidu·lu.}}<br />
* {{test|ita|Vistola.|Bida·la.}}<br />
<br />
[[Category:Sardo e italiano|Pending tests]]</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Sardo_e_italiano/Pending_tests&diff=67171Sardo e italiano/Pending tests2018-06-15T16:25:27Z<p>Grfro3d: /* èssere de + infinitu */</p>
<hr />
<div><br />
=Diferèntzias intro de sardu e italianu=<br />
<br />
==Usu de "totu"==<br />
<br />
* {{test|ita|Sono rimasto a casa per tutto il giorno.|So abarradu in domo totu sa die.}} <br />
* {{test|ita|Sono rimasto qui per tutto il tempo.|So abarradu inoghe totu su tempus.}} <br />
* {{test|ita|Me la sono mangiata tutta.|Mi l'apo mandigada totu.}} <br />
* {{test|ita|Tutti gli alberi.|Totu sos àrbores.}} <br />
* {{test|ita|Tutte le case.|Totu sas domos.}} <br />
* {{test|ita|Ci sono tutti.|Bi sunt totus.}} <br />
* {{test|ita|Ci sono tutte.|Bi sunt totus.}} <br />
* {{test|ita|Ciao a tutti.|Salude a totus.}}<br />
<br />
==Nùmeros ordinales==<br />
<br />
* {{test|ita|primo.|su primu.}} <br />
* {{test|ita|secondo.|segundu.}}<br />
* {{test|ita|terzo.|su de tres.}}<br />
* {{test|ita|quarto.|su de bator.}}<br />
* {{test|ita|quinto.|su de chimbe.}}<br />
* {{test|ita|sesto.|su de ses.}}<br />
* {{test|ita|settimo.|su de sete.}}<br />
* {{test|ita|ottavo.|su de oto.}}<br />
* {{test|ita|nono.|su de noe.}}<br />
* {{test|ita|decimo.|su de deghe.}}<br />
* {{test|ita|È arrivato il secondo.|Est arribadu su segundu.}}<br />
* {{test|ita|È arrivato secondo.|Est arribadu segundu.}}<br />
* {{test|ita|È arrivato il terzo.|Est arribadu su de tres.}}<br />
* {{test|ita|È arrivato terzo in gara.|Est arribadu su de tres in gara.}}<br />
* {{test|ita|È stato il terzo ad arrivare.|Est istadu su de tres a nche arribare.}}<br />
* {{test|ita|La seconda casa.|Sa segunda domo.}}<br />
* {{test|ita|La mia seconda casa.|Sa segunda domo mea.}}<br />
* {{test|ita|La terza casa.|Sa de tres domos.}}<br />
* {{test|ita|La mia terza casa.|Sa de tres de sas domos meas.}}<br />
* {{test|ita|Una terza famiglia.|Una de tres famìlias.}}<br />
<br />
==Imperativu negativu==<br />
<br />
* {{test|ita|non fare.|non fatzas.}} <br />
* {{test|ita|non fare così.|non fatzas gosi.}}<br />
* {{test|ita|non fare da cattivo.|non fatzas a malu.}}<br />
* {{test|ita|non fargli male.|no li fatzas male.}}<br />
* {{test|ita|non dirgli niente.|no li nàrgias nudda.}}<br />
* {{test|ita|non lo fare.|no lu fatzas.}}<br />
* {{test|ita|non farlo.|no lu fatzas.}}<br />
* {{test|ita|non farglielo.|no liu fatzas.}}<br />
* {{test|ita|non farglielo fare.|non bi lu fatzas fàghere.}}<br />
* {{test|ita|non farglielo.|non bi lu fatzas.}}<br />
* {{test|ita|non fate.|non fatzais.}}<br />
* {{test|ita|non fatelo.|no lu fatzais.}}<br />
* {{test|ita|non fateglielo.|non bi lu fatzais.}}<br />
* {{test|ita|non fategliela.|non bi la fatzais.}}<br />
<br />
==Partitivu==<br />
<br />
* {{test|ita|Bevo dell'acqua.|Bufo abba.}}<br />
* {{test|ita|Porta delle mele.|Bati·nche·nde mela.}}<br />
* {{test|ita|Qualcuno di noi.|Calicunu de nois.}}<br />
* {{test|ita|C'erano fichi e ghiande.|B'aiat figu e lande.}}<br />
<br />
==andare + partitzìpiu==<br />
<br />
* {{test|ita|Il fazzoletto va messo così.|Su mucadore andat postu gasi.}}<br />
* {{test|ita|Il fazzoletto non va messo così.|Su mucadore non andat postu gasi.}}<br />
* {{test|ita|Il fazzoletto andava messo così.|Su mucadore andaìat postu gasi.}}<br />
* {{test|ita|Il fazzoletto non andava messo così.|Su mucadore no andaìat postu gasi.}}<br />
* {{test|ita|Il fazzoletto non andava messo così.|Su mucadore no andaìat postu gasi.}}<br />
<br />
<br />
==chèrrere + partitzìpiu==<br />
* {{test|ita|Quel film è da vedere.|Cuddu film cheret bidu.}}<br />
* {{test|ita|Quel film era da vedere.|Cuddu film cherìat bidu.}}<br />
* {{test|ita|Quel film sarebbe da vedere.|Cuddu film diat chèrrere bidu.}}<br />
<br />
==chèrrere + a + infinitu==<br />
<br />
* {{test|ita|Lui vuole che si faccia così.|Isse chèret a fàghere gasi.}}<br />
* {{test|ita|Lui non vuole che si faccia così.|Isse non bòlet a fàghere gasi.}}<br />
* {{test|ita|Lui voleva che si facesse così.|Isse bolìat a fàghere gasi.}}<br />
* {{test|ita|Lui non voleva che si facesse così.|Isse bolìat a fàghere gasi.}}<br />
* {{test|ita|Lui vorrà che si faccia così.|Isse at a chèrrere a fàghere gasi.}}<br />
<br />
==abarrare==<br />
<br />
* {{test|ita|stai calmo|abarra chietu}}<br />
<br />
==èssere de + infinitu==<br />
* {{test|ita|Il secchio era da mettere in quell'angolo.|Su puale fiat de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio non era da mettere in quell'angolo.|Su puale no fiat de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio sarà da mettere in quell'angolo.|Su puale at a èssere de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio non sarà da mettere in quell'angolo.|Su puale no at a èssere de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio è da mettere in quell'angolo.|Su puale est de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio non è da mettere in quell'angolo.|Su puale no est de pònnere in cudd'angrone.}}<br />
<br />
==àere de + infinitu==<br />
* {{test|ita|Cosa c'è da fare?|Ite b'at de fàghere?.}}<br />
<br />
==èssere a + infinitu==<br />
* {{test|ita|Cosa si deve fare?|Ite est a fàghere?}}<br />
<br />
==Possessivos==<br />
<br />
* {{test|ita|Nella mia terra.|In sa terra mea.}}<br />
* {{test|ita|Nella loro terra.|In sa terra issoro.}}<br />
* {{test|ita|Le loro case.|Sas domos issoro.}}<br />
* {{test|ita|I loro amici.|Sos amigos issoro.}}<br />
<br />
==Superlativos==<br />
<br />
* {{test|ita|Tua figlia è molto studiosa.|Fìgia tua est meda istudiosa.}}<br />
* {{test|ita|Contentissimo.|Cuntentu a beru.}}<br />
* {{test|ita|Bellissimi.|Bellos a beru.}}<br />
* {{test|ita|Integerrima.|Intrega a beru.}}<br />
* {{test|ita|Il più bello di tutti.|Su prus bellu de totus.}}<br />
* {{test|ita|La più ricca del mondo.|Sa prus rica de su mundu.}}<br />
<br />
==Cumparativos==<br />
<br />
* {{test|ita|Carlo è più serio di Marco.|Carlo est prus seriu de Marco.}}<br />
* {{test|ita|Carlo è migliore.|Carlo est mègius.}}<br />
* {{test|ita|La figlia è così bella come la madre.|Sa fìgia est bella comente a sa mama.}}<br />
* {{test|ita|La figlia è tanto bella quanto la madre.|Sa fìgia est bella cantu a sa mama.}}<br />
* {{test|ita|La figlia è non meno bella di la madre.|Sa fìgia no est prus pagu bella de sa mama.}}<br />
* {{test|ita|Marco è meno studioso di Carlo.|Marco est prus pagu istudiosu de Carlo.}}<br />
* {{test|ita|Tuo figlio è più intelligente che studioso.|Fìgiu tuo est prus abbistu chi non istudiosu.}}<br />
* {{test|ita|È più facile promettere che mantenere.| Est prus fàtzile promìtere chi non mantènnere.}}<br />
<br />
==Pronùmenes proclìticos==<br />
<br />
* {{test|ita|Alcuni glielo chiesero.|Calicunu bi lu at pedidu.}}<br />
<br />
* {{test|ita|Glielo.|Bi lu.}}<br />
* {{test|ita|Gliela.|Bi la.}}<br />
* {{test|ita|Glieli.|Bi los.}}<br />
* {{test|ita|Gliele.|Bi las.}}<br />
* {{test|ita|Gliene.|Bi nde.}}<br />
<br />
==Pronùmenes enclìticos==<br />
<br />
* {{test|ita|Dimmi dov'è!|Nara·mi ue est!}}<br />
* {{test|ita|No Maria, non posso dirtelo.|No Maria, non ti lu potzo nàrrere.}}<br />
<br />
===Imperativo===<br />
<br />
* {{test|ita|Dallo.|Dae·lu}}<br />
* {{test|ita|Dalla.|Dae·la}}<br />
* {{test|ita|Dalli.|Dae·los}}<br />
* {{test|ita|Dalle.|Dae·las}}<br />
<br />
<br />
* {{test|ita|Dagli un libro.|Dae·li unu libru}}<br />
* {{test|ita|Dalle un libro.|Dae·li unu libru}}<br />
* {{test|ita|Dagli un libro.|Dae·lis unu libru}}<br />
* {{test|ita|Dagli un libro.|Dae·lis unu libru}}<br />
<br />
<br />
* {{test|ita|Dagliene uno.|Dae·si·nde unu}}<br />
<br />
* {{test|ita|Portacene una.|Bati·nche·nde una.}}<br />
<br />
<br />
* {{test|ita|Dà.|Dae.}}<br />
* {{test|ita|Date.|Dage.}}<br />
* {{test|ita|Datela.|Dage·la.}}<br />
* {{test|ita|Datele.|Dàge·li.}}<br />
* {{test|ita|Dategli.|Dàge·lis.}}<br />
* {{test|ita|Dategli.|Dàge·lis.}}<br />
* {{test|ita|Datelo.|Dàge·lu.}}<br />
* {{test|ita|Dateli.|Dàge·los.}}<br />
* {{test|ita|Datemi.|Dàge·mi.}}<br />
* {{test|ita|Datemela.|Dage·mi·la.}}<br />
* {{test|ita|Datemele.|Dage·mi·las.}}<br />
* {{test|ita|Datemelo.|Dage·mi·lu.}}<br />
* {{test|ita|Datemeli.|Dage·mi·los.}}<br />
* {{test|ita|Dateci.|Dàge·nos.}}<br />
* {{test|ita|Datecela.|Dage·nos·la.}}<br />
* {{test|ita|Datecele.|Dage·nos·las.}}<br />
* {{test|ita|Datecelo.|Dage·nos·lu.}}<br />
* {{test|ita|Dateceli.|Dage·nos·los.}}<br />
<br />
* {{test|ita|Dategliela.|Dage·bi·la.}}<br />
* {{test|ita|Dategliele.|Dage·bi·las.}}<br />
* {{test|ita|Dateglielo.|Dage·bi·lu.}}<br />
* {{test|ita|Dalla.|Dae·la.}}<br />
* {{test|ita|Dalle.|Dae·las.}}<br />
* {{test|ita|Dalle.|Dae·li.}}<br />
* {{test|ita|Dagli.|Dae·li.}}<br />
* {{test|ita|Dallo.|Dae·lu.}}<br />
* {{test|ita|Dalli.|Dae·los.}}<br />
* {{test|ita|Dammi.|Dae·mi.}}<br />
* {{test|ita|Dammela.|Dae·mi·la.}}<br />
* {{test|ita|Dammele.|Dae·mi·las.}}<br />
* {{test|ita|Dammelo.|Dae·mi·lu.}}<br />
* {{test|ita|Dammeli.|Dae·mi·los}}<br />
<br />
* {{test|ita|Dagliela.|Dae·bi·la.}}<br />
* {{test|ita|Dagliele.|Dae·bi·las.}}<br />
* {{test|ita|Daglielo.|Dae·bi·los.}}<br />
* {{test|ita|Daglieli.|Dae·bi·los.}}<br />
* {{test|ita|Datti.|Dae·ti.}}<br />
* {{test|ita|Dattela.|Dae·ti·la.}}<br />
* {{test|ita|Dattele.|Dae·ti·las.}}<br />
* {{test|ita|Dattelo.|Dae·ti·lu.}}<br />
* {{test|ita|Datteli.|Dae·ti·los.}}<br />
* {{test|ita|Datemi.|Dàge·mi.}}<br />
* {{test|ita|Dateci.|Dàge·nos.}}<br />
* {{test|ita|Dia.|Dia.}}<br />
* {{test|ita|Diate.|Diades.}}<br />
* {{test|ita|La dia.|La diat.}}<br />
* {{test|ita|Le dia.|Las diat.}}<br />
* {{test|ita|Le dia.|Lis diat.}}<br />
* {{test|ita|Gli dia.|Lis diat.}}<br />
* {{test|ita|Gli dia.|Li diat.}}<br />
* {{test|ita|Lo dia.|Lu diat.}}<br />
* {{test|ita|Li dia.|Los diat.}}<br />
* {{test|ita|Mi dia.|Mi diat.}}<br />
* {{test|ita|Me la dia.|Mi la diat.}}<br />
* {{test|ita|Me le dia.|Mi las diat.}}<br />
* {{test|ita|Me lo dia.|Mi lu diat.}}<br />
* {{test|ita|Me li dia.|Mi los diat.}}<br />
* {{test|ita|Diamoci.|Diamus·nos.}}<br />
* {{test|ita|Diamocela.|Diamus·nos·la.}}<br />
* {{test|ita|Diamocele.|Diamus·nos·las}}<br />
* {{test|ita|Diamocelo.|Diamus·nos·lu.}}<br />
* {{test|ita|Diamoceli.|Diamus·nos·los.}}<br />
* {{test|ita|Diamo.|Diamus.}}<br />
* {{test|ita|Diamogliela.|Diamus.}}<br />
* {{test|ita|Diamogliele.|Diamus·bi·la.}}<br />
* {{test|ita|Diamoglielo|Diamus·bi·las.}}<br />
* {{test|ita|Diamoglieli.|Diamus·bi·lu.}}<br />
* {{test|ita|Diamola.|Diamus·la.}}<br />
* {{test|ita|Diamole.|Diamus·las.}}<br />
* {{test|ita|Diamogli.|Diamus·li.}}<br />
* {{test|ita|Diamogli.|Diamus·lis.}}<br />
* {{test|ita|Diamolo.|Diamus·lu.}}<br />
* {{test|ita|Diamoli.|Diamus·los.}}<br />
<br />
* {{test|ita|Diano.|Diant.}}<br />
* {{test|ita|La diano.|La diant.}}<br />
* {{test|ita|Le diano.|Lis diant. }}<br />
* {{test|ita|Gli diano.|Lis diant.}}<br />
* {{test|ita|Gli diano.|Lis diant. }}<br />
* {{test|ita|Lo diano.|Lu diant.}}<br />
* {{test|ita|Li diano.|Lis diant. }}<br />
* {{test|ita|Mi diano.|Mi diant.}}<br />
* {{test|ita|Me la diano.|Mi la diant.}}<br />
* {{test|ita|Me le diano.|Mi las diant.}}<br />
* {{test|ita|Me lo diano.|Mi lu diant.}}<br />
* {{test|ita|Me li diano.|Mi los diant.}}<br />
* {{test|ita|Ci diano.|Nos diant.}}<br />
* {{test|ita|Ce la diano.|Nos la diant.}}<br />
* {{test|ita|Ce le diano.|Nos las diant.}}<br />
* {{test|ita|Ce lo diano.|Nos lu diant.}}<br />
* {{test|ita|Ce lo diano.|Nos lu diant.}}<br />
* {{test|ita|Ce li diano.|Nos los diant.}}<br />
* {{test|ita|Vi diano.|Bos diant.}}<br />
* {{test|ita|Ci dia.|Nos diat.}}<br />
* {{test|ita|Ce la dia.|Nos las diat.}}<br />
* {{test|ita|Ce la dia.|Nos lu diat.}}<br />
* {{test|ita|Ce le dia.|Nos las diat.}}<br />
* {{test|ita|Ce lo dia.|Nos lu diat. }}<br />
* {{test|ita|Ce li dia.|Nos los diat. }}<br />
* {{test|ita|Si diano.|Si diant. }}<br />
* {{test|ita|Gliela diano.|Bi la diat. }}<br />
* {{test|ita|Gliele diano.|Bi las diat. }}<br />
* {{test|ita|Glielo diano.|Bi lu diat.}}<br />
* {{test|ita|Gliela dia.|Bi la diat.}}<br />
* {{test|ita|Gliele dia.|Bi las diat.}}<br />
* {{test|ita|Glielo dia.|Bi lu diat.}}<br />
* {{test|ita|Glieli dia.|Bi los diat.}}<br />
<br />
===Infinitivo===<br />
<br />
* {{test|ita|Dare.|Dare}}<br />
<br />
* {{test|ita|Darla.|La dare.}}<br />
* {{test|ita|Darle.|Li dare.}}<br />
* {{test|ita|Darle.|Las dare.}}<br />
* {{test|ita|Dargli.|Li dare.}}<br />
* {{test|ita|Darglielo.|Bi lu dare.}}<br />
* {{test|ita|Dargliela.|Bi la dare.}}<br />
* {{test|ita|Darglieli.|Bi los dare.}}<br />
* {{test|ita|Dargliele.|Bi las dare.}}<br />
* {{test|ita|Darlo.|Lu dare.}}<br />
* {{test|ita|Darli.|Los dare.}}<br />
* {{test|ita|Darmi.|Mi dare.}}<br />
* {{test|ita|Darmela.|Mi la dare.}}<br />
* {{test|ita|Darmele.|Mi las dare.}}<br />
* {{test|ita|Darmelo.|Mi lu dare.}}<br />
* {{test|ita|Darmeli.|Mi los dare.}}<br />
* {{test|ita|Darci.|Nos dare.}}<br />
* {{test|ita|Darcela.|Nos la dare.}}<br />
* {{test|ita|Darcele.|Nos las dare.}}<br />
* {{test|ita|Darcelo.|Nos lu dare.}}<br />
* {{test|ita|Darceli.|Nos los dare.}}<br />
* {{test|ita|Darvi.|Bos dare.}}<br />
* {{test|ita|Darvela.|Bos la dare.}}<br />
* {{test|ita|Darvele.|Bos las dare.}}<br />
* {{test|ita|Darvelo.|Bos lu dare.}}<br />
* {{test|ita|Darveli.|Bos los dare.}}<br />
* {{test|ita|Darsi.|Si dare.}}<br />
* {{test|ita|Darsela.|Si la dare.}}<br />
* {{test|ita|Darsele.|Si las dare.}}<br />
* {{test|ita|Darselo.|Si lu dare.}}<br />
* {{test|ita|Darseli.|Si los dare.}}<br />
* {{test|ita|Darti.|Ti dare.}}<br />
* {{test|ita|Dartela.|Ti la dare.}}<br />
* {{test|ita|Dartele.|Ti las dare.}}<br />
* {{test|ita|Dartelo.|Ti lu dare.}}<br />
* {{test|ita|Darteli.|Ti los dare.}}<br />
* {{test|ita|Dartemi.|Ti mi dare.}}<br />
* {{test|ita|Darteci.|Ti nos dare.}}<br />
<br />
===Gerundio===<br />
<br />
* {{test|ita|Dando.|Dende.}}<br />
* {{test|ita|Dandola.|Dende·la.}}<br />
* {{test|ita|Dandole.|Dende·li.}}<br />
* {{test|ita|Dandole.|Dende·las.}}<br />
* {{test|ita|Dandogli.|Dende·li.}}<br />
* {{test|ita|Dandoglielo.|Dende·bi·lu.}}<br />
* {{test|ita|Dandogliela.|Dende·bi·la.}}<br />
* {{test|ita|Dandogliele.|Dende·bi·las.}}<br />
* {{test|ita|Dandoglieli.|Dende·bi·los.}}<br />
* {{test|ita|Dandolo.|Dende·lu.}}<br />
* {{test|ita|Dandoli.|Dende·los.}}<br />
* {{test|ita|Dandomi.|Dende·mi.}}<br />
* {{test|ita|Dandomela.|Dende·mi·la.}}<br />
* {{test|ita|Dandomele.|Dende·mi·las.}}<br />
* {{test|ita|Dandomelo.|Dende·mi·lu.}}<br />
* {{test|ita|Dandomeli.|Dende·mi·los.}}<br />
* {{test|ita|Dandoci.|Dende·nos.}}<br />
* {{test|ita|Dandocela.|Dende·nos·la.}}<br />
* {{test|ita|Dandocele.|Dende·nos·las.}}<br />
* {{test|ita|Dandocelo.|Dende·nos·lu.}}<br />
* {{test|ita|Dandoceli.|Dende·mi·los.}}<br />
* {{test|ita|Dandovi.|Dende·bos.}}<br />
* {{test|ita|Dandovela.|Dende·bos·la.}}<br />
* {{test|ita|Dandovele.|Dende·bos·las.}}<br />
* {{test|ita|Dandovelo.|Dende·bos·lu.}}<br />
* {{test|ita|Dandoveli.|Dende·bos·los.}}<br />
* {{test|ita|Dandosi.|Dende·si}}<br />
* {{test|ita|Dandosela.|Dende·si·la.}}<br />
* {{test|ita|Dandosele.|Dende·si·las.}}<br />
* {{test|ita|Dandosegli.|Dende·si·lis.}}<br />
* {{test|ita|Dandoselo.|Dende·si·lu.}}<br />
* {{test|ita|Dandoseli.|Dende·si·los.}}<br />
* {{test|ita|Dandoti.|Dende·ti.}}<br />
* {{test|ita|Dandotela.|Dende·ti·la.}}<br />
* {{test|ita|Dandotele.|Dende·ti·las.}}<br />
* {{test|ita|Dandotelo.|Dende·ti·lu.}}<br />
* {{test|ita|Dandoteli.|Dende·ti·los.}}<br />
* {{test|ita|Dandoteci.|Dende·ti·nos.}}<br />
<br />
===Participio===<br />
<br />
* {{test|ita|Vistolo.|Bidu·lu.}}<br />
* {{test|ita|Vistola.|Bida·la.}}<br />
<br />
[[Category:Sardo e italiano|Pending tests]]</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Sardo_e_italiano/Pending_tests&diff=67170Sardo e italiano/Pending tests2018-06-15T15:01:07Z<p>Grfro3d: /* Cumparativos */</p>
<hr />
<div><br />
=Diferèntzias intro de sardu e italianu=<br />
<br />
==Usu de "totu"==<br />
<br />
* {{test|ita|Sono rimasto a casa per tutto il giorno.|So abarradu in domo totu sa die.}} <br />
* {{test|ita|Sono rimasto qui per tutto il tempo.|So abarradu inoghe totu su tempus.}} <br />
* {{test|ita|Me la sono mangiata tutta.|Mi l'apo mandigada totu.}} <br />
* {{test|ita|Tutti gli alberi.|Totu sos àrbores.}} <br />
* {{test|ita|Tutte le case.|Totu sas domos.}} <br />
* {{test|ita|Ci sono tutti.|Bi sunt totus.}} <br />
* {{test|ita|Ci sono tutte.|Bi sunt totus.}} <br />
* {{test|ita|Ciao a tutti.|Salude a totus.}}<br />
<br />
==Nùmeros ordinales==<br />
<br />
* {{test|ita|primo.|su primu.}} <br />
* {{test|ita|secondo.|segundu.}}<br />
* {{test|ita|terzo.|su de tres.}}<br />
* {{test|ita|quarto.|su de bator.}}<br />
* {{test|ita|quinto.|su de chimbe.}}<br />
* {{test|ita|sesto.|su de ses.}}<br />
* {{test|ita|settimo.|su de sete.}}<br />
* {{test|ita|ottavo.|su de oto.}}<br />
* {{test|ita|nono.|su de noe.}}<br />
* {{test|ita|decimo.|su de deghe.}}<br />
* {{test|ita|È arrivato il secondo.|Est arribadu su segundu.}}<br />
* {{test|ita|È arrivato secondo.|Est arribadu segundu.}}<br />
* {{test|ita|È arrivato il terzo.|Est arribadu su de tres.}}<br />
* {{test|ita|È arrivato terzo in gara.|Est arribadu su de tres in gara.}}<br />
* {{test|ita|È stato il terzo ad arrivare.|Est istadu su de tres a nche arribare.}}<br />
* {{test|ita|La seconda casa.|Sa segunda domo.}}<br />
* {{test|ita|La mia seconda casa.|Sa segunda domo mea.}}<br />
* {{test|ita|La terza casa.|Sa de tres domos.}}<br />
* {{test|ita|La mia terza casa.|Sa de tres de sas domos meas.}}<br />
* {{test|ita|Una terza famiglia.|Una de tres famìlias.}}<br />
<br />
==Imperativu negativu==<br />
<br />
* {{test|ita|non fare.|non fatzas.}} <br />
* {{test|ita|non fare così.|non fatzas gosi.}}<br />
* {{test|ita|non fare da cattivo.|non fatzas a malu.}}<br />
* {{test|ita|non fargli male.|no li fatzas male.}}<br />
* {{test|ita|non dirgli niente.|no li nàrgias nudda.}}<br />
* {{test|ita|non lo fare.|no lu fatzas.}}<br />
* {{test|ita|non farlo.|no lu fatzas.}}<br />
* {{test|ita|non farglielo.|no liu fatzas.}}<br />
* {{test|ita|non farglielo fare.|non bi lu fatzas fàghere.}}<br />
* {{test|ita|non farglielo.|non bi lu fatzas.}}<br />
* {{test|ita|non fate.|non fatzais.}}<br />
* {{test|ita|non fatelo.|no lu fatzais.}}<br />
* {{test|ita|non fateglielo.|non bi lu fatzais.}}<br />
* {{test|ita|non fategliela.|non bi la fatzais.}}<br />
<br />
==Partitivu==<br />
<br />
* {{test|ita|Bevo dell'acqua.|Bufo abba.}}<br />
* {{test|ita|Porta delle mele.|Bati·nche·nde mela.}}<br />
* {{test|ita|Qualcuno di noi.|Calicunu de nois.}}<br />
* {{test|ita|C'erano fichi e ghiande.|B'aiat figu e lande.}}<br />
<br />
==andare + partitzìpiu==<br />
<br />
* {{test|ita|Il fazzoletto va messo così.|Su mucadore andat postu gasi.}}<br />
* {{test|ita|Il fazzoletto non va messo così.|Su mucadore non andat postu gasi.}}<br />
* {{test|ita|Il fazzoletto andava messo così.|Su mucadore andaìat postu gasi.}}<br />
* {{test|ita|Il fazzoletto non andava messo così.|Su mucadore no andaìat postu gasi.}}<br />
* {{test|ita|Il fazzoletto non andava messo così.|Su mucadore no andaìat postu gasi.}}<br />
<br />
<br />
==chèrrere + partitzìpiu==<br />
* {{test|ita|Quel film è da vedere.|Cuddu film cheret bidu.}}<br />
* {{test|ita|Quel film era da vedere.|Cuddu film cherìat bidu.}}<br />
* {{test|ita|Quel film sarebbe da vedere.|Cuddu film diat chèrrere bidu.}}<br />
<br />
==chèrrere + a + infinitu==<br />
<br />
* {{test|ita|Lui vuole che si faccia così.|Isse chèret a fàghere gasi.}}<br />
* {{test|ita|Lui non vuole che si faccia così.|Isse non bòlet a fàghere gasi.}}<br />
* {{test|ita|Lui voleva che si facesse così.|Isse bolìat a fàghere gasi.}}<br />
* {{test|ita|Lui non voleva che si facesse così.|Isse bolìat a fàghere gasi.}}<br />
* {{test|ita|Lui vorrà che si faccia così.|Isse at a chèrrere a fàghere gasi.}}<br />
<br />
==abarrare==<br />
<br />
* {{test|ita|stai calmo|abarra chietu}}<br />
<br />
==èssere de + infinitu==<br />
* {{test|ita|Il secchio era da mettere in quell'angolo.|Su puale fiat de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio non era da mettere in quell'angolo.|Su puale no fiat de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio sarà da mettere in quell'angolo.|Su puale at a èssere de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio non sarà da mettere in quell'angolo.|Su puale no at a èssere de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio è da mettere in quell'angolo.|Su puale est de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio non è da mettere in quell'angolo.|Su puale no est de pònnere in cudd'angrone.}}<br />
<br />
==Possessivos==<br />
<br />
* {{test|ita|Nella mia terra.|In sa terra mea.}}<br />
* {{test|ita|Nella loro terra.|In sa terra issoro.}}<br />
* {{test|ita|Le loro case.|Sas domos issoro.}}<br />
* {{test|ita|I loro amici.|Sos amigos issoro.}}<br />
<br />
==Superlativos==<br />
<br />
* {{test|ita|Tua figlia è molto studiosa.|Fìgia tua est meda istudiosa.}}<br />
* {{test|ita|Contentissimo.|Cuntentu a beru.}}<br />
* {{test|ita|Bellissimi.|Bellos a beru.}}<br />
* {{test|ita|Integerrima.|Intrega a beru.}}<br />
* {{test|ita|Il più bello di tutti.|Su prus bellu de totus.}}<br />
* {{test|ita|La più ricca del mondo.|Sa prus rica de su mundu.}}<br />
<br />
==Cumparativos==<br />
<br />
* {{test|ita|Carlo è più serio di Marco.|Carlo est prus seriu de Marco.}}<br />
* {{test|ita|Carlo è migliore.|Carlo est mègius.}}<br />
* {{test|ita|La figlia è così bella come la madre.|Sa fìgia est bella comente a sa mama.}}<br />
* {{test|ita|La figlia è tanto bella quanto la madre.|Sa fìgia est bella cantu a sa mama.}}<br />
* {{test|ita|La figlia è non meno bella di la madre.|Sa fìgia no est prus pagu bella de sa mama.}}<br />
* {{test|ita|Marco è meno studioso di Carlo.|Marco est prus pagu istudiosu de Carlo.}}<br />
* {{test|ita|Tuo figlio è più intelligente che studioso.|Fìgiu tuo est prus abbistu chi non istudiosu.}}<br />
* {{test|ita|È più facile promettere che mantenere.| Est prus fàtzile promìtere chi non mantènnere.}}<br />
<br />
==Pronùmenes proclìticos==<br />
<br />
* {{test|ita|Alcuni glielo chiesero.|Calicunu bi lu at pedidu.}}<br />
<br />
* {{test|ita|Glielo.|Bi lu.}}<br />
* {{test|ita|Gliela.|Bi la.}}<br />
* {{test|ita|Glieli.|Bi los.}}<br />
* {{test|ita|Gliele.|Bi las.}}<br />
* {{test|ita|Gliene.|Bi nde.}}<br />
<br />
==Pronùmenes enclìticos==<br />
<br />
* {{test|ita|Dimmi dov'è!|Nara·mi ue est!}}<br />
* {{test|ita|No Maria, non posso dirtelo.|No Maria, non ti lu potzo nàrrere.}}<br />
<br />
===Imperativo===<br />
<br />
* {{test|ita|Dallo.|Dae·lu}}<br />
* {{test|ita|Dalla.|Dae·la}}<br />
* {{test|ita|Dalli.|Dae·los}}<br />
* {{test|ita|Dalle.|Dae·las}}<br />
<br />
<br />
* {{test|ita|Dagli un libro.|Dae·li unu libru}}<br />
* {{test|ita|Dalle un libro.|Dae·li unu libru}}<br />
* {{test|ita|Dagli un libro.|Dae·lis unu libru}}<br />
* {{test|ita|Dagli un libro.|Dae·lis unu libru}}<br />
<br />
<br />
* {{test|ita|Dagliene uno.|Dae·si·nde unu}}<br />
<br />
* {{test|ita|Portacene una.|Bati·nche·nde una.}}<br />
<br />
<br />
* {{test|ita|Dà.|Dae.}}<br />
* {{test|ita|Date.|Dage.}}<br />
* {{test|ita|Datela.|Dage·la.}}<br />
* {{test|ita|Datele.|Dàge·li.}}<br />
* {{test|ita|Dategli.|Dàge·lis.}}<br />
* {{test|ita|Dategli.|Dàge·lis.}}<br />
* {{test|ita|Datelo.|Dàge·lu.}}<br />
* {{test|ita|Dateli.|Dàge·los.}}<br />
* {{test|ita|Datemi.|Dàge·mi.}}<br />
* {{test|ita|Datemela.|Dage·mi·la.}}<br />
* {{test|ita|Datemele.|Dage·mi·las.}}<br />
* {{test|ita|Datemelo.|Dage·mi·lu.}}<br />
* {{test|ita|Datemeli.|Dage·mi·los.}}<br />
* {{test|ita|Dateci.|Dàge·nos.}}<br />
* {{test|ita|Datecela.|Dage·nos·la.}}<br />
* {{test|ita|Datecele.|Dage·nos·las.}}<br />
* {{test|ita|Datecelo.|Dage·nos·lu.}}<br />
* {{test|ita|Dateceli.|Dage·nos·los.}}<br />
<br />
* {{test|ita|Dategliela.|Dage·bi·la.}}<br />
* {{test|ita|Dategliele.|Dage·bi·las.}}<br />
* {{test|ita|Dateglielo.|Dage·bi·lu.}}<br />
* {{test|ita|Dalla.|Dae·la.}}<br />
* {{test|ita|Dalle.|Dae·las.}}<br />
* {{test|ita|Dalle.|Dae·li.}}<br />
* {{test|ita|Dagli.|Dae·li.}}<br />
* {{test|ita|Dallo.|Dae·lu.}}<br />
* {{test|ita|Dalli.|Dae·los.}}<br />
* {{test|ita|Dammi.|Dae·mi.}}<br />
* {{test|ita|Dammela.|Dae·mi·la.}}<br />
* {{test|ita|Dammele.|Dae·mi·las.}}<br />
* {{test|ita|Dammelo.|Dae·mi·lu.}}<br />
* {{test|ita|Dammeli.|Dae·mi·los}}<br />
<br />
* {{test|ita|Dagliela.|Dae·bi·la.}}<br />
* {{test|ita|Dagliele.|Dae·bi·las.}}<br />
* {{test|ita|Daglielo.|Dae·bi·los.}}<br />
* {{test|ita|Daglieli.|Dae·bi·los.}}<br />
* {{test|ita|Datti.|Dae·ti.}}<br />
* {{test|ita|Dattela.|Dae·ti·la.}}<br />
* {{test|ita|Dattele.|Dae·ti·las.}}<br />
* {{test|ita|Dattelo.|Dae·ti·lu.}}<br />
* {{test|ita|Datteli.|Dae·ti·los.}}<br />
* {{test|ita|Datemi.|Dàge·mi.}}<br />
* {{test|ita|Dateci.|Dàge·nos.}}<br />
* {{test|ita|Dia.|Dia.}}<br />
* {{test|ita|Diate.|Diades.}}<br />
* {{test|ita|La dia.|La diat.}}<br />
* {{test|ita|Le dia.|Las diat.}}<br />
* {{test|ita|Le dia.|Lis diat.}}<br />
* {{test|ita|Gli dia.|Lis diat.}}<br />
* {{test|ita|Gli dia.|Li diat.}}<br />
* {{test|ita|Lo dia.|Lu diat.}}<br />
* {{test|ita|Li dia.|Los diat.}}<br />
* {{test|ita|Mi dia.|Mi diat.}}<br />
* {{test|ita|Me la dia.|Mi la diat.}}<br />
* {{test|ita|Me le dia.|Mi las diat.}}<br />
* {{test|ita|Me lo dia.|Mi lu diat.}}<br />
* {{test|ita|Me li dia.|Mi los diat.}}<br />
* {{test|ita|Diamoci.|Diamus·nos.}}<br />
* {{test|ita|Diamocela.|Diamus·nos·la.}}<br />
* {{test|ita|Diamocele.|Diamus·nos·las}}<br />
* {{test|ita|Diamocelo.|Diamus·nos·lu.}}<br />
* {{test|ita|Diamoceli.|Diamus·nos·los.}}<br />
* {{test|ita|Diamo.|Diamus.}}<br />
* {{test|ita|Diamogliela.|Diamus.}}<br />
* {{test|ita|Diamogliele.|Diamus·bi·la.}}<br />
* {{test|ita|Diamoglielo|Diamus·bi·las.}}<br />
* {{test|ita|Diamoglieli.|Diamus·bi·lu.}}<br />
* {{test|ita|Diamola.|Diamus·la.}}<br />
* {{test|ita|Diamole.|Diamus·las.}}<br />
* {{test|ita|Diamogli.|Diamus·li.}}<br />
* {{test|ita|Diamogli.|Diamus·lis.}}<br />
* {{test|ita|Diamolo.|Diamus·lu.}}<br />
* {{test|ita|Diamoli.|Diamus·los.}}<br />
<br />
* {{test|ita|Diano.|Diant.}}<br />
* {{test|ita|La diano.|La diant.}}<br />
* {{test|ita|Le diano.|Lis diant. }}<br />
* {{test|ita|Gli diano.|Lis diant.}}<br />
* {{test|ita|Gli diano.|Lis diant. }}<br />
* {{test|ita|Lo diano.|Lu diant.}}<br />
* {{test|ita|Li diano.|Lis diant. }}<br />
* {{test|ita|Mi diano.|Mi diant.}}<br />
* {{test|ita|Me la diano.|Mi la diant.}}<br />
* {{test|ita|Me le diano.|Mi las diant.}}<br />
* {{test|ita|Me lo diano.|Mi lu diant.}}<br />
* {{test|ita|Me li diano.|Mi los diant.}}<br />
* {{test|ita|Ci diano.|Nos diant.}}<br />
* {{test|ita|Ce la diano.|Nos la diant.}}<br />
* {{test|ita|Ce le diano.|Nos las diant.}}<br />
* {{test|ita|Ce lo diano.|Nos lu diant.}}<br />
* {{test|ita|Ce lo diano.|Nos lu diant.}}<br />
* {{test|ita|Ce li diano.|Nos los diant.}}<br />
* {{test|ita|Vi diano.|Bos diant.}}<br />
* {{test|ita|Ci dia.|Nos diat.}}<br />
* {{test|ita|Ce la dia.|Nos las diat.}}<br />
* {{test|ita|Ce la dia.|Nos lu diat.}}<br />
* {{test|ita|Ce le dia.|Nos las diat.}}<br />
* {{test|ita|Ce lo dia.|Nos lu diat. }}<br />
* {{test|ita|Ce li dia.|Nos los diat. }}<br />
* {{test|ita|Si diano.|Si diant. }}<br />
* {{test|ita|Gliela diano.|Bi la diat. }}<br />
* {{test|ita|Gliele diano.|Bi las diat. }}<br />
* {{test|ita|Glielo diano.|Bi lu diat.}}<br />
* {{test|ita|Gliela dia.|Bi la diat.}}<br />
* {{test|ita|Gliele dia.|Bi las diat.}}<br />
* {{test|ita|Glielo dia.|Bi lu diat.}}<br />
* {{test|ita|Glieli dia.|Bi los diat.}}<br />
<br />
===Infinitivo===<br />
<br />
* {{test|ita|Dare.|Dare}}<br />
<br />
* {{test|ita|Darla.|La dare.}}<br />
* {{test|ita|Darle.|Li dare.}}<br />
* {{test|ita|Darle.|Las dare.}}<br />
* {{test|ita|Dargli.|Li dare.}}<br />
* {{test|ita|Darglielo.|Bi lu dare.}}<br />
* {{test|ita|Dargliela.|Bi la dare.}}<br />
* {{test|ita|Darglieli.|Bi los dare.}}<br />
* {{test|ita|Dargliele.|Bi las dare.}}<br />
* {{test|ita|Darlo.|Lu dare.}}<br />
* {{test|ita|Darli.|Los dare.}}<br />
* {{test|ita|Darmi.|Mi dare.}}<br />
* {{test|ita|Darmela.|Mi la dare.}}<br />
* {{test|ita|Darmele.|Mi las dare.}}<br />
* {{test|ita|Darmelo.|Mi lu dare.}}<br />
* {{test|ita|Darmeli.|Mi los dare.}}<br />
* {{test|ita|Darci.|Nos dare.}}<br />
* {{test|ita|Darcela.|Nos la dare.}}<br />
* {{test|ita|Darcele.|Nos las dare.}}<br />
* {{test|ita|Darcelo.|Nos lu dare.}}<br />
* {{test|ita|Darceli.|Nos los dare.}}<br />
* {{test|ita|Darvi.|Bos dare.}}<br />
* {{test|ita|Darvela.|Bos la dare.}}<br />
* {{test|ita|Darvele.|Bos las dare.}}<br />
* {{test|ita|Darvelo.|Bos lu dare.}}<br />
* {{test|ita|Darveli.|Bos los dare.}}<br />
* {{test|ita|Darsi.|Si dare.}}<br />
* {{test|ita|Darsela.|Si la dare.}}<br />
* {{test|ita|Darsele.|Si las dare.}}<br />
* {{test|ita|Darselo.|Si lu dare.}}<br />
* {{test|ita|Darseli.|Si los dare.}}<br />
* {{test|ita|Darti.|Ti dare.}}<br />
* {{test|ita|Dartela.|Ti la dare.}}<br />
* {{test|ita|Dartele.|Ti las dare.}}<br />
* {{test|ita|Dartelo.|Ti lu dare.}}<br />
* {{test|ita|Darteli.|Ti los dare.}}<br />
* {{test|ita|Dartemi.|Ti mi dare.}}<br />
* {{test|ita|Darteci.|Ti nos dare.}}<br />
<br />
===Gerundio===<br />
<br />
* {{test|ita|Dando.|Dende.}}<br />
* {{test|ita|Dandola.|Dende·la.}}<br />
* {{test|ita|Dandole.|Dende·li.}}<br />
* {{test|ita|Dandole.|Dende·las.}}<br />
* {{test|ita|Dandogli.|Dende·li.}}<br />
* {{test|ita|Dandoglielo.|Dende·bi·lu.}}<br />
* {{test|ita|Dandogliela.|Dende·bi·la.}}<br />
* {{test|ita|Dandogliele.|Dende·bi·las.}}<br />
* {{test|ita|Dandoglieli.|Dende·bi·los.}}<br />
* {{test|ita|Dandolo.|Dende·lu.}}<br />
* {{test|ita|Dandoli.|Dende·los.}}<br />
* {{test|ita|Dandomi.|Dende·mi.}}<br />
* {{test|ita|Dandomela.|Dende·mi·la.}}<br />
* {{test|ita|Dandomele.|Dende·mi·las.}}<br />
* {{test|ita|Dandomelo.|Dende·mi·lu.}}<br />
* {{test|ita|Dandomeli.|Dende·mi·los.}}<br />
* {{test|ita|Dandoci.|Dende·nos.}}<br />
* {{test|ita|Dandocela.|Dende·nos·la.}}<br />
* {{test|ita|Dandocele.|Dende·nos·las.}}<br />
* {{test|ita|Dandocelo.|Dende·nos·lu.}}<br />
* {{test|ita|Dandoceli.|Dende·mi·los.}}<br />
* {{test|ita|Dandovi.|Dende·bos.}}<br />
* {{test|ita|Dandovela.|Dende·bos·la.}}<br />
* {{test|ita|Dandovele.|Dende·bos·las.}}<br />
* {{test|ita|Dandovelo.|Dende·bos·lu.}}<br />
* {{test|ita|Dandoveli.|Dende·bos·los.}}<br />
* {{test|ita|Dandosi.|Dende·si}}<br />
* {{test|ita|Dandosela.|Dende·si·la.}}<br />
* {{test|ita|Dandosele.|Dende·si·las.}}<br />
* {{test|ita|Dandosegli.|Dende·si·lis.}}<br />
* {{test|ita|Dandoselo.|Dende·si·lu.}}<br />
* {{test|ita|Dandoseli.|Dende·si·los.}}<br />
* {{test|ita|Dandoti.|Dende·ti.}}<br />
* {{test|ita|Dandotela.|Dende·ti·la.}}<br />
* {{test|ita|Dandotele.|Dende·ti·las.}}<br />
* {{test|ita|Dandotelo.|Dende·ti·lu.}}<br />
* {{test|ita|Dandoteli.|Dende·ti·los.}}<br />
* {{test|ita|Dandoteci.|Dende·ti·nos.}}<br />
<br />
===Participio===<br />
<br />
* {{test|ita|Vistolo.|Bidu·lu.}}<br />
* {{test|ita|Vistola.|Bida·la.}}<br />
<br />
[[Category:Sardo e italiano|Pending tests]]</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Sardo_e_italiano/Pending_tests&diff=67169Sardo e italiano/Pending tests2018-06-15T14:31:15Z<p>Grfro3d: /* Partitivu */</p>
<hr />
<div><br />
=Diferèntzias intro de sardu e italianu=<br />
<br />
==Usu de "totu"==<br />
<br />
* {{test|ita|Sono rimasto a casa per tutto il giorno.|So abarradu in domo totu sa die.}} <br />
* {{test|ita|Sono rimasto qui per tutto il tempo.|So abarradu inoghe totu su tempus.}} <br />
* {{test|ita|Me la sono mangiata tutta.|Mi l'apo mandigada totu.}} <br />
* {{test|ita|Tutti gli alberi.|Totu sos àrbores.}} <br />
* {{test|ita|Tutte le case.|Totu sas domos.}} <br />
* {{test|ita|Ci sono tutti.|Bi sunt totus.}} <br />
* {{test|ita|Ci sono tutte.|Bi sunt totus.}} <br />
* {{test|ita|Ciao a tutti.|Salude a totus.}}<br />
<br />
==Nùmeros ordinales==<br />
<br />
* {{test|ita|primo.|su primu.}} <br />
* {{test|ita|secondo.|segundu.}}<br />
* {{test|ita|terzo.|su de tres.}}<br />
* {{test|ita|quarto.|su de bator.}}<br />
* {{test|ita|quinto.|su de chimbe.}}<br />
* {{test|ita|sesto.|su de ses.}}<br />
* {{test|ita|settimo.|su de sete.}}<br />
* {{test|ita|ottavo.|su de oto.}}<br />
* {{test|ita|nono.|su de noe.}}<br />
* {{test|ita|decimo.|su de deghe.}}<br />
* {{test|ita|È arrivato il secondo.|Est arribadu su segundu.}}<br />
* {{test|ita|È arrivato secondo.|Est arribadu segundu.}}<br />
* {{test|ita|È arrivato il terzo.|Est arribadu su de tres.}}<br />
* {{test|ita|È arrivato terzo in gara.|Est arribadu su de tres in gara.}}<br />
* {{test|ita|È stato il terzo ad arrivare.|Est istadu su de tres a nche arribare.}}<br />
* {{test|ita|La seconda casa.|Sa segunda domo.}}<br />
* {{test|ita|La mia seconda casa.|Sa segunda domo mea.}}<br />
* {{test|ita|La terza casa.|Sa de tres domos.}}<br />
* {{test|ita|La mia terza casa.|Sa de tres de sas domos meas.}}<br />
* {{test|ita|Una terza famiglia.|Una de tres famìlias.}}<br />
<br />
==Imperativu negativu==<br />
<br />
* {{test|ita|non fare.|non fatzas.}} <br />
* {{test|ita|non fare così.|non fatzas gosi.}}<br />
* {{test|ita|non fare da cattivo.|non fatzas a malu.}}<br />
* {{test|ita|non fargli male.|no li fatzas male.}}<br />
* {{test|ita|non dirgli niente.|no li nàrgias nudda.}}<br />
* {{test|ita|non lo fare.|no lu fatzas.}}<br />
* {{test|ita|non farlo.|no lu fatzas.}}<br />
* {{test|ita|non farglielo.|no liu fatzas.}}<br />
* {{test|ita|non farglielo fare.|non bi lu fatzas fàghere.}}<br />
* {{test|ita|non farglielo.|non bi lu fatzas.}}<br />
* {{test|ita|non fate.|non fatzais.}}<br />
* {{test|ita|non fatelo.|no lu fatzais.}}<br />
* {{test|ita|non fateglielo.|non bi lu fatzais.}}<br />
* {{test|ita|non fategliela.|non bi la fatzais.}}<br />
<br />
==Partitivu==<br />
<br />
* {{test|ita|Bevo dell'acqua.|Bufo abba.}}<br />
* {{test|ita|Porta delle mele.|Bati·nche·nde mela.}}<br />
* {{test|ita|Qualcuno di noi.|Calicunu de nois.}}<br />
* {{test|ita|C'erano fichi e ghiande.|B'aiat figu e lande.}}<br />
<br />
==andare + partitzìpiu==<br />
<br />
* {{test|ita|Il fazzoletto va messo così.|Su mucadore andat postu gasi.}}<br />
* {{test|ita|Il fazzoletto non va messo così.|Su mucadore non andat postu gasi.}}<br />
* {{test|ita|Il fazzoletto andava messo così.|Su mucadore andaìat postu gasi.}}<br />
* {{test|ita|Il fazzoletto non andava messo così.|Su mucadore no andaìat postu gasi.}}<br />
* {{test|ita|Il fazzoletto non andava messo così.|Su mucadore no andaìat postu gasi.}}<br />
<br />
<br />
==chèrrere + partitzìpiu==<br />
* {{test|ita|Quel film è da vedere.|Cuddu film cheret bidu.}}<br />
* {{test|ita|Quel film era da vedere.|Cuddu film cherìat bidu.}}<br />
* {{test|ita|Quel film sarebbe da vedere.|Cuddu film diat chèrrere bidu.}}<br />
<br />
==chèrrere + a + infinitu==<br />
<br />
* {{test|ita|Lui vuole che si faccia così.|Isse chèret a fàghere gasi.}}<br />
* {{test|ita|Lui non vuole che si faccia così.|Isse non bòlet a fàghere gasi.}}<br />
* {{test|ita|Lui voleva che si facesse così.|Isse bolìat a fàghere gasi.}}<br />
* {{test|ita|Lui non voleva che si facesse così.|Isse bolìat a fàghere gasi.}}<br />
* {{test|ita|Lui vorrà che si faccia così.|Isse at a chèrrere a fàghere gasi.}}<br />
<br />
==abarrare==<br />
<br />
* {{test|ita|stai calmo|abarra chietu}}<br />
<br />
==èssere de + infinitu==<br />
* {{test|ita|Il secchio era da mettere in quell'angolo.|Su puale fiat de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio non era da mettere in quell'angolo.|Su puale no fiat de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio sarà da mettere in quell'angolo.|Su puale at a èssere de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio non sarà da mettere in quell'angolo.|Su puale no at a èssere de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio è da mettere in quell'angolo.|Su puale est de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio non è da mettere in quell'angolo.|Su puale no est de pònnere in cudd'angrone.}}<br />
<br />
==Possessivos==<br />
<br />
* {{test|ita|Nella mia terra.|In sa terra mea.}}<br />
* {{test|ita|Nella loro terra.|In sa terra issoro.}}<br />
* {{test|ita|Le loro case.|Sas domos issoro.}}<br />
* {{test|ita|I loro amici.|Sos amigos issoro.}}<br />
<br />
==Superlativos==<br />
<br />
* {{test|ita|Tua figlia è molto studiosa.|Fìgia tua est meda istudiosa.}}<br />
* {{test|ita|Contentissimo.|Cuntentu a beru.}}<br />
* {{test|ita|Bellissimi.|Bellos a beru.}}<br />
* {{test|ita|Integerrima.|Intrega a beru.}}<br />
* {{test|ita|Il più bello di tutti.|Su prus bellu de totus.}}<br />
* {{test|ita|La più ricca del mondo.|Sa prus rica de su mundu.}}<br />
<br />
==Cumparativos==<br />
<br />
* {{test|ita|Carlo è più serio di Marco.|Carlo est prus seriu de Marco.}}<br />
* {{test|ita|Carlo è migliore.|Carlo est mègius.}}<br />
* {{test|ita|La figlia è così bella come la madre.|Sa fìgia est bella comente a sa mama.}}<br />
* {{test|ita|La figlia è tanto bella quanto la madre.|Sa fìgia est bella cantu a sa mama.}}<br />
* {{test|ita|La figlia è non meno bella di la madre.|Sa fìgia no est prus pagu bella de sa mama.}}<br />
* {{test|ita|Marco è meno studioso di Carlo.|Marco est prus pagu istudiosu de Carlo.}}<br />
* {{test|ita|Tuo figlio è più intelligente che studioso.|Fìgiu tuo est prus abbistu chi istudiosu.}}<br />
* {{test|ita|È più facile promettere che mantenere.| Est prus fàtzile promìtere chi mantènnere.}}<br />
<br />
==Pronùmenes proclìticos==<br />
<br />
* {{test|ita|Alcuni glielo chiesero.|Calicunu bi lu at pedidu.}}<br />
<br />
* {{test|ita|Glielo.|Bi lu.}}<br />
* {{test|ita|Gliela.|Bi la.}}<br />
* {{test|ita|Glieli.|Bi los.}}<br />
* {{test|ita|Gliele.|Bi las.}}<br />
* {{test|ita|Gliene.|Bi nde.}}<br />
<br />
==Pronùmenes enclìticos==<br />
<br />
* {{test|ita|Dimmi dov'è!|Nara·mi ue est!}}<br />
* {{test|ita|No Maria, non posso dirtelo.|No Maria, non ti lu potzo nàrrere.}}<br />
<br />
===Imperativo===<br />
<br />
* {{test|ita|Dallo.|Dae·lu}}<br />
* {{test|ita|Dalla.|Dae·la}}<br />
* {{test|ita|Dalli.|Dae·los}}<br />
* {{test|ita|Dalle.|Dae·las}}<br />
<br />
<br />
* {{test|ita|Dagli un libro.|Dae·li unu libru}}<br />
* {{test|ita|Dalle un libro.|Dae·li unu libru}}<br />
* {{test|ita|Dagli un libro.|Dae·lis unu libru}}<br />
* {{test|ita|Dagli un libro.|Dae·lis unu libru}}<br />
<br />
<br />
* {{test|ita|Dagliene uno.|Dae·si·nde unu}}<br />
<br />
* {{test|ita|Portacene una.|Bati·nche·nde una.}}<br />
<br />
<br />
* {{test|ita|Dà.|Dae.}}<br />
* {{test|ita|Date.|Dage.}}<br />
* {{test|ita|Datela.|Dage·la.}}<br />
* {{test|ita|Datele.|Dàge·li.}}<br />
* {{test|ita|Dategli.|Dàge·lis.}}<br />
* {{test|ita|Dategli.|Dàge·lis.}}<br />
* {{test|ita|Datelo.|Dàge·lu.}}<br />
* {{test|ita|Dateli.|Dàge·los.}}<br />
* {{test|ita|Datemi.|Dàge·mi.}}<br />
* {{test|ita|Datemela.|Dage·mi·la.}}<br />
* {{test|ita|Datemele.|Dage·mi·las.}}<br />
* {{test|ita|Datemelo.|Dage·mi·lu.}}<br />
* {{test|ita|Datemeli.|Dage·mi·los.}}<br />
* {{test|ita|Dateci.|Dàge·nos.}}<br />
* {{test|ita|Datecela.|Dage·nos·la.}}<br />
* {{test|ita|Datecele.|Dage·nos·las.}}<br />
* {{test|ita|Datecelo.|Dage·nos·lu.}}<br />
* {{test|ita|Dateceli.|Dage·nos·los.}}<br />
<br />
* {{test|ita|Dategliela.|Dage·bi·la.}}<br />
* {{test|ita|Dategliele.|Dage·bi·las.}}<br />
* {{test|ita|Dateglielo.|Dage·bi·lu.}}<br />
* {{test|ita|Dalla.|Dae·la.}}<br />
* {{test|ita|Dalle.|Dae·las.}}<br />
* {{test|ita|Dalle.|Dae·li.}}<br />
* {{test|ita|Dagli.|Dae·li.}}<br />
* {{test|ita|Dallo.|Dae·lu.}}<br />
* {{test|ita|Dalli.|Dae·los.}}<br />
* {{test|ita|Dammi.|Dae·mi.}}<br />
* {{test|ita|Dammela.|Dae·mi·la.}}<br />
* {{test|ita|Dammele.|Dae·mi·las.}}<br />
* {{test|ita|Dammelo.|Dae·mi·lu.}}<br />
* {{test|ita|Dammeli.|Dae·mi·los}}<br />
<br />
* {{test|ita|Dagliela.|Dae·bi·la.}}<br />
* {{test|ita|Dagliele.|Dae·bi·las.}}<br />
* {{test|ita|Daglielo.|Dae·bi·los.}}<br />
* {{test|ita|Daglieli.|Dae·bi·los.}}<br />
* {{test|ita|Datti.|Dae·ti.}}<br />
* {{test|ita|Dattela.|Dae·ti·la.}}<br />
* {{test|ita|Dattele.|Dae·ti·las.}}<br />
* {{test|ita|Dattelo.|Dae·ti·lu.}}<br />
* {{test|ita|Datteli.|Dae·ti·los.}}<br />
* {{test|ita|Datemi.|Dàge·mi.}}<br />
* {{test|ita|Dateci.|Dàge·nos.}}<br />
* {{test|ita|Dia.|Dia.}}<br />
* {{test|ita|Diate.|Diades.}}<br />
* {{test|ita|La dia.|La diat.}}<br />
* {{test|ita|Le dia.|Las diat.}}<br />
* {{test|ita|Le dia.|Lis diat.}}<br />
* {{test|ita|Gli dia.|Lis diat.}}<br />
* {{test|ita|Gli dia.|Li diat.}}<br />
* {{test|ita|Lo dia.|Lu diat.}}<br />
* {{test|ita|Li dia.|Los diat.}}<br />
* {{test|ita|Mi dia.|Mi diat.}}<br />
* {{test|ita|Me la dia.|Mi la diat.}}<br />
* {{test|ita|Me le dia.|Mi las diat.}}<br />
* {{test|ita|Me lo dia.|Mi lu diat.}}<br />
* {{test|ita|Me li dia.|Mi los diat.}}<br />
* {{test|ita|Diamoci.|Diamus·nos.}}<br />
* {{test|ita|Diamocela.|Diamus·nos·la.}}<br />
* {{test|ita|Diamocele.|Diamus·nos·las}}<br />
* {{test|ita|Diamocelo.|Diamus·nos·lu.}}<br />
* {{test|ita|Diamoceli.|Diamus·nos·los.}}<br />
* {{test|ita|Diamo.|Diamus.}}<br />
* {{test|ita|Diamogliela.|Diamus.}}<br />
* {{test|ita|Diamogliele.|Diamus·bi·la.}}<br />
* {{test|ita|Diamoglielo|Diamus·bi·las.}}<br />
* {{test|ita|Diamoglieli.|Diamus·bi·lu.}}<br />
* {{test|ita|Diamola.|Diamus·la.}}<br />
* {{test|ita|Diamole.|Diamus·las.}}<br />
* {{test|ita|Diamogli.|Diamus·li.}}<br />
* {{test|ita|Diamogli.|Diamus·lis.}}<br />
* {{test|ita|Diamolo.|Diamus·lu.}}<br />
* {{test|ita|Diamoli.|Diamus·los.}}<br />
<br />
* {{test|ita|Diano.|Diant.}}<br />
* {{test|ita|La diano.|La diant.}}<br />
* {{test|ita|Le diano.|Lis diant. }}<br />
* {{test|ita|Gli diano.|Lis diant.}}<br />
* {{test|ita|Gli diano.|Lis diant. }}<br />
* {{test|ita|Lo diano.|Lu diant.}}<br />
* {{test|ita|Li diano.|Lis diant. }}<br />
* {{test|ita|Mi diano.|Mi diant.}}<br />
* {{test|ita|Me la diano.|Mi la diant.}}<br />
* {{test|ita|Me le diano.|Mi las diant.}}<br />
* {{test|ita|Me lo diano.|Mi lu diant.}}<br />
* {{test|ita|Me li diano.|Mi los diant.}}<br />
* {{test|ita|Ci diano.|Nos diant.}}<br />
* {{test|ita|Ce la diano.|Nos la diant.}}<br />
* {{test|ita|Ce le diano.|Nos las diant.}}<br />
* {{test|ita|Ce lo diano.|Nos lu diant.}}<br />
* {{test|ita|Ce lo diano.|Nos lu diant.}}<br />
* {{test|ita|Ce li diano.|Nos los diant.}}<br />
* {{test|ita|Vi diano.|Bos diant.}}<br />
* {{test|ita|Ci dia.|Nos diat.}}<br />
* {{test|ita|Ce la dia.|Nos las diat.}}<br />
* {{test|ita|Ce la dia.|Nos lu diat.}}<br />
* {{test|ita|Ce le dia.|Nos las diat.}}<br />
* {{test|ita|Ce lo dia.|Nos lu diat. }}<br />
* {{test|ita|Ce li dia.|Nos los diat. }}<br />
* {{test|ita|Si diano.|Si diant. }}<br />
* {{test|ita|Gliela diano.|Bi la diat. }}<br />
* {{test|ita|Gliele diano.|Bi las diat. }}<br />
* {{test|ita|Glielo diano.|Bi lu diat.}}<br />
* {{test|ita|Gliela dia.|Bi la diat.}}<br />
* {{test|ita|Gliele dia.|Bi las diat.}}<br />
* {{test|ita|Glielo dia.|Bi lu diat.}}<br />
* {{test|ita|Glieli dia.|Bi los diat.}}<br />
<br />
===Infinitivo===<br />
<br />
* {{test|ita|Dare.|Dare}}<br />
<br />
* {{test|ita|Darla.|La dare.}}<br />
* {{test|ita|Darle.|Li dare.}}<br />
* {{test|ita|Darle.|Las dare.}}<br />
* {{test|ita|Dargli.|Li dare.}}<br />
* {{test|ita|Darglielo.|Bi lu dare.}}<br />
* {{test|ita|Dargliela.|Bi la dare.}}<br />
* {{test|ita|Darglieli.|Bi los dare.}}<br />
* {{test|ita|Dargliele.|Bi las dare.}}<br />
* {{test|ita|Darlo.|Lu dare.}}<br />
* {{test|ita|Darli.|Los dare.}}<br />
* {{test|ita|Darmi.|Mi dare.}}<br />
* {{test|ita|Darmela.|Mi la dare.}}<br />
* {{test|ita|Darmele.|Mi las dare.}}<br />
* {{test|ita|Darmelo.|Mi lu dare.}}<br />
* {{test|ita|Darmeli.|Mi los dare.}}<br />
* {{test|ita|Darci.|Nos dare.}}<br />
* {{test|ita|Darcela.|Nos la dare.}}<br />
* {{test|ita|Darcele.|Nos las dare.}}<br />
* {{test|ita|Darcelo.|Nos lu dare.}}<br />
* {{test|ita|Darceli.|Nos los dare.}}<br />
* {{test|ita|Darvi.|Bos dare.}}<br />
* {{test|ita|Darvela.|Bos la dare.}}<br />
* {{test|ita|Darvele.|Bos las dare.}}<br />
* {{test|ita|Darvelo.|Bos lu dare.}}<br />
* {{test|ita|Darveli.|Bos los dare.}}<br />
* {{test|ita|Darsi.|Si dare.}}<br />
* {{test|ita|Darsela.|Si la dare.}}<br />
* {{test|ita|Darsele.|Si las dare.}}<br />
* {{test|ita|Darselo.|Si lu dare.}}<br />
* {{test|ita|Darseli.|Si los dare.}}<br />
* {{test|ita|Darti.|Ti dare.}}<br />
* {{test|ita|Dartela.|Ti la dare.}}<br />
* {{test|ita|Dartele.|Ti las dare.}}<br />
* {{test|ita|Dartelo.|Ti lu dare.}}<br />
* {{test|ita|Darteli.|Ti los dare.}}<br />
* {{test|ita|Dartemi.|Ti mi dare.}}<br />
* {{test|ita|Darteci.|Ti nos dare.}}<br />
<br />
===Gerundio===<br />
<br />
* {{test|ita|Dando.|Dende.}}<br />
* {{test|ita|Dandola.|Dende·la.}}<br />
* {{test|ita|Dandole.|Dende·li.}}<br />
* {{test|ita|Dandole.|Dende·las.}}<br />
* {{test|ita|Dandogli.|Dende·li.}}<br />
* {{test|ita|Dandoglielo.|Dende·bi·lu.}}<br />
* {{test|ita|Dandogliela.|Dende·bi·la.}}<br />
* {{test|ita|Dandogliele.|Dende·bi·las.}}<br />
* {{test|ita|Dandoglieli.|Dende·bi·los.}}<br />
* {{test|ita|Dandolo.|Dende·lu.}}<br />
* {{test|ita|Dandoli.|Dende·los.}}<br />
* {{test|ita|Dandomi.|Dende·mi.}}<br />
* {{test|ita|Dandomela.|Dende·mi·la.}}<br />
* {{test|ita|Dandomele.|Dende·mi·las.}}<br />
* {{test|ita|Dandomelo.|Dende·mi·lu.}}<br />
* {{test|ita|Dandomeli.|Dende·mi·los.}}<br />
* {{test|ita|Dandoci.|Dende·nos.}}<br />
* {{test|ita|Dandocela.|Dende·nos·la.}}<br />
* {{test|ita|Dandocele.|Dende·nos·las.}}<br />
* {{test|ita|Dandocelo.|Dende·nos·lu.}}<br />
* {{test|ita|Dandoceli.|Dende·mi·los.}}<br />
* {{test|ita|Dandovi.|Dende·bos.}}<br />
* {{test|ita|Dandovela.|Dende·bos·la.}}<br />
* {{test|ita|Dandovele.|Dende·bos·las.}}<br />
* {{test|ita|Dandovelo.|Dende·bos·lu.}}<br />
* {{test|ita|Dandoveli.|Dende·bos·los.}}<br />
* {{test|ita|Dandosi.|Dende·si}}<br />
* {{test|ita|Dandosela.|Dende·si·la.}}<br />
* {{test|ita|Dandosele.|Dende·si·las.}}<br />
* {{test|ita|Dandosegli.|Dende·si·lis.}}<br />
* {{test|ita|Dandoselo.|Dende·si·lu.}}<br />
* {{test|ita|Dandoseli.|Dende·si·los.}}<br />
* {{test|ita|Dandoti.|Dende·ti.}}<br />
* {{test|ita|Dandotela.|Dende·ti·la.}}<br />
* {{test|ita|Dandotele.|Dende·ti·las.}}<br />
* {{test|ita|Dandotelo.|Dende·ti·lu.}}<br />
* {{test|ita|Dandoteli.|Dende·ti·los.}}<br />
* {{test|ita|Dandoteci.|Dende·ti·nos.}}<br />
<br />
===Participio===<br />
<br />
* {{test|ita|Vistolo.|Bidu·lu.}}<br />
* {{test|ita|Vistola.|Bida·la.}}<br />
<br />
[[Category:Sardo e italiano|Pending tests]]</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Sardo_e_italiano/Pending_tests&diff=67168Sardo e italiano/Pending tests2018-06-15T14:29:32Z<p>Grfro3d: /* Partitivu */</p>
<hr />
<div><br />
=Diferèntzias intro de sardu e italianu=<br />
<br />
==Usu de "totu"==<br />
<br />
* {{test|ita|Sono rimasto a casa per tutto il giorno.|So abarradu in domo totu sa die.}} <br />
* {{test|ita|Sono rimasto qui per tutto il tempo.|So abarradu inoghe totu su tempus.}} <br />
* {{test|ita|Me la sono mangiata tutta.|Mi l'apo mandigada totu.}} <br />
* {{test|ita|Tutti gli alberi.|Totu sos àrbores.}} <br />
* {{test|ita|Tutte le case.|Totu sas domos.}} <br />
* {{test|ita|Ci sono tutti.|Bi sunt totus.}} <br />
* {{test|ita|Ci sono tutte.|Bi sunt totus.}} <br />
* {{test|ita|Ciao a tutti.|Salude a totus.}}<br />
<br />
==Nùmeros ordinales==<br />
<br />
* {{test|ita|primo.|su primu.}} <br />
* {{test|ita|secondo.|segundu.}}<br />
* {{test|ita|terzo.|su de tres.}}<br />
* {{test|ita|quarto.|su de bator.}}<br />
* {{test|ita|quinto.|su de chimbe.}}<br />
* {{test|ita|sesto.|su de ses.}}<br />
* {{test|ita|settimo.|su de sete.}}<br />
* {{test|ita|ottavo.|su de oto.}}<br />
* {{test|ita|nono.|su de noe.}}<br />
* {{test|ita|decimo.|su de deghe.}}<br />
* {{test|ita|È arrivato il secondo.|Est arribadu su segundu.}}<br />
* {{test|ita|È arrivato secondo.|Est arribadu segundu.}}<br />
* {{test|ita|È arrivato il terzo.|Est arribadu su de tres.}}<br />
* {{test|ita|È arrivato terzo in gara.|Est arribadu su de tres in gara.}}<br />
* {{test|ita|È stato il terzo ad arrivare.|Est istadu su de tres a nche arribare.}}<br />
* {{test|ita|La seconda casa.|Sa segunda domo.}}<br />
* {{test|ita|La mia seconda casa.|Sa segunda domo mea.}}<br />
* {{test|ita|La terza casa.|Sa de tres domos.}}<br />
* {{test|ita|La mia terza casa.|Sa de tres de sas domos meas.}}<br />
* {{test|ita|Una terza famiglia.|Una de tres famìlias.}}<br />
<br />
==Imperativu negativu==<br />
<br />
* {{test|ita|non fare.|non fatzas.}} <br />
* {{test|ita|non fare così.|non fatzas gosi.}}<br />
* {{test|ita|non fare da cattivo.|non fatzas a malu.}}<br />
* {{test|ita|non fargli male.|no li fatzas male.}}<br />
* {{test|ita|non dirgli niente.|no li nàrgias nudda.}}<br />
* {{test|ita|non lo fare.|no lu fatzas.}}<br />
* {{test|ita|non farlo.|no lu fatzas.}}<br />
* {{test|ita|non farglielo.|no liu fatzas.}}<br />
* {{test|ita|non farglielo fare.|non bi lu fatzas fàghere.}}<br />
* {{test|ita|non farglielo.|non bi lu fatzas.}}<br />
* {{test|ita|non fate.|non fatzais.}}<br />
* {{test|ita|non fatelo.|no lu fatzais.}}<br />
* {{test|ita|non fateglielo.|non bi lu fatzais.}}<br />
* {{test|ita|non fategliela.|non bi la fatzais.}}<br />
<br />
==Partitivu==<br />
<br />
* {{test|ita|Bevo dell'acqua.|Bufo abba.}}<br />
* {{test|ita|Porta delle mele.|Bati·nche mela.}}<br />
* {{test|ita|Qualcuno di noi.|Calicunu de nois.}}<br />
<br />
==andare + partitzìpiu==<br />
<br />
* {{test|ita|Il fazzoletto va messo così.|Su mucadore andat postu gasi.}}<br />
* {{test|ita|Il fazzoletto non va messo così.|Su mucadore non andat postu gasi.}}<br />
* {{test|ita|Il fazzoletto andava messo così.|Su mucadore andaìat postu gasi.}}<br />
* {{test|ita|Il fazzoletto non andava messo così.|Su mucadore no andaìat postu gasi.}}<br />
* {{test|ita|Il fazzoletto non andava messo così.|Su mucadore no andaìat postu gasi.}}<br />
<br />
<br />
==chèrrere + partitzìpiu==<br />
* {{test|ita|Quel film è da vedere.|Cuddu film cheret bidu.}}<br />
* {{test|ita|Quel film era da vedere.|Cuddu film cherìat bidu.}}<br />
* {{test|ita|Quel film sarebbe da vedere.|Cuddu film diat chèrrere bidu.}}<br />
<br />
==chèrrere + a + infinitu==<br />
<br />
* {{test|ita|Lui vuole che si faccia così.|Isse chèret a fàghere gasi.}}<br />
* {{test|ita|Lui non vuole che si faccia così.|Isse non bòlet a fàghere gasi.}}<br />
* {{test|ita|Lui voleva che si facesse così.|Isse bolìat a fàghere gasi.}}<br />
* {{test|ita|Lui non voleva che si facesse così.|Isse bolìat a fàghere gasi.}}<br />
* {{test|ita|Lui vorrà che si faccia così.|Isse at a chèrrere a fàghere gasi.}}<br />
<br />
==abarrare==<br />
<br />
* {{test|ita|stai calmo|abarra chietu}}<br />
<br />
==èssere de + infinitu==<br />
* {{test|ita|Il secchio era da mettere in quell'angolo.|Su puale fiat de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio non era da mettere in quell'angolo.|Su puale no fiat de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio sarà da mettere in quell'angolo.|Su puale at a èssere de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio non sarà da mettere in quell'angolo.|Su puale no at a èssere de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio è da mettere in quell'angolo.|Su puale est de pònnere in cudd'angrone.}}<br />
* {{test|ita|Il secchio non è da mettere in quell'angolo.|Su puale no est de pònnere in cudd'angrone.}}<br />
<br />
==Possessivos==<br />
<br />
* {{test|ita|Nella mia terra.|In sa terra mea.}}<br />
* {{test|ita|Nella loro terra.|In sa terra issoro.}}<br />
* {{test|ita|Le loro case.|Sas domos issoro.}}<br />
* {{test|ita|I loro amici.|Sos amigos issoro.}}<br />
<br />
==Superlativos==<br />
<br />
* {{test|ita|Tua figlia è molto studiosa.|Fìgia tua est meda istudiosa.}}<br />
* {{test|ita|Contentissimo.|Cuntentu a beru.}}<br />
* {{test|ita|Bellissimi.|Bellos a beru.}}<br />
* {{test|ita|Integerrima.|Intrega a beru.}}<br />
* {{test|ita|Il più bello di tutti.|Su prus bellu de totus.}}<br />
* {{test|ita|La più ricca del mondo.|Sa prus rica de su mundu.}}<br />
<br />
==Cumparativos==<br />
<br />
* {{test|ita|Carlo è più serio di Marco.|Carlo est prus seriu de Marco.}}<br />
* {{test|ita|Carlo è migliore.|Carlo est mègius.}}<br />
* {{test|ita|La figlia è così bella come la madre.|Sa fìgia est bella comente a sa mama.}}<br />
* {{test|ita|La figlia è tanto bella quanto la madre.|Sa fìgia est bella cantu a sa mama.}}<br />
* {{test|ita|La figlia è non meno bella di la madre.|Sa fìgia no est prus pagu bella de sa mama.}}<br />
* {{test|ita|Marco è meno studioso di Carlo.|Marco est prus pagu istudiosu de Carlo.}}<br />
* {{test|ita|Tuo figlio è più intelligente che studioso.|Fìgiu tuo est prus abbistu chi istudiosu.}}<br />
* {{test|ita|È più facile promettere che mantenere.| Est prus fàtzile promìtere chi mantènnere.}}<br />
<br />
==Pronùmenes proclìticos==<br />
<br />
* {{test|ita|Alcuni glielo chiesero.|Calicunu bi lu at pedidu.}}<br />
<br />
* {{test|ita|Glielo.|Bi lu.}}<br />
* {{test|ita|Gliela.|Bi la.}}<br />
* {{test|ita|Glieli.|Bi los.}}<br />
* {{test|ita|Gliele.|Bi las.}}<br />
* {{test|ita|Gliene.|Bi nde.}}<br />
<br />
==Pronùmenes enclìticos==<br />
<br />
* {{test|ita|Dimmi dov'è!|Nara·mi ue est!}}<br />
* {{test|ita|No Maria, non posso dirtelo.|No Maria, non ti lu potzo nàrrere.}}<br />
<br />
===Imperativo===<br />
<br />
* {{test|ita|Dallo.|Dae·lu}}<br />
* {{test|ita|Dalla.|Dae·la}}<br />
* {{test|ita|Dalli.|Dae·los}}<br />
* {{test|ita|Dalle.|Dae·las}}<br />
<br />
<br />
* {{test|ita|Dagli un libro.|Dae·li unu libru}}<br />
* {{test|ita|Dalle un libro.|Dae·li unu libru}}<br />
* {{test|ita|Dagli un libro.|Dae·lis unu libru}}<br />
* {{test|ita|Dagli un libro.|Dae·lis unu libru}}<br />
<br />
<br />
* {{test|ita|Dagliene uno.|Dae·si·nde unu}}<br />
<br />
* {{test|ita|Portacene una.|Bati·nche·nde una.}}<br />
<br />
<br />
* {{test|ita|Dà.|Dae.}}<br />
* {{test|ita|Date.|Dage.}}<br />
* {{test|ita|Datela.|Dage·la.}}<br />
* {{test|ita|Datele.|Dàge·li.}}<br />
* {{test|ita|Dategli.|Dàge·lis.}}<br />
* {{test|ita|Dategli.|Dàge·lis.}}<br />
* {{test|ita|Datelo.|Dàge·lu.}}<br />
* {{test|ita|Dateli.|Dàge·los.}}<br />
* {{test|ita|Datemi.|Dàge·mi.}}<br />
* {{test|ita|Datemela.|Dage·mi·la.}}<br />
* {{test|ita|Datemele.|Dage·mi·las.}}<br />
* {{test|ita|Datemelo.|Dage·mi·lu.}}<br />
* {{test|ita|Datemeli.|Dage·mi·los.}}<br />
* {{test|ita|Dateci.|Dàge·nos.}}<br />
* {{test|ita|Datecela.|Dage·nos·la.}}<br />
* {{test|ita|Datecele.|Dage·nos·las.}}<br />
* {{test|ita|Datecelo.|Dage·nos·lu.}}<br />
* {{test|ita|Dateceli.|Dage·nos·los.}}<br />
<br />
* {{test|ita|Dategliela.|Dage·bi·la.}}<br />
* {{test|ita|Dategliele.|Dage·bi·las.}}<br />
* {{test|ita|Dateglielo.|Dage·bi·lu.}}<br />
* {{test|ita|Dalla.|Dae·la.}}<br />
* {{test|ita|Dalle.|Dae·las.}}<br />
* {{test|ita|Dalle.|Dae·li.}}<br />
* {{test|ita|Dagli.|Dae·li.}}<br />
* {{test|ita|Dallo.|Dae·lu.}}<br />
* {{test|ita|Dalli.|Dae·los.}}<br />
* {{test|ita|Dammi.|Dae·mi.}}<br />
* {{test|ita|Dammela.|Dae·mi·la.}}<br />
* {{test|ita|Dammele.|Dae·mi·las.}}<br />
* {{test|ita|Dammelo.|Dae·mi·lu.}}<br />
* {{test|ita|Dammeli.|Dae·mi·los}}<br />
<br />
* {{test|ita|Dagliela.|Dae·bi·la.}}<br />
* {{test|ita|Dagliele.|Dae·bi·las.}}<br />
* {{test|ita|Daglielo.|Dae·bi·los.}}<br />
* {{test|ita|Daglieli.|Dae·bi·los.}}<br />
* {{test|ita|Datti.|Dae·ti.}}<br />
* {{test|ita|Dattela.|Dae·ti·la.}}<br />
* {{test|ita|Dattele.|Dae·ti·las.}}<br />
* {{test|ita|Dattelo.|Dae·ti·lu.}}<br />
* {{test|ita|Datteli.|Dae·ti·los.}}<br />
* {{test|ita|Datemi.|Dàge·mi.}}<br />
* {{test|ita|Dateci.|Dàge·nos.}}<br />
* {{test|ita|Dia.|Dia.}}<br />
* {{test|ita|Diate.|Diades.}}<br />
* {{test|ita|La dia.|La diat.}}<br />
* {{test|ita|Le dia.|Las diat.}}<br />
* {{test|ita|Le dia.|Lis diat.}}<br />
* {{test|ita|Gli dia.|Lis diat.}}<br />
* {{test|ita|Gli dia.|Li diat.}}<br />
* {{test|ita|Lo dia.|Lu diat.}}<br />
* {{test|ita|Li dia.|Los diat.}}<br />
* {{test|ita|Mi dia.|Mi diat.}}<br />
* {{test|ita|Me la dia.|Mi la diat.}}<br />
* {{test|ita|Me le dia.|Mi las diat.}}<br />
* {{test|ita|Me lo dia.|Mi lu diat.}}<br />
* {{test|ita|Me li dia.|Mi los diat.}}<br />
* {{test|ita|Diamoci.|Diamus·nos.}}<br />
* {{test|ita|Diamocela.|Diamus·nos·la.}}<br />
* {{test|ita|Diamocele.|Diamus·nos·las}}<br />
* {{test|ita|Diamocelo.|Diamus·nos·lu.}}<br />
* {{test|ita|Diamoceli.|Diamus·nos·los.}}<br />
* {{test|ita|Diamo.|Diamus.}}<br />
* {{test|ita|Diamogliela.|Diamus.}}<br />
* {{test|ita|Diamogliele.|Diamus·bi·la.}}<br />
* {{test|ita|Diamoglielo|Diamus·bi·las.}}<br />
* {{test|ita|Diamoglieli.|Diamus·bi·lu.}}<br />
* {{test|ita|Diamola.|Diamus·la.}}<br />
* {{test|ita|Diamole.|Diamus·las.}}<br />
* {{test|ita|Diamogli.|Diamus·li.}}<br />
* {{test|ita|Diamogli.|Diamus·lis.}}<br />
* {{test|ita|Diamolo.|Diamus·lu.}}<br />
* {{test|ita|Diamoli.|Diamus·los.}}<br />
<br />
* {{test|ita|Diano.|Diant.}}<br />
* {{test|ita|La diano.|La diant.}}<br />
* {{test|ita|Le diano.|Lis diant. }}<br />
* {{test|ita|Gli diano.|Lis diant.}}<br />
* {{test|ita|Gli diano.|Lis diant. }}<br />
* {{test|ita|Lo diano.|Lu diant.}}<br />
* {{test|ita|Li diano.|Lis diant. }}<br />
* {{test|ita|Mi diano.|Mi diant.}}<br />
* {{test|ita|Me la diano.|Mi la diant.}}<br />
* {{test|ita|Me le diano.|Mi las diant.}}<br />
* {{test|ita|Me lo diano.|Mi lu diant.}}<br />
* {{test|ita|Me li diano.|Mi los diant.}}<br />
* {{test|ita|Ci diano.|Nos diant.}}<br />
* {{test|ita|Ce la diano.|Nos la diant.}}<br />
* {{test|ita|Ce le diano.|Nos las diant.}}<br />
* {{test|ita|Ce lo diano.|Nos lu diant.}}<br />
* {{test|ita|Ce lo diano.|Nos lu diant.}}<br />
* {{test|ita|Ce li diano.|Nos los diant.}}<br />
* {{test|ita|Vi diano.|Bos diant.}}<br />
* {{test|ita|Ci dia.|Nos diat.}}<br />
* {{test|ita|Ce la dia.|Nos las diat.}}<br />
* {{test|ita|Ce la dia.|Nos lu diat.}}<br />
* {{test|ita|Ce le dia.|Nos las diat.}}<br />
* {{test|ita|Ce lo dia.|Nos lu diat. }}<br />
* {{test|ita|Ce li dia.|Nos los diat. }}<br />
* {{test|ita|Si diano.|Si diant. }}<br />
* {{test|ita|Gliela diano.|Bi la diat. }}<br />
* {{test|ita|Gliele diano.|Bi las diat. }}<br />
* {{test|ita|Glielo diano.|Bi lu diat.}}<br />
* {{test|ita|Gliela dia.|Bi la diat.}}<br />
* {{test|ita|Gliele dia.|Bi las diat.}}<br />
* {{test|ita|Glielo dia.|Bi lu diat.}}<br />
* {{test|ita|Glieli dia.|Bi los diat.}}<br />
<br />
===Infinitivo===<br />
<br />
* {{test|ita|Dare.|Dare}}<br />
<br />
* {{test|ita|Darla.|La dare.}}<br />
* {{test|ita|Darle.|Li dare.}}<br />
* {{test|ita|Darle.|Las dare.}}<br />
* {{test|ita|Dargli.|Li dare.}}<br />
* {{test|ita|Darglielo.|Bi lu dare.}}<br />
* {{test|ita|Dargliela.|Bi la dare.}}<br />
* {{test|ita|Darglieli.|Bi los dare.}}<br />
* {{test|ita|Dargliele.|Bi las dare.}}<br />
* {{test|ita|Darlo.|Lu dare.}}<br />
* {{test|ita|Darli.|Los dare.}}<br />
* {{test|ita|Darmi.|Mi dare.}}<br />
* {{test|ita|Darmela.|Mi la dare.}}<br />
* {{test|ita|Darmele.|Mi las dare.}}<br />
* {{test|ita|Darmelo.|Mi lu dare.}}<br />
* {{test|ita|Darmeli.|Mi los dare.}}<br />
* {{test|ita|Darci.|Nos dare.}}<br />
* {{test|ita|Darcela.|Nos la dare.}}<br />
* {{test|ita|Darcele.|Nos las dare.}}<br />
* {{test|ita|Darcelo.|Nos lu dare.}}<br />
* {{test|ita|Darceli.|Nos los dare.}}<br />
* {{test|ita|Darvi.|Bos dare.}}<br />
* {{test|ita|Darvela.|Bos la dare.}}<br />
* {{test|ita|Darvele.|Bos las dare.}}<br />
* {{test|ita|Darvelo.|Bos lu dare.}}<br />
* {{test|ita|Darveli.|Bos los dare.}}<br />
* {{test|ita|Darsi.|Si dare.}}<br />
* {{test|ita|Darsela.|Si la dare.}}<br />
* {{test|ita|Darsele.|Si las dare.}}<br />
* {{test|ita|Darselo.|Si lu dare.}}<br />
* {{test|ita|Darseli.|Si los dare.}}<br />
* {{test|ita|Darti.|Ti dare.}}<br />
* {{test|ita|Dartela.|Ti la dare.}}<br />
* {{test|ita|Dartele.|Ti las dare.}}<br />
* {{test|ita|Dartelo.|Ti lu dare.}}<br />
* {{test|ita|Darteli.|Ti los dare.}}<br />
* {{test|ita|Dartemi.|Ti mi dare.}}<br />
* {{test|ita|Darteci.|Ti nos dare.}}<br />
<br />
===Gerundio===<br />
<br />
* {{test|ita|Dando.|Dende.}}<br />
* {{test|ita|Dandola.|Dende·la.}}<br />
* {{test|ita|Dandole.|Dende·li.}}<br />
* {{test|ita|Dandole.|Dende·las.}}<br />
* {{test|ita|Dandogli.|Dende·li.}}<br />
* {{test|ita|Dandoglielo.|Dende·bi·lu.}}<br />
* {{test|ita|Dandogliela.|Dende·bi·la.}}<br />
* {{test|ita|Dandogliele.|Dende·bi·las.}}<br />
* {{test|ita|Dandoglieli.|Dende·bi·los.}}<br />
* {{test|ita|Dandolo.|Dende·lu.}}<br />
* {{test|ita|Dandoli.|Dende·los.}}<br />
* {{test|ita|Dandomi.|Dende·mi.}}<br />
* {{test|ita|Dandomela.|Dende·mi·la.}}<br />
* {{test|ita|Dandomele.|Dende·mi·las.}}<br />
* {{test|ita|Dandomelo.|Dende·mi·lu.}}<br />
* {{test|ita|Dandomeli.|Dende·mi·los.}}<br />
* {{test|ita|Dandoci.|Dende·nos.}}<br />
* {{test|ita|Dandocela.|Dende·nos·la.}}<br />
* {{test|ita|Dandocele.|Dende·nos·las.}}<br />
* {{test|ita|Dandocelo.|Dende·nos·lu.}}<br />
* {{test|ita|Dandoceli.|Dende·mi·los.}}<br />
* {{test|ita|Dandovi.|Dende·bos.}}<br />
* {{test|ita|Dandovela.|Dende·bos·la.}}<br />
* {{test|ita|Dandovele.|Dende·bos·las.}}<br />
* {{test|ita|Dandovelo.|Dende·bos·lu.}}<br />
* {{test|ita|Dandoveli.|Dende·bos·los.}}<br />
* {{test|ita|Dandosi.|Dende·si}}<br />
* {{test|ita|Dandosela.|Dende·si·la.}}<br />
* {{test|ita|Dandosele.|Dende·si·las.}}<br />
* {{test|ita|Dandosegli.|Dende·si·lis.}}<br />
* {{test|ita|Dandoselo.|Dende·si·lu.}}<br />
* {{test|ita|Dandoseli.|Dende·si·los.}}<br />
* {{test|ita|Dandoti.|Dende·ti.}}<br />
* {{test|ita|Dandotela.|Dende·ti·la.}}<br />
* {{test|ita|Dandotele.|Dende·ti·las.}}<br />
* {{test|ita|Dandotelo.|Dende·ti·lu.}}<br />
* {{test|ita|Dandoteli.|Dende·ti·los.}}<br />
* {{test|ita|Dandoteci.|Dende·ti·nos.}}<br />
<br />
===Participio===<br />
<br />
* {{test|ita|Vistolo.|Bidu·lu.}}<br />
* {{test|ita|Vistola.|Bida·la.}}<br />
<br />
[[Category:Sardo e italiano|Pending tests]]</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Sardu_abbarra_bivu!&diff=67108Sardu abbarra bivu!2018-06-04T00:41:51Z<p>Grfro3d: </p>
<hr />
<div>'''Name''': Gianfranco Fronteddu<br />
<br />
'''E-mail address''': gfro3d@gmail.com<br />
<br />
'''Other information that may be useful to contact you:'''<br />
<br />
Telegram username: gianfro4moros Skype: gianfranco.fronteddu88<br />
<br />
'''Why is it you are interested in Machine Translation?''' <br />
I’m a Translation student and have always been fascinated by Computational Linguistics during my University studies. We have approached to this field of Linguistics through the courses “Theory and techniques of translating” and “Applied linguistics” at the University of Cagliari. These have allowed me to gain a general understanding of Machine Translation (MT) and its history, from the first engines in the ‘50s —based on bilingual dictionaries that worked through “word to word” translation— until today, including some of the significant advances in the discipline (IBM system in the ‘50s with 250 words and 6 grammar rules, in 1983, the first automatic translation program for PC, which was immediately adopted by many big companies as IBM). There are mainly two approaches to MT, the Statistical (SMT) and the Rule-Based (RBMT), which includes the translation based on the principle of transfer. Words, in this approach, are translated according to a purely linguistic point of view choosing the appropriate linguistic equivalent. Many famous MT systems are based on rules. The most popular are surely Apertium and Lucy Translator. The other main approach, SMT, relies on parallel corpora containing real texts and their corresponding translations. The objective of this approach is to generate a translation from statistical methods based on bilingual and monolingual corpora of texts. Other recent approaches to MT include the neural MT or the context-based MT, which gets the best translation of a word by considering the rest of the words that surround it. The context-based Machine Translation presents a greater advantage respect to MT based on corpora: adding new languages is very easy. To create a new language pair, in fact, it is not necessary to include corpora with millions of words as in the statistical methods: it takes only two smaller corpora and a dictionary containing rules to conjugate verbs and to match nouns and adjectives. <br />
As for the effects of MT in professional translation, MT allows translators to increase their translation capacity and to offer a broader range of services to clients (more words per hour). However, MT is often criticised because it allegedly brings poorer results; in my opinion, quality must be achieved through the cooperation of the MT engine and the translator. To put in other words, no one better than a translator trained in the use of MT to prepare texts and to correct their outputs, since s/he will be aware, even before the translation takes places, of the possible problems that might arise during the translation phase.<br />
Finally, as will be described below, MT —together with other translation technologies― has proved to be a crucial tool when it comes to the survival of endangered languages.<br />
<br />
'''Why is it that you are interested in Apertium?'''<br />
Apertium originated as one of the MT engines in the OpenTrad project, funded by the Spanish Government. It was designed primarily to translate between similar language pairs, although it has recently been expanded to translate more divergent languages. New language pairs can be added by creating dictionaries and rules containing linguistic data in XML format.<br />
The fact that Apertium is an open-source project means that anyone can contribute to its development. This brings about an interesting point related to the involvement of minoritised language communities. Being myself a speaker of a minoritised language, Sardinian, I would like to give my contribution so that my language can become part of the language combinations offered by this tool. Sardinian is a Romance language deriving from Latin spoken in the island of Sardinia. After the Roman Empire, along the centuries was subjected to domination by various populations: Vandals, Pisans and Genoese, Aragonese and Spanish and finally Piedmontese and Italian. The Sardinian language has resisted any domination, even if it has remained linguistically influenced by the languages of each period of domination. Nowadays, Sardinian is a language system which is distinguished in two main variants: Campidanese (https://www.ethnologue.com/language/sro), spoken in central and southern Sardinia, and Logudorese (https://www.ethnologue.com/language/src), spoken in central and northern Sardinia.<br />
According to Ethnologue, unfortunately, the Sardinian language is in danger of extinction. The linguistic fragmentation and differences between the various dialects have led to a gradual abandonment of Sardinian in favor of the national language, Italian. It resists as the primary language only in some areas of Sardinia, for example, the central ones. The UNESCO Atlas of the World's Languages in Danger (http://www.unesco.org/languages-atlas/index.php) reports that Logudorese is spoken mainly in the central part of Sardinia by about 400,000 people. Whereas Campidanese, spoken in the south of Sardinia, by about 900,000 people.<br />
To prevent its extinction, a language standardization project was initiated in order to create a new grammar and a new spelling, valid for everyone, which took the name of LSC (Common Sardinian Language). The creation of an MT engine would be of great utility for the language: firstly, a RBMT as Apertium would ease written production, essential to complete the process of standardization. In neighbouring cases such as that of Catalonia, MT has increased the presence of the Catalan language at various levels. For instance, since 1997, when for the first time a Catalan newspaper started publishing a bilingual daily edition (Catalan/Spanish), at least three other newspapers have followed this same steps: El Periódico, El País and La Vanguardia. At least 2 of them in papers.<br />
Which of the published tasks are you interested in? What do you plan to do? <br />
I am interested in adopting an unreleased language pair.<br />
Considering my background studies (translation) and my knowledge about MT and Translation Technologies, I plan to build up the language pair Italian-Sardinian. It must be acknowledged that some work for the Sardinian language has already been carried out, as can be seen into the Apertium Incubator (http://wiki.apertium.org/wiki/Incubator). Specifically, some dictionaries are available for the language pair Catalan-Sardinian, Portuguese-Sardinian and Italian-Sardinian.<br />
<br />
'''My proposal.'''<br />
Title: ''Sardu, abbarra vivu!'' (Sardinian, keep yourself alive!)<br />
The project I intend to carry out is the creation of a MT engine for the language pair Italian-Sardinian based on the Apertium platform. As pointed out above, MT is crucial for the survival of minoritised languages. Apertium, having lead the development of RBMT engines in the last years, provides an excellent framework for language pairs of the same linguistic family without the need of linguistic corpora. The experience of Apertium with several minoritised languages such as Occitan, Asturian or Maltese proofs that such a project is viable.<br />
Google and Apertium would benefit from this project, not only because it would contribute to open-source software and minority languages, but especially because it would have a great impact in the Sardinian society, since at present there is no MT system for the Sardinian language, neither by Google nor by Apertium.<br />
As for the beneficiaries, the examples given above for similar cases (such as the one of the Catalan language) show that the outcome of this project might have a commercial impact as well, since media, such as newspapers, magazines and websites, as stakeholders in the field of the written production, could be interested in including the MT system in their publication workflows and therefore in assuming experts for the customisation and the improvement of the engine. Furthermore, such a tool could have an impact as well from the educational point of view, because new generations could gain access to the Sardinian language.<br />
<br />
'''Include time needed to think, to program, to document and to disseminate.'''<br />
Not having any previous experience on the building of language pairs nor on the functioning of Apertium, I estimate that the first four weeks (from April 22nd 2016 to May 22nd 2016) will be employed in the acquisition of knowledge and understanding of the Apertium framework. <br />
<br />
'''Work plan From May 23th to August 23th 2016'''<br />
Week 1: From May 23th to May 30th 2016. 30 hours. Look for Italian Dictionaries already existent; Installation of: lttoolbox (>= 3.3.0); apertium (>= 3.3.0); a text editor; set up file of basic XML skeleton for the creation of morphological Sardinian and Italian dictionaries (wget; python3 apertium-init.py ita; python3 apertium-init.py sc; python3 apertium-init.py ita-sc).<br />
Week 2: From May 31st to June 6th 2016. 30 hours. Creation of three directories apertium-ita, apertium-sc, apertium-ita-sc; start with the creation of Sardinian morphological dictionary (Alphabet, Symbols, Paradigms, Standard sections). About spelling rules I’ll refer to LSC (Common Sardinian language) (http://www.regione.sardegna.it/documenti/1_72_20060418160308.pdf) .<br />
Week 3: From June 7th to June 14th 2016. 40 hours. Work on Sardinian Morphological dictionary. <br />
Week 4: From June 15th to June 19th 2016. 20 hours Work on Sardinian Morphological dictionary. From June 20th to June 25 pause due to academic issues.<br />
Deliverable #1 June 27th 2016. Google midterm evaluation: Italian and Sardinian morphological dictionaries.<br />
Week 5:From June 27 to 4 July 2016. 30 hours. Acquisition of knowledge and understanding of generating bilingual dictionaries. Creation of the Bilingual dictionary file name apertium-ita-sc.sc-ita.dix; <br />
Week 6: From July 5th to July 11th 2016. 30 hours. Start generating Bilingual dictionary: creation of the basic XML skeleton. Adding an entry to translate between Italian-Sardinian words.<br />
Week 7: From July 12th to July 19. 30 hours. Work on bilingual dictionary.<br />
Week 8: From July 20 to July 27. 30 hours. Work on bilingual dictionary<br />
Deliverable #2 July 28th 2016: Bilingual dictionary.<br />
Week 9: From July 29th to 5th August. 30 hours. Creation of Transfer Rule file. Set up of basic skeleton and especially of grammatical symbols input/output rules.<br />
Week 10: From August 6th to August 11th 2016. 30 hours. Work at Transfer rule file defining categories and symbols. We’ll also try to recycle the work already done from existing language pair (http://wiki.apertium.org/wiki/Incubator).<br />
Week 11: From August 12nd to August 19th. Review and finalization of the project.<br />
Week 12: From August 19th to August 26th. Submission of the project to the mentors for the final evaluation.<br />
Project complete: August 29th 2016. Project complete.<br />
'''<br />
<br />
List your skills and give evidence of your qualifications. Tell us what is your current field of study, major, etc. Convince us that you can do the work. In particular we would like to know whether you have programmed before in open-source projects.'''<br />
Despite the fact that I do not have a programmer profile, I am strongly determined to carry out this project and to compensate for my lack of knowledge on computational linguistics with the maximum dedication. I feel, however, that my skills are adequate for this project for the following reasons. <br />
Firstly because my native language is Sardinian. I have always spoken Sardinian and I know deeply the characteristics of Sardinian language. I am aware of the advances that have been made to achieve a standard language and I know the phonetic, grammatical and literary of the variants of my tongue.<br />
Secondly, because of my education background. In 2015 I completed my Bachelor's Degree in Foreign Languages. In the course of my studies I have supported various exams relating to translation and interpreting from Spanish and English into Italian. In the academic year 2011-2012 I was awarded an Erasmus scholarship, by means of which I could attend courses at the University of the Basque Country. During this experience I participated in a project of audiovisual translation and subtitling, which then became the subject of my thesis. I also took a course called "Computers for translators" through which I acquired skills on the use of translation memories and CAT tools (SDL Trados and Wordfast). I have as well received specific training on computational linguistics at the University of Cagliari, in which I learned to use markup languages (XML and HTML) for the creation of linguistic corpora. At present, I attend a Master’s Degree in Translation of specialised texts at the University of Cagliari, where I have been trained on translation technologies and localisation. My passion about translation technologies has allowed me to be selected by the University of Cagliari to teach during the month of March a course on Computer Translation, in the context of which I have taught Audiovisual Translation and CAT tools for 40 hours. <br />
Finally, I have recently won a scholarship, provided by the plan Erasmus + Traineeship, thanks to which, for the next three months, I will do an internship at the Tradumàtica Research Group at the UAB (Universitat Autònoma de Barcelona). During my stage in Barcelona, I will carry out tasks related to translation and localisation with the aid of CAT tools, especially focusing on free software and minoritized languages. I am convinced that the things I will learn with the Tradumàtica Research group at the UAB will allow me to carry out the project I am submitting with success.<br />
Despite the fact that I have never developed an open-source project myself, I do have participated in open-source projects involving the modification of the source code of software in order to translate it into other languages. For instance, I have participated in the localisation of the open-source instant messaging client Telegram into Sardinian, both for iPhone and Android platforms, and I plan to be able to complete these translations over the next few weeks.<br />
<br />
'''List any non-Summer-of-Code plans you have for the Summer, especially employment, if you are applying for internships, and class-taking. Be specific about schedules and time commitments. we would like to be sure you have at least 30 free hours a week to develop for our project.'''<br />
I can guarantee 30 hours per week to work on this project. I will finish my studies at the University of Cagliari on June and during my stage in Barcelona I will be able to work in this project. My obligations, therefore, leave me plenty of hours, especially during the weekends, to devote to it. My stay in Barcelona will end on 07.31.2016, and from there on there will be a pause in the period from 06.19.2016 to 06.25.2016, due to academic issues. I plan to increase the number of work hours during the previous and the following weeks so that this pause will not affect the global calendar.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Sardu_abbarra_bivu!&diff=67107Sardu abbarra bivu!2018-06-04T00:41:23Z<p>Grfro3d: </p>
<hr />
<div>'''Name''': Gianfranco Fronteddu<br />
<br />
'''E-mail address''': gfro3d@gmail.com<br />
<br />
'''Other information that may be useful to contact you:'''<br />
<br />
Telegram username: gianfro4moros Skype: gianfranco.fronteddu88<br />
<br />
'''Why is it you are interested in Machine Translation?''' <br />
I’m a Translation student and have always been fascinated by Computational Linguistics during my University studies. We have approached to this field of Linguistics through the courses “Theory and techniques of translating” and “Applied linguistics” at the University of Cagliari. These have allowed me to gain a general understanding of Machine Translation (MT) and its history, from the first engines in the ‘50s —based on bilingual dictionaries that worked through “word to word” translation— until today, including some of the significant advances in the discipline (IBM system in the ‘50s with 250 words and 6 grammar rules, in 1983, the first automatic translation program for PC, which was immediately adopted by many big companies as IBM).<br />
There are mainly two approaches to MT, the Statistical (SMT) and the Rule-Based (RBMT), which includes the translation based on the principle of transfer. Words, in this approach, are translated according to a purely linguistic point of view choosing the appropriate linguistic equivalent. Many famous MT systems are based on rules. The most popular are surely Apertium and Lucy Translator. The other main approach, SMT, relies on parallel corpora containing real texts and their corresponding translations. The objective of this approach is to generate a translation from statistical methods based on bilingual and monolingual corpora of texts. Other recent approaches to MT include the neural MT or the context-based MT, which gets the best translation of a word by considering the rest of the words that surround it. The context-based Machine Translation presents a greater advantage respect to MT based on corpora: adding new languages is very easy. To create a new language pair, in fact, it is not necessary to include corpora with millions of words as in the statistical methods: it takes only two smaller corpora and a dictionary containing rules to conjugate verbs and to match nouns and adjectives. <br />
As for the effects of MT in professional translation, MT allows translators to increase their translation capacity and to offer a broader range of services to clients (more words per hour). However, MT is often criticised because it allegedly brings poorer results; in my opinion, quality must be achieved through the cooperation of the MT engine and the translator. To put in other words, no one better than a translator trained in the use of MT to prepare texts and to correct their outputs, since s/he will be aware, even before the translation takes places, of the possible problems that might arise during the translation phase.<br />
Finally, as will be described below, MT —together with other translation technologies― has proved to be a crucial tool when it comes to the survival of endangered languages.<br />
<br />
'''Why is it that you are interested in Apertium?'''<br />
Apertium originated as one of the MT engines in the OpenTrad project, funded by the Spanish Government. It was designed primarily to translate between similar language pairs, although it has recently been expanded to translate more divergent languages. New language pairs can be added by creating dictionaries and rules containing linguistic data in XML format.<br />
The fact that Apertium is an open-source project means that anyone can contribute to its development. This brings about an interesting point related to the involvement of minoritised language communities. Being myself a speaker of a minoritised language, Sardinian, I would like to give my contribution so that my language can become part of the language combinations offered by this tool. Sardinian is a Romance language deriving from Latin spoken in the island of Sardinia. After the Roman Empire, along the centuries was subjected to domination by various populations: Vandals, Pisans and Genoese, Aragonese and Spanish and finally Piedmontese and Italian. The Sardinian language has resisted any domination, even if it has remained linguistically influenced by the languages of each period of domination. Nowadays, Sardinian is a language system which is distinguished in two main variants: Campidanese (https://www.ethnologue.com/language/sro), spoken in central and southern Sardinia, and Logudorese (https://www.ethnologue.com/language/src), spoken in central and northern Sardinia.<br />
According to Ethnologue, unfortunately, the Sardinian language is in danger of extinction. The linguistic fragmentation and differences between the various dialects have led to a gradual abandonment of Sardinian in favor of the national language, Italian. It resists as the primary language only in some areas of Sardinia, for example, the central ones. The UNESCO Atlas of the World's Languages in Danger (http://www.unesco.org/languages-atlas/index.php) reports that Logudorese is spoken mainly in the central part of Sardinia by about 400,000 people. Whereas Campidanese, spoken in the south of Sardinia, by about 900,000 people.<br />
To prevent its extinction, a language standardization project was initiated in order to create a new grammar and a new spelling, valid for everyone, which took the name of LSC (Common Sardinian Language). The creation of an MT engine would be of great utility for the language: firstly, a RBMT as Apertium would ease written production, essential to complete the process of standardization. In neighbouring cases such as that of Catalonia, MT has increased the presence of the Catalan language at various levels. For instance, since 1997, when for the first time a Catalan newspaper started publishing a bilingual daily edition (Catalan/Spanish), at least three other newspapers have followed this same steps: El Periódico, El País and La Vanguardia. At least 2 of them in papers.<br />
Which of the published tasks are you interested in? What do you plan to do? <br />
I am interested in adopting an unreleased language pair.<br />
Considering my background studies (translation) and my knowledge about MT and Translation Technologies, I plan to build up the language pair Italian-Sardinian. It must be acknowledged that some work for the Sardinian language has already been carried out, as can be seen into the Apertium Incubator (http://wiki.apertium.org/wiki/Incubator). Specifically, some dictionaries are available for the language pair Catalan-Sardinian, Portuguese-Sardinian and Italian-Sardinian.<br />
<br />
'''My proposal.'''<br />
Title: ''Sardu, abbarra vivu!'' (Sardinian, keep yourself alive!)<br />
The project I intend to carry out is the creation of a MT engine for the language pair Italian-Sardinian based on the Apertium platform. As pointed out above, MT is crucial for the survival of minoritised languages. Apertium, having lead the development of RBMT engines in the last years, provides an excellent framework for language pairs of the same linguistic family without the need of linguistic corpora. The experience of Apertium with several minoritised languages such as Occitan, Asturian or Maltese proofs that such a project is viable.<br />
Google and Apertium would benefit from this project, not only because it would contribute to open-source software and minority languages, but especially because it would have a great impact in the Sardinian society, since at present there is no MT system for the Sardinian language, neither by Google nor by Apertium.<br />
As for the beneficiaries, the examples given above for similar cases (such as the one of the Catalan language) show that the outcome of this project might have a commercial impact as well, since media, such as newspapers, magazines and websites, as stakeholders in the field of the written production, could be interested in including the MT system in their publication workflows and therefore in assuming experts for the customisation and the improvement of the engine. Furthermore, such a tool could have an impact as well from the educational point of view, because new generations could gain access to the Sardinian language.<br />
<br />
'''Include time needed to think, to program, to document and to disseminate.'''<br />
Not having any previous experience on the building of language pairs nor on the functioning of Apertium, I estimate that the first four weeks (from April 22nd 2016 to May 22nd 2016) will be employed in the acquisition of knowledge and understanding of the Apertium framework. <br />
<br />
'''Work plan From May 23th to August 23th 2016'''<br />
Week 1: From May 23th to May 30th 2016. 30 hours. Look for Italian Dictionaries already existent; Installation of: lttoolbox (>= 3.3.0); apertium (>= 3.3.0); a text editor; set up file of basic XML skeleton for the creation of morphological Sardinian and Italian dictionaries (wget; python3 apertium-init.py ita; python3 apertium-init.py sc; python3 apertium-init.py ita-sc).<br />
Week 2: From May 31st to June 6th 2016. 30 hours. Creation of three directories apertium-ita, apertium-sc, apertium-ita-sc; start with the creation of Sardinian morphological dictionary (Alphabet, Symbols, Paradigms, Standard sections). About spelling rules I’ll refer to LSC (Common Sardinian language) (http://www.regione.sardegna.it/documenti/1_72_20060418160308.pdf) .<br />
Week 3: From June 7th to June 14th 2016. 40 hours. Work on Sardinian Morphological dictionary. <br />
Week 4: From June 15th to June 19th 2016. 20 hours Work on Sardinian Morphological dictionary. From June 20th to June 25 pause due to academic issues.<br />
Deliverable #1 June 27th 2016. Google midterm evaluation: Italian and Sardinian morphological dictionaries.<br />
Week 5:From June 27 to 4 July 2016. 30 hours. Acquisition of knowledge and understanding of generating bilingual dictionaries. Creation of the Bilingual dictionary file name apertium-ita-sc.sc-ita.dix; <br />
Week 6: From July 5th to July 11th 2016. 30 hours. Start generating Bilingual dictionary: creation of the basic XML skeleton. Adding an entry to translate between Italian-Sardinian words.<br />
Week 7: From July 12th to July 19. 30 hours. Work on bilingual dictionary.<br />
Week 8: From July 20 to July 27. 30 hours. Work on bilingual dictionary<br />
Deliverable #2 July 28th 2016: Bilingual dictionary.<br />
Week 9: From July 29th to 5th August. 30 hours. Creation of Transfer Rule file. Set up of basic skeleton and especially of grammatical symbols input/output rules.<br />
Week 10: From August 6th to August 11th 2016. 30 hours. Work at Transfer rule file defining categories and symbols. We’ll also try to recycle the work already done from existing language pair (http://wiki.apertium.org/wiki/Incubator).<br />
Week 11: From August 12nd to August 19th. Review and finalization of the project.<br />
Week 12: From August 19th to August 26th. Submission of the project to the mentors for the final evaluation.<br />
Project complete: August 29th 2016. Project complete.<br />
'''<br />
<br />
List your skills and give evidence of your qualifications. Tell us what is your current field of study, major, etc. Convince us that you can do the work. In particular we would like to know whether you have programmed before in open-source projects.'''<br />
Despite the fact that I do not have a programmer profile, I am strongly determined to carry out this project and to compensate for my lack of knowledge on computational linguistics with the maximum dedication. I feel, however, that my skills are adequate for this project for the following reasons. <br />
Firstly because my native language is Sardinian. I have always spoken Sardinian and I know deeply the characteristics of Sardinian language. I am aware of the advances that have been made to achieve a standard language and I know the phonetic, grammatical and literary of the variants of my tongue.<br />
Secondly, because of my education background. In 2015 I completed my Bachelor's Degree in Foreign Languages. In the course of my studies I have supported various exams relating to translation and interpreting from Spanish and English into Italian. In the academic year 2011-2012 I was awarded an Erasmus scholarship, by means of which I could attend courses at the University of the Basque Country. During this experience I participated in a project of audiovisual translation and subtitling, which then became the subject of my thesis. I also took a course called "Computers for translators" through which I acquired skills on the use of translation memories and CAT tools (SDL Trados and Wordfast). I have as well received specific training on computational linguistics at the University of Cagliari, in which I learned to use markup languages (XML and HTML) for the creation of linguistic corpora. At present, I attend a Master’s Degree in Translation of specialised texts at the University of Cagliari, where I have been trained on translation technologies and localisation. My passion about translation technologies has allowed me to be selected by the University of Cagliari to teach during the month of March a course on Computer Translation, in the context of which I have taught Audiovisual Translation and CAT tools for 40 hours. <br />
Finally, I have recently won a scholarship, provided by the plan Erasmus + Traineeship, thanks to which, for the next three months, I will do an internship at the Tradumàtica Research Group at the UAB (Universitat Autònoma de Barcelona). During my stage in Barcelona, I will carry out tasks related to translation and localisation with the aid of CAT tools, especially focusing on free software and minoritized languages. I am convinced that the things I will learn with the Tradumàtica Research group at the UAB will allow me to carry out the project I am submitting with success.<br />
Despite the fact that I have never developed an open-source project myself, I do have participated in open-source projects involving the modification of the source code of software in order to translate it into other languages. For instance, I have participated in the localisation of the open-source instant messaging client Telegram into Sardinian, both for iPhone and Android platforms, and I plan to be able to complete these translations over the next few weeks.<br />
<br />
'''List any non-Summer-of-Code plans you have for the Summer, especially employment, if you are applying for internships, and class-taking. Be specific about schedules and time commitments. we would like to be sure you have at least 30 free hours a week to develop for our project.'''<br />
I can guarantee 30 hours per week to work on this project. I will finish my studies at the University of Cagliari on June and during my stage in Barcelona I will be able to work in this project. My obligations, therefore, leave me plenty of hours, especially during the weekends, to devote to it. My stay in Barcelona will end on 07.31.2016, and from there on there will be a pause in the period from 06.19.2016 to 06.25.2016, due to academic issues. I plan to increase the number of work hours during the previous and the following weeks so that this pause will not affect the global calendar.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Sardu_abbarra_bivu!&diff=67106Sardu abbarra bivu!2018-06-04T00:39:59Z<p>Grfro3d: Created page with "'''Name''': Gianfranco Fronteddu '''E-mail address''': gfro3d@gmail.com '''Other information that may be useful to contact you:''' Telegram username: gianfro4moros Skype: gian..."</p>
<hr />
<div>'''Name''': Gianfranco Fronteddu<br />
'''E-mail address''': gfro3d@gmail.com<br />
'''Other information that may be useful to contact you:'''<br />
Telegram username: gianfro4moros Skype: gianfranco.fronteddu88<br />
<br />
'''Why is it you are interested in Machine Translation?''' <br />
I’m a Translation student and have always been fascinated by Computational Linguistics during my University studies. We have approached to this field of Linguistics through the courses “Theory and techniques of translating” and “Applied linguistics” at the University of Cagliari. These have allowed me to gain a general understanding of Machine Translation (MT) and its history, from the first engines in the ‘50s —based on bilingual dictionaries that worked through “word to word” translation— until today, including some of the significant advances in the discipline (IBM system in the ‘50s with 250 words and 6 grammar rules, in 1983, the first automatic translation program for PC, which was immediately adopted by many big companies as IBM).<br />
There are mainly two approaches to MT, the Statistical (SMT) and the Rule-Based (RBMT), which includes the translation based on the principle of transfer. Words, in this approach, are translated according to a purely linguistic point of view choosing the appropriate linguistic equivalent. Many famous MT systems are based on rules. The most popular are surely Apertium and Lucy Translator. The other main approach, SMT, relies on parallel corpora containing real texts and their corresponding translations. The objective of this approach is to generate a translation from statistical methods based on bilingual and monolingual corpora of texts.<br />
Other recent approaches to MT include the neural MT or the context-based MT, which gets the best translation of a word by considering the rest of the words that surround it. The context-based Machine Translation presents a greater advantage respect to MT based on corpora: adding new languages is very easy. To create a new language pair, in fact, it is not necessary to include corpora with millions of words as in the statistical methods: it takes only two smaller corpora and a dictionary containing rules to conjugate verbs and to match nouns and adjectives. <br />
As for the effects of MT in professional translation, MT allows translators to increase their translation capacity and to offer a broader range of services to clients (more words per hour). However, MT is often criticised because it allegedly brings poorer results; in my opinion, quality must be achieved through the cooperation of the MT engine and the translator. To put in other words, no one better than a translator trained in the use of MT to prepare texts and to correct their outputs, since s/he will be aware, even before the translation takes places, of the possible problems that might arise during the translation phase.<br />
Finally, as will be described below, MT —together with other translation technologies― has proved to be a crucial tool when it comes to the survival of endangered languages.<br />
<br />
'''Why is it that you are interested in Apertium?'''<br />
Apertium originated as one of the MT engines in the OpenTrad project, funded by the Spanish Government. It was designed primarily to translate between similar language pairs, although it has recently been expanded to translate more divergent languages. New language pairs can be added by creating dictionaries and rules containing linguistic data in XML format.<br />
The fact that Apertium is an open-source project means that anyone can contribute to its development. This brings about an interesting point related to the involvement of minoritised language communities. Being myself a speaker of a minoritised language, Sardinian, I would like to give my contribution so that my language can become part of the language combinations offered by this tool. Sardinian is a Romance language deriving from Latin spoken in the island of Sardinia. After the Roman Empire, along the centuries was subjected to domination by various populations: Vandals, Pisans and Genoese, Aragonese and Spanish and finally Piedmontese and Italian. The Sardinian language has resisted any domination, even if it has remained linguistically influenced by the languages of each period of domination. Nowadays, Sardinian is a language system which is distinguished in two main variants: Campidanese (https://www.ethnologue.com/language/sro), spoken in central and southern Sardinia, and Logudorese (https://www.ethnologue.com/language/src), spoken in central and northern Sardinia.<br />
According to Ethnologue, unfortunately, the Sardinian language is in danger of extinction. The linguistic fragmentation and differences between the various dialects have led to a gradual abandonment of Sardinian in favor of the national language, Italian. It resists as the primary language only in some areas of Sardinia, for example, the central ones. The UNESCO Atlas of the World's Languages in Danger (http://www.unesco.org/languages-atlas/index.php) reports that Logudorese is spoken mainly in the central part of Sardinia by about 400,000 people. Whereas Campidanese, spoken in the south of Sardinia, by about 900,000 people.<br />
To prevent its extinction, a language standardization project was initiated in order to create a new grammar and a new spelling, valid for everyone, which took the name of LSC (Common Sardinian Language). The creation of an MT engine would be of great utility for the language: firstly, a RBMT as Apertium would ease written production, essential to complete the process of standardization. In neighbouring cases such as that of Catalonia, MT has increased the presence of the Catalan language at various levels. For instance, since 1997, when for the first time a Catalan newspaper started publishing a bilingual daily edition (Catalan/Spanish), at least three other newspapers have followed this same steps: El Periódico, El País and La Vanguardia. At least 2 of them in papers.<br />
Which of the published tasks are you interested in? What do you plan to do? <br />
I am interested in adopting an unreleased language pair.<br />
Considering my background studies (translation) and my knowledge about MT and Translation Technologies, I plan to build up the language pair Italian-Sardinian. It must be acknowledged that some work for the Sardinian language has already been carried out, as can be seen into the Apertium Incubator (http://wiki.apertium.org/wiki/Incubator). Specifically, some dictionaries are available for the language pair Catalan-Sardinian, Portuguese-Sardinian and Italian-Sardinian.<br />
<br />
'''My proposal.'''<br />
Title: ''Sardu, abbarra vivu!'' (Sardinian, keep yourself alive!)<br />
The project I intend to carry out is the creation of a MT engine for the language pair Italian-Sardinian based on the Apertium platform. As pointed out above, MT is crucial for the survival of minoritised languages. Apertium, having lead the development of RBMT engines in the last years, provides an excellent framework for language pairs of the same linguistic family without the need of linguistic corpora. The experience of Apertium with several minoritised languages such as Occitan, Asturian or Maltese proofs that such a project is viable.<br />
Google and Apertium would benefit from this project, not only because it would contribute to open-source software and minority languages, but especially because it would have a great impact in the Sardinian society, since at present there is no MT system for the Sardinian language, neither by Google nor by Apertium.<br />
As for the beneficiaries, the examples given above for similar cases (such as the one of the Catalan language) show that the outcome of this project might have a commercial impact as well, since media, such as newspapers, magazines and websites, as stakeholders in the field of the written production, could be interested in including the MT system in their publication workflows and therefore in assuming experts for the customisation and the improvement of the engine. Furthermore, such a tool could have an impact as well from the educational point of view, because new generations could gain access to the Sardinian language.<br />
<br />
'''Include time needed to think, to program, to document and to disseminate.'''<br />
Not having any previous experience on the building of language pairs nor on the functioning of Apertium, I estimate that the first four weeks (from April 22nd 2016 to May 22nd 2016) will be employed in the acquisition of knowledge and understanding of the Apertium framework. <br />
<br />
'''Work plan From May 23th to August 23th 2016'''<br />
Week 1: From May 23th to May 30th 2016. 30 hours. Look for Italian Dictionaries already existent; Installation of: lttoolbox (>= 3.3.0); apertium (>= 3.3.0); a text editor; set up file of basic XML skeleton for the creation of morphological Sardinian and Italian dictionaries (wget; python3 apertium-init.py ita; python3 apertium-init.py sc; python3 apertium-init.py ita-sc).<br />
Week 2: From May 31st to June 6th 2016. 30 hours. Creation of three directories apertium-ita, apertium-sc, apertium-ita-sc; start with the creation of Sardinian morphological dictionary (Alphabet, Symbols, Paradigms, Standard sections). About spelling rules I’ll refer to LSC (Common Sardinian language) (http://www.regione.sardegna.it/documenti/1_72_20060418160308.pdf) .<br />
Week 3: From June 7th to June 14th 2016. 40 hours. Work on Sardinian Morphological dictionary. <br />
Week 4: From June 15th to June 19th 2016. 20 hours Work on Sardinian Morphological dictionary. From June 20th to June 25 pause due to academic issues.<br />
Deliverable #1 June 27th 2016. Google midterm evaluation: Italian and Sardinian morphological dictionaries.<br />
Week 5:From June 27 to 4 July 2016. 30 hours. Acquisition of knowledge and understanding of generating bilingual dictionaries. Creation of the Bilingual dictionary file name apertium-ita-sc.sc-ita.dix; <br />
Week 6: From July 5th to July 11th 2016. 30 hours. Start generating Bilingual dictionary: creation of the basic XML skeleton. Adding an entry to translate between Italian-Sardinian words.<br />
Week 7: From July 12th to July 19. 30 hours. Work on bilingual dictionary.<br />
Week 8: From July 20 to July 27. 30 hours. Work on bilingual dictionary<br />
Deliverable #2 July 28th 2016: Bilingual dictionary.<br />
Week 9: From July 29th to 5th August. 30 hours. Creation of Transfer Rule file. Set up of basic skeleton and especially of grammatical symbols input/output rules.<br />
Week 10: From August 6th to August 11th 2016. 30 hours. Work at Transfer rule file defining categories and symbols. We’ll also try to recycle the work already done from existing language pair (http://wiki.apertium.org/wiki/Incubator).<br />
Week 11: From August 12nd to August 19th. Review and finalization of the project.<br />
Week 12: From August 19th to August 26th. Submission of the project to the mentors for the final evaluation.<br />
Project complete: August 29th 2016. Project complete.<br />
'''<br />
<br />
List your skills and give evidence of your qualifications. Tell us what is your current field of study, major, etc. Convince us that you can do the work. In particular we would like to know whether you have programmed before in open-source projects.'''<br />
Despite the fact that I do not have a programmer profile, I am strongly determined to carry out this project and to compensate for my lack of knowledge on computational linguistics with the maximum dedication. I feel, however, that my skills are adequate for this project for the following reasons. <br />
Firstly because my native language is Sardinian. I have always spoken Sardinian and I know deeply the characteristics of Sardinian language. I am aware of the advances that have been made to achieve a standard language and I know the phonetic, grammatical and literary of the variants of my tongue.<br />
Secondly, because of my education background. In 2015 I completed my Bachelor's Degree in Foreign Languages. In the course of my studies I have supported various exams relating to translation and interpreting from Spanish and English into Italian. In the academic year 2011-2012 I was awarded an Erasmus scholarship, by means of which I could attend courses at the University of the Basque Country. During this experience I participated in a project of audiovisual translation and subtitling, which then became the subject of my thesis. I also took a course called "Computers for translators" through which I acquired skills on the use of translation memories and CAT tools (SDL Trados and Wordfast). I have as well received specific training on computational linguistics at the University of Cagliari, in which I learned to use markup languages (XML and HTML) for the creation of linguistic corpora. At present, I attend a Master’s Degree in Translation of specialised texts at the University of Cagliari, where I have been trained on translation technologies and localisation. My passion about translation technologies has allowed me to be selected by the University of Cagliari to teach during the month of March a course on Computer Translation, in the context of which I have taught Audiovisual Translation and CAT tools for 40 hours. <br />
Finally, I have recently won a scholarship, provided by the plan Erasmus + Traineeship, thanks to which, for the next three months, I will do an internship at the Tradumàtica Research Group at the UAB (Universitat Autònoma de Barcelona). During my stage in Barcelona, I will carry out tasks related to translation and localisation with the aid of CAT tools, especially focusing on free software and minoritized languages. I am convinced that the things I will learn with the Tradumàtica Research group at the UAB will allow me to carry out the project I am submitting with success.<br />
Despite the fact that I have never developed an open-source project myself, I do have participated in open-source projects involving the modification of the source code of software in order to translate it into other languages. For instance, I have participated in the localisation of the open-source instant messaging client Telegram into Sardinian, both for iPhone and Android platforms, and I plan to be able to complete these translations over the next few weeks.<br />
<br />
'''List any non-Summer-of-Code plans you have for the Summer, especially employment, if you are applying for internships, and class-taking. Be specific about schedules and time commitments. we would like to be sure you have at least 30 free hours a week to develop for our project.'''<br />
I can guarantee 30 hours per week to work on this project. I will finish my studies at the University of Cagliari on June and during my stage in Barcelona I will be able to work in this project. My obligations, therefore, leave me plenty of hours, especially during the weekends, to devote to it. My stay in Barcelona will end on 07.31.2016, and from there on there will be a pause in the period from 06.19.2016 to 06.25.2016, due to academic issues. I plan to increase the number of work hours during the previous and the following weeks so that this pause will not affect the global calendar.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd/_Apertium_ita-srd:_relata_finale&diff=64455Apertium cat-srd/ Apertium ita-srd: relata finale2017-09-21T16:18:34Z<p>Grfro3d: /* Ditzionàriu morfològicu sardu */</p>
<hr />
<div>==Descritzione de su traballu==<br />
<br />
Su progetu pro sa partetzipatzione a su programma “Google Summer of Code 2017” cun s’organizatzione Apertium est istadu s’isvilupu de unu Tradutore Automàticu basadu in règulas intre su catalanu e su sardu e sa sighida de su progetu de s’annu coladu, apertium ita-srd. Cust’idea benit dae sa voluntade de chèrrere isvilupare un’àteru trastu de agiudu pro sa limba sarda, sighinde su matessi caminu de su traballu fatu s’annu passadu pro apertium ita-srd.<br />
A custu progetu ant partetzipadu Gianfranco Fronteddu, Hèctor Alòs i Font e Francis Tyers e Adrià Martín.<br />
<br />
Comente si podet bìdere in su ligàmene de su [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], b’at àpidu duas fases: una prus longa, chi est durada totu sos meses de làmpadas e de trìulas, dedicada a catalanu-sardu e s’àtera, prus curtza, contivigiada in su mese de austu, ponende·nche sas bases pro unu tradutore sardu-italianu nou.<br />
<br />
Su sèberu de chèrrere isvilupare un'àteru tradutore automàticu in sardu, custa borta catàlanu-sardu, est dèvidu a unas cantas resones.<br />
Pro prima cosa, s'istadu de perìgulu chi est bivende sa limba sarda e su bisòngiu de nche ammanniare cantas prus fainas possìbiles in sardu, mescamente in sa tecnologia de còdighe abertu e in Limba Sarda Comuna, chi est sa proposta ortogràfica regionale de s'annu 2006 pigada comente riferimentu dae Apertium. Custu faghent a manera chi una faina che a custa siat de agiudu pro s'amparu de su sardu e pro s'istandardizatzione cumpleta de sa limba. Tando, isfrutende·nche su traballu de s'annu coladu, amus pensadu de creare un'àtera croba linguìstica e de megiorare sa chi bi fiat giai, creschende sa cantidade de risorsas in LSC e ponende·nche una base galu prus manna pro s'ammàniu de àteros traballos in su benidore.<br />
<br />
Imbetzes, su sèberu de isvilupare unu tradutore cun sa limba catalana est dèvidu in antis totu a su fatu chi su catalanu est, comente a su sardu, una de sas chimbe limbas de minoria in Sardigna (sardu, catalanu-aligheresu, gadduresu, tataresu e tabarchinu), faeddada in sa tzitade de S'Alighera, cun belle 33.000 faeddadores. In prus, su catalanu est una de sas limbas pro sa chi prus Apertium tenet risorsas.<br />
<br />
Posca, custa prataforma de còdighe abertu est adata pro limbas romantzas chi s'assimìgiant intre issas comente su catalanu e su sardu, chi rispetant custu rechisitu pro sas influèntzias e s'eredidade linguìstica catalana presente in sa limba sarda, dèvida a s'època de s'ocupatzione Catalana-Aragonesa in Sardigna. In custa manera sa cantidade manna de testos, testimònios e materiales in limba catalana a pitzu de s'istòria sarda ant a èssere a disponimentu fintzas in sardu etotu, gasi comente totu sas publicatziones e sos istùdios de sotziolinguìstica e de polìtica linguìstica de interessu pro sa situatzione de sa limba sarda.<br />
<br />
Sos impreos de custu tradutore diant pòdere èssere medas: s'isvilupu de sa Wikipèdia in sardu diat dèvere bènnere fintzas dae sa tradutzione de sos artìculos chi non sunt gasi detalliados in italianu o dant importu a aspetos diferentes. Bortende·nche fintzas dae un'àtera limba che a su catalanu sa cantidade de sas informatziones podet crèschere galu de prus oferende puru un'àtera manera pro espressare sos matessi cuntzetos. <br />
<br />
Pro cumpletare su pranu de traballu, bi fiat sa voluntade de fàghere calicuna cosa in su mese de austu fintzas pro srd-ita, semper chi sos obietivos de sa prima fase prus longa s'èsserent cumpridos in sos tempos disinnados.<br />
<br />
==Prima fase: Apertium cat-srd (29 de Maju - 29 de Trìulas)==<br />
Sa prima fase at pertocadu s'isvilupu de su tradutore catalanu-sardu. Su tradutore, pro more de su traballu fatu in antis dae Francis Tyers, fiat in sa setzione [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] e partiat cun unu tantu de 2645 paràulas in su ditzionàriu bilìngue, una "cobertura fina" (trimmed coverage) de belle su 77% e una pertzentuale de errore WER de su su 34.8%. S'obietivu fiat de lograre su 90% de cobertura e de abbassare su WER a mancu de su 15%.<br />
<br />
In càmbiu de s'annu passadu, chi pro isvilupare su tradutore italianu-sardu b'est istadu su bisòngiu de isvilupare in pràtica totu su ditzionàriu morfològicu sardu e megiorare puru aspetos de s'analizadore morfològicu italianu, ocannu si partiat dae duas limbas bene isvilupadas in sa prataforma de Apertium. Nos semus pòdidos cuntzentrare esclusivamente in su trasferimentu dae una limba a s'àtera, alleghende de paràulas, istruturas morfològicas e sintàticas.<br />
<br />
Sighende sas datas de su programma GSOC 2017, in su mese de maju e in sa prima chida de làmpadas ("Community Bounding") s'est traballadu meda in s'anàlisi cuntrastiva intre su catalanu e su sardu pro sa creatzione de sos "pending test". Pighende·nche comente riferimentu fintzas sos "pending test" de ita-srd, nche sunt essidas a campu diferèntzias istruturales in formas interrogativas, numerales, possessivos, fòrmulas de òbligu e formas continuadas, su passadu, futuru e cunditzionale, e mescamente sos clìticos.<br />
<br />
===Ditzionàriu morfològicu sardu===<br />
Pro cantu pertocat su ditzionàriu morfològicu sardu, disponìamus giai de unu cun 51.800 paràulas (chi includiat sos lemmas de su [http://www.sardegnacultura.it/cds/cros-lsc/ Curretore Ortorgràficu Regionale Sardu]), isvilupadu durante s'anteriore GSoC. Nche sunt istadas agiuntas 15.500 paràulas in prus: 1300 sustantivos, 800 agetivos, 300 avèrbios, 250 verbos e 12.500 nùmenes pròpios. Si tratat, pro sa majoria, de terminologia iscientìfica e tècnica, e vocabulàriu sotziopolìticu. Cun s'etzetzione de sos nùmenes pròpios, cun sos cales sunt istados sighidos su prus àteros critèrios, sa seletzione de sas paràulas de introdùere est istada fata partinde dae sa [https://ca.wikipedia.org/wiki/Portada/ Wikipedia Catalana].<br />
<br />
Posca, est istadu acontzadu su ditzionàriu, boghende·nche medas allegas chi non fiant normativas e curregende faddinas in s'assignatzione de sos paradigmas (mescamente alleghende de su gènere assignadu a carchi sustantivu).<br />
<br />
===Ditzionàriu morfològicu catalanu===<br />
Fintzas in su ditzionàriu morfològicu catalanu b'at àpidu un'agiunta de nùmenes pròpios, belle 10.000.<br />
<br />
===Disambiguatzione morfològica in catalanu===<br />
In su ditzionàriu catalanu nche sunt istadas iscritas 15 règulas de disambiguatzione morfològica e nde est istada modificada calicun'àtera. <br />
<br />
===Règulas de seletzione lessicale===<br />
Su tradutore disponet de 274 règulas de seletzione lessicale. Si tratat de règulas chi sèberant cale de duas o prus possìbiles tradutziones est sa prus adata in unu determinadu cuntestu. (A mesu a mesu, in su ditzionàriu bilìngue b'at chèntinas de sèberos intre diferentes possìbiles tradutziones de una paràula, ma, a diferèntzia de sas règulas, custu sèberu si faghet in cada cuntestu.)<br />
<br />
===Règulas de trasferimentu===<br />
Su tradutore disponet de 78 règulas de trasferimentu. Si tratat de règulas chi modìficant s'istrutura de sa frase in catalanu pro l'adatare a s'istrutura chi bi bolet in sardu. Pro esempru:<br />
<br />
<li>El '''meu''' llibre > Su libru '''meu'''<br />
<br />
<li>'''Vaig''' menjar > '''Apo''' mandigadu<br />
<br />
<li>'''He''' anat > '''So''' andadu<br />
<br />
<li>Vull saludar-'''lo''' > '''Lu''' chèrgio saludare<br />
<br />
===Calidade===<br />
Sa valutatzione de sa calidade serbit a proare comente funtzionat su tradutore in sa pràtica. B'at medas maneras de la fàghere e sos testos chi si sèberant dipendent dae cal'est s'impreu chi si nde devet fàghere de su tradutore: in pagas paràulas, serbit a carculare cantas paràulas tocat de cambiare pro pòdere publicare su testu.<br />
<br />
"Su Word Error Rate" (WER) est s'indicadore chi inditat sas paràulas chi si devent cambiare pro pòdere publicare su testu. Segundu su "Work plan" s'obietivu fiat de nche arribbare a una pertzentuale prus bassa de su 15%. Su tassu de faddinas in sa tradutzione est 13,9% (nùmeru otentu cun s'indicadore WER partende dae duos testos pigados a casu de 600 paràulas de sa Wikipedia).<br />
<br />
''Sa cobertura'' de su tradutore (pertzentuale de paràulas reconnotas) est 94,0% (nùmeru otentu partinde dae unu corpus mannu de sa Wikipedia).<br />
<br />
====Testu in catalanu (seberadu a s'arriscu/a casu)====<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la "ciutat alta" i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
====Tradutzione automàtica a su sardu====<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa "tzitade arta" e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Segunda fase: apertium srd-ita (austu 2017)==<br />
In sa segunda fase de su progetu amus traballadu in previsione de unu tradutore nou srd-ita. Su chi amus pòdidu fàghere est istadu cumintzare una disàmbiguatzione morfològica manuale de una parte de sos còrpora chi tenimus pro agiudare su tradutore a reconnòschere sa morfologia curreta de cada paràula, mescamente in cuddos testos chi non fiant de su totu a norma.<br />
<br />
Amus tratadu duos corpus: unu giornalìsticu e prus dialetale e s'àteru pigadu deretu dae testos literàrios perfetamente a norma LSC. De su primu corpus est istada etichetada un'annanta de 6000 paràulas, de su segundu 11800. Sa falta de tempus (duas chidas pro sa tarea) no at permìtidu revisionare s'etichetadura. Pro custu no est istadu possìbile creare unu disambiguadore morfològicu pro su sardu, comente fiat s'intentzione nostra.<br />
<br />
Sunt istadas agiuntas fintzas 9 règulas noas de trasferimentu e curregidas calicuna de sas chi bi fiant giai. In pràtica, como si traduent in manera curreta sos tempos verbales dae su sardu a s'italianu (pretzisu su futuru e su cunditzionale). Est megiorada sa tradutzione de sos possessivos e si tratat carchi casu de enclìticos (in sardu bi podent èssere finas a tres enclìticos cando chi in italianu non bi nde podent àere prus de duos)<br />
<br />
Amus fatu carchi cosa fintzas in su ditzionàriu italianu, agiunghende·nche 4 règulas de disambiguatzione morfològica (de sa bator una de importu fiat sa disambiguatzione de "sono" comente a "so" e "sunt"). In prus, amus annantu una lista de istados de su mundu (chi nos at dadu Diegu Corràine) e dae custos nd'amus bogadu a campu fintzas sos gentilìtzios currispondentes, In totu su ditzionàriu bilìngue italianu-sardu tenet 1400 in prus dae su cumintzu de GSoC. S'ispurgadura de sos errores de su ditzionàriu sardu e s'agiunta de sas intradas ant a permìtere in pagu tempus de ammaniare una versione noa de su tradutore italianu-sardu.<br />
<br />
==Risorsas==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Pranos pro su benidore==<br />
Su traballu de sa de duas fases de su progetu at a èssere isfrutadu pro sa creatzione de una versione de ita-srd prus pretzisa e agiornada. Agabbare su traballu chi amus cumintzadu serbit pro sa creatzione de unu disambiguadore morfològicu chi at a èssere ùtile pro sa disambiguatzione de sos corpora e non si nd'at a pòdere fàghere a mancu pro isvilupare àteras crobas linguìsticas cun su sardu in su benidore.<br />
<br />
==Concrusiones==<br />
Su progetu est agabbadu cun risultados chi podimus cunsiderare bonos. Lis cheria torrare gràtzias a Apertium pro custa oportunidade, a Francis Tyers e mescamente a Hèctor Alòs i Font pro àere traballadu paris cun megus totu custos meses.<br />
<br />
Li torramus gràtzias a Diegu Corràine chi nos at agiudadu meda intreghende·nos materiales medas pro s'ammàniu de custu progetu e cussigiende·nos bene cada bia chi li pregontaìamus calicuna cosa in contu de limba sarda. Gràtzias fintzas a sa Gazeta pro sos testos chi faghent parte de su corpus chi amus creadu pro traballare. Est pretzisu mentovare sa Regione Sardigna chi at postu a disponimentu risorsas lìberas e trastos chi agiudant sa creatzione de trastos noos de còdighe abertu che a custu chi amus in pessu aprontadu.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64265Apertium cat-srd and ita-srd/GSoC 20172017-09-02T15:56:52Z<p>Grfro3d: /* Work and Commits */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu Final report==<br />
<br />
===Work and Commits===<br />
You can see my work including the code and a full list of commits here: https://apertium.projectjj.com/gsoc2017/gfro3d.html.<br />
<br />
In this repository you can find apertium cat-srd https://svn.code.sf.net/p/apertium/svn/trunk/apertium-cat-srd<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation from Catalan to Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, as done in the last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and another one, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for improving the translator from Italian to Sardinian, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we spent some time in the translator from Italian to Sardinian.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase, lasting until the second GSoC evaluation, concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] section and had 2645 lemma in the bilingual dictionary, a [http://wiki.apertium.org/wiki/Calculating_coverage Coverage] of about 77% and a [http://wiki.apertium.org/wiki/WER WER] error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, in which it was necessary to develop almost the whole Sardinian morphological dictionary and also to improve aspects of the Italian morphological analyzer, this year we started from two already developed languages on the Apertium platform. We could focus mainly on transferring from one language to another, i.e. words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Catalan and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of cat-srd, structural differences in numerals, possessive forms, duty formulas and continuous tenses, past tenses, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/ CROS] lemmas, developed in the previous year. 15,500 more words have been added: c. 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 proper names. This is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made from a [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia] corpus.<br />
<br />
Then the Sardinian dictionary have been adjusted, removing [https://sc.wikipedia.org/wiki/Limba_Sarda_Comuna non-standard (LSC)] words and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary proper names have been added, almost 10,000.<br />
<br />
===Morphological disambiguation in Catalan===<br />
15 new morphological disambiguation rules for Catalan have been written and a few more have been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose between two or more possible translations in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
<li>El '''meu''' llibre > Su libru '''meu'''<br />
<br />
<li>'''Vaig''' menjar > '''Apo''' mandigadu<br />
<br />
<li>'''He''' anat > '''So''' andadu<br />
<br />
<li>Vull saludar-'''lo''' > '''Lu''' chèrgio saludare<br />
<br />
===Quality===<br />
Quality assessment is used to see how the translator works in practice.<br />
<br />
''[http://wiki.apertium.org/wiki/WER Word Error Rate (WER)]'' is an indicator that shows the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a percentage lower than 15%. The rate of errors in translation is 13.9% (number obtained by the WER indicator calculated on randomly taken texts from Wikipedia - 600 words).<br />
<br />
The ''[http://wiki.apertium.org/wiki/Calculating_coverage Coverage]'' (percentage of recognized words) is 94% (number obtained from a large Wikipedia corpus).<br />
<br />
====Text in Catalan (chosen randomly)====<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
====Machine translation to Sardinian====<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (July, 29th - August, 29th)==<br />
In the second phase of the project we have put the basis of a new translator from Sardinian to Italian. We started a manual morphological disambiguation of the corpora that will help the translator to recognize the correct morphology of each word.<br />
<br />
We have treated two corpora: one journalistic and more dialectal, and other taken directly from literary texts written in model LSC. From the first one, 6000 words were disambiguated, from the second one 11800.<br />
<br />
9 new transfer rules have been added and we have corrected some previously written. Now verb tenses are translated correctly from Sardinian to Italian. The translation of possessives and enclitics has been also improved (Sardinian has up to three enclitics, whereas in Italian there cannot be more than two).<br />
<br />
By the way, we also improved a bit the Italian morphological analyzer, adding 4 morphological disambiguation rules (for disambiguating "sono" as "so" or "sunt"). Additionally, we have added a list of countries in the world (given by Diegu Corràine) and we have obtained the corresponding denonyms. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will permit to create soon a new version of ita-srd translator.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Curretore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64155Apertium cat-srd and ita-srd/GSoC 20172017-08-28T17:35:22Z<p>Grfro3d: /* Work and Commits */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu Final report==<br />
<br />
===Work and Commits===<br />
You can see my work including the code and a full list of commits here: https://apertium.projectjj.com/gsoc2017/gfro3d.html.<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way of last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] section and started from a number of 2645 in the bilingual dictionary, a [http://wiki.apertium.org/wiki/Calculating_coverage Coverage] of about 77% and a [http://wiki.apertium.org/wiki/WER WER] error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/ CROS] lemmas, developed during the previous. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia].<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
<br />
Vaig menjar > Apo mandigadu<br />
<br />
He anat > So andadu<br />
<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''[http://wiki.apertium.org/wiki/WER Word Error Rate (WER)]'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''[http://wiki.apertium.org/wiki/Calculating_coverage Coverage]'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (July, 29th - August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Curretore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd/_Apertium_ita-srd:_relata_finale&diff=64153Apertium cat-srd/ Apertium ita-srd: relata finale2017-08-28T17:24:21Z<p>Grfro3d: /* Descritzione de su traballu */</p>
<hr />
<div>==Descritzione de su traballu==<br />
<br />
Su progetu pro sa partetzipatzione a su programma “Google Summer of Code 2017” cun s’organizatzione Apertium est istadu s’isvilupu de unu Tradutore Automàticu basadu in règulas intre su catalanu e su sardu e sa sighida de su progetu de s’annu coladu, apertium ita-srd. Cust’idea benit dae sa voluntade de chèrrere isvilupare un’àteru trastu de agiudu pro sa limba sarda, sighinde su matessi caminu de su traballu fatu s’annu passadu pro apertium ita-srd.<br />
A custu progetu ant partetzipadu Gianfranco Fronteddu, Hèctor Alòs i Fonts e Francis Tyers e A. Martín.<br />
<br />
Comente si podet bìdere in su ligàmene de su [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], b’at àpidu duas fases: una prus longa, chi est durada totu sos meses de làmpadas e de trìulas, dedicada a catalanu-sardu e s’àtera, prus curtza, contivigiada in su mese de austu, ponende·nche sas bases pro unu tradutore sardu-italianu nou.<br />
<br />
Su sèberu de chèrrere isvilupare un'àteru tradutore automàticu in sardu, custa borta catàlanu-sardu, est dèvidu a unas cantas resones.<br />
Pro prima cosa, s'istadu de perìgulu chi est bivende sa limba sarda e su bisòngiu de nche ammanniare cantas prus fainas possìbiles in sardu, mescamente in sa tecnologia de còdighe abertu e in Limba Sarda Comuna, chi est sa proposta ortogràfica regionale de s'annu 2006 pigada comente riferimentu dae Apertium. Custu faghent a manera chi una faina che a custa siat de agiudu pro s'amparu de su sardu e pro s'istandardizatzione cumpleta de sa limba. Tando, isfrutende·nche su traballu de s'annu coladu, amus pensadu de creare un'àtera croba linguìstica e de megiorare sa chi bi fiat giai, creschende sa cantidade de risorsas in LSC e ponende·nche una base galu prus manna pro s'ammàniu de àteros traballos in su benidore.<br />
Imbetzes, su sèberu de isvilupare unu tradutore cun sa limba catalana est dèvidu in antis totu a su fatu chi su catalanu est, comente a su sardu, una de sas chimbe limbas de minoria in Sardigna (sardu, catalanu-aligheresu, gadduresu, tataresu e tabarchinu), faeddada in sa tzitade de S'Alighera, cun belle 33.000 faeddadores. In prus, su catalanu est una de sas limbas pro sa chi prus Apertium tenet risorsas.<br />
Posca, custa prataforma de còdighe abertu est adata pro limbas romantzas chi s'assimìgiant intre issas comente su catalanu e su sardu, chi rispetant custu rechisitu pro sas influèntzias e s'eredidade linguìstica catalana presente in sa limba sarda, dèvida a s'època de s'ocupatzione Catalana-Aragonesa in Sardigna. In custa manera sa cantidade manna de testos, testimònios e materiales in limba catalana a pitzu de s'istòria sarda ant a èssere a disponimentu fintzas in sardu etotu, gasi comente totu sas publicatziones e sos istùdios de sotziolinguìstica e de polìtica linguìstica de interessu pro sa situatzione de sa limba sarda.<br />
<br />
Sos impreos de custu tradutore diant pòdere èssere medas: s'isvilupu de sa Wikipèdia in sardu diat dèvere bènnere fintzas dae sa tradutzione de sos artìculos chi non sunt gasi detalliados in italianu o dant importu a aspetos diferentes. Bortende·nche fintzas dae un'àtera limba che a su catalanu sa cantidade de sas informatziones podet crèschere galu de prus oferende puru un'àtera manera pro espressare sos matessi cuntzetos. <br />
<br />
Pro cumpletare su pranu de traballu, bi fiat sa voluntade de fàghere calicuna cosa in su mese de austu fintzas pro srd-ita, semper chi sos obietivos de sa prima fase prus longa s'èsserent cumpridos in sos tempos disinnados. Comente amus pòdidu averguare chi sos risultados pro cat-srd a ùrtimos de Trìulas fiant bonos amus, detzìdidu de nos dedicare a srd-ita.<br />
<br />
==Prima fase: Apertium cat-srd (29 de Maju - 29 de Trìulas)==<br />
Sa prima fase at pertocadu s'isvilupu de su tradutore catalanu-sardu. Su tradutore, pro more de su traballu fatu in antis dae Francis Tyers, fiat in sa setzione [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] e partiat cun unu tantu de 2645 paràulas in su ditzionàriu bilìngue, una "cobertura fina" (trimmed coverage) de belle su 77% e una pertzentuale de errore WER de su su 34.8%. S'obietivu fiat de lograre su 90% de cobertura e de abbassare su WER a mancu de su 15%.<br />
In càmbiu de s'annu passadu, chi pro isvilupare su tradutore italianu-sardu b'est istadu su bisòngiu de isvilupare in pràtica totu su ditzionàriu morfològicu sardu e megiorare puru aspetos de s'analizadore morfològicu italianu, ocannu si partiat dae duas limbas bene isvilupadas in sa prataforma de Apertium. Nos semus pòdidos cuntzentrare esclusivamente in su trasferimentu dae una limba a s'àtera, alleghende de paràulas, istruturas morfològicas e sintàticas.<br />
<br />
Sighende sas datas de su programma GSOC 2017, in su mese de maju e in sa prima chida de làmpadas ("Community Bounding") s'est traballadu meda in s'anàlisi cuntrastiva intre su catalanu e su sardu pro sa creatzione de sos "pending test". Pighende·nche comente riferimentu fintzas sos "pending test" de ita-srd, nche sunt essidas a campu diferèntzias istruturales in formas interrogativas, numerales, possessivos, fòrmulas de òbligu e formas continuadas, su passadu, futuru e cunditzionale, e mescamente sos clìticos.<br />
<br />
===Ditzionàriu morfològicu sardu===<br />
Pro cantu pertocat su ditzionàriu morfològicu sardu, disponìamus giai de unu cun 51.800 paràulas (chi includiat sos lemmas de su [http://www.sardegnacultura.it/cds/cros-lsc/ Curretore Ortorgràficu Regionale Sardu]), isvilupadu durante s'anteriore GSoC. Nche sunt istadas agiuntas 15.500 paràulas in prus: 1300 sustantivos, 800 agetivos, 300 avèrbios, 250 verbos e 12.500 nùmenes pròpios. Si tratat, pro sa majoria, de terminologia iscientìfica e tècnica, e vocabulàriu sotziopolìticu. Cun s'etzetzione de sos nùmenes pròpios, cun sos cales sunt istados sighidos su prus àteros critèrios, sa seletzione de sas paràulas de introdùere est istada fata partende dae sa [https://ca.wikipedia.org/wiki/Portada/ Wikipedia Catalana].<br />
Posca, est istadu acontzadu su ditzionàriu, boghende·nche medas allegas chi non fiant normativas e curregende faddinas in s'assignatzione de sos paradigmas (mescamente alleghende de su gènere assignadu a carchi sustantivu).<br />
<br />
===Ditzionàriu morfològicu catalanu===<br />
Fintzas in su ditzionàriu morfològicu catalanu b'at àpidu un'agiunta de nùmenes pròpios, belle 10.000.<br />
<br />
===Disambiguatzione morfològica in catalanu===<br />
In su ditzionàriu catalanu nche sunt istadas iscritas 15 règulas de disambiguatzione morfològica e nde est istada modificada calicun'àtera. <br />
<br />
===Règulas de seletzione lessicale===<br />
Su tradutore disponet de 274 règulas de seletzione lessicale. Si tratat de règulas chi sèberant cale de duas o prus possìbiles tradutziones est sa prus adata in unu determinadu cuntestu. (A mesu a mesu, in su ditzionàriu bilìngue b'at chèntinas de sèberos intre diferentes possìbiles tradutziones de una paràula, ma, a diferèntzia de sas règulas, custu sèberu si faghet in cada cuntestu.)<br />
<br />
===Règulas de trasferimentu===<br />
Su tradutore disponet de 78 règulas de trasferimentu. Si tratat de règulas chi modìficant s'istrutura de sa frase in catalanu pro l'adatare a s'istrutura chi bi bolet in sardu. Pro esempru:<br />
<br />
El meu llibre > Su libru meu<br />
<br />
Vaig menjar > Apo mandigadu<br />
<br />
He anat > So andadu<br />
<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Calidade===<br />
Sa valutatzione de sa calidade serbit a proare comente funtzionat su tradutore in sa pràtica. B'at medas maneras de la fàghere e sos testos chi si sèberant dipendent dae cal'est s'impreu chi si nde devet fàghere de su tradutore: in pagas paràulas, serbit a carculare cantas paràulas tocat de cambiare pro pòdere publicare su testu.<br />
<br />
"Su Word Error Rate" (WER) est s'indicadore chi inditat sas paràulas chi si devent cambiare pro pòdere publicare su testu. Segundu su "Work plan" s'obietivu fiat de nche arribbare a una pertzentuale prus bassa de su 15%. Su tassu de faddinas in sa tradutzione est 13,9% (nùmeru otentu cun s'indicadore WER partende dae duos testos pigados a casu de 600 paràulas de sa Wikipedia).<br />
<br />
''Sa cobertura'' de su tradutore (pertzentuale de paràulas reconnotas) est 94,0% (nùmeru otentu partinde dae unu corpus mannu de sa Wikipedia).<br />
<br />
===Testu in catalanu (seberadu a s'arriscu/a casu)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la "ciutat alta" i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Tradutzione automàtica a su sardu===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa "tzitade arta" e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Segunda fase: apertium srd-ita (austu 2017)==<br />
In sa segunda fase de su progetu amus traballadu in previsione de unu tradutore nou srd-ita. Su chi amus pòdidu fàghere est istadu cumintzare una disàmbiguatzione morfològica manuale de una parte de sos còrpora chi tenimus pro agiudare su tradutore a reconnòschere sa morfologia curreta de cada paràula, mescamente in cuddos testos chi non fiant de su totu a norma. <br />
Amus tratadu duos corpus: unu giornalìsticu e prus dialetale e s'àteru pigadu deretu dae testos literàrios perfetamente a norma LSC. De su primu corpus est istada etichetada un'annanta de 6000 paràulas, de su segundu 11800. Sa falta de tempus (duas chidas pro sa tarea) no at permìtidu revisionare s'etichetadura. Pro custu no est istadu possìbile creare unu disambiguadore morfològicu pro su sardu, comente fiat s'intentzione nostra. <br />
Sunt istadas agiuntas fintzas 9 règulas noas de trasferimentu e curregidas calicuna de sas chi bi fiant giai. In pràtica, como si traduent in manera curreta sos tempos verbales dae su sardu a s'italianu (pretzisu su futuru e su cunditzionale). Est megiorada sa tradutzione de sos possessivos e si tratat carchi casu de enclìticos (in sardu bi podent èssere finas a tres enclìticos cando chi in italianu non bi nde podent àere prus de duos)<br />
<br />
Amus fatu carchi cosa fintzas in su ditzionàriu italianu, agiunghende·nche 4 règulas de disambiguatzione morfològica (de sa bator una de importu fiat sa disambiguatzione de "sono" comente a "so" e "sunt"). In prus, amus annantu una lista de istados de su mundu (chi nos at dadu Diegu Corràine) e dae custos nd'amus bogadu a campu fintzas sos gentilìtzios currispondentes, In totu su ditzionàriu bilìngue italianu-sardu tenet 1400 in prus dae su cumintzu de GSoC. S'ispurgadura de sos errores de su ditzionàriu sardu e s'agiunta de sas intradas ant a permìtere in pagu tempus de ammaniare una versione noa de su tradutore italianu-sardu.<br />
<br />
==Risorsas==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Pranos pro su benidore==<br />
Su traballu de sa de duas fases de su progetu at a èssere isfrutadu pro sa creatzione de una versione de ita-srd prus pretzisa e agiornada. Agabbare su traballu chi amus cumintzadu serbit pro sa creatzione de unu disambiguadore morfològicu chi at a èssere ùtile pro sa disambiguatzione de sos corpora e non si nd'at a pòdere fàghere a mancu pro isvilupare àteras crobas linguìsticas cun su sardu in su benidore.<br />
<br />
==Concrusiones==<br />
Su progetu est agabbadu cun risultados chi podimus cunsiderare bonos. Lis cheria torrare gràtzias a Apertium pro custa oportunidade, a Francis Tyers e mescamente a Hèctor Alòs i Font pro àere traballadu paris cun megus totu custos meses. <br />
Li torramus gràtzias a Diegu Corràine chi nos at agiudadu meda intreghende·nos materiales medas pro s'ammàniu de custu progetu e cussigiende·nos bene cada bia chi li pregontaìamus calicuna cosa in contu de limba sarda. Gràtzias fintzas a sa Gazeta pro sos testos chi faghent parte de su corpus chi amus creadu pro traballare. Est pretzisu mentovare sa Regione Sardigna chi at postu a disponimentu risorsas lìberas e trastos chi agiudant sa creatzione de trastos noos de còdighe abertu che a custu chi amus in pessu aprontadu.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64151Apertium cat-srd and ita-srd/GSoC 20172017-08-28T17:20:55Z<p>Grfro3d: /* Resources */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu Final report==<br />
<br />
===Work and Commits===<br />
You can see my work and a full list of commits here: https://apertium.projectjj.com/gsoc2017/gfro3d.html.<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way of last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] section and started from a number of 2645 in the bilingual dictionary, a [http://wiki.apertium.org/wiki/Calculating_coverage Coverage] of about 77% and a [http://wiki.apertium.org/wiki/WER WER] error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/ CROS] lemmas, developed during the previous. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia].<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
<br />
Vaig menjar > Apo mandigadu<br />
<br />
He anat > So andadu<br />
<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''[http://wiki.apertium.org/wiki/WER Word Error Rate (WER)]'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''[http://wiki.apertium.org/wiki/Calculating_coverage Coverage]'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (July, 29th - August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Curretore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64149Apertium cat-srd and ita-srd/GSoC 20172017-08-28T17:17:09Z<p>Grfro3d: /* Sardinian morphological dictionary */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu Final report==<br />
<br />
===Work and Commits===<br />
You can see my work and a full list of commits here: https://apertium.projectjj.com/gsoc2017/gfro3d.html.<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way of last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] section and started from a number of 2645 in the bilingual dictionary, a [http://wiki.apertium.org/wiki/Calculating_coverage Coverage] of about 77% and a [http://wiki.apertium.org/wiki/WER WER] error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/ CROS] lemmas, developed during the previous. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia].<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
<br />
Vaig menjar > Apo mandigadu<br />
<br />
He anat > So andadu<br />
<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''[http://wiki.apertium.org/wiki/WER Word Error Rate (WER)]'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''[http://wiki.apertium.org/wiki/Calculating_coverage Coverage]'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (July, 29th - August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64144Apertium cat-srd and ita-srd/GSoC 20172017-08-28T16:28:06Z<p>Grfro3d: /* First phase: Apertium cat-srd (May, 29th - July, 29th) */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu Final report==<br />
<br />
===Work and Commits===<br />
You can see my work and a full list of commits here: https://apertium.projectjj.com/gsoc2017/gfro3d.html.<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way of last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] section and started from a number of 2645 in the bilingual dictionary, a [http://wiki.apertium.org/wiki/Calculating_coverage Coverage] of about 77% and a [http://wiki.apertium.org/wiki/WER WER] error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/ CROS] lemmas, developed during the previous [https://apertium.projectjj.com/gsoc2016/gfro3d.html/ GSoC 2016]. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia].<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
<br />
Vaig menjar > Apo mandigadu<br />
<br />
He anat > So andadu<br />
<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''[http://wiki.apertium.org/wiki/WER Word Error Rate (WER)]'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''[http://wiki.apertium.org/wiki/Calculating_coverage Coverage]'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (July, 29th - August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64143Apertium cat-srd and ita-srd/GSoC 20172017-08-28T16:27:39Z<p>Grfro3d: /* First phase: Apertium cat-srd (May, 29th - July, 29th) */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu Final report==<br />
<br />
===Work and Commits===<br />
You can see my work and a full list of commits here: https://apertium.projectjj.com/gsoc2017/gfro3d.html.<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way of last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] section and started from a number of 2645 in the bilingual dictionary, a [http://wiki.apertium.org/wiki/Calculating_coverage Coverage] of about 77% and a [http://wiki.apertium.org/wiki/WER (WER)] error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/ CROS] lemmas, developed during the previous [https://apertium.projectjj.com/gsoc2016/gfro3d.html/ GSoC 2016]. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia].<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
<br />
Vaig menjar > Apo mandigadu<br />
<br />
He anat > So andadu<br />
<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''[http://wiki.apertium.org/wiki/WER Word Error Rate (WER)]'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''[http://wiki.apertium.org/wiki/Calculating_coverage Coverage]'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (July, 29th - August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64142Apertium cat-srd and ita-srd/GSoC 20172017-08-28T16:27:10Z<p>Grfro3d: /* First phase: Apertium cat-srd (May, 29th - July, 29th) */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu Final report==<br />
<br />
===Work and Commits===<br />
You can see my work and a full list of commits here: https://apertium.projectjj.com/gsoc2017/gfro3d.html.<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way of last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] section and started from a number of 2645 in the bilingual dictionary, a [http://wiki.apertium.org/wiki/Calculating_coverage Coverage] of about 77% and a [http://wiki.apertium.org/wiki/WER Word Error Rate (WER)] error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/ CROS] lemmas, developed during the previous [https://apertium.projectjj.com/gsoc2016/gfro3d.html/ GSoC 2016]. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia].<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
<br />
Vaig menjar > Apo mandigadu<br />
<br />
He anat > So andadu<br />
<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''[http://wiki.apertium.org/wiki/WER Word Error Rate (WER)]'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''[http://wiki.apertium.org/wiki/Calculating_coverage Coverage]'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (July, 29th - August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64141Apertium cat-srd and ita-srd/GSoC 20172017-08-28T16:25:33Z<p>Grfro3d: /* Quality */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu Final report==<br />
<br />
===Work and Commits===<br />
You can see my work and a full list of commits here: https://apertium.projectjj.com/gsoc2017/gfro3d.html.<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way of last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/ CROS] lemmas, developed during the previous [https://apertium.projectjj.com/gsoc2016/gfro3d.html/ GSoC 2016]. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia].<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
<br />
Vaig menjar > Apo mandigadu<br />
<br />
He anat > So andadu<br />
<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''[http://wiki.apertium.org/wiki/WER Word Error Rate (WER)]'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''[http://wiki.apertium.org/wiki/Calculating_coverage Coverage]'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (July, 29th - August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd/_Apertium_ita-srd:_relata_finale&diff=64140Apertium cat-srd/ Apertium ita-srd: relata finale2017-08-28T16:16:44Z<p>Grfro3d: /* Prima fase: Apertium cat-srd (29 de Maju - 29 de Trìulas) */</p>
<hr />
<div>==Descritzione de su traballu==<br />
<br />
Su progetu pro sa partetzipatzione a su programma “Google Summer of Code 2017” cun s’organizatzione Apertium est istadu s’isvilupu de unu Tradutore Automàticu basadu in règulas intre su catalanu e su sardu e sa sighida de su progetu de s’annu coladu, apertium ita-srd. Cust’idea benit dae sa voluntade de chèrrere isvilupare un’àteru trastu de agiudu pro sa limba sarda, sighinde su matessi caminu de su traballu fatu s’annu passadu pro apertium ita-srd.<br />
A custu progetu ant partetzipadu Gianfranco Fronteddu, Hèctor Alòs i Fonts e Francis Tyers.<br />
<br />
Comente si podet bìdere in su ligàmene de su [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], b’at àpidu duas fases: una prus longa, chi est durada totu sos meses de làmpadas e de trìulas, dedicada a catalanu-sardu e s’àtera, prus curtza, contivigiada in su mese de austu, ponende·nche sas bases pro unu tradutore sardu-italianu nou.<br />
<br />
Su sèberu de chèrrere isvilupare un'àteru tradutore automàticu in sardu, custa borta catàlanu-sardu, est dèvidu a unas cantas resones.<br />
Pro prima cosa, s'istadu de perìgulu chi est bivende sa limba sarda e su bisòngiu de nche ammanniare cantas prus fainas possìbiles in sardu, mescamente in sa tecnologia de còdighe abertu e in Limba Sarda Comuna, chi est sa proposta ortogràfica regionale de s'annu 2006 pigada comente riferimentu dae Apertium. Custu faghent a manera chi una faina che a custa siat de agiudu pro s'amparu de su sardu e pro s'istandardizatzione cumpleta de sa limba. Tando, isfrutende·nche su traballu de s'annu coladu, amus pensadu de creare un'àtera croba linguìstica e de megiorare sa chi bi fiat giai, creschende sa cantidade de risorsas in LSC e ponende·nche una base galu prus manna pro s'ammàniu de àteros traballos in su benidore.<br />
Imbetzes, su sèberu de isvilupare unu tradutore cun sa limba catalana est dèvidu in antis totu a su fatu chi su catalanu est, comente a su sardu, una de sas chimbe limbas de minoria in Sardigna (sardu, catalanu-aligheresu, gadduresu, tataresu e tabarchinu), faeddada in sa tzitade de S'Alighera, cun belle 33.000 faeddadores. In prus, su catalanu est una de sas limbas pro sa chi prus Apertium tenet risorsas.<br />
Posca, custa prataforma de còdighe abertu est adata pro limbas romantzas chi s'assimìgiant intre issas comente su catalanu e su sardu, chi rispetant custu rechisitu pro sas influèntzias e s'eredidade linguìstica catalana presente in sa limba sarda, dèvida a s'època de s'ocupatzione Catalana-Aragonesa in Sardigna. In custa manera sa cantidade manna de testos, testimònios e materiales in limba catalana a pitzu de s'istòria sarda ant a èssere a disponimentu fintzas in sardu etotu, gasi comente totu sas publicatziones e sos istùdios de sotziolinguìstica e de polìtica linguìstica de interessu pro sa situatzione de sa limba sarda.<br />
<br />
Sos impreos de custu tradutore diant pòdere èssere medas: s'isvilupu de sa Wikipèdia in sardu diat dèvere bènnere fintzas dae sa tradutzione de sos artìculos chi non sunt gasi detalliados in italianu o dant importu a aspetos diferentes. Bortende·nche fintzas dae un'àtera limba che a su catalanu sa cantidade de sas informatziones podet crèschere galu de prus oferende puru un'àtera manera pro espressare sos matessi cuntzetos. <br />
<br />
Pro cumpletare su pranu de traballu, bi fiat sa voluntade de fàghere calicuna cosa in su mese de austu fintzas pro srd-ita, semper chi sos obietivos de sa prima fase prus longa s'èsserent cumpridos in sos tempos disinnados. Comente amus pòdidu averguare chi sos risultados pro cat-srd a ùrtimos de Trìulas fiant bonos amus, detzìdidu de nos dedicare a srd-ita.<br />
<br />
==Prima fase: Apertium cat-srd (29 de Maju - 29 de Trìulas)==<br />
Sa prima fase at pertocadu s'isvilupu de su tradutore catalanu-sardu. Su tradutore, pro more de su traballu fatu in antis dae Francis Tyers, fiat in sa setzione [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] e partiat cun unu tantu de 2645 paràulas in su ditzionàriu bilìngue, una "cobertura fina" (trimmed coverage) de belle su 77% e una pertzentuale de errore WER de su su 34.8%. S'obietivu fiat de lograre su 90% de cobertura e de abbassare su WER a mancu de su 15%.<br />
In càmbiu de s'annu passadu, chi pro isvilupare su tradutore italianu-sardu b'est istadu su bisòngiu de isvilupare in pràtica totu su ditzionàriu morfològicu sardu e megiorare puru aspetos de s'analizadore morfològicu italianu, ocannu si partiat dae duas limbas bene isvilupadas in sa prataforma de Apertium. Nos semus pòdidos cuntzentrare esclusivamente in su trasferimentu dae una limba a s'àtera, alleghende de paràulas, istruturas morfològicas e sintàticas.<br />
<br />
Sighende sas datas de su programma GSOC 2017, in su mese de maju e in sa prima chida de làmpadas ("Community Bounding") s'est traballadu meda in s'anàlisi cuntrastiva intre su catalanu e su sardu pro sa creatzione de sos "pending test". Pighende·nche comente riferimentu fintzas sos "pending test" de ita-srd, nche sunt essidas a campu diferèntzias istruturales in formas interrogativas, numerales, possessivos, fòrmulas de òbligu e formas continuadas, su passadu, futuru e cunditzionale, e mescamente sos clìticos.<br />
<br />
===Ditzionàriu morfològicu sardu===<br />
Pro cantu pertocat su ditzionàriu morfològicu sardu, disponìamus giai de unu cun 51.800 paràulas (chi includiat sos lemmas de su [http://www.sardegnacultura.it/cds/cros-lsc/ Curretore Ortorgràficu Regionale Sardu]), isvilupadu durante s'anteriore GSoC. Nche sunt istadas agiuntas 15.500 paràulas in prus: 1300 sustantivos, 800 agetivos, 300 avèrbios, 250 verbos e 12.500 nùmenes pròpios. Si tratat, pro sa majoria, de terminologia iscientìfica e tècnica, e vocabulàriu sotziopolìticu. Cun s'etzetzione de sos nùmenes pròpios, cun sos cales sunt istados sighidos su prus àteros critèrios, sa seletzione de sas paràulas de introdùere est istada fata partende dae sa [https://ca.wikipedia.org/wiki/Portada/ Wikipedia Catalana].<br />
Posca, est istadu acontzadu su ditzionàriu, boghende·nche medas allegas chi non fiant normativas e curregende faddinas in s'assignatzione de sos paradigmas (mescamente alleghende de su gènere assignadu a carchi sustantivu).<br />
<br />
===Ditzionàriu morfològicu catalanu===<br />
Fintzas in su ditzionàriu morfològicu catalanu b'at àpidu un'agiunta de nùmenes pròpios, belle 10.000.<br />
<br />
===Disambiguatzione morfològica in catalanu===<br />
In su ditzionàriu catalanu nche sunt istadas iscritas 15 règulas de disambiguatzione morfològica e nde est istada modificada calicun'àtera. <br />
<br />
===Règulas de seletzione lessicale===<br />
Su tradutore disponet de 274 règulas de seletzione lessicale. Si tratat de règulas chi sèberant cale de duas o prus possìbiles tradutziones est sa prus adata in unu determinadu cuntestu. (A mesu a mesu, in su ditzionàriu bilìngue b'at chèntinas de sèberos intre diferentes possìbiles tradutziones de una paràula, ma, a diferèntzia de sas règulas, custu sèberu si faghet in cada cuntestu.)<br />
<br />
===Règulas de trasferimentu===<br />
Su tradutore disponet de 78 règulas de trasferimentu. Si tratat de règulas chi modìficant s'istrutura de sa frase in catalanu pro l'adatare a s'istrutura chi bi bolet in sardu. Pro esempru:<br />
<br />
El meu llibre > Su libru meu<br />
<br />
Vaig menjar > Apo mandigadu<br />
<br />
He anat > So andadu<br />
<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Calidade===<br />
Sa valutatzione de sa calidade serbit a proare comente funtzionat su tradutore in sa pràtica. B'at medas maneras de la fàghere e sos testos chi si sèberant dipendent dae cal'est s'impreu chi si nde devet fàghere de su tradutore: in pagas paràulas, serbit a carculare cantas paràulas tocat de cambiare pro pòdere publicare su testu.<br />
<br />
"Su Word Error Rate" (WER) est s'indicadore chi inditat sas paràulas chi si devent cambiare pro pòdere publicare su testu. Segundu su "Work plan" s'obietivu fiat de nche arribbare a una pertzentuale prus bassa de su 15%. Su tassu de faddinas in sa tradutzione est 13,9% (nùmeru otentu cun s'indicadore WER partende dae duos testos pigados a casu de 600 paràulas de sa Wikipedia).<br />
<br />
''Sa cobertura'' de su tradutore (pertzentuale de paràulas reconnotas) est 94,0% (nùmeru otentu partinde dae unu corpus mannu de sa Wikipedia).<br />
<br />
===Testu in catalanu (seberadu a s'arriscu/a casu)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la "ciutat alta" i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Tradutzione automàtica a su sardu===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa "tzitade arta" e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Segunda fase: apertium srd-ita (austu 2017)==<br />
In sa segunda fase de su progetu amus traballadu in previsione de unu tradutore nou srd-ita. Su chi amus pòdidu fàghere est istadu cumintzare una disàmbiguatzione morfològica manuale de una parte de sos còrpora chi tenimus pro agiudare su tradutore a reconnòschere sa morfologia curreta de cada paràula, mescamente in cuddos testos chi non fiant de su totu a norma. <br />
Amus tratadu duos corpus: unu giornalìsticu e prus dialetale e s'àteru pigadu deretu dae testos literàrios perfetamente a norma LSC. De su primu corpus est istada etichetada un'annanta de 6000 paràulas, de su segundu 11800. Sa falta de tempus (duas chidas pro sa tarea) no at permìtidu revisionare s'etichetadura. Pro custu no est istadu possìbile creare unu disambiguadore morfològicu pro su sardu, comente fiat s'intentzione nostra. <br />
Sunt istadas agiuntas fintzas 9 règulas noas de trasferimentu e curregidas calicuna de sas chi bi fiant giai. In pràtica, como si traduent in manera curreta sos tempos verbales dae su sardu a s'italianu (pretzisu su futuru e su cunditzionale). Est megiorada sa tradutzione de sos possessivos e si tratat carchi casu de enclìticos (in sardu bi podent èssere finas a tres enclìticos cando chi in italianu non bi nde podent àere prus de duos)<br />
<br />
Amus fatu carchi cosa fintzas in su ditzionàriu italianu, agiunghende·nche 4 règulas de disambiguatzione morfològica (de sa bator una de importu fiat sa disambiguatzione de "sono" comente a "so" e "sunt"). In prus, amus annantu una lista de istados de su mundu (chi nos at dadu Diegu Corràine) e dae custos nd'amus bogadu a campu fintzas sos gentilìtzios currispondentes, In totu su ditzionàriu bilìngue italianu-sardu tenet 1400 in prus dae su cumintzu de GSoC. S'ispurgadura de sos errores de su ditzionàriu sardu e s'agiunta de sas intradas ant a permìtere in pagu tempus de ammaniare una versione noa de su tradutore italianu-sardu.<br />
<br />
==Risorsas==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Pranos pro su benidore==<br />
Su traballu de sa de duas fases de su progetu at a èssere isfrutadu pro sa creatzione de una versione de ita-srd prus pretzisa e agiornada. Agabbare su traballu chi amus cumintzadu serbit pro sa creatzione de unu disambiguadore morfològicu chi at a èssere ùtile pro sa disambiguatzione de sos corpora e non si nd'at a pòdere fàghere a mancu pro isvilupare àteras crobas linguìsticas cun su sardu in su benidore.<br />
<br />
==Concrusiones==<br />
Su progetu est agabbadu cun risultados chi podimus cunsiderare bonos. Lis cheria torrare gràtzias a Apertium pro custa oportunidade, a Francis Tyers e mescamente a Hèctor Alòs i Font pro àere traballadu paris cun megus totu custos meses. <br />
Li torramus gràtzias a Diegu Corràine chi nos at agiudadu meda intreghende·nos materiales medas pro s'ammàniu de custu progetu e cussigiende·nos bene cada bia chi li pregontaìamus calicuna cosa in contu de limba sarda. Gràtzias fintzas a sa Gazeta pro sos testos chi faghent parte de su corpus chi amus creadu pro traballare. Est pretzisu mentovare sa Regione Sardigna chi at postu a disponimentu risorsas lìberas e trastos chi agiudant sa creatzione de trastos noos de còdighe abertu che a custu chi amus in pessu aprontadu.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd/_Apertium_ita-srd:_relata_finale&diff=64139Apertium cat-srd/ Apertium ita-srd: relata finale2017-08-28T16:15:53Z<p>Grfro3d: /* Descritzione de su traballu */</p>
<hr />
<div>==Descritzione de su traballu==<br />
<br />
Su progetu pro sa partetzipatzione a su programma “Google Summer of Code 2017” cun s’organizatzione Apertium est istadu s’isvilupu de unu Tradutore Automàticu basadu in règulas intre su catalanu e su sardu e sa sighida de su progetu de s’annu coladu, apertium ita-srd. Cust’idea benit dae sa voluntade de chèrrere isvilupare un’àteru trastu de agiudu pro sa limba sarda, sighinde su matessi caminu de su traballu fatu s’annu passadu pro apertium ita-srd.<br />
A custu progetu ant partetzipadu Gianfranco Fronteddu, Hèctor Alòs i Fonts e Francis Tyers.<br />
<br />
Comente si podet bìdere in su ligàmene de su [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], b’at àpidu duas fases: una prus longa, chi est durada totu sos meses de làmpadas e de trìulas, dedicada a catalanu-sardu e s’àtera, prus curtza, contivigiada in su mese de austu, ponende·nche sas bases pro unu tradutore sardu-italianu nou.<br />
<br />
Su sèberu de chèrrere isvilupare un'àteru tradutore automàticu in sardu, custa borta catàlanu-sardu, est dèvidu a unas cantas resones.<br />
Pro prima cosa, s'istadu de perìgulu chi est bivende sa limba sarda e su bisòngiu de nche ammanniare cantas prus fainas possìbiles in sardu, mescamente in sa tecnologia de còdighe abertu e in Limba Sarda Comuna, chi est sa proposta ortogràfica regionale de s'annu 2006 pigada comente riferimentu dae Apertium. Custu faghent a manera chi una faina che a custa siat de agiudu pro s'amparu de su sardu e pro s'istandardizatzione cumpleta de sa limba. Tando, isfrutende·nche su traballu de s'annu coladu, amus pensadu de creare un'àtera croba linguìstica e de megiorare sa chi bi fiat giai, creschende sa cantidade de risorsas in LSC e ponende·nche una base galu prus manna pro s'ammàniu de àteros traballos in su benidore.<br />
Imbetzes, su sèberu de isvilupare unu tradutore cun sa limba catalana est dèvidu in antis totu a su fatu chi su catalanu est, comente a su sardu, una de sas chimbe limbas de minoria in Sardigna (sardu, catalanu-aligheresu, gadduresu, tataresu e tabarchinu), faeddada in sa tzitade de S'Alighera, cun belle 33.000 faeddadores. In prus, su catalanu est una de sas limbas pro sa chi prus Apertium tenet risorsas.<br />
Posca, custa prataforma de còdighe abertu est adata pro limbas romantzas chi s'assimìgiant intre issas comente su catalanu e su sardu, chi rispetant custu rechisitu pro sas influèntzias e s'eredidade linguìstica catalana presente in sa limba sarda, dèvida a s'època de s'ocupatzione Catalana-Aragonesa in Sardigna. In custa manera sa cantidade manna de testos, testimònios e materiales in limba catalana a pitzu de s'istòria sarda ant a èssere a disponimentu fintzas in sardu etotu, gasi comente totu sas publicatziones e sos istùdios de sotziolinguìstica e de polìtica linguìstica de interessu pro sa situatzione de sa limba sarda.<br />
<br />
Sos impreos de custu tradutore diant pòdere èssere medas: s'isvilupu de sa Wikipèdia in sardu diat dèvere bènnere fintzas dae sa tradutzione de sos artìculos chi non sunt gasi detalliados in italianu o dant importu a aspetos diferentes. Bortende·nche fintzas dae un'àtera limba che a su catalanu sa cantidade de sas informatziones podet crèschere galu de prus oferende puru un'àtera manera pro espressare sos matessi cuntzetos. <br />
<br />
Pro cumpletare su pranu de traballu, bi fiat sa voluntade de fàghere calicuna cosa in su mese de austu fintzas pro srd-ita, semper chi sos obietivos de sa prima fase prus longa s'èsserent cumpridos in sos tempos disinnados. Comente amus pòdidu averguare chi sos risultados pro cat-srd a ùrtimos de Trìulas fiant bonos amus, detzìdidu de nos dedicare a srd-ita.<br />
<br />
==Prima fase: Apertium cat-srd (29 de Maju - 29 de Trìulas)==<br />
Sa prima fase at pertocadu s'isvilupu de su tradutore catalanu-sardu. Su tradutore, pro more de su traballu fatu in antis dae Francis Tyers, fiat in sa setzione [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] e partiat cun unu tantu de 2645 paràulas in su ditzionàriu bilìngue, una "cobertura fina" (trimmed coverage)de belle su 77% e una pertzentuale de errore WER de su su 34.8%. S'obietivu fiat de lograre su 90% de cobertura e de abbassare su WER a mancu de su 15%.<br />
In càmbiu de s'annu passadu, chi pro isvilupare su tradutore italianu-sardu b'est istadu su bisòngiu de isvilupare in pràtica totu su ditzionàriu morfològicu sardu e megiorare puru aspetos de s'analizadore morfològicu italianu, ocannu si partiat dae duas limbas bene isvilupadas in sa prataforma de Apertium. Nos semus pòdidos cuntzentrare esclusivamente in su trasferimentu dae una limba a s'àtera, alleghende de paràulas, istruturas morfològicas e sintàticas.<br />
<br />
Sighende sas datas de su programma GSOC 2017, in su mese de maju e in sa prima chida de làmpadas ("Community Bounding") s'est traballadu meda in s'anàlisi cuntrastiva intre su catalanu e su sardu pro sa creatzione de sos "pending test". Pighende·nche comente riferimentu fintzas sos "pending test" de ita-srd, nche sunt essidas a campu diferèntzias istruturales in formas interrogativas, numerales, possessivos, fòrmulas de òbligu e formas continuadas, su passadu, futuru e cunditzionale, e mescamente sos clìticos.<br />
<br />
===Ditzionàriu morfològicu sardu===<br />
Pro cantu pertocat su ditzionàriu morfològicu sardu, disponìamus giai de unu cun 51.800 paràulas (chi includiat sos lemmas de su [http://www.sardegnacultura.it/cds/cros-lsc/ Curretore Ortorgràficu Regionale Sardu]), isvilupadu durante s'anteriore GSoC. Nche sunt istadas agiuntas 15.500 paràulas in prus: 1300 sustantivos, 800 agetivos, 300 avèrbios, 250 verbos e 12.500 nùmenes pròpios. Si tratat, pro sa majoria, de terminologia iscientìfica e tècnica, e vocabulàriu sotziopolìticu. Cun s'etzetzione de sos nùmenes pròpios, cun sos cales sunt istados sighidos su prus àteros critèrios, sa seletzione de sas paràulas de introdùere est istada fata partende dae sa [https://ca.wikipedia.org/wiki/Portada/ Wikipedia Catalana].<br />
Posca, est istadu acontzadu su ditzionàriu, boghende·nche medas allegas chi non fiant normativas e curregende faddinas in s'assignatzione de sos paradigmas (mescamente alleghende de su gènere assignadu a carchi sustantivu).<br />
<br />
===Ditzionàriu morfològicu catalanu===<br />
Fintzas in su ditzionàriu morfològicu catalanu b'at àpidu un'agiunta de nùmenes pròpios, belle 10.000.<br />
<br />
===Disambiguatzione morfològica in catalanu===<br />
In su ditzionàriu catalanu nche sunt istadas iscritas 15 règulas de disambiguatzione morfològica e nde est istada modificada calicun'àtera. <br />
<br />
===Règulas de seletzione lessicale===<br />
Su tradutore disponet de 274 règulas de seletzione lessicale. Si tratat de règulas chi sèberant cale de duas o prus possìbiles tradutziones est sa prus adata in unu determinadu cuntestu. (A mesu a mesu, in su ditzionàriu bilìngue b'at chèntinas de sèberos intre diferentes possìbiles tradutziones de una paràula, ma, a diferèntzia de sas règulas, custu sèberu si faghet in cada cuntestu.)<br />
<br />
===Règulas de trasferimentu===<br />
Su tradutore disponet de 78 règulas de trasferimentu. Si tratat de règulas chi modìficant s'istrutura de sa frase in catalanu pro l'adatare a s'istrutura chi bi bolet in sardu. Pro esempru:<br />
<br />
El meu llibre > Su libru meu<br />
<br />
Vaig menjar > Apo mandigadu<br />
<br />
He anat > So andadu<br />
<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Calidade===<br />
Sa valutatzione de sa calidade serbit a proare comente funtzionat su tradutore in sa pràtica. B'at medas maneras de la fàghere e sos testos chi si sèberant dipendent dae cal'est s'impreu chi si nde devet fàghere de su tradutore: in pagas paràulas, serbit a carculare cantas paràulas tocat de cambiare pro pòdere publicare su testu.<br />
<br />
"Su Word Error Rate" (WER) est s'indicadore chi inditat sas paràulas chi si devent cambiare pro pòdere publicare su testu. Segundu su "Work plan" s'obietivu fiat de nche arribbare a una pertzentuale prus bassa de su 15%. Su tassu de faddinas in sa tradutzione est 13,9% (nùmeru otentu cun s'indicadore WER partende dae duos testos pigados a casu de 600 paràulas de sa Wikipedia).<br />
<br />
''Sa cobertura'' de su tradutore (pertzentuale de paràulas reconnotas) est 94,0% (nùmeru otentu partinde dae unu corpus mannu de sa Wikipedia).<br />
<br />
===Testu in catalanu (seberadu a s'arriscu/a casu)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la "ciutat alta" i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Tradutzione automàtica a su sardu===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa "tzitade arta" e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Segunda fase: apertium srd-ita (austu 2017)==<br />
In sa segunda fase de su progetu amus traballadu in previsione de unu tradutore nou srd-ita. Su chi amus pòdidu fàghere est istadu cumintzare una disàmbiguatzione morfològica manuale de una parte de sos còrpora chi tenimus pro agiudare su tradutore a reconnòschere sa morfologia curreta de cada paràula, mescamente in cuddos testos chi non fiant de su totu a norma. <br />
Amus tratadu duos corpus: unu giornalìsticu e prus dialetale e s'àteru pigadu deretu dae testos literàrios perfetamente a norma LSC. De su primu corpus est istada etichetada un'annanta de 6000 paràulas, de su segundu 11800. Sa falta de tempus (duas chidas pro sa tarea) no at permìtidu revisionare s'etichetadura. Pro custu no est istadu possìbile creare unu disambiguadore morfològicu pro su sardu, comente fiat s'intentzione nostra. <br />
Sunt istadas agiuntas fintzas 9 règulas noas de trasferimentu e curregidas calicuna de sas chi bi fiant giai. In pràtica, como si traduent in manera curreta sos tempos verbales dae su sardu a s'italianu (pretzisu su futuru e su cunditzionale). Est megiorada sa tradutzione de sos possessivos e si tratat carchi casu de enclìticos (in sardu bi podent èssere finas a tres enclìticos cando chi in italianu non bi nde podent àere prus de duos)<br />
<br />
Amus fatu carchi cosa fintzas in su ditzionàriu italianu, agiunghende·nche 4 règulas de disambiguatzione morfològica (de sa bator una de importu fiat sa disambiguatzione de "sono" comente a "so" e "sunt"). In prus, amus annantu una lista de istados de su mundu (chi nos at dadu Diegu Corràine) e dae custos nd'amus bogadu a campu fintzas sos gentilìtzios currispondentes, In totu su ditzionàriu bilìngue italianu-sardu tenet 1400 in prus dae su cumintzu de GSoC. S'ispurgadura de sos errores de su ditzionàriu sardu e s'agiunta de sas intradas ant a permìtere in pagu tempus de ammaniare una versione noa de su tradutore italianu-sardu.<br />
<br />
==Risorsas==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Pranos pro su benidore==<br />
Su traballu de sa de duas fases de su progetu at a èssere isfrutadu pro sa creatzione de una versione de ita-srd prus pretzisa e agiornada. Agabbare su traballu chi amus cumintzadu serbit pro sa creatzione de unu disambiguadore morfològicu chi at a èssere ùtile pro sa disambiguatzione de sos corpora e non si nd'at a pòdere fàghere a mancu pro isvilupare àteras crobas linguìsticas cun su sardu in su benidore.<br />
<br />
==Concrusiones==<br />
Su progetu est agabbadu cun risultados chi podimus cunsiderare bonos. Lis cheria torrare gràtzias a Apertium pro custa oportunidade, a Francis Tyers e mescamente a Hèctor Alòs i Font pro àere traballadu paris cun megus totu custos meses. <br />
Li torramus gràtzias a Diegu Corràine chi nos at agiudadu meda intreghende·nos materiales medas pro s'ammàniu de custu progetu e cussigiende·nos bene cada bia chi li pregontaìamus calicuna cosa in contu de limba sarda. Gràtzias fintzas a sa Gazeta pro sos testos chi faghent parte de su corpus chi amus creadu pro traballare. Est pretzisu mentovare sa Regione Sardigna chi at postu a disponimentu risorsas lìberas e trastos chi agiudant sa creatzione de trastos noos de còdighe abertu che a custu chi amus in pessu aprontadu.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd/_Apertium_ita-srd:_relata_finale&diff=64138Apertium cat-srd/ Apertium ita-srd: relata finale2017-08-28T16:13:50Z<p>Grfro3d: /* Règulas de trasferimentu */</p>
<hr />
<div>==Descritzione de su traballu==<br />
<br />
Su progetu pro sa partetzipatzione a su programma “Google Summer of Code 2017” cun s’organizatzione Apertium est istadu s’isvilupu de unu Tradutore Automàticu basadu in règulas intre su catalanu e su sardu e sa sighida de su progetu de s’annu coladu, apertium ita-srd. Cust’idea benit dae sa voluntade de chèrrere isvilupare un’àteru trastu de agiudu pro sa limba sarda, sighinde su matessi caminu de su traballu fatu s’annu passadu pro apertium ita-srd.<br />
A custu progetu ant partetzipadu Gianfranco Fronteddu, Hèctor Alòs i Fonts e Francis Tyers.<br />
<br />
Comente si podet bìdere in su ligàmene de su “Work plan”, b’at àpidu duas fases: una prus longa, chi est durada totu sos meses de làmpadas e de trìulas, dedicada a catalanu-sardu e s’àtera, prus curtza, contivigiada in su mese de austu, ponende·nche sas bases pro unu tradutore sardu-italianu nou.<br />
<br />
Su sèberu de chèrrere isvilupare un'àteru tradutore automàticu in sardu, custa borta catàlanu-sardu, est dèvidu a unas cantas resones.<br />
Pro prima cosa, s'istadu de perìgulu chi est bivende sa limba sarda e su bisòngiu de nche ammanniare cantas prus fainas possìbiles in sardu, mescamente in sa tecnologia de còdighe abertu e in Limba Sarda Comuna, chi est sa proposta ortogràfica regionale de s'annu 2006 pigada comente riferimentu dae Apertium. Custu faghent a manera chi una faina che a custa siat de agiudu pro s'amparu de su sardu e pro s'istandardizatzione cumpleta de sa limba. Tando, isfrutende·nche su traballu de s'annu coladu, amus pensadu de creare un'àtera croba linguìstica e de megiorare sa chi bi fiat giai, creschende sa cantidade de risorsas in LSC e ponende·nche una base galu prus manna pro s'ammàniu de àteros traballos in su benidore.<br />
Imbetzes, su sèberu de isvilupare unu tradutore cun sa limba catalana est dèvidu in antis totu a su fatu chi su catalanu est, comente a su sardu, una de sas chimbe limbas de minoria in Sardigna (sardu, catalanu-aligheresu, gadduresu, tataresu e tabarchinu), faeddada in sa tzitade de S'Alighera, cun belle 33.000 faeddadores. In prus, su catalanu est una de sas limbas pro sa chi prus Apertium tenet risorsas.<br />
Posca, custa prataforma de còdighe abertu est adata pro limbas romantzas chi s'assimìgiant intre issas comente su catalanu e su sardu, chi rispetant custu rechisitu pro sas influèntzias e s'eredidade linguìstica catalana presente in sa limba sarda, dèvida a s'època de s'ocupatzione Catalana-Aragonesa in Sardigna. In custa manera sa cantidade manna de testos, testimònios e materiales in limba catalana a pitzu de s'istòria sarda ant a èssere a disponimentu fintzas in sardu etotu, gasi comente totu sas publicatziones e sos istùdios de sotziolinguìstica e de polìtica linguìstica de interessu pro sa situatzione de sa limba sarda.<br />
<br />
Sos impreos de custu tradutore diant pòdere èssere medas: s'isvilupu de sa Wikipèdia in sardu diat dèvere bènnere fintzas dae sa tradutzione de sos artìculos chi non sunt gasi detalliados in italianu o dant importu a aspetos diferentes. Bortende·nche fintzas dae un'àtera limba che a su catalanu sa cantidade de sas informatziones podet crèschere galu de prus oferende puru un'àtera manera pro espressare sos matessi cuntzetos. <br />
<br />
Pro cumpletare su pranu de traballu, bi fiat sa voluntade de fàghere calicuna cosa in su mese de austu fintzas pro srd-ita, semper chi sos obietivos de sa prima fase prus longa s'èsserent cumpridos in sos tempos disinnados. Comente amus pòdidu averguare chi sos risultados pro cat-srd a ùrtimos de Trìulas fiant bonos amus, detzìdidu de nos dedicare a srd-ita.<br />
<br />
==Prima fase: Apertium cat-srd (29 de Maju - 29 de Trìulas)==<br />
Sa prima fase at pertocadu s'isvilupu de su tradutore catalanu-sardu. Su tradutore, pro more de su traballu fatu in antis dae Francis Tyers, fiat in sa setzione [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] e partiat cun unu tantu de 2645 paràulas in su ditzionàriu bilìngue, una "cobertura fina" (trimmed coverage)de belle su 77% e una pertzentuale de errore WER de su su 34.8%. S'obietivu fiat de lograre su 90% de cobertura e de abbassare su WER a mancu de su 15%.<br />
In càmbiu de s'annu passadu, chi pro isvilupare su tradutore italianu-sardu b'est istadu su bisòngiu de isvilupare in pràtica totu su ditzionàriu morfològicu sardu e megiorare puru aspetos de s'analizadore morfològicu italianu, ocannu si partiat dae duas limbas bene isvilupadas in sa prataforma de Apertium. Nos semus pòdidos cuntzentrare esclusivamente in su trasferimentu dae una limba a s'àtera, alleghende de paràulas, istruturas morfològicas e sintàticas.<br />
<br />
Sighende sas datas de su programma GSOC 2017, in su mese de maju e in sa prima chida de làmpadas ("Community Bounding") s'est traballadu meda in s'anàlisi cuntrastiva intre su catalanu e su sardu pro sa creatzione de sos "pending test". Pighende·nche comente riferimentu fintzas sos "pending test" de ita-srd, nche sunt essidas a campu diferèntzias istruturales in formas interrogativas, numerales, possessivos, fòrmulas de òbligu e formas continuadas, su passadu, futuru e cunditzionale, e mescamente sos clìticos.<br />
<br />
===Ditzionàriu morfològicu sardu===<br />
Pro cantu pertocat su ditzionàriu morfològicu sardu, disponìamus giai de unu cun 51.800 paràulas (chi includiat sos lemmas de su [http://www.sardegnacultura.it/cds/cros-lsc/ Curretore Ortorgràficu Regionale Sardu]), isvilupadu durante s'anteriore GSoC. Nche sunt istadas agiuntas 15.500 paràulas in prus: 1300 sustantivos, 800 agetivos, 300 avèrbios, 250 verbos e 12.500 nùmenes pròpios. Si tratat, pro sa majoria, de terminologia iscientìfica e tècnica, e vocabulàriu sotziopolìticu. Cun s'etzetzione de sos nùmenes pròpios, cun sos cales sunt istados sighidos su prus àteros critèrios, sa seletzione de sas paràulas de introdùere est istada fata partende dae sa [https://ca.wikipedia.org/wiki/Portada/ Wikipedia Catalana].<br />
Posca, est istadu acontzadu su ditzionàriu, boghende·nche medas allegas chi non fiant normativas e curregende faddinas in s'assignatzione de sos paradigmas (mescamente alleghende de su gènere assignadu a carchi sustantivu).<br />
<br />
===Ditzionàriu morfològicu catalanu===<br />
Fintzas in su ditzionàriu morfològicu catalanu b'at àpidu un'agiunta de nùmenes pròpios, belle 10.000.<br />
<br />
===Disambiguatzione morfològica in catalanu===<br />
In su ditzionàriu catalanu nche sunt istadas iscritas 15 règulas de disambiguatzione morfològica e nde est istada modificada calicun'àtera. <br />
<br />
===Règulas de seletzione lessicale===<br />
Su tradutore disponet de 274 règulas de seletzione lessicale. Si tratat de règulas chi sèberant cale de duas o prus possìbiles tradutziones est sa prus adata in unu determinadu cuntestu. (A mesu a mesu, in su ditzionàriu bilìngue b'at chèntinas de sèberos intre diferentes possìbiles tradutziones de una paràula, ma, a diferèntzia de sas règulas, custu sèberu si faghet in cada cuntestu.)<br />
<br />
===Règulas de trasferimentu===<br />
Su tradutore disponet de 78 règulas de trasferimentu. Si tratat de règulas chi modìficant s'istrutura de sa frase in catalanu pro l'adatare a s'istrutura chi bi bolet in sardu. Pro esempru:<br />
<br />
El meu llibre > Su libru meu<br />
<br />
Vaig menjar > Apo mandigadu<br />
<br />
He anat > So andadu<br />
<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Calidade===<br />
Sa valutatzione de sa calidade serbit a proare comente funtzionat su tradutore in sa pràtica. B'at medas maneras de la fàghere e sos testos chi si sèberant dipendent dae cal'est s'impreu chi si nde devet fàghere de su tradutore: in pagas paràulas, serbit a carculare cantas paràulas tocat de cambiare pro pòdere publicare su testu.<br />
<br />
"Su Word Error Rate" (WER) est s'indicadore chi inditat sas paràulas chi si devent cambiare pro pòdere publicare su testu. Segundu su "Work plan" s'obietivu fiat de nche arribbare a una pertzentuale prus bassa de su 15%. Su tassu de faddinas in sa tradutzione est 13,9% (nùmeru otentu cun s'indicadore WER partende dae duos testos pigados a casu de 600 paràulas de sa Wikipedia).<br />
<br />
''Sa cobertura'' de su tradutore (pertzentuale de paràulas reconnotas) est 94,0% (nùmeru otentu partinde dae unu corpus mannu de sa Wikipedia).<br />
<br />
===Testu in catalanu (seberadu a s'arriscu/a casu)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la "ciutat alta" i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Tradutzione automàtica a su sardu===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa "tzitade arta" e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Segunda fase: apertium srd-ita (austu 2017)==<br />
In sa segunda fase de su progetu amus traballadu in previsione de unu tradutore nou srd-ita. Su chi amus pòdidu fàghere est istadu cumintzare una disàmbiguatzione morfològica manuale de una parte de sos còrpora chi tenimus pro agiudare su tradutore a reconnòschere sa morfologia curreta de cada paràula, mescamente in cuddos testos chi non fiant de su totu a norma. <br />
Amus tratadu duos corpus: unu giornalìsticu e prus dialetale e s'àteru pigadu deretu dae testos literàrios perfetamente a norma LSC. De su primu corpus est istada etichetada un'annanta de 6000 paràulas, de su segundu 11800. Sa falta de tempus (duas chidas pro sa tarea) no at permìtidu revisionare s'etichetadura. Pro custu no est istadu possìbile creare unu disambiguadore morfològicu pro su sardu, comente fiat s'intentzione nostra. <br />
Sunt istadas agiuntas fintzas 9 règulas noas de trasferimentu e curregidas calicuna de sas chi bi fiant giai. In pràtica, como si traduent in manera curreta sos tempos verbales dae su sardu a s'italianu (pretzisu su futuru e su cunditzionale). Est megiorada sa tradutzione de sos possessivos e si tratat carchi casu de enclìticos (in sardu bi podent èssere finas a tres enclìticos cando chi in italianu non bi nde podent àere prus de duos)<br />
<br />
Amus fatu carchi cosa fintzas in su ditzionàriu italianu, agiunghende·nche 4 règulas de disambiguatzione morfològica (de sa bator una de importu fiat sa disambiguatzione de "sono" comente a "so" e "sunt"). In prus, amus annantu una lista de istados de su mundu (chi nos at dadu Diegu Corràine) e dae custos nd'amus bogadu a campu fintzas sos gentilìtzios currispondentes, In totu su ditzionàriu bilìngue italianu-sardu tenet 1400 in prus dae su cumintzu de GSoC. S'ispurgadura de sos errores de su ditzionàriu sardu e s'agiunta de sas intradas ant a permìtere in pagu tempus de ammaniare una versione noa de su tradutore italianu-sardu.<br />
<br />
==Risorsas==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Pranos pro su benidore==<br />
Su traballu de sa de duas fases de su progetu at a èssere isfrutadu pro sa creatzione de una versione de ita-srd prus pretzisa e agiornada. Agabbare su traballu chi amus cumintzadu serbit pro sa creatzione de unu disambiguadore morfològicu chi at a èssere ùtile pro sa disambiguatzione de sos corpora e non si nd'at a pòdere fàghere a mancu pro isvilupare àteras crobas linguìsticas cun su sardu in su benidore.<br />
<br />
==Concrusiones==<br />
Su progetu est agabbadu cun risultados chi podimus cunsiderare bonos. Lis cheria torrare gràtzias a Apertium pro custa oportunidade, a Francis Tyers e mescamente a Hèctor Alòs i Font pro àere traballadu paris cun megus totu custos meses. <br />
Li torramus gràtzias a Diegu Corràine chi nos at agiudadu meda intreghende·nos materiales medas pro s'ammàniu de custu progetu e cussigiende·nos bene cada bia chi li pregontaìamus calicuna cosa in contu de limba sarda. Gràtzias fintzas a sa Gazeta pro sos testos chi faghent parte de su corpus chi amus creadu pro traballare. Est pretzisu mentovare sa Regione Sardigna chi at postu a disponimentu risorsas lìberas e trastos chi agiudant sa creatzione de trastos noos de còdighe abertu che a custu chi amus in pessu aprontadu.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd/_Apertium_ita-srd:_relata_finale&diff=64137Apertium cat-srd/ Apertium ita-srd: relata finale2017-08-28T16:12:26Z<p>Grfro3d: /* Prima fase: Apertium cat-srd (29 de Maju - 29 de Trìulas) */</p>
<hr />
<div>==Descritzione de su traballu==<br />
<br />
Su progetu pro sa partetzipatzione a su programma “Google Summer of Code 2017” cun s’organizatzione Apertium est istadu s’isvilupu de unu Tradutore Automàticu basadu in règulas intre su catalanu e su sardu e sa sighida de su progetu de s’annu coladu, apertium ita-srd. Cust’idea benit dae sa voluntade de chèrrere isvilupare un’àteru trastu de agiudu pro sa limba sarda, sighinde su matessi caminu de su traballu fatu s’annu passadu pro apertium ita-srd.<br />
A custu progetu ant partetzipadu Gianfranco Fronteddu, Hèctor Alòs i Fonts e Francis Tyers.<br />
<br />
Comente si podet bìdere in su ligàmene de su “Work plan”, b’at àpidu duas fases: una prus longa, chi est durada totu sos meses de làmpadas e de trìulas, dedicada a catalanu-sardu e s’àtera, prus curtza, contivigiada in su mese de austu, ponende·nche sas bases pro unu tradutore sardu-italianu nou.<br />
<br />
Su sèberu de chèrrere isvilupare un'àteru tradutore automàticu in sardu, custa borta catàlanu-sardu, est dèvidu a unas cantas resones.<br />
Pro prima cosa, s'istadu de perìgulu chi est bivende sa limba sarda e su bisòngiu de nche ammanniare cantas prus fainas possìbiles in sardu, mescamente in sa tecnologia de còdighe abertu e in Limba Sarda Comuna, chi est sa proposta ortogràfica regionale de s'annu 2006 pigada comente riferimentu dae Apertium. Custu faghent a manera chi una faina che a custa siat de agiudu pro s'amparu de su sardu e pro s'istandardizatzione cumpleta de sa limba. Tando, isfrutende·nche su traballu de s'annu coladu, amus pensadu de creare un'àtera croba linguìstica e de megiorare sa chi bi fiat giai, creschende sa cantidade de risorsas in LSC e ponende·nche una base galu prus manna pro s'ammàniu de àteros traballos in su benidore.<br />
Imbetzes, su sèberu de isvilupare unu tradutore cun sa limba catalana est dèvidu in antis totu a su fatu chi su catalanu est, comente a su sardu, una de sas chimbe limbas de minoria in Sardigna (sardu, catalanu-aligheresu, gadduresu, tataresu e tabarchinu), faeddada in sa tzitade de S'Alighera, cun belle 33.000 faeddadores. In prus, su catalanu est una de sas limbas pro sa chi prus Apertium tenet risorsas.<br />
Posca, custa prataforma de còdighe abertu est adata pro limbas romantzas chi s'assimìgiant intre issas comente su catalanu e su sardu, chi rispetant custu rechisitu pro sas influèntzias e s'eredidade linguìstica catalana presente in sa limba sarda, dèvida a s'època de s'ocupatzione Catalana-Aragonesa in Sardigna. In custa manera sa cantidade manna de testos, testimònios e materiales in limba catalana a pitzu de s'istòria sarda ant a èssere a disponimentu fintzas in sardu etotu, gasi comente totu sas publicatziones e sos istùdios de sotziolinguìstica e de polìtica linguìstica de interessu pro sa situatzione de sa limba sarda.<br />
<br />
Sos impreos de custu tradutore diant pòdere èssere medas: s'isvilupu de sa Wikipèdia in sardu diat dèvere bènnere fintzas dae sa tradutzione de sos artìculos chi non sunt gasi detalliados in italianu o dant importu a aspetos diferentes. Bortende·nche fintzas dae un'àtera limba che a su catalanu sa cantidade de sas informatziones podet crèschere galu de prus oferende puru un'àtera manera pro espressare sos matessi cuntzetos. <br />
<br />
Pro cumpletare su pranu de traballu, bi fiat sa voluntade de fàghere calicuna cosa in su mese de austu fintzas pro srd-ita, semper chi sos obietivos de sa prima fase prus longa s'èsserent cumpridos in sos tempos disinnados. Comente amus pòdidu averguare chi sos risultados pro cat-srd a ùrtimos de Trìulas fiant bonos amus, detzìdidu de nos dedicare a srd-ita.<br />
<br />
==Prima fase: Apertium cat-srd (29 de Maju - 29 de Trìulas)==<br />
Sa prima fase at pertocadu s'isvilupu de su tradutore catalanu-sardu. Su tradutore, pro more de su traballu fatu in antis dae Francis Tyers, fiat in sa setzione [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] e partiat cun unu tantu de 2645 paràulas in su ditzionàriu bilìngue, una "cobertura fina" (trimmed coverage)de belle su 77% e una pertzentuale de errore WER de su su 34.8%. S'obietivu fiat de lograre su 90% de cobertura e de abbassare su WER a mancu de su 15%.<br />
In càmbiu de s'annu passadu, chi pro isvilupare su tradutore italianu-sardu b'est istadu su bisòngiu de isvilupare in pràtica totu su ditzionàriu morfològicu sardu e megiorare puru aspetos de s'analizadore morfològicu italianu, ocannu si partiat dae duas limbas bene isvilupadas in sa prataforma de Apertium. Nos semus pòdidos cuntzentrare esclusivamente in su trasferimentu dae una limba a s'àtera, alleghende de paràulas, istruturas morfològicas e sintàticas.<br />
<br />
Sighende sas datas de su programma GSOC 2017, in su mese de maju e in sa prima chida de làmpadas ("Community Bounding") s'est traballadu meda in s'anàlisi cuntrastiva intre su catalanu e su sardu pro sa creatzione de sos "pending test". Pighende·nche comente riferimentu fintzas sos "pending test" de ita-srd, nche sunt essidas a campu diferèntzias istruturales in formas interrogativas, numerales, possessivos, fòrmulas de òbligu e formas continuadas, su passadu, futuru e cunditzionale, e mescamente sos clìticos.<br />
<br />
===Ditzionàriu morfològicu sardu===<br />
Pro cantu pertocat su ditzionàriu morfològicu sardu, disponìamus giai de unu cun 51.800 paràulas (chi includiat sos lemmas de su [http://www.sardegnacultura.it/cds/cros-lsc/ Curretore Ortorgràficu Regionale Sardu]), isvilupadu durante s'anteriore GSoC. Nche sunt istadas agiuntas 15.500 paràulas in prus: 1300 sustantivos, 800 agetivos, 300 avèrbios, 250 verbos e 12.500 nùmenes pròpios. Si tratat, pro sa majoria, de terminologia iscientìfica e tècnica, e vocabulàriu sotziopolìticu. Cun s'etzetzione de sos nùmenes pròpios, cun sos cales sunt istados sighidos su prus àteros critèrios, sa seletzione de sas paràulas de introdùere est istada fata partende dae sa [https://ca.wikipedia.org/wiki/Portada/ Wikipedia Catalana].<br />
Posca, est istadu acontzadu su ditzionàriu, boghende·nche medas allegas chi non fiant normativas e curregende faddinas in s'assignatzione de sos paradigmas (mescamente alleghende de su gènere assignadu a carchi sustantivu).<br />
<br />
===Ditzionàriu morfològicu catalanu===<br />
Fintzas in su ditzionàriu morfològicu catalanu b'at àpidu un'agiunta de nùmenes pròpios, belle 10.000.<br />
<br />
===Disambiguatzione morfològica in catalanu===<br />
In su ditzionàriu catalanu nche sunt istadas iscritas 15 règulas de disambiguatzione morfològica e nde est istada modificada calicun'àtera. <br />
<br />
===Règulas de seletzione lessicale===<br />
Su tradutore disponet de 274 règulas de seletzione lessicale. Si tratat de règulas chi sèberant cale de duas o prus possìbiles tradutziones est sa prus adata in unu determinadu cuntestu. (A mesu a mesu, in su ditzionàriu bilìngue b'at chèntinas de sèberos intre diferentes possìbiles tradutziones de una paràula, ma, a diferèntzia de sas règulas, custu sèberu si faghet in cada cuntestu.)<br />
<br />
===Règulas de trasferimentu===<br />
Su tradutore disponet de 78 règulas de trasferimentu. Si tratat de règulas chi modìficant s'istrutura de sa frase in catalanu pro l'adatare a s'istrutura chi bi bolet in sardu. Pro esempru:<br />
El meu llibre &amp;gt; Su libru meu<br />
Vaig menjar &amp;gt; Apo mandigadu<br />
He anat &amp;gt; So andadu<br />
Vull saludar-lo &amp;gt; Lu chèrgio saludare<br />
<br />
===Calidade===<br />
Sa valutatzione de sa calidade serbit a proare comente funtzionat su tradutore in sa pràtica. B'at medas maneras de la fàghere e sos testos chi si sèberant dipendent dae cal'est s'impreu chi si nde devet fàghere de su tradutore: in pagas paràulas, serbit a carculare cantas paràulas tocat de cambiare pro pòdere publicare su testu.<br />
<br />
"Su Word Error Rate" (WER) est s'indicadore chi inditat sas paràulas chi si devent cambiare pro pòdere publicare su testu. Segundu su "Work plan" s'obietivu fiat de nche arribbare a una pertzentuale prus bassa de su 15%. Su tassu de faddinas in sa tradutzione est 13,9% (nùmeru otentu cun s'indicadore WER partende dae duos testos pigados a casu de 600 paràulas de sa Wikipedia).<br />
<br />
''Sa cobertura'' de su tradutore (pertzentuale de paràulas reconnotas) est 94,0% (nùmeru otentu partinde dae unu corpus mannu de sa Wikipedia).<br />
<br />
===Testu in catalanu (seberadu a s'arriscu/a casu)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la "ciutat alta" i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Tradutzione automàtica a su sardu===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa "tzitade arta" e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Segunda fase: apertium srd-ita (austu 2017)==<br />
In sa segunda fase de su progetu amus traballadu in previsione de unu tradutore nou srd-ita. Su chi amus pòdidu fàghere est istadu cumintzare una disàmbiguatzione morfològica manuale de una parte de sos còrpora chi tenimus pro agiudare su tradutore a reconnòschere sa morfologia curreta de cada paràula, mescamente in cuddos testos chi non fiant de su totu a norma. <br />
Amus tratadu duos corpus: unu giornalìsticu e prus dialetale e s'àteru pigadu deretu dae testos literàrios perfetamente a norma LSC. De su primu corpus est istada etichetada un'annanta de 6000 paràulas, de su segundu 11800. Sa falta de tempus (duas chidas pro sa tarea) no at permìtidu revisionare s'etichetadura. Pro custu no est istadu possìbile creare unu disambiguadore morfològicu pro su sardu, comente fiat s'intentzione nostra. <br />
Sunt istadas agiuntas fintzas 9 règulas noas de trasferimentu e curregidas calicuna de sas chi bi fiant giai. In pràtica, como si traduent in manera curreta sos tempos verbales dae su sardu a s'italianu (pretzisu su futuru e su cunditzionale). Est megiorada sa tradutzione de sos possessivos e si tratat carchi casu de enclìticos (in sardu bi podent èssere finas a tres enclìticos cando chi in italianu non bi nde podent àere prus de duos)<br />
<br />
Amus fatu carchi cosa fintzas in su ditzionàriu italianu, agiunghende·nche 4 règulas de disambiguatzione morfològica (de sa bator una de importu fiat sa disambiguatzione de "sono" comente a "so" e "sunt"). In prus, amus annantu una lista de istados de su mundu (chi nos at dadu Diegu Corràine) e dae custos nd'amus bogadu a campu fintzas sos gentilìtzios currispondentes, In totu su ditzionàriu bilìngue italianu-sardu tenet 1400 in prus dae su cumintzu de GSoC. S'ispurgadura de sos errores de su ditzionàriu sardu e s'agiunta de sas intradas ant a permìtere in pagu tempus de ammaniare una versione noa de su tradutore italianu-sardu.<br />
<br />
==Risorsas==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Pranos pro su benidore==<br />
Su traballu de sa de duas fases de su progetu at a èssere isfrutadu pro sa creatzione de una versione de ita-srd prus pretzisa e agiornada. Agabbare su traballu chi amus cumintzadu serbit pro sa creatzione de unu disambiguadore morfològicu chi at a èssere ùtile pro sa disambiguatzione de sos corpora e non si nd'at a pòdere fàghere a mancu pro isvilupare àteras crobas linguìsticas cun su sardu in su benidore.<br />
<br />
==Concrusiones==<br />
Su progetu est agabbadu cun risultados chi podimus cunsiderare bonos. Lis cheria torrare gràtzias a Apertium pro custa oportunidade, a Francis Tyers e mescamente a Hèctor Alòs i Font pro àere traballadu paris cun megus totu custos meses. <br />
Li torramus gràtzias a Diegu Corràine chi nos at agiudadu meda intreghende·nos materiales medas pro s'ammàniu de custu progetu e cussigiende·nos bene cada bia chi li pregontaìamus calicuna cosa in contu de limba sarda. Gràtzias fintzas a sa Gazeta pro sos testos chi faghent parte de su corpus chi amus creadu pro traballare. Est pretzisu mentovare sa Regione Sardigna chi at postu a disponimentu risorsas lìberas e trastos chi agiudant sa creatzione de trastos noos de còdighe abertu che a custu chi amus in pessu aprontadu.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd/_Apertium_ita-srd:_relata_finale&diff=64136Apertium cat-srd/ Apertium ita-srd: relata finale2017-08-28T16:08:34Z<p>Grfro3d: /* Descritzione de su traballu */</p>
<hr />
<div>==Descritzione de su traballu==<br />
<br />
Su progetu pro sa partetzipatzione a su programma “Google Summer of Code 2017” cun s’organizatzione Apertium est istadu s’isvilupu de unu Tradutore Automàticu basadu in règulas intre su catalanu e su sardu e sa sighida de su progetu de s’annu coladu, apertium ita-srd. Cust’idea benit dae sa voluntade de chèrrere isvilupare un’àteru trastu de agiudu pro sa limba sarda, sighinde su matessi caminu de su traballu fatu s’annu passadu pro apertium ita-srd.<br />
A custu progetu ant partetzipadu Gianfranco Fronteddu, Hèctor Alòs i Fonts e Francis Tyers.<br />
<br />
Comente si podet bìdere in su ligàmene de su “Work plan”, b’at àpidu duas fases: una prus longa, chi est durada totu sos meses de làmpadas e de trìulas, dedicada a catalanu-sardu e s’àtera, prus curtza, contivigiada in su mese de austu, ponende·nche sas bases pro unu tradutore sardu-italianu nou.<br />
<br />
Su sèberu de chèrrere isvilupare un'àteru tradutore automàticu in sardu, custa borta catàlanu-sardu, est dèvidu a unas cantas resones.<br />
Pro prima cosa, s'istadu de perìgulu chi est bivende sa limba sarda e su bisòngiu de nche ammanniare cantas prus fainas possìbiles in sardu, mescamente in sa tecnologia de còdighe abertu e in Limba Sarda Comuna, chi est sa proposta ortogràfica regionale de s'annu 2006 pigada comente riferimentu dae Apertium. Custu faghent a manera chi una faina che a custa siat de agiudu pro s'amparu de su sardu e pro s'istandardizatzione cumpleta de sa limba. Tando, isfrutende·nche su traballu de s'annu coladu, amus pensadu de creare un'àtera croba linguìstica e de megiorare sa chi bi fiat giai, creschende sa cantidade de risorsas in LSC e ponende·nche una base galu prus manna pro s'ammàniu de àteros traballos in su benidore.<br />
Imbetzes, su sèberu de isvilupare unu tradutore cun sa limba catalana est dèvidu in antis totu a su fatu chi su catalanu est, comente a su sardu, una de sas chimbe limbas de minoria in Sardigna (sardu, catalanu-aligheresu, gadduresu, tataresu e tabarchinu), faeddada in sa tzitade de S'Alighera, cun belle 33.000 faeddadores. In prus, su catalanu est una de sas limbas pro sa chi prus Apertium tenet risorsas.<br />
Posca, custa prataforma de còdighe abertu est adata pro limbas romantzas chi s'assimìgiant intre issas comente su catalanu e su sardu, chi rispetant custu rechisitu pro sas influèntzias e s'eredidade linguìstica catalana presente in sa limba sarda, dèvida a s'època de s'ocupatzione Catalana-Aragonesa in Sardigna. In custa manera sa cantidade manna de testos, testimònios e materiales in limba catalana a pitzu de s'istòria sarda ant a èssere a disponimentu fintzas in sardu etotu, gasi comente totu sas publicatziones e sos istùdios de sotziolinguìstica e de polìtica linguìstica de interessu pro sa situatzione de sa limba sarda.<br />
<br />
Sos impreos de custu tradutore diant pòdere èssere medas: s'isvilupu de sa Wikipèdia in sardu diat dèvere bènnere fintzas dae sa tradutzione de sos artìculos chi non sunt gasi detalliados in italianu o dant importu a aspetos diferentes. Bortende·nche fintzas dae un'àtera limba che a su catalanu sa cantidade de sas informatziones podet crèschere galu de prus oferende puru un'àtera manera pro espressare sos matessi cuntzetos. <br />
<br />
Pro cumpletare su pranu de traballu, bi fiat sa voluntade de fàghere calicuna cosa in su mese de austu fintzas pro srd-ita, semper chi sos obietivos de sa prima fase prus longa s'èsserent cumpridos in sos tempos disinnados. Comente amus pòdidu averguare chi sos risultados pro cat-srd a ùrtimos de Trìulas fiant bonos amus, detzìdidu de nos dedicare a srd-ita.<br />
<br />
==Prima fase: Apertium cat-srd (29 de Maju - 29 de Trìulas)==<br />
Sa prima fase at pertocadu s'isvilupu de su tradutore catalanu-sardu. Su tradutore, pro more de su traballu fatu in antis dae Francis Tyers, fiat in sa setzione "staging" e partiat cun unu tantu de 2645 paràulas in su ditzionàriu bilìngue, una "cobertura fina" (trimmed coverage)de belle su 77% e una pertzentuale de errore WER de su su 34.8%. S'obietivu fiat de lograre su 90% de cobertura e de abbassare su WER a mancu de su 15%.<br />
In càmbiu de s'annu passadu, chi pro isvilupare su tradutore italianu-sardu b'est istadu su bisòngiu de isvilupare in pràtica totu su ditzionàriu morfològicu sardu e megiorare puru aspetos de s'analizadore morfològicu italianu, ocannu si partiat dae duas limbas bene isvilupadas in sa prataforma de Apertium. Nos semus pòdidos cuntzentrare esclusivamente in su trasferimentu dae una limba a s'àtera, alleghende de paràulas, istruturas morfològicas e sintàticas.<br />
<br />
Sighende sas datas de su programma GSOC 2017, in su mese de maju e in sa prima chida de làmpadas ("Community Bounding") s'est traballadu meda in s'anàlisi cuntrastiva intre su catalanu e su sardu pro sa creatzione de sos "pending test". Pighende·nche comente riferimentu fintzas sos "pending test" de ita-srd, nche sunt essidas a campu diferèntzias istruturales in formas interrogativas, numerales, possessivos, fòrmulas de òbligu e formas continuadas, su passadu, futuru e cunditzionale, e mescamente sos clìticos.<br />
<br />
===Ditzionàriu morfològicu sardu===<br />
Pro cantu pertocat su ditzionàriu morfològicu sardu, disponìamus giai de unu cun 51.800 paràulas (chi includiat sos lemmas de su Curretore Regionale Ortogràficu Sardu), isvilupadu durante s'anteriore GSoC. Nche sunt istadas agiuntas 15.500 paràulas in prus: 1300 sustantivos, 800 agetivos, 300 avèrbios, 250 verbos e 12.500 nùmenes pròpios. Si tratat, pro sa majoria, de terminologia iscientìfica e tècnica, e vocabulàriu sotziopolìticu. Cun s'etzetzione de sos nùmenes pròpios, cun sos cales sunt istados sighidos su prus àteros critèrios, sa seletzione de sas paràulas de introdùere est istada fata partende dae sa Wikipedia catalana.<br />
Posca, est istadu acontzadu su ditzionàriu, boghende·nche medas allegas chi non fiant normativas e curregende faddinas in s'assignatzione de sos paradigmas (mescamente alleghende de su gènere assignadu a carchi sustantivu).<br />
<br />
===Ditzionàriu morfològicu catalanu===<br />
Fintzas in su ditzionàriu morfològicu catalanu b'at àpidu un'agiunta de nùmenes pròpios, belle 10.000.<br />
<br />
===Disambiguatzione morfològica in catalanu===<br />
In su ditzionàriu catalanu nche sunt istadas iscritas 15 règulas de disambiguatzione morfològica e nde est istada modificada calicun'àtera. <br />
<br />
===Règulas de seletzione lessicale===<br />
Su tradutore disponet de 274 règulas de seletzione lessicale. Si tratat de règulas chi sèberant cale de duas o prus possìbiles tradutziones est sa prus adata in unu determinadu cuntestu. (A mesu a mesu, in su ditzionàriu bilìngue b'at chèntinas de sèberos intre diferentes possìbiles tradutziones de una paràula, ma, a diferèntzia de sas règulas, custu sèberu si faghet in cada cuntestu.)<br />
<br />
===Règulas de trasferimentu===<br />
Su tradutore disponet de 78 règulas de trasferimentu. Si tratat de règulas chi modìficant s'istrutura de sa frase in catalanu pro l'adatare a s'istrutura chi bi bolet in sardu. Pro esempru:<br />
El meu llibre &amp;gt; Su libru meu<br />
Vaig menjar &amp;gt; Apo mandigadu<br />
He anat &amp;gt; So andadu<br />
Vull saludar-lo &amp;gt; Lu chèrgio saludare<br />
<br />
===Calidade===<br />
Sa valutatzione de sa calidade serbit a proare comente funtzionat su tradutore in sa pràtica. B'at medas maneras de la fàghere e sos testos chi si sèberant dipendent dae cal'est s'impreu chi si nde devet fàghere de su tradutore: in pagas paràulas, serbit a carculare cantas paràulas tocat de cambiare pro pòdere publicare su testu.<br />
<br />
"Su Word Error Rate" (WER) est s'indicadore chi inditat sas paràulas chi si devent cambiare pro pòdere publicare su testu. Segundu su "Work plan" s'obietivu fiat de nche arribbare a una pertzentuale prus bassa de su 15%. Su tassu de faddinas in sa tradutzione est 13,9% (nùmeru otentu cun s'indicadore WER partende dae duos testos pigados a casu de 600 paràulas de sa Wikipedia).<br />
<br />
''Sa cobertura'' de su tradutore (pertzentuale de paràulas reconnotas) est 94,0% (nùmeru otentu partinde dae unu corpus mannu de sa Wikipedia).<br />
<br />
===Testu in catalanu (seberadu a s'arriscu/a casu)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la "ciutat alta" i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Tradutzione automàtica a su sardu===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa "tzitade arta" e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Segunda fase: apertium srd-ita (austu 2017)==<br />
In sa segunda fase de su progetu amus traballadu in previsione de unu tradutore nou srd-ita. Su chi amus pòdidu fàghere est istadu cumintzare una disàmbiguatzione morfològica manuale de una parte de sos còrpora chi tenimus pro agiudare su tradutore a reconnòschere sa morfologia curreta de cada paràula, mescamente in cuddos testos chi non fiant de su totu a norma. <br />
Amus tratadu duos corpus: unu giornalìsticu e prus dialetale e s'àteru pigadu deretu dae testos literàrios perfetamente a norma LSC. De su primu corpus est istada etichetada un'annanta de 6000 paràulas, de su segundu 11800. Sa falta de tempus (duas chidas pro sa tarea) no at permìtidu revisionare s'etichetadura. Pro custu no est istadu possìbile creare unu disambiguadore morfològicu pro su sardu, comente fiat s'intentzione nostra. <br />
Sunt istadas agiuntas fintzas 9 règulas noas de trasferimentu e curregidas calicuna de sas chi bi fiant giai. In pràtica, como si traduent in manera curreta sos tempos verbales dae su sardu a s'italianu (pretzisu su futuru e su cunditzionale). Est megiorada sa tradutzione de sos possessivos e si tratat carchi casu de enclìticos (in sardu bi podent èssere finas a tres enclìticos cando chi in italianu non bi nde podent àere prus de duos)<br />
<br />
Amus fatu carchi cosa fintzas in su ditzionàriu italianu, agiunghende·nche 4 règulas de disambiguatzione morfològica (de sa bator una de importu fiat sa disambiguatzione de "sono" comente a "so" e "sunt"). In prus, amus annantu una lista de istados de su mundu (chi nos at dadu Diegu Corràine) e dae custos nd'amus bogadu a campu fintzas sos gentilìtzios currispondentes, In totu su ditzionàriu bilìngue italianu-sardu tenet 1400 in prus dae su cumintzu de GSoC. S'ispurgadura de sos errores de su ditzionàriu sardu e s'agiunta de sas intradas ant a permìtere in pagu tempus de ammaniare una versione noa de su tradutore italianu-sardu.<br />
<br />
==Risorsas==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Pranos pro su benidore==<br />
Su traballu de sa de duas fases de su progetu at a èssere isfrutadu pro sa creatzione de una versione de ita-srd prus pretzisa e agiornada. Agabbare su traballu chi amus cumintzadu serbit pro sa creatzione de unu disambiguadore morfològicu chi at a èssere ùtile pro sa disambiguatzione de sos corpora e non si nd'at a pòdere fàghere a mancu pro isvilupare àteras crobas linguìsticas cun su sardu in su benidore.<br />
<br />
==Concrusiones==<br />
Su progetu est agabbadu cun risultados chi podimus cunsiderare bonos. Lis cheria torrare gràtzias a Apertium pro custa oportunidade, a Francis Tyers e mescamente a Hèctor Alòs i Font pro àere traballadu paris cun megus totu custos meses. <br />
Li torramus gràtzias a Diegu Corràine chi nos at agiudadu meda intreghende·nos materiales medas pro s'ammàniu de custu progetu e cussigiende·nos bene cada bia chi li pregontaìamus calicuna cosa in contu de limba sarda. Gràtzias fintzas a sa Gazeta pro sos testos chi faghent parte de su corpus chi amus creadu pro traballare. Est pretzisu mentovare sa Regione Sardigna chi at postu a disponimentu risorsas lìberas e trastos chi agiudant sa creatzione de trastos noos de còdighe abertu che a custu chi amus in pessu aprontadu.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd/_Apertium_ita-srd:_relata_finale&diff=64135Apertium cat-srd/ Apertium ita-srd: relata finale2017-08-28T16:07:14Z<p>Grfro3d: /* Risorsas */</p>
<hr />
<div>==Descritzione de su traballu==<br />
<br />
Su progetu pro sa partetzipatzione a su programma “Google Summer of Code 2017” cun s’organizatzione Apertium est istadu s’isvilupu de unu Tradutore Automàticu basadu in règulas intre su catalanu e su sardu e sa sighida de su progetu de s’annu coladu, apertium ita-srd. Cust’idea benit dae sa voluntade de chèrrere isvilupare un’àteru trastu de agiudu pro sa limba sarda, sighinde su matessi caminu de su traballu fatu s’annu passadu pro apertium ita-srd.<br />
<br />
Comente si podet bìdere in su ligàmene de su “Work plan”, b’at àpidu duas fases: una prus longa, chi est durada totu sos meses de làmpadas e de trìulas, dedicada a catalanu-sardu e s’àtera, prus curtza, contivigiada in su mese de austu, ponende·nche sas bases pro unu tradutore sardu-italianu nou.<br />
<br />
Su sèberu de chèrrere isvilupare un'àteru tradutore automàticu in sardu, custa borta catàlanu-sardu, est dèvidu a unas cantas resones.<br />
Pro prima cosa, s'istadu de perìgulu chi est bivende sa limba sarda e su bisòngiu de nche ammanniare cantas prus fainas possìbiles in sardu, mescamente in sa tecnologia de còdighe abertu e in Limba Sarda Comuna, chi est sa proposta ortogràfica regionale de s'annu 2006 pigada comente riferimentu dae Apertium. Custu faghent a manera chi una faina che a custa siat de agiudu pro s'amparu de su sardu e pro s'istandardizatzione cumpleta de sa limba. Tando, isfrutende·nche su traballu de s'annu coladu, amus pensadu de creare un'àtera croba linguìstica e de megiorare sa chi bi fiat giai, creschende sa cantidade de risorsas in LSC e ponende·nche una base galu prus manna pro s'ammàniu de àteros traballos in su benidore.<br />
Imbetzes, su sèberu de isvilupare unu tradutore cun sa limba catalana est dèvidu in antis totu a su fatu chi su catalanu est, comente a su sardu, una de sas chimbe limbas de minoria in Sardigna (sardu, catalanu-aligheresu, gadduresu, tataresu e tabarchinu), faeddada in sa tzitade de S'Alighera, cun belle 33.000 faeddadores. In prus, su catalanu est una de sas limbas pro sa chi prus Apertium tenet risorsas.<br />
Posca, custa prataforma de còdighe abertu est adata pro limbas romantzas chi s'assimìgiant intre issas comente su catalanu e su sardu, chi rispetant custu rechisitu pro sas influèntzias e s'eredidade linguìstica catalana presente in sa limba sarda, dèvida a s'època de s'ocupatzione Catalana-Aragonesa in Sardigna. In custa manera sa cantidade manna de testos, testimònios e materiales in limba catalana a pitzu de s'istòria sarda ant a èssere a disponimentu fintzas in sardu etotu, gasi comente totu sas publicatziones e sos istùdios de sotziolinguìstica e de polìtica linguìstica de interessu pro sa situatzione de sa limba sarda.<br />
<br />
Sos impreos de custu tradutore diant pòdere èssere medas: s'isvilupu de sa Wikipèdia in sardu diat dèvere bènnere fintzas dae sa tradutzione de sos artìculos chi non sunt gasi detalliados in italianu o dant importu a aspetos diferentes. Bortende·nche fintzas dae un'àtera limba che a su catalanu sa cantidade de sas informatziones podet crèschere galu de prus oferende puru un'àtera manera pro espressare sos matessi cuntzetos. <br />
<br />
Pro cumpletare su pranu de traballu, bi fiat sa voluntade de fàghere calicuna cosa in su mese de austu fintzas pro srd-ita, semper chi sos obietivos de sa prima fase prus longa s'èsserent cumpridos in sos tempos disinnados. Comente amus pòdidu averguare chi sos risultados pro cat-srd a ùrtimos de Trìulas fiant bonos amus, detzìdidu de nos dedicare a srd-ita.<br />
<br />
==Prima fase: Apertium cat-srd (29 de Maju - 29 de Trìulas)==<br />
Sa prima fase at pertocadu s'isvilupu de su tradutore catalanu-sardu. Su tradutore, pro more de su traballu fatu in antis dae Francis Tyers, fiat in sa setzione "staging" e partiat cun unu tantu de 2645 paràulas in su ditzionàriu bilìngue, una "cobertura fina" (trimmed coverage)de belle su 77% e una pertzentuale de errore WER de su su 34.8%. S'obietivu fiat de lograre su 90% de cobertura e de abbassare su WER a mancu de su 15%.<br />
In càmbiu de s'annu passadu, chi pro isvilupare su tradutore italianu-sardu b'est istadu su bisòngiu de isvilupare in pràtica totu su ditzionàriu morfològicu sardu e megiorare puru aspetos de s'analizadore morfològicu italianu, ocannu si partiat dae duas limbas bene isvilupadas in sa prataforma de Apertium. Nos semus pòdidos cuntzentrare esclusivamente in su trasferimentu dae una limba a s'àtera, alleghende de paràulas, istruturas morfològicas e sintàticas.<br />
<br />
Sighende sas datas de su programma GSOC 2017, in su mese de maju e in sa prima chida de làmpadas ("Community Bounding") s'est traballadu meda in s'anàlisi cuntrastiva intre su catalanu e su sardu pro sa creatzione de sos "pending test". Pighende·nche comente riferimentu fintzas sos "pending test" de ita-srd, nche sunt essidas a campu diferèntzias istruturales in formas interrogativas, numerales, possessivos, fòrmulas de òbligu e formas continuadas, su passadu, futuru e cunditzionale, e mescamente sos clìticos.<br />
<br />
===Ditzionàriu morfològicu sardu===<br />
Pro cantu pertocat su ditzionàriu morfològicu sardu, disponìamus giai de unu cun 51.800 paràulas (chi includiat sos lemmas de su Curretore Regionale Ortogràficu Sardu), isvilupadu durante s'anteriore GSoC. Nche sunt istadas agiuntas 15.500 paràulas in prus: 1300 sustantivos, 800 agetivos, 300 avèrbios, 250 verbos e 12.500 nùmenes pròpios. Si tratat, pro sa majoria, de terminologia iscientìfica e tècnica, e vocabulàriu sotziopolìticu. Cun s'etzetzione de sos nùmenes pròpios, cun sos cales sunt istados sighidos su prus àteros critèrios, sa seletzione de sas paràulas de introdùere est istada fata partende dae sa Wikipedia catalana.<br />
Posca, est istadu acontzadu su ditzionàriu, boghende·nche medas allegas chi non fiant normativas e curregende faddinas in s'assignatzione de sos paradigmas (mescamente alleghende de su gènere assignadu a carchi sustantivu).<br />
<br />
===Ditzionàriu morfològicu catalanu===<br />
Fintzas in su ditzionàriu morfològicu catalanu b'at àpidu un'agiunta de nùmenes pròpios, belle 10.000.<br />
<br />
===Disambiguatzione morfològica in catalanu===<br />
In su ditzionàriu catalanu nche sunt istadas iscritas 15 règulas de disambiguatzione morfològica e nde est istada modificada calicun'àtera. <br />
<br />
===Règulas de seletzione lessicale===<br />
Su tradutore disponet de 274 règulas de seletzione lessicale. Si tratat de règulas chi sèberant cale de duas o prus possìbiles tradutziones est sa prus adata in unu determinadu cuntestu. (A mesu a mesu, in su ditzionàriu bilìngue b'at chèntinas de sèberos intre diferentes possìbiles tradutziones de una paràula, ma, a diferèntzia de sas règulas, custu sèberu si faghet in cada cuntestu.)<br />
<br />
===Règulas de trasferimentu===<br />
Su tradutore disponet de 78 règulas de trasferimentu. Si tratat de règulas chi modìficant s'istrutura de sa frase in catalanu pro l'adatare a s'istrutura chi bi bolet in sardu. Pro esempru:<br />
El meu llibre &amp;gt; Su libru meu<br />
Vaig menjar &amp;gt; Apo mandigadu<br />
He anat &amp;gt; So andadu<br />
Vull saludar-lo &amp;gt; Lu chèrgio saludare<br />
<br />
===Calidade===<br />
Sa valutatzione de sa calidade serbit a proare comente funtzionat su tradutore in sa pràtica. B'at medas maneras de la fàghere e sos testos chi si sèberant dipendent dae cal'est s'impreu chi si nde devet fàghere de su tradutore: in pagas paràulas, serbit a carculare cantas paràulas tocat de cambiare pro pòdere publicare su testu.<br />
<br />
"Su Word Error Rate" (WER) est s'indicadore chi inditat sas paràulas chi si devent cambiare pro pòdere publicare su testu. Segundu su "Work plan" s'obietivu fiat de nche arribbare a una pertzentuale prus bassa de su 15%. Su tassu de faddinas in sa tradutzione est 13,9% (nùmeru otentu cun s'indicadore WER partende dae duos testos pigados a casu de 600 paràulas de sa Wikipedia).<br />
<br />
''Sa cobertura'' de su tradutore (pertzentuale de paràulas reconnotas) est 94,0% (nùmeru otentu partinde dae unu corpus mannu de sa Wikipedia).<br />
<br />
===Testu in catalanu (seberadu a s'arriscu/a casu)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la "ciutat alta" i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Tradutzione automàtica a su sardu===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa "tzitade arta" e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Segunda fase: apertium srd-ita (austu 2017)==<br />
In sa segunda fase de su progetu amus traballadu in previsione de unu tradutore nou srd-ita. Su chi amus pòdidu fàghere est istadu cumintzare una disàmbiguatzione morfològica manuale de una parte de sos còrpora chi tenimus pro agiudare su tradutore a reconnòschere sa morfologia curreta de cada paràula, mescamente in cuddos testos chi non fiant de su totu a norma. <br />
Amus tratadu duos corpus: unu giornalìsticu e prus dialetale e s'àteru pigadu deretu dae testos literàrios perfetamente a norma LSC. De su primu corpus est istada etichetada un'annanta de 6000 paràulas, de su segundu 11800. Sa falta de tempus (duas chidas pro sa tarea) no at permìtidu revisionare s'etichetadura. Pro custu no est istadu possìbile creare unu disambiguadore morfològicu pro su sardu, comente fiat s'intentzione nostra. <br />
Sunt istadas agiuntas fintzas 9 règulas noas de trasferimentu e curregidas calicuna de sas chi bi fiant giai. In pràtica, como si traduent in manera curreta sos tempos verbales dae su sardu a s'italianu (pretzisu su futuru e su cunditzionale). Est megiorada sa tradutzione de sos possessivos e si tratat carchi casu de enclìticos (in sardu bi podent èssere finas a tres enclìticos cando chi in italianu non bi nde podent àere prus de duos)<br />
<br />
Amus fatu carchi cosa fintzas in su ditzionàriu italianu, agiunghende·nche 4 règulas de disambiguatzione morfològica (de sa bator una de importu fiat sa disambiguatzione de "sono" comente a "so" e "sunt"). In prus, amus annantu una lista de istados de su mundu (chi nos at dadu Diegu Corràine) e dae custos nd'amus bogadu a campu fintzas sos gentilìtzios currispondentes, In totu su ditzionàriu bilìngue italianu-sardu tenet 1400 in prus dae su cumintzu de GSoC. S'ispurgadura de sos errores de su ditzionàriu sardu e s'agiunta de sas intradas ant a permìtere in pagu tempus de ammaniare una versione noa de su tradutore italianu-sardu.<br />
<br />
==Risorsas==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Pranos pro su benidore==<br />
Su traballu de sa de duas fases de su progetu at a èssere isfrutadu pro sa creatzione de una versione de ita-srd prus pretzisa e agiornada. Agabbare su traballu chi amus cumintzadu serbit pro sa creatzione de unu disambiguadore morfològicu chi at a èssere ùtile pro sa disambiguatzione de sos corpora e non si nd'at a pòdere fàghere a mancu pro isvilupare àteras crobas linguìsticas cun su sardu in su benidore.<br />
<br />
==Concrusiones==<br />
Su progetu est agabbadu cun risultados chi podimus cunsiderare bonos. Lis cheria torrare gràtzias a Apertium pro custa oportunidade, a Francis Tyers e mescamente a Hèctor Alòs i Font pro àere traballadu paris cun megus totu custos meses. <br />
Li torramus gràtzias a Diegu Corràine chi nos at agiudadu meda intreghende·nos materiales medas pro s'ammàniu de custu progetu e cussigiende·nos bene cada bia chi li pregontaìamus calicuna cosa in contu de limba sarda. Gràtzias fintzas a sa Gazeta pro sos testos chi faghent parte de su corpus chi amus creadu pro traballare. Est pretzisu mentovare sa Regione Sardigna chi at postu a disponimentu risorsas lìberas e trastos chi agiudant sa creatzione de trastos noos de còdighe abertu che a custu chi amus in pessu aprontadu.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd/_Apertium_ita-srd:_relata_finale&diff=64134Apertium cat-srd/ Apertium ita-srd: relata finale2017-08-28T16:05:24Z<p>Grfro3d: Created page with "==Descritzione de su traballu== Su progetu pro sa partetzipatzione a su programma “Google Summer of Code 2017” cun s’organizatzione Apertium est istadu s’isvilupu de ..."</p>
<hr />
<div>==Descritzione de su traballu==<br />
<br />
Su progetu pro sa partetzipatzione a su programma “Google Summer of Code 2017” cun s’organizatzione Apertium est istadu s’isvilupu de unu Tradutore Automàticu basadu in règulas intre su catalanu e su sardu e sa sighida de su progetu de s’annu coladu, apertium ita-srd. Cust’idea benit dae sa voluntade de chèrrere isvilupare un’àteru trastu de agiudu pro sa limba sarda, sighinde su matessi caminu de su traballu fatu s’annu passadu pro apertium ita-srd.<br />
<br />
Comente si podet bìdere in su ligàmene de su “Work plan”, b’at àpidu duas fases: una prus longa, chi est durada totu sos meses de làmpadas e de trìulas, dedicada a catalanu-sardu e s’àtera, prus curtza, contivigiada in su mese de austu, ponende·nche sas bases pro unu tradutore sardu-italianu nou.<br />
<br />
Su sèberu de chèrrere isvilupare un'àteru tradutore automàticu in sardu, custa borta catàlanu-sardu, est dèvidu a unas cantas resones.<br />
Pro prima cosa, s'istadu de perìgulu chi est bivende sa limba sarda e su bisòngiu de nche ammanniare cantas prus fainas possìbiles in sardu, mescamente in sa tecnologia de còdighe abertu e in Limba Sarda Comuna, chi est sa proposta ortogràfica regionale de s'annu 2006 pigada comente riferimentu dae Apertium. Custu faghent a manera chi una faina che a custa siat de agiudu pro s'amparu de su sardu e pro s'istandardizatzione cumpleta de sa limba. Tando, isfrutende·nche su traballu de s'annu coladu, amus pensadu de creare un'àtera croba linguìstica e de megiorare sa chi bi fiat giai, creschende sa cantidade de risorsas in LSC e ponende·nche una base galu prus manna pro s'ammàniu de àteros traballos in su benidore.<br />
Imbetzes, su sèberu de isvilupare unu tradutore cun sa limba catalana est dèvidu in antis totu a su fatu chi su catalanu est, comente a su sardu, una de sas chimbe limbas de minoria in Sardigna (sardu, catalanu-aligheresu, gadduresu, tataresu e tabarchinu), faeddada in sa tzitade de S'Alighera, cun belle 33.000 faeddadores. In prus, su catalanu est una de sas limbas pro sa chi prus Apertium tenet risorsas.<br />
Posca, custa prataforma de còdighe abertu est adata pro limbas romantzas chi s'assimìgiant intre issas comente su catalanu e su sardu, chi rispetant custu rechisitu pro sas influèntzias e s'eredidade linguìstica catalana presente in sa limba sarda, dèvida a s'època de s'ocupatzione Catalana-Aragonesa in Sardigna. In custa manera sa cantidade manna de testos, testimònios e materiales in limba catalana a pitzu de s'istòria sarda ant a èssere a disponimentu fintzas in sardu etotu, gasi comente totu sas publicatziones e sos istùdios de sotziolinguìstica e de polìtica linguìstica de interessu pro sa situatzione de sa limba sarda.<br />
<br />
Sos impreos de custu tradutore diant pòdere èssere medas: s'isvilupu de sa Wikipèdia in sardu diat dèvere bènnere fintzas dae sa tradutzione de sos artìculos chi non sunt gasi detalliados in italianu o dant importu a aspetos diferentes. Bortende·nche fintzas dae un'àtera limba che a su catalanu sa cantidade de sas informatziones podet crèschere galu de prus oferende puru un'àtera manera pro espressare sos matessi cuntzetos. <br />
<br />
Pro cumpletare su pranu de traballu, bi fiat sa voluntade de fàghere calicuna cosa in su mese de austu fintzas pro srd-ita, semper chi sos obietivos de sa prima fase prus longa s'èsserent cumpridos in sos tempos disinnados. Comente amus pòdidu averguare chi sos risultados pro cat-srd a ùrtimos de Trìulas fiant bonos amus, detzìdidu de nos dedicare a srd-ita.<br />
<br />
==Prima fase: Apertium cat-srd (29 de Maju - 29 de Trìulas)==<br />
Sa prima fase at pertocadu s'isvilupu de su tradutore catalanu-sardu. Su tradutore, pro more de su traballu fatu in antis dae Francis Tyers, fiat in sa setzione "staging" e partiat cun unu tantu de 2645 paràulas in su ditzionàriu bilìngue, una "cobertura fina" (trimmed coverage)de belle su 77% e una pertzentuale de errore WER de su su 34.8%. S'obietivu fiat de lograre su 90% de cobertura e de abbassare su WER a mancu de su 15%.<br />
In càmbiu de s'annu passadu, chi pro isvilupare su tradutore italianu-sardu b'est istadu su bisòngiu de isvilupare in pràtica totu su ditzionàriu morfològicu sardu e megiorare puru aspetos de s'analizadore morfològicu italianu, ocannu si partiat dae duas limbas bene isvilupadas in sa prataforma de Apertium. Nos semus pòdidos cuntzentrare esclusivamente in su trasferimentu dae una limba a s'àtera, alleghende de paràulas, istruturas morfològicas e sintàticas.<br />
<br />
Sighende sas datas de su programma GSOC 2017, in su mese de maju e in sa prima chida de làmpadas ("Community Bounding") s'est traballadu meda in s'anàlisi cuntrastiva intre su catalanu e su sardu pro sa creatzione de sos "pending test". Pighende·nche comente riferimentu fintzas sos "pending test" de ita-srd, nche sunt essidas a campu diferèntzias istruturales in formas interrogativas, numerales, possessivos, fòrmulas de òbligu e formas continuadas, su passadu, futuru e cunditzionale, e mescamente sos clìticos.<br />
<br />
===Ditzionàriu morfològicu sardu===<br />
Pro cantu pertocat su ditzionàriu morfològicu sardu, disponìamus giai de unu cun 51.800 paràulas (chi includiat sos lemmas de su Curretore Regionale Ortogràficu Sardu), isvilupadu durante s'anteriore GSoC. Nche sunt istadas agiuntas 15.500 paràulas in prus: 1300 sustantivos, 800 agetivos, 300 avèrbios, 250 verbos e 12.500 nùmenes pròpios. Si tratat, pro sa majoria, de terminologia iscientìfica e tècnica, e vocabulàriu sotziopolìticu. Cun s'etzetzione de sos nùmenes pròpios, cun sos cales sunt istados sighidos su prus àteros critèrios, sa seletzione de sas paràulas de introdùere est istada fata partende dae sa Wikipedia catalana.<br />
Posca, est istadu acontzadu su ditzionàriu, boghende·nche medas allegas chi non fiant normativas e curregende faddinas in s'assignatzione de sos paradigmas (mescamente alleghende de su gènere assignadu a carchi sustantivu).<br />
<br />
===Ditzionàriu morfològicu catalanu===<br />
Fintzas in su ditzionàriu morfològicu catalanu b'at àpidu un'agiunta de nùmenes pròpios, belle 10.000.<br />
<br />
===Disambiguatzione morfològica in catalanu===<br />
In su ditzionàriu catalanu nche sunt istadas iscritas 15 règulas de disambiguatzione morfològica e nde est istada modificada calicun'àtera. <br />
<br />
===Règulas de seletzione lessicale===<br />
Su tradutore disponet de 274 règulas de seletzione lessicale. Si tratat de règulas chi sèberant cale de duas o prus possìbiles tradutziones est sa prus adata in unu determinadu cuntestu. (A mesu a mesu, in su ditzionàriu bilìngue b'at chèntinas de sèberos intre diferentes possìbiles tradutziones de una paràula, ma, a diferèntzia de sas règulas, custu sèberu si faghet in cada cuntestu.)<br />
<br />
===Règulas de trasferimentu===<br />
Su tradutore disponet de 78 règulas de trasferimentu. Si tratat de règulas chi modìficant s'istrutura de sa frase in catalanu pro l'adatare a s'istrutura chi bi bolet in sardu. Pro esempru:<br />
El meu llibre &amp;gt; Su libru meu<br />
Vaig menjar &amp;gt; Apo mandigadu<br />
He anat &amp;gt; So andadu<br />
Vull saludar-lo &amp;gt; Lu chèrgio saludare<br />
<br />
===Calidade===<br />
Sa valutatzione de sa calidade serbit a proare comente funtzionat su tradutore in sa pràtica. B'at medas maneras de la fàghere e sos testos chi si sèberant dipendent dae cal'est s'impreu chi si nde devet fàghere de su tradutore: in pagas paràulas, serbit a carculare cantas paràulas tocat de cambiare pro pòdere publicare su testu.<br />
<br />
"Su Word Error Rate" (WER) est s'indicadore chi inditat sas paràulas chi si devent cambiare pro pòdere publicare su testu. Segundu su "Work plan" s'obietivu fiat de nche arribbare a una pertzentuale prus bassa de su 15%. Su tassu de faddinas in sa tradutzione est 13,9% (nùmeru otentu cun s'indicadore WER partende dae duos testos pigados a casu de 600 paràulas de sa Wikipedia).<br />
<br />
''Sa cobertura'' de su tradutore (pertzentuale de paràulas reconnotas) est 94,0% (nùmeru otentu partinde dae unu corpus mannu de sa Wikipedia).<br />
<br />
===Testu in catalanu (seberadu a s'arriscu/a casu)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la "ciutat alta" i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Tradutzione automàtica a su sardu===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa "tzitade arta" e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Segunda fase: apertium srd-ita (austu 2017)==<br />
In sa segunda fase de su progetu amus traballadu in previsione de unu tradutore nou srd-ita. Su chi amus pòdidu fàghere est istadu cumintzare una disàmbiguatzione morfològica manuale de una parte de sos còrpora chi tenimus pro agiudare su tradutore a reconnòschere sa morfologia curreta de cada paràula, mescamente in cuddos testos chi non fiant de su totu a norma. <br />
Amus tratadu duos corpus: unu giornalìsticu e prus dialetale e s'àteru pigadu deretu dae testos literàrios perfetamente a norma LSC. De su primu corpus est istada etichetada un'annanta de 6000 paràulas, de su segundu 11800. Sa falta de tempus (duas chidas pro sa tarea) no at permìtidu revisionare s'etichetadura. Pro custu no est istadu possìbile creare unu disambiguadore morfològicu pro su sardu, comente fiat s'intentzione nostra. <br />
Sunt istadas agiuntas fintzas 9 règulas noas de trasferimentu e curregidas calicuna de sas chi bi fiant giai. In pràtica, como si traduent in manera curreta sos tempos verbales dae su sardu a s'italianu (pretzisu su futuru e su cunditzionale). Est megiorada sa tradutzione de sos possessivos e si tratat carchi casu de enclìticos (in sardu bi podent èssere finas a tres enclìticos cando chi in italianu non bi nde podent àere prus de duos)<br />
<br />
Amus fatu carchi cosa fintzas in su ditzionàriu italianu, agiunghende·nche 4 règulas de disambiguatzione morfològica (de sa bator una de importu fiat sa disambiguatzione de "sono" comente a "so" e "sunt"). In prus, amus annantu una lista de istados de su mundu (chi nos at dadu Diegu Corràine) e dae custos nd'amus bogadu a campu fintzas sos gentilìtzios currispondentes, In totu su ditzionàriu bilìngue italianu-sardu tenet 1400 in prus dae su cumintzu de GSoC. S'ispurgadura de sos errores de su ditzionàriu sardu e s'agiunta de sas intradas ant a permìtere in pagu tempus de ammaniare una versione noa de su tradutore italianu-sardu.<br />
<br />
==Risorsas==<br />
Curretore ortogràficu LSC<br />
Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari<br />
Normativa ortografica Limba Sarda Comuna<br />
Analitzadore hunspell<br />
Glossariu italinu-sardu<br />
Limbanaztiones.com<br />
Sardo Logudorese-Italiano<br />
Diccionari de la llengua catalana<br />
Sa gazeta<br />
<br />
==Pranos pro su benidore==<br />
Su traballu de sa de duas fases de su progetu at a èssere isfrutadu pro sa creatzione de una versione de ita-srd prus pretzisa e agiornada. Agabbare su traballu chi amus cumintzadu serbit pro sa creatzione de unu disambiguadore morfològicu chi at a èssere ùtile pro sa disambiguatzione de sos corpora e non si nd'at a pòdere fàghere a mancu pro isvilupare àteras crobas linguìsticas cun su sardu in su benidore.<br />
<br />
==Concrusiones==<br />
Su progetu est agabbadu cun risultados chi podimus cunsiderare bonos. Lis cheria torrare gràtzias a Apertium pro custa oportunidade, a Francis Tyers e mescamente a Hèctor Alòs i Font pro àere traballadu paris cun megus totu custos meses. <br />
Li torramus gràtzias a Diegu Corràine chi nos at agiudadu meda intreghende·nos materiales medas pro s'ammàniu de custu progetu e cussigiende·nos bene cada bia chi li pregontaìamus calicuna cosa in contu de limba sarda. Gràtzias fintzas a sa Gazeta pro sos testos chi faghent parte de su corpus chi amus creadu pro traballare. Est pretzisu mentovare sa Regione Sardigna chi at postu a disponimentu risorsas lìberas e trastos chi agiudant sa creatzione de trastos noos de còdighe abertu che a custu chi amus in pessu aprontadu.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64122Apertium cat-srd and ita-srd/GSoC 20172017-08-28T13:37:05Z<p>Grfro3d: /* Description */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu Final report==<br />
<br />
===Work and Commits===<br />
You can see my work and a full list of commits here: https://apertium.projectjj.com/gsoc2017/gfro3d.html.<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way of last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/ CROS] lemmas, developed during the previous [https://apertium.projectjj.com/gsoc2016/gfro3d.html/ GSoC 2016]. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia].<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
<br />
Vaig menjar > Apo mandigadu<br />
<br />
He anat > So andadu<br />
<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (July, 29th - August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64121Apertium cat-srd and ita-srd/GSoC 20172017-08-28T13:23:16Z<p>Grfro3d: /* Work and Commits */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu Final report==<br />
<br />
===Work and Commits===<br />
You can see my work and a full list of commits here: https://apertium.projectjj.com/gsoc2017/gfro3d.html.<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/ CROS] lemmas, developed during the previous [https://apertium.projectjj.com/gsoc2016/gfro3d.html/ GSoC 2016]. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia].<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
<br />
Vaig menjar > Apo mandigadu<br />
<br />
He anat > So andadu<br />
<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (July, 29th - August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64120Apertium cat-srd and ita-srd/GSoC 20172017-08-28T13:22:14Z<p>Grfro3d: /* Work and Commits */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu Final report==<br />
<br />
===Work and Commits===<br />
You can see my work and a full list of commits here https://apertium.projectjj.com/gsoc2017/gfro3d.html.<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/ CROS] lemmas, developed during the previous [https://apertium.projectjj.com/gsoc2016/gfro3d.html/ GSoC 2016]. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia].<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
<br />
Vaig menjar > Apo mandigadu<br />
<br />
He anat > So andadu<br />
<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (July, 29th - August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64119Apertium cat-srd and ita-srd/GSoC 20172017-08-28T13:19:17Z<p>Grfro3d: </p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu Final report==<br />
<br />
===Work and Commits===<br />
You can see my work and a full list of commits and modified files here https://apertium.projectjj.com/gsoc2017/gfro3d.html.<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/ CROS] lemmas, developed during the previous [https://apertium.projectjj.com/gsoc2016/gfro3d.html/ GSoC 2016]. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia].<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
<br />
Vaig menjar > Apo mandigadu<br />
<br />
He anat > So andadu<br />
<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (July, 29th - August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64109Apertium cat-srd and ita-srd/GSoC 20172017-08-27T20:08:46Z<p>Grfro3d: /* Modified files and Commits */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu==<br />
<br />
===Modified files and Commits===<br />
You can see a full list of commits and modified files here https://apertium.projectjj.com/gsoc2017/gfro3d.html.<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/ CROS] lemmas, developed during the previous [https://apertium.projectjj.com/gsoc2016/gfro3d.html/ GSoC 2016]. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia].<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
<br />
Vaig menjar > Apo mandigadu<br />
<br />
He anat > So andadu<br />
<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (July, 29th - August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64108Apertium cat-srd and ita-srd/GSoC 20172017-08-27T20:08:25Z<p>Grfro3d: /* Modified files and Commits */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu==<br />
<br />
===Modified files and Commits===<br />
You can see a full list of commits and modified files here [https://apertium.projectjj.com/gsoc2017/gfro3d.html].<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/ CROS] lemmas, developed during the previous [https://apertium.projectjj.com/gsoc2016/gfro3d.html/ GSoC 2016]. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia].<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
<br />
Vaig menjar > Apo mandigadu<br />
<br />
He anat > So andadu<br />
<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (July, 29th - August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64026Apertium cat-srd and ita-srd/GSoC 20172017-08-27T14:12:16Z<p>Grfro3d: /* Sardinian morphological dictionary */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu==<br />
<br />
===Modified files and Commits===<br />
You can see a full list of commits and modified files [https://apertium.projectjj.com/gsoc2017/gfro3d.html here].<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/ CROS] lemmas, developed during the previous [https://apertium.projectjj.com/gsoc2016/gfro3d.html/ GSoC 2016]. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia].<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
<br />
Vaig menjar > Apo mandigadu<br />
<br />
He anat > So andadu<br />
<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (July, 29th - August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64025Apertium cat-srd and ita-srd/GSoC 20172017-08-27T14:03:40Z<p>Grfro3d: /* Second phase: Apertium srd-ita (August, 29th) */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu==<br />
<br />
===Modified files and Commits===<br />
You can see a full list of commits and modified files [https://apertium.projectjj.com/gsoc2017/gfro3d.html here].<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/CROS] lemmas, developed during the previous [https://apertium.projectjj.com/gsoc2016/gfro3d.html/ GSoC 2016]. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia].<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
<br />
Vaig menjar > Apo mandigadu<br />
<br />
He anat > So andadu<br />
<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (July, 29th - August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64024Apertium cat-srd and ita-srd/GSoC 20172017-08-27T14:02:47Z<p>Grfro3d: /* Transfer rules */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu==<br />
<br />
===Modified files and Commits===<br />
You can see a full list of commits and modified files [https://apertium.projectjj.com/gsoc2017/gfro3d.html here].<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/CROS] lemmas, developed during the previous [https://apertium.projectjj.com/gsoc2016/gfro3d.html/ GSoC 2016]. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia].<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
<br />
Vaig menjar > Apo mandigadu<br />
<br />
He anat > So andadu<br />
<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64023Apertium cat-srd and ita-srd/GSoC 20172017-08-27T14:02:08Z<p>Grfro3d: /* First phase: Apertium cat-srd (May, 29th - July, 29th) */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu==<br />
<br />
===Modified files and Commits===<br />
You can see a full list of commits and modified files [https://apertium.projectjj.com/gsoc2017/gfro3d.html here].<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the [https://svn.code.sf.net/p/apertium/svn/staging/apertium-cat-srd/ "staging"] section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/CROS] lemmas, developed during the previous [https://apertium.projectjj.com/gsoc2016/gfro3d.html/ GSoC 2016]. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia].<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
Vaig menjar > Apo mandigadu<br />
He anat > So andadu<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64022Apertium cat-srd and ita-srd/GSoC 20172017-08-27T14:00:13Z<p>Grfro3d: /* Quality */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu==<br />
<br />
===Modified files and Commits===<br />
You can see a full list of commits and modified files [https://apertium.projectjj.com/gsoc2017/gfro3d.html here].<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the "staging" section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/CROS] lemmas, developed during the previous [https://apertium.projectjj.com/gsoc2016/gfro3d.html/ GSoC 2016]. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia].<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
Vaig menjar > Apo mandigadu<br />
He anat > So andadu<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64021Apertium cat-srd and ita-srd/GSoC 20172017-08-27T13:59:49Z<p>Grfro3d: /* Resources */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu==<br />
<br />
===Modified files and Commits===<br />
You can see a full list of commits and modified files [https://apertium.projectjj.com/gsoc2017/gfro3d.html here].<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the "staging" section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/CROS] lemmas, developed during the previous [https://apertium.projectjj.com/gsoc2016/gfro3d.html/ GSoC 2016]. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia].<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
Vaig menjar > Apo mandigadu<br />
He anat > So andadu<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/ Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/ Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64020Apertium cat-srd and ita-srd/GSoC 20172017-08-27T13:58:52Z<p>Grfro3d: /* Sardinian morphological dictionary */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu==<br />
<br />
===Modified files and Commits===<br />
You can see a full list of commits and modified files [https://apertium.projectjj.com/gsoc2017/gfro3d.html here].<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the "staging" section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/CROS] lemmas, developed during the previous [https://apertium.projectjj.com/gsoc2016/gfro3d.html/ GSoC 2016]. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia].<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
Vaig menjar > Apo mandigadu<br />
He anat > So andadu<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64019Apertium cat-srd and ita-srd/GSoC 20172017-08-27T13:58:25Z<p>Grfro3d: /* Sardinian morphological dictionary */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu==<br />
<br />
===Modified files and Commits===<br />
You can see a full list of commits and modified files [https://apertium.projectjj.com/gsoc2017/gfro3d.html here].<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the "staging" section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the [http://www.sardegnacultura.it/cds/cros-lsc/CROS] lemmas, developed during the previous [https://apertium.projectjj.com/gsoc2016/gfro3d.html/GSoC 2016]. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the [https://ca.wikipedia.org/wiki/Portada/Catalan Wikipedia].<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
Vaig menjar > Apo mandigadu<br />
He anat > So andadu<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64018Apertium cat-srd and ita-srd/GSoC 20172017-08-27T13:54:06Z<p>Grfro3d: /* Description */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu==<br />
<br />
===Modified files and Commits===<br />
You can see a full list of commits and modified files [https://apertium.projectjj.com/gsoc2017/gfro3d.html here].<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way last year.<br />
<br />
As can be seen in the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the [http://wiki.apertium.org/wiki/Catalan_and_Sardinian/Work_plan "Work Plan"], it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the "staging" section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the CROSS lemma), developed during the previous GSoC2016. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the Catalan Wikipedia.<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
Vaig menjar > Apo mandigadu<br />
He anat > So andadu<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64017Apertium cat-srd and ita-srd/GSoC 20172017-08-27T13:51:22Z<p>Grfro3d: /* Work and Commits */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu==<br />
<br />
===Modified files and Commits===<br />
You can see a full list of commits and modified files [https://apertium.projectjj.com/gsoc2017/gfro3d.html here].<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way last year.<br />
<br />
As can be seen in the "Work Plan", there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the work plan, it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the "staging" section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the CROSS lemma), developed during the previous GSoC2016. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the Catalan Wikipedia.<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
Vaig menjar > Apo mandigadu<br />
He anat > So andadu<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64016Apertium cat-srd and ita-srd/GSoC 20172017-08-27T13:50:54Z<p>Grfro3d: /* Work and Commit */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu==<br />
<br />
===Work and Commits===<br />
You can see a full list of commits and modified files [https://apertium.projectjj.com/gsoc2017/gfro3d.html here].<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way last year.<br />
<br />
As can be seen in the "Work Plan", there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the work plan, it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the "staging" section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the CROSS lemma), developed during the previous GSoC2016. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the Catalan Wikipedia.<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
Vaig menjar > Apo mandigadu<br />
He anat > So andadu<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64015Apertium cat-srd and ita-srd/GSoC 20172017-08-27T13:49:52Z<p>Grfro3d: /* Commit */</p>
<hr />
<div>== Google Summer of Code 2017 Gianfranco Fronteddu==<br />
<br />
===Work and Commit===<br />
You can see a full list of commits and modified files [https://apertium.projectjj.com/gsoc2017/gfro3d.html here].<br />
<br />
== Student Information ==<br />
<br />
'''Name:''' Gianfranco Fronteddu<br />
<br />
'''Location:''' Casteddu, Sardigna<br />
<br />
'''E-mail:''' gfro3d@gmail.com<br />
<br />
'''IRC:''' gianfranco<br />
<br />
'''SourceForge:''' gfro3d<br />
<br />
'''Telegram:''' gianfro4moros<br />
<br />
'''Skype:''' gianfranco.fronteddu88<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way last year.<br />
<br />
As can be seen in the "Work Plan", there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the work plan, it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the "staging" section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the CROSS lemma), developed during the previous GSoC2016. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the Catalan Wikipedia.<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
Vaig menjar > Apo mandigadu<br />
He anat > So andadu<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64014Apertium cat-srd and ita-srd/GSoC 20172017-08-27T13:46:25Z<p>Grfro3d: /* Resources */</p>
<hr />
<div>==Commit==<br />
You can see a full list of commits and modified files [https://apertium.projectjj.com/gsoc2017/gfro3d.html here].<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way last year.<br />
<br />
As can be seen in the "Work Plan", there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the work plan, it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the "staging" section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the CROSS lemma), developed during the previous GSoC2016. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the Catalan Wikipedia.<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
Vaig menjar > Apo mandigadu<br />
He anat > So andadu<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
* [http://dlc.iec.cat/Institut d'Estudis Catalans: Diccionari de la llengua catalana]<br />
* [http://www.sagazeta.info/Sa Gazeta]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64013Apertium cat-srd and ita-srd/GSoC 20172017-08-27T13:43:31Z<p>Grfro3d: /* Resources */</p>
<hr />
<div>==Commit==<br />
You can see a full list of commits and modified files [https://apertium.projectjj.com/gsoc2017/gfro3d.html here].<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way last year.<br />
<br />
As can be seen in the "Work Plan", there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the work plan, it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the "staging" section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the CROSS lemma), developed during the previous GSoC2016. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the Catalan Wikipedia.<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
Vaig menjar > Apo mandigadu<br />
He anat > So andadu<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/ Correctore ortogràficu LSC]<br />
* [http://www.sardegnacultura.it/documenti/7_81_20080107092727.pdf Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari]<br />
* [http://www.sardegnacultura.it/documenti/7_25_20060427093224.pdf Normativa ortografica Limba Sarda Comuna]<br />
* [http://www.sardegnacultura.it/cds/cros-lsc/cros.oxt Analitzadore hunspell]<br />
* [http://www.sardegnacultura.it/documenti/7_108_20090205130512.pdf Glossàriu italianu-sardu] <br />
* [http://limbasnatziones.tempusnostru.it/home.page/ Limbanaztiones.com]<br />
* [http://http://vocabolariocasu.isresardegna.it/Vocabolario Sardo Logudorese-Italiano]<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64012Apertium cat-srd and ita-srd/GSoC 20172017-08-27T13:41:42Z<p>Grfro3d: /* Quality */</p>
<hr />
<div>==Commit==<br />
You can see a full list of commits and modified files [https://apertium.projectjj.com/gsoc2017/gfro3d.html here].<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way last year.<br />
<br />
As can be seen in the "Work Plan", there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the work plan, it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the "staging" section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the CROSS lemma), developed during the previous GSoC2016. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the Catalan Wikipedia.<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
Vaig menjar > Apo mandigadu<br />
He anat > So andadu<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
<br />
===Text in Catalan (chosen randomly)===<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
Curretore ortogràficu LSC<br />
Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari<br />
Normativa ortografica Limba Sarda Comuna<br />
Analitzadore hunspell<br />
Glossariu italinu-sardu<br />
Limbanaztiones.com<br />
Sardo Logudorese-Italiano<br />
Diccionari de la llengua catalana<br />
Sa gazeta<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3dhttps://wiki.apertium.org/w/index.php?title=Apertium_cat-srd_and_ita-srd/GSoC_2017&diff=64011Apertium cat-srd and ita-srd/GSoC 20172017-08-27T13:41:22Z<p>Grfro3d: /* Quality */</p>
<hr />
<div>==Commit==<br />
You can see a full list of commits and modified files [https://apertium.projectjj.com/gsoc2017/gfro3d.html here].<br />
<br />
==Description==<br />
The project for participation in the Google Summer of Code 2017 program with Apertium was the development of a Ruled-Based Machine Translation between Catalan and Sardinian and the continuation of last year's project, apertium ita-srd. This idea comes from the desire to develop another tool to help Sardinian language, following the same way last year.<br />
<br />
As can be seen in the "Work Plan", there were two phases: one longer, which lasted during June and July, devoted to Catalan Sardinian and the other, shorter, held in August, to start preparing a new Sardinian-Italian translator.<br />
To complete the work plan, it was decided to do something even in the month of August for srd-ita, assuming that the objectives of the first longest phase were achieved on schedule. When we could verify that the results for cat-srd at the end of July were good, we decided to dedicate ourselves to srd-ita.<br />
<br />
==First phase: Apertium cat-srd (May, 29th - July, 29th)==<br />
The first coding phase lasting until the second evaluation of GSoC concerned Catalan-Sardinian translator. The translator, thanks to the work done by Francis Tyers, was initially in the "staging" section and started from a number of 2645 in the bilingual dictionary, a trimmed coverage of about 77% and a WER error rate of 34.8 %. The goal was to get 90% coverage and to lower the WER to less than 15%.<br />
<br />
Unlike last year, during which to develop the translator it was necessary to develop almost all the morphological dictionary and also improve aspects of the Italian morphological analyzer, this year started from two languages already developed on the Apertium platform. We could focus mainly on transferring from one language to another, speaking of words, morphological and syntactic structures.<br />
<br />
In observing the dates of the GSoC 2017 program, in May and in the first week of June ("Community Bounding") a great deal of contrasting analysis between Italian and Sardinian has been done to create "pending tests". Also, referring to the "pending test" of ita-srd, structural differences in numerals, possessive forms, duty formulas and continuous tences, past tences, future, conditional and clitic were highlighted.<br />
<br />
===Sardinian morphological dictionary===<br />
Regarding the Sardinian morphological dictionary, we already had a dictionary of 51,800 words (including the CROSS lemma), developed during the previous GSoC2016. There were added 15,500 more words: 1300 nouns, 800 adjectives, 300 adverbs 250 verbs and 12,500 own names. It is, for the most part, scientific and technical terminology, and socio-political vocabulary. Except for proper names, for which other criteria were followed, the selection of the words to be introduced was made by the Catalan Wikipedia.<br />
Then the dictionary was adjusted, removing many words that were not normative and correcting mistakes in the assignment of paradigms (especially speaking of the genre assigned to some nouns).<br />
<br />
===Catalan morphological dictionary===<br />
Also in the Catalan morphological dictionary there was an addition of proper names, almost 10,000.<br />
<br />
===Morphological disambiguation in catalan===<br />
In Catalan dictionary they were written 15 of morphological disambiguation rules and someone else has been modified.<br />
<br />
===Lexical selection rules===<br />
The translator has 274 lexical selection rules. These are rules that choose which of two or more possible translations is most appropriate in a given context.<br />
<br />
===Transfer rules===<br />
The translator has 78 transfer rules. These are rules that modify the structure of the sentence in Catalan to fit the structure needed in Sardinian. For example:<br />
<br />
El meu llibre > Su libru meu<br />
Vaig menjar > Apo mandigadu<br />
He anat > So andadu<br />
Vull saludar-lo > Lu chèrgio saludare<br />
<br />
===Quality===<br />
The quality assessment is used to see how the translator works in practice. There are many ways to do it, and the choice of the texts depends on how the translator will be used: simply, you need to calculate how many words you have to change in order to publish the text.<br />
<br />
''Word Error Rate (WER)'' is the indicator that indicates the words that need to be changed in order to publish the text. According to the Work Plan, the goal was to get a lower percentage of 15%. The rate of errors in translation is 13.9% (number obtained by WER indicator calculated on texts taken randomly of 600 words from Wikipedia).<br />
<br />
''Translator coverage'' (percentage of recognized words) is 94% (number obtained from a large corpus of Wikipedia).<br />
<br />
<br />
===Text in Catalan=== (chosen randomly)<br />
L'Acròpoli d'Atenes és l'acròpoli grega més important. L'Acròpoli era, literalment, la “ciutat alta” i estava present a la majoria de ciutats gregues, amb una doble funció: defensiva i com a seu dels principals llocs de culte. L'Acròpoli d'Atenes està situada sobre un turó a uns 165 metres per sobre del nivell de la ciutat. També és coneguda com a Cecròpia en honor del llegendari home serp, Cècrops, rei d'Atenes.<br />
<br />
===Machine-translation to Sardinian===<br />
S'Acròpoli de Atene est s'acròpoli grega prus importante. S'Acròpoli fiat, literalmente, sa “tzitade arta” e fiat presente a sa majoria de tzitades gregas, cun una dòpia funtzione: difensora e comente a sede de sos printzipales logos de cultu. S'Acròpoli de Atene est situada subra unu montigru a unos 165 metros in subra de su livellu de sa tzitade. Puru est connota comente a Cecròpia in onore de su legendàriu òmine colovra, Cècrops, re de Atene.<br />
<br />
==Second phase: Apertium srd-ita (August, 29th)==<br />
In the second phase of the project we worked in preparation for a new translator srd-ita. What we could do was start a manual morphological disambiguation of the corpora that has to help the translator to recognize the correct morphology of each word, especially in texts which don't respect the standard orthographic LSC.<br />
We have treated two corpora: one journalist and more dialect, and other taken directly from literary texts written in perfect LSC. Of the first, 6000 words were added, of the second 11800.<br />
<br />
They have been added 9 new transfer rules and correct some of those that were already there. Now tenses from the Sardinian to Italian are translated correctly. It also improved the translation of possessive and is some cases of enclitics (Sardinian there may be up to three enclitics, whereas in Italian can not there be more than two).<br />
<br />
We did something even in the Italian dictionary, adding 4 rules of morphological disambiguation (a very important was the disambiguation of "sono" as "so" and "sunt"). Additionally, we have added a list of countries in the world (who gave us Diegu Corràine) and we have obtained the corresponding gentiles. Since the beginning of GSoC 2017 1400 words have been added to the bilingual dictionary ita-srd. The cleaning of the Sardinian dictionary from the mistakes and adding new entries will quickly develop a new version of ita-srd dictionary.<br />
<br />
==Resources==<br />
Curretore ortogràficu LSC<br />
Dizionario universale della lingua di Sardegna Italiano-Sardo-Italiano, Edes, 2006, Cagliari<br />
Normativa ortografica Limba Sarda Comuna<br />
Analitzadore hunspell<br />
Glossariu italinu-sardu<br />
Limbanaztiones.com<br />
Sardo Logudorese-Italiano<br />
Diccionari de la llengua catalana<br />
Sa gazeta<br />
<br />
==Future plans==<br />
The work of the second phase of the project will be used for the creation of a more accurate and updated version of ita-srd.</div>Grfro3d