Difference between revisions of "User:Capsot/Proposal oci-fra/fra-oci Translator"
Line 19: | Line 19: | ||
== Why is it that you are interested in Apertium? == |
== Why is it that you are interested in Apertium? == |
||
I have known the Apertium translation project many years ago while collaborating as a linguist to the first Occitan translating tools in the Val d’Aran which was working on an Occitan translator using two linguistic varieties (a standard Occitan and an Aranese one). The Apertium community seems to already have many good translating tools; people there share genuine interest towards any languages, and treat every one of these as equal, without any real hierarchy whether dominant or minoritized, which I particularly appreciate. Then the collaborative atmosphere is really pleasant; many people have helped me with the technical issues quickly and kindly. |
I have known the Apertium translation project many years ago while collaborating as a linguist to the first Occitan translating tools in the Val d’Aran which was working on an Occitan translator using two linguistic varieties (a standard Occitan and an Aranese one). The Apertium community seems to already have many good translating tools; people there share genuine interest towards any languages, and treat every one of these as equal, without any real hierarchy whether dominant or minoritized, which I particularly appreciate. Then the collaborative atmosphere is really pleasant; many people have helped me with the technical issues quickly and kindly. |
||
I hope I can contribute and enrich the projects of the Apertium community with my knowledge and command of languages. |
I hope I can contribute and enrich the projects of the Apertium community with my knowledge and command of languages. |
||
Revision as of 20:23, 25 March 2018
Possible Mentor: Hèctor Alòs
Contents
- 1 Skills and experience
- 2 Why is it you are interested in machine translation?
- 3 Why is it that you are interested in Apertium?
- 4 Which of the published tasks are you interested in? What do you plan to do?
- 5 Reasons why Google and Apertium should sponsor it
- 6 A description of how and who it will benefit in society
- 7 Work plan
Skills and experience
My native languages are Catalan and French but I also master Occitan and Spanish to a high professional level. I have a very good command of Italian and English too. Furthermore I can understand and speak some basic Ukrainian.
Besides my diverse teaching (High School and University) and translating experience (many translations from Catalan or French to Occitan for instance), I am a linguist interested in many languages, especially the Romance family, and I am specialized in Occitan and Catalan dialectology. I coauthored a Catalan-Occitan/Occitan-Catalan dictionary with Patrici Pojada in 2005.
I should complete my thesis, which is already close to completion, in Catalan and Occitan dialectology in 2018. I am currently (since 2016) member of the Acadèmia Aranesa dera Lengua Occitana of the Aran Valley (Val d’Aran, Catalonia) and previously in the Grop de Lingüistica Occitana, which asked me to elaborate an Occitan lexicon about new technologies with the TERMCAT.
Moreover I have contributed previously in the Aranese Comission deth Traductor (2008), which helped shaping the linguistic stockword included in Gema Ramírez and Carme Armentano’s Apertium Occitan translator.
Though I did not have real previous experience in coding, I have made many contributions in the Occitan and Catalan Wikipedias, therefore I already had some knowledge of the wikisyntaxcode. During the last weeks, I have learned the basic commands while working on the oci-fra file with Hèctor Alòs.
Why is it you are interested in machine translation?
I have been a Wikipedia editor (mainly on the Occitan and Catalan versions) for a long time and witnessed how machine translation can help expand their content in the Catalan Viquipèdia, which has very good translating tools. Automated translation can thus provide a helpful hand in acquiring articles from other Wikipedias and prove to be an amazing gain of time and energies for small communities like the Occitan one.
Why is it that you are interested in Apertium?
I have known the Apertium translation project many years ago while collaborating as a linguist to the first Occitan translating tools in the Val d’Aran which was working on an Occitan translator using two linguistic varieties (a standard Occitan and an Aranese one). The Apertium community seems to already have many good translating tools; people there share genuine interest towards any languages, and treat every one of these as equal, without any real hierarchy whether dominant or minoritized, which I particularly appreciate. Then the collaborative atmosphere is really pleasant; many people have helped me with the technical issues quickly and kindly.
I hope I can contribute and enrich the projects of the Apertium community with my knowledge and command of languages.
Which of the published tasks are you interested in? What do you plan to do?
Reasons why Google and Apertium should sponsor it
A description of how and who it will benefit in society
Work plan
- Note: The French → Occitan part of the project is the main direction.
- Nòta: La part francés → occitan del projècte es la direccion principala.
- Note : La partie français → occitan du projet est la direction principale.
Setmana | Datas | Descripcion | Bidix (sens np) previst |
(%) Cobertura prevista |
(%) WER previst |
Testvoc | Avaloracion | Bidix real |
(%) Cobertura reala |
(%) WER | Err. | Fet? |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | français > occitan | ~5,700 | ||||||||||
1 | 14 mai—20 mai | Improving Occitan monodix Adding prn, pr, cnj*, basic adv to bidix |
~6,000 | ~84,0% | ||||||||
2 | 21 mai—27 mai | Adding n, adj, adv to the bidix from the French Wictionary | ~12,000 | ~86,0% | ||||||||
3 | 28 mai—3 junh | Adding vblex to the bidix from the French Wictionary Beginning to add missing words in decreasing order of frequency fra > oci |
~14,000 | ~88.0% | ||||||||
4 | 4 junh—10 junh | Adding words Transfer rules fra > oci |
~16,000 | ~89.0% | ||||||||
5 | 11 junh—15 junh Deliverable #1: French to Occitan translator |
Adding words Transfer rules fra > oci |
~18,000 | ~89.5% | ~25% | |||||||
6 | 18 junh—24 junh | Adding words Transfer rules fra > oci Begin testvoc fra > oci |
~20,000 | ~90.0% | pr, cnj*, adv, prn, det | |||||||
7 | 25 junh—1 julhet | Adding words Transfer rules fra > oci Testvoc fra > oci |
~21,000 | ~90.5% | vblex | |||||||
8 | 2 julhet—8 julhet | Adding words Transfer rules fra > oci Testvoc fra > oci |
~22,000 | ~91.0% | adj | |||||||
9 | 9 julhet—13 julhet Deliverable #2: French to Occitan translator |
Transfer rules fra > oci Testvoc fra > oci |
~22,000 | ~91.0% | ~15% | n | ||||||
0 | occitan > français | ~22,000 | ||||||||||
10 | 16 julhet—22 julhet | Adding missing words in decreasing order of frequency oci > fra Transfer rules oci > fra Testvoc oci > fra |
~22,500 | ~88.0% | pr, cnj*, adv, prn, det | |||||||
13 | 23 julhet—29 julhet | Adding words Transfer rules oci > fra Testvoc oci > fra |
~23,000 | ~89.0% | n, adj | |||||||
11 | 30 julhet—5 agost | Adding words Transfer rules oci > fra Testvoc oci > fra |
~23,500 | ~90.0% | vblex | |||||||
12* | 6 agost—9 agost | Final improvements | ||||||||||
12** | 10 agost—14 agost Deliverable #3: Occitan to French translator |
Final evalution | ~23,500 | ~90.0% | ~30% |
Per calcular los nombres
- Errors (calcular en apertium-fra-oci/dev)
$ bash dev/testvoc/generation.sh fra-oci | wc -l # en apertium-oci-fra $ bash dev/testvoc/generation.sh oci-fra | wc -l # en apertium-oci-fra
- Bidix (calcular en apertium-oci-fra)
$ cat apertium-oci-fra.oci-fra.dix | grep '<l' | grep -v '¨np"' | wc -l
- Cobertura (calcular en apertium-oci-fra)
$ cat ../apertium-fra/corpus/corpus_fra_wp100000.txt | apertium -d . fra-oci-morph | sed 's/\$\W*\^/$\n^/g' > /tmp/fra-oci.coverage.txt $ calc `cat /tmp/fra-oci.coverage.txt | grep -v '\*' | wc -l `/`cat /tmp/fra-oci.coverage.txt | wc -l` $ cat ../apertium-cat/corpus/corpus_oci_wp100000.txt | apertium -d . oci-fra-morph | sed 's/\$\W*\^/$\n^/g' > /tmp/oci-fra.coverage.txt $ calc `cat /tmp/oci-fra.coverage.txt | grep -v '\*' | wc -l `/`cat /tmp/oci-fra.coverage.txt | wc -l`