Difference between revisions of "User:Capsot/Proposal oci-fra/fra-oci Translator"
Jump to navigation
Jump to search
m |
m (→Work plan) |
||
Line 18: | Line 18: | ||
== Work plan == |
== Work plan == |
||
* |
*Note: The French → Occitan part of the project is the main direction. |
||
*<small>Nòta: La part francés → occitan del projècte es la direccion principala.</small> |
|||
*<small>Note : La partie français → occitan du projet est la direction principale.</small> |
|||
{|class=wikitable |
|||
! Setmana !! Datas !! Descripcion !! Bidix<br/>(sens np)<br/>previst !!(%) Cobertura<br/>prevista !! (%) WER<br/>previst !! Testvoc !! Avaloracion !! Bidix<br/>real !! (%) Cobertura<br/>reala !! (%) WER !! Err. !! Fet? |
|||
|- |
|||
| 0 || <b>français > occitan</b> || || ~5,700 || || || || || || || || || |
|||
|- |
|||
| 1 || 14 mai—20 mai || Improving Occitan monodix<br/>Adding prn, pr, cnj*, basic adv to bidix || ~6,000 || ~84,0% || || || || || || || || |
|||
|- |
|||
| 2 || 21 mai—27 mai || Adding n, adj, adv to the bidix from the French Wictionary || ~12,000 || ~86,0% || || || || || || || || |
|||
|- |
|||
| 3 || 28 mai—3 junh || Adding vblex to the bidix from the French Wictionary<br/>Beginning to add missing words in decreasing order of frequency fra > oci || ~14,000 || ~88.0% || || || || || || || || |
|||
|- |
|||
| 4 || 4 junh—10 junh || Adding words<br/>Transfer rules fra > oci || ~16,000 || ~89.0% || || || || || || || || |
|||
|- |
|||
| 5 || <b>11 junh—15 junh<br>Deliverable #1: French to Occitan translator</b> || Adding words<br/>Transfer rules fra > oci || <b>~18,000</b> || <b>~89.5%</b> || <b>~25%</b> || || || || || || || |
|||
|- |
|||
| 6 || 18 junh—24 junh || Adding words<br/>Transfer rules fra > oci<br/>Begin testvoc fra > oci || ~20,000 || ~90.0% || || pr, cnj*, adv, prn, det || || || || || || |
|||
|- |
|||
| 7 || 25 junh—1 julhet || Adding words<br/>Transfer rules fra > oci<br/>Testvoc fra > oci || ~21,000 || ~90.5% || || vblex || || || || || || |
|||
|- |
|||
| 8 || 2 julhet—8 julhet || Adding words<br/>Transfer rules fra > oci<br/>Testvoc fra > oci || ~22,000 || ~91.0% || || adj || || || || || || |
|||
|- |
|||
| 9 || <b>9 julhet—13 julhet<br>Deliverable #2: French to Occitan translator</b> || Transfer rules fra > oci<br/>Testvoc fra > oci || <b>~22,000</b> || <b>~91.0%</b> || <b>~15%</b> || n || || || || || || |
|||
|- |
|||
| 0 || <b>occitan > français</b> || || ~22,000 || || || || || || || || || |
|||
|- |
|||
| 10 || 16 julhet—22 julhet || Adding missing words in decreasing order of frequency oci > fra<br/>Transfer rules oci > fra<br/>Testvoc oci > fra || ~22,500 || ~88.0% || || pr, cnj*, adv, prn, det || || || || || || |
|||
|- |
|||
| 13 || 23 julhet—29 julhet || Adding words<br/>Transfer rules oci > fra <br/>Testvoc oci > fra || ~23,000 || ~89.0% || || n, adj || || || || || || |
|||
|- |
|||
| 11 || 30 julhet—5 agost || Adding words<br/>Transfer rules oci > fra <br/>Testvoc oci > fra || ~23,500 || ~90.0% || || vblex || || || || || || |
|||
|- |
|||
| 12* || 6 agost—9 agost || Final improvements || || || || || || || || || || |
|||
|- |
|||
| 12** || <b>10 agost—14 agost<br>Deliverable #3: Occitan to French translator</b> || Final evalution|| <b>~23,500</b> || <b>~90.0%</b> || <b>~30%</b> || || || || || || || |
|||
|- |
|||
|} |
|||
===Per calcular los nombres=== |
|||
;Errors (calcular en apertium-fra-oci/dev) |
|||
<pre> |
|||
$ bash dev/testvoc/generation.sh fra-oci | wc -l # en apertium-oci-fra |
|||
$ bash dev/testvoc/generation.sh oci-fra | wc -l # en apertium-oci-fra |
|||
</pre> |
|||
;Bidix (calcular en apertium-oci-fra) |
|||
<pre> |
|||
$ cat apertium-oci-fra.oci-fra.dix | grep '<l' | grep -v '¨np"' | wc -l |
|||
</pre> |
|||
;Cobertura (calcular en apertium-oci-fra) |
|||
<pre> |
|||
$ cat ../apertium-fra/corpus/corpus_fra_wp100000.txt | apertium -d . fra-oci-morph | sed 's/\$\W*\^/$\n^/g' > /tmp/fra-oci.coverage.txt |
|||
$ calc `cat /tmp/fra-oci.coverage.txt | grep -v '\*' | wc -l `/`cat /tmp/fra-oci.coverage.txt | wc -l` |
|||
$ cat ../apertium-cat/corpus/corpus_oci_wp100000.txt | apertium -d . oci-fra-morph | sed 's/\$\W*\^/$\n^/g' > /tmp/oci-fra.coverage.txt |
|||
$ calc `cat /tmp/oci-fra.coverage.txt | grep -v '\*' | wc -l `/`cat /tmp/oci-fra.coverage.txt | wc -l` |
|||
</pre> |
Revision as of 19:50, 25 March 2018
Possible Mentor: Hèctor Alòs
Contents
- 1 Skills and experience
- 2 Why is it you are interested in machine translation?
- 3 Why is it that you are interested in Apertium?
- 4 Which of the published tasks are you interested in? What do you plan to do?
- 5 Reasons why Google and Apertium should sponsor it
- 6 A description of how and who it will benefit in society
- 7 Work plan
Skills and experience
Why is it you are interested in machine translation?
Why is it that you are interested in Apertium?
Which of the published tasks are you interested in? What do you plan to do?
Reasons why Google and Apertium should sponsor it
A description of how and who it will benefit in society
Work plan
- Note: The French → Occitan part of the project is the main direction.
- Nòta: La part francés → occitan del projècte es la direccion principala.
- Note : La partie français → occitan du projet est la direction principale.
Setmana | Datas | Descripcion | Bidix (sens np) previst |
(%) Cobertura prevista |
(%) WER previst |
Testvoc | Avaloracion | Bidix real |
(%) Cobertura reala |
(%) WER | Err. | Fet? |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | français > occitan | ~5,700 | ||||||||||
1 | 14 mai—20 mai | Improving Occitan monodix Adding prn, pr, cnj*, basic adv to bidix |
~6,000 | ~84,0% | ||||||||
2 | 21 mai—27 mai | Adding n, adj, adv to the bidix from the French Wictionary | ~12,000 | ~86,0% | ||||||||
3 | 28 mai—3 junh | Adding vblex to the bidix from the French Wictionary Beginning to add missing words in decreasing order of frequency fra > oci |
~14,000 | ~88.0% | ||||||||
4 | 4 junh—10 junh | Adding words Transfer rules fra > oci |
~16,000 | ~89.0% | ||||||||
5 | 11 junh—15 junh Deliverable #1: French to Occitan translator |
Adding words Transfer rules fra > oci |
~18,000 | ~89.5% | ~25% | |||||||
6 | 18 junh—24 junh | Adding words Transfer rules fra > oci Begin testvoc fra > oci |
~20,000 | ~90.0% | pr, cnj*, adv, prn, det | |||||||
7 | 25 junh—1 julhet | Adding words Transfer rules fra > oci Testvoc fra > oci |
~21,000 | ~90.5% | vblex | |||||||
8 | 2 julhet—8 julhet | Adding words Transfer rules fra > oci Testvoc fra > oci |
~22,000 | ~91.0% | adj | |||||||
9 | 9 julhet—13 julhet Deliverable #2: French to Occitan translator |
Transfer rules fra > oci Testvoc fra > oci |
~22,000 | ~91.0% | ~15% | n | ||||||
0 | occitan > français | ~22,000 | ||||||||||
10 | 16 julhet—22 julhet | Adding missing words in decreasing order of frequency oci > fra Transfer rules oci > fra Testvoc oci > fra |
~22,500 | ~88.0% | pr, cnj*, adv, prn, det | |||||||
13 | 23 julhet—29 julhet | Adding words Transfer rules oci > fra Testvoc oci > fra |
~23,000 | ~89.0% | n, adj | |||||||
11 | 30 julhet—5 agost | Adding words Transfer rules oci > fra Testvoc oci > fra |
~23,500 | ~90.0% | vblex | |||||||
12* | 6 agost—9 agost | Final improvements | ||||||||||
12** | 10 agost—14 agost Deliverable #3: Occitan to French translator |
Final evalution | ~23,500 | ~90.0% | ~30% |
Per calcular los nombres
- Errors (calcular en apertium-fra-oci/dev)
$ bash dev/testvoc/generation.sh fra-oci | wc -l # en apertium-oci-fra $ bash dev/testvoc/generation.sh oci-fra | wc -l # en apertium-oci-fra
- Bidix (calcular en apertium-oci-fra)
$ cat apertium-oci-fra.oci-fra.dix | grep '<l' | grep -v '¨np"' | wc -l
- Cobertura (calcular en apertium-oci-fra)
$ cat ../apertium-fra/corpus/corpus_fra_wp100000.txt | apertium -d . fra-oci-morph | sed 's/\$\W*\^/$\n^/g' > /tmp/fra-oci.coverage.txt $ calc `cat /tmp/fra-oci.coverage.txt | grep -v '\*' | wc -l `/`cat /tmp/fra-oci.coverage.txt | wc -l` $ cat ../apertium-cat/corpus/corpus_oci_wp100000.txt | apertium -d . oci-fra-morph | sed 's/\$\W*\^/$\n^/g' > /tmp/oci-fra.coverage.txt $ calc `cat /tmp/oci-fra.coverage.txt | grep -v '\*' | wc -l `/`cat /tmp/oci-fra.coverage.txt | wc -l`