Difference between revisions of "User:Capsot/Proposal oci-fra/fra-oci Translator"

From Apertium
Jump to navigation Jump to search
m
Line 18: Line 18:
   
 
== Work plan ==
 
== Work plan ==
* [[Occitan and French/Work plan | See Work plan]]
+
*Note: The French → Occitan part of the project is the main direction.
  +
*<small>Nòta: La part francés → occitan del projècte es la direccion principala.</small>
  +
*<small>Note : La partie français → occitan du projet est la direction principale.</small>
  +
  +
{|class=wikitable
  +
! Setmana !! Datas !! Descripcion !! Bidix<br/>(sens np)<br/>previst !!(%) Cobertura<br/>prevista !! (%) WER<br/>previst !! Testvoc !! Avaloracion !! Bidix<br/>real !! (%) Cobertura<br/>reala !! (%) WER !! Err. !! Fet?
  +
|-
  +
| 0 || <b>français > occitan</b> || || ~5,700 || || || || || || || || ||
  +
|-
  +
| 1 || 14 mai&mdash;20 mai || Improving Occitan monodix<br/>Adding prn, pr, cnj*, basic adv to bidix || ~6,000 || ~84,0% || || || || || || || ||
  +
|-
  +
| 2 || 21 mai&mdash;27 mai || Adding n, adj, adv to the bidix from the French Wictionary || ~12,000 || ~86,0% || || || || || || || ||
  +
|-
  +
| 3 || 28 mai&mdash;3 junh || Adding vblex to the bidix from the French Wictionary<br/>Beginning to add missing words in decreasing order of frequency fra > oci || ~14,000 || ~88.0% || || || || || || || ||
  +
|-
  +
| 4 || 4 junh&mdash;10 junh || Adding words<br/>Transfer rules fra > oci || ~16,000 || ~89.0% || || || || || || || ||
  +
|-
  +
| 5 || <b>11 junh&mdash;15 junh<br>Deliverable #1: French to Occitan translator</b> || Adding words<br/>Transfer rules fra > oci || <b>~18,000</b> || <b>~89.5%</b> || <b>~25%</b> || || || || || || ||
  +
|-
  +
| 6 || 18 junh&mdash;24 junh || Adding words<br/>Transfer rules fra > oci<br/>Begin testvoc fra > oci || ~20,000 || ~90.0% || || pr, cnj*, adv, prn, det || || || || || ||
  +
|-
  +
| 7 || 25 junh&mdash;1 julhet || Adding words<br/>Transfer rules fra > oci<br/>Testvoc fra > oci || ~21,000 || ~90.5% || || vblex || || || || || ||
  +
|-
  +
| 8 || 2 julhet&mdash;8 julhet || Adding words<br/>Transfer rules fra > oci<br/>Testvoc fra > oci || ~22,000 || ~91.0% || || adj || || || || || ||
  +
|-
  +
| 9 || <b>9 julhet&mdash;13 julhet<br>Deliverable #2: French to Occitan translator</b> || Transfer rules fra > oci<br/>Testvoc fra > oci || <b>~22,000</b> || <b>~91.0%</b> || <b>~15%</b> || n || || || || || ||
  +
|-
  +
| 0 || <b>occitan > français</b> || || ~22,000 || || || || || || || || ||
  +
|-
  +
| 10 || 16 julhet&mdash;22 julhet || Adding missing words in decreasing order of frequency oci > fra<br/>Transfer rules oci > fra<br/>Testvoc oci > fra || ~22,500 || ~88.0% || || pr, cnj*, adv, prn, det || || || || || ||
  +
|-
  +
| 13 || 23 julhet&mdash;29 julhet || Adding words<br/>Transfer rules oci > fra <br/>Testvoc oci > fra || ~23,000 || ~89.0% || || n, adj || || || || || ||
  +
|-
  +
| 11 || 30 julhet&mdash;5 agost || Adding words<br/>Transfer rules oci > fra <br/>Testvoc oci > fra || ~23,500 || ~90.0% || || vblex || || || || || ||
  +
|-
  +
| 12* || 6 agost&mdash;9 agost || Final improvements || || || || || || || || || ||
  +
|-
  +
| 12** || <b>10 agost&mdash;14 agost<br>Deliverable #3: Occitan to French translator</b> || Final evalution|| <b>~23,500</b> || <b>~90.0%</b> || <b>~30%</b> || || || || || || ||
  +
|-
  +
|}
  +
  +
===Per calcular los nombres===
  +
  +
;Errors (calcular en apertium-fra-oci/dev)
  +
  +
<pre>
  +
$ bash dev/testvoc/generation.sh fra-oci | wc -l # en apertium-oci-fra
  +
$ bash dev/testvoc/generation.sh oci-fra | wc -l # en apertium-oci-fra
  +
</pre>
  +
  +
;Bidix (calcular en apertium-oci-fra)
  +
  +
<pre>
  +
$ cat apertium-oci-fra.oci-fra.dix | grep '<l' | grep -v '¨np"' | wc -l
  +
</pre>
  +
  +
;Cobertura (calcular en apertium-oci-fra)
  +
  +
<pre>
  +
$ cat ../apertium-fra/corpus/corpus_fra_wp100000.txt | apertium -d . fra-oci-morph | sed 's/\$\W*\^/$\n^/g' > /tmp/fra-oci.coverage.txt
  +
$ calc `cat /tmp/fra-oci.coverage.txt | grep -v '\*' | wc -l `/`cat /tmp/fra-oci.coverage.txt | wc -l`
  +
  +
$ cat ../apertium-cat/corpus/corpus_oci_wp100000.txt | apertium -d . oci-fra-morph | sed 's/\$\W*\^/$\n^/g' > /tmp/oci-fra.coverage.txt
  +
$ calc `cat /tmp/oci-fra.coverage.txt | grep -v '\*' | wc -l `/`cat /tmp/oci-fra.coverage.txt | wc -l`
  +
</pre>

Revision as of 19:50, 25 March 2018

Possible Mentor: Hèctor Alòs

Skills and experience

Why is it you are interested in machine translation?

Why is it that you are interested in Apertium?

Which of the published tasks are you interested in? What do you plan to do?

Reasons why Google and Apertium should sponsor it

A description of how and who it will benefit in society

Work plan

  • Note: The French → Occitan part of the project is the main direction.
  • Nòta: La part francés → occitan del projècte es la direccion principala.
  • Note : La partie français → occitan du projet est la direction principale.
Setmana Datas Descripcion Bidix
(sens np)
previst
(%) Cobertura
prevista
(%) WER
previst
Testvoc Avaloracion Bidix
real
(%) Cobertura
reala
(%) WER Err. Fet?
0 français > occitan ~5,700
1 14 mai—20 mai Improving Occitan monodix
Adding prn, pr, cnj*, basic adv to bidix
~6,000 ~84,0%
2 21 mai—27 mai Adding n, adj, adv to the bidix from the French Wictionary ~12,000 ~86,0%
3 28 mai—3 junh Adding vblex to the bidix from the French Wictionary
Beginning to add missing words in decreasing order of frequency fra > oci
~14,000 ~88.0%
4 4 junh—10 junh Adding words
Transfer rules fra > oci
~16,000 ~89.0%
5 11 junh—15 junh
Deliverable #1: French to Occitan translator
Adding words
Transfer rules fra > oci
~18,000 ~89.5% ~25%
6 18 junh—24 junh Adding words
Transfer rules fra > oci
Begin testvoc fra > oci
~20,000 ~90.0% pr, cnj*, adv, prn, det
7 25 junh—1 julhet Adding words
Transfer rules fra > oci
Testvoc fra > oci
~21,000 ~90.5% vblex
8 2 julhet—8 julhet Adding words
Transfer rules fra > oci
Testvoc fra > oci
~22,000 ~91.0% adj
9 9 julhet—13 julhet
Deliverable #2: French to Occitan translator
Transfer rules fra > oci
Testvoc fra > oci
~22,000 ~91.0% ~15% n
0 occitan > français ~22,000
10 16 julhet—22 julhet Adding missing words in decreasing order of frequency oci > fra
Transfer rules oci > fra
Testvoc oci > fra
~22,500 ~88.0% pr, cnj*, adv, prn, det
13 23 julhet—29 julhet Adding words
Transfer rules oci > fra
Testvoc oci > fra
~23,000 ~89.0% n, adj
11 30 julhet—5 agost Adding words
Transfer rules oci > fra
Testvoc oci > fra
~23,500 ~90.0% vblex
12* 6 agost—9 agost Final improvements
12** 10 agost—14 agost
Deliverable #3: Occitan to French translator
Final evalution ~23,500 ~90.0% ~30%

Per calcular los nombres

Errors (calcular en apertium-fra-oci/dev)
$ bash dev/testvoc/generation.sh fra-oci | wc -l  # en apertium-oci-fra
$ bash dev/testvoc/generation.sh oci-fra | wc -l  # en apertium-oci-fra
Bidix (calcular en apertium-oci-fra)
$ cat apertium-oci-fra.oci-fra.dix | grep '<l'  | grep -v '¨np"' | wc -l
Cobertura (calcular en apertium-oci-fra)
$ cat ../apertium-fra/corpus/corpus_fra_wp100000.txt | apertium -d . fra-oci-morph | sed 's/\$\W*\^/$\n^/g' > /tmp/fra-oci.coverage.txt
$ calc `cat /tmp/fra-oci.coverage.txt | grep -v '\*' | wc -l `/`cat /tmp/fra-oci.coverage.txt | wc -l`

$ cat ../apertium-cat/corpus/corpus_oci_wp100000.txt | apertium -d . oci-fra-morph | sed 's/\$\W*\^/$\n^/g' > /tmp/oci-fra.coverage.txt
$ calc `cat /tmp/oci-fra.coverage.txt | grep -v '\*' | wc -l `/`cat /tmp/oci-fra.coverage.txt | wc -l`