Difference between revisions of "Romanian and Catalan/Workplan"

From Apertium
Jump to navigation Jump to search
 
(13 intermediate revisions by the same user not shown)
Line 5: Line 5:
! style="width: 15%" | Dates
! style="width: 15%" | Dates
! style="width: 45%" | Work done
! style="width: 45%" | Work done
! style="width: 5%" | Bidix
! style="width: 6%" | Bidix
! style="width: 15%" | WER / PER
! style="width: 15%" | WER / PER
! style="width: 10%" | Coverage
! style="width: 9%" | Coverage
|-
|-
! Post-application period
! Post-application period
Line 37: Line 37:
* Improve freqlist generation script (explains the leap in coverage)
* Improve freqlist generation script (explains the leap in coverage)
| style="text-align:center" | 17,026 (~17,000)
| style="text-align:center" | 17,026 (~17,000)
| style="text-align:center" | ron > cat (~32%)<br>cat > ron (~59%)
| style="text-align:center" | ron > cat ~31% (~32%)<br>cat > ron ~53% (~59%)
| style="text-align:center" | ron 83.6% (81.1%)<br>cat 86.2% (80%)
| style="text-align:center" | ron 83.6% (81.1%)<br>cat 86.2% (80%)
|-
|-
Line 43: Line 43:
| style="text-align:center" | 28 May - 3 June
| style="text-align:center" | 28 May - 3 June
|
|
* Add new entries to dictionaries
| style="text-align:center" | (~19,000)
* Fix broken bidix entries
| style="text-align:center" | ron > cat (~30%)<br>cat > ron (~58%)
* Plan transfer rule changes
| style="text-align:center" | ron (81.9%)<br>cat (80.9%)
| style="text-align:center" | 18,487 (~19,000)
| style="text-align:center" | ron > cat ~31% (~30%)<br>cat > ron ~53% (~58%)
| style="text-align:center" | ron 84.9% (81.9%)<br>cat 86.4% (80.9%)
|-
|-
! 4
! 4
| style="text-align:center" | 4 June - 10 June
| style="text-align:center" | 4 June - 10 June
|
|
* Add new entries to dictionaries
| style="text-align:center" | (~21,000)
* Fix broken bidix entries
| style="text-align:center" | ron > cat (~28%)<br>cat > ron (~57%)
* Upgrade transfer rules and write rules for new patterns
| style="text-align:center" | ron (82.7%)<br>cat (81.7%)
* Add CG to Romanian to improve disambiguation
| style="text-align:center" | 20,324 (~21,000)
| style="text-align:center" | ron > cat ~31% (~28%)<br>cat > ron ~53% (~57%)
| style="text-align:center" | ron 85.4% (82.7%)<br>cat 86.8% (81.7%)
|-
|-
! 5
! 5
| style="text-align:center" | 11 June - 17 June
| style="text-align:center" | 11 June - 17 June
|
|
* Add new entries to dictionaries
* Fix broken bidix entries
* Upgrade transfer rules and write rules for new patterns
* Add more Romanian CG rules
* Improve evaluation of test texts with diff files
'''First evaluation'''
'''First evaluation'''
| style="text-align:center" | (~23,000)
| style="text-align:center" | 22,381 (~23,000)
| style="text-align:center" | ron > cat (~26%)<br>cat > ron (~56%)
| style="text-align:center" | ron > cat ~30% (~26%)<br>cat > ron ~53% (~56%)
| style="text-align:center" | ron (83.4%)<br>cat (82.4%)
| style="text-align:center" | ron 86.1% (83.4%)<br>cat 87.4% (82.4%)
|-
|-
! 6
! 6
| style="text-align:center" | 18 June - 24 June
| style="text-align:center" | 18 June - 24 June
|
|
* Add new entries to dictionaries
| style="text-align:center" | (~25,000)
* Fix broken bidix entries
| style="text-align:center" | ron > cat (~25%)<br>cat > ron (~53%)
* Write new transfer rules
| style="text-align:center" | ron (84.1%)<br>cat (83%)
| style="text-align:center" | 22,640 (~25,000)
| style="text-align:center" | ron > cat ~30% (~25%)<br>cat > ron ~51% (~53%)
| style="text-align:center" | ron 86.4% (84.1%)<br>cat 88.2% (83%)
|-
|-
! 7
! 7
| style="text-align:center" | 25 June - 1 July
| style="text-align:center" | 25 June - 1 July
|
|
* Upgrade Indonesian-Malaysian pair to monolingual package system (testvoc clean)
| style="text-align:center" | (~27,000)
* Upgrade Welsh-English pair to monolingual package system
| style="text-align:center" | ron > cat (~24%)<br>cat > ron (~50%)
* Testvoc fixes (Romanian-Catalan and Welsh-English)
| style="text-align:center" | ron (84.7%)<br>cat (83.6%)
! style="text-align:center" |
! style="text-align:center" |
! style="text-align:center" |
|-
|-
! 8
! 8
| style="text-align:center" | 2 July - 8 July
| style="text-align:center" | 2 July - 8 July
|
|
* Testvoc fixes (Romanian-Catalan and Welsh-English)
| style="text-align:center" | (~29,000)
| style="text-align:center" | ron > cat (~23%)<br>cat > ron (~47%)
! style="text-align:center" |
| style="text-align:center" | ron (85.3%)<br>cat (84.2%)
! style="text-align:center" |
! style="text-align:center" |
|-
|-
! 9
! 9
| style="text-align:center" | 9 July - 15 July
| style="text-align:center" | 9 July - 15 July
|
|
* Upgrade Catalan-Italian pair to monolingual package system (testvoc clean)
* Testvoc fixes (Romanian-Catalan)
'''Second evaluation'''
'''Second evaluation'''
| style="text-align:center" | (~31,000)
! style="text-align:center" |
| style="text-align:center" | ron > cat (~22%)<br>cat > ron (~45%)
! style="text-align:center" |
| style="text-align:center" | ron (85.8%)<br>cat (84.7%)
! style="text-align:center" |
|-
|-
! 10
! 10
| style="text-align:center" | 16 July - 22 July
| style="text-align:center" | 16 July - 22 July
|
|
* Upgrade Afrikaans-Dutch pair to monolingual package system (testvoc clean)
* Testvoc fixes (Romanian-Catalan and Welsh-English)
! style="text-align:center" |
! style="text-align:center" |
! style="text-align:center" |
! style="text-align:center" |
Line 101: Line 124:
| style="text-align:center" | 23 July - 29 July
| style="text-align:center" | 23 July - 29 July
|
|
* Add new entries to dictionaries
! style="text-align:center" |
* Fix broken bidix entries
! style="text-align:center" |
* Write new transfer rules
! style="text-align:center" |
* Add Romanian CG rules
| style="text-align:center" | 22,995 (~27,000)
| style="text-align:center" | ron > cat ~29% (~24%)<br>cat > ron ~46% (~50%)
| style="text-align:center" | ron 86.8% (84.7%)<br>cat 88.7% (83.6%)
|-
|-
! 12
! 12
| style="text-align:center" | 30 July - 5 August
| style="text-align:center" | 30 July - 5 August
|
|
* Fix broken bidix entries
! style="text-align:center" |
* Write new transfer rules
! style="text-align:center" |
* Add Romanian CG rules
! style="text-align:center" |
| style="text-align:center" | 23,009 (~29,000)
| style="text-align:center" | ron > cat ~29% (~23%)<br>cat > ron ~46% (~47%)
| style="text-align:center" | ron 86.8% (85.3%)<br>cat 88.7% (84.2%)
|-
|-
! 13
! 13
| style="text-align:center" | 6 August - 14 August
| style="text-align:center" | 6 August - 14 August
|
|
* Fix broken bidix entries
* Write new transfer rules
'''Final evaluation'''
'''Final evaluation'''
! style="text-align:center" |
| style="text-align:center" | 23,015 (~31,000)
! style="text-align:center" |
| style="text-align:center" | ron > cat ~29% (~22%)<br>cat > ron ~46% (~45%)
! style="text-align:center" |
| style="text-align:center" | ron 86.8% (85.8%)<br>cat 88.7% (84.7%)
|}
|}

Latest revision as of 10:56, 14 August 2018

You can find the detailed goals for each week here.

Week Dates Work done Bidix WER / PER Coverage
Post-application period 28 March - 13 May
  • Build frequency lists for Romanian and Catalan
  • Fix broken bidix entries
  • Improve testvoc scripts
12,819 ron > cat (~36%)
cat > ron (~61%)
ron (79%)
cat (78%)
1 14 May - 20 May
  • Add new entries to dictionaries
  • Fix broken bidix entries
  • Fix transfer rules that didn't work as expected
14,890 (~15,000) ron > cat ~33% (~34%)
cat > ron ~56% (~60%)
ron 82.2% (80.1%)
cat 82.7% (79.1%)
2 21 May - 27 May
  • Add new entries to dictionaries
  • Fix broken bidix entries (adj clean)
  • Rewrite ron-cat transfer rules to use chunking
  • Improve freqlist generation script (explains the leap in coverage)
17,026 (~17,000) ron > cat ~31% (~32%)
cat > ron ~53% (~59%)
ron 83.6% (81.1%)
cat 86.2% (80%)
3 28 May - 3 June
  • Add new entries to dictionaries
  • Fix broken bidix entries
  • Plan transfer rule changes
18,487 (~19,000) ron > cat ~31% (~30%)
cat > ron ~53% (~58%)
ron 84.9% (81.9%)
cat 86.4% (80.9%)
4 4 June - 10 June
  • Add new entries to dictionaries
  • Fix broken bidix entries
  • Upgrade transfer rules and write rules for new patterns
  • Add CG to Romanian to improve disambiguation
20,324 (~21,000) ron > cat ~31% (~28%)
cat > ron ~53% (~57%)
ron 85.4% (82.7%)
cat 86.8% (81.7%)
5 11 June - 17 June
  • Add new entries to dictionaries
  • Fix broken bidix entries
  • Upgrade transfer rules and write rules for new patterns
  • Add more Romanian CG rules
  • Improve evaluation of test texts with diff files

First evaluation

22,381 (~23,000) ron > cat ~30% (~26%)
cat > ron ~53% (~56%)
ron 86.1% (83.4%)
cat 87.4% (82.4%)
6 18 June - 24 June
  • Add new entries to dictionaries
  • Fix broken bidix entries
  • Write new transfer rules
22,640 (~25,000) ron > cat ~30% (~25%)
cat > ron ~51% (~53%)
ron 86.4% (84.1%)
cat 88.2% (83%)
7 25 June - 1 July
  • Upgrade Indonesian-Malaysian pair to monolingual package system (testvoc clean)
  • Upgrade Welsh-English pair to monolingual package system
  • Testvoc fixes (Romanian-Catalan and Welsh-English)
8 2 July - 8 July
  • Testvoc fixes (Romanian-Catalan and Welsh-English)
9 9 July - 15 July
  • Upgrade Catalan-Italian pair to monolingual package system (testvoc clean)
  • Testvoc fixes (Romanian-Catalan)

Second evaluation

10 16 July - 22 July
  • Upgrade Afrikaans-Dutch pair to monolingual package system (testvoc clean)
  • Testvoc fixes (Romanian-Catalan and Welsh-English)
11 23 July - 29 July
  • Add new entries to dictionaries
  • Fix broken bidix entries
  • Write new transfer rules
  • Add Romanian CG rules
22,995 (~27,000) ron > cat ~29% (~24%)
cat > ron ~46% (~50%)
ron 86.8% (84.7%)
cat 88.7% (83.6%)
12 30 July - 5 August
  • Fix broken bidix entries
  • Write new transfer rules
  • Add Romanian CG rules
23,009 (~29,000) ron > cat ~29% (~23%)
cat > ron ~46% (~47%)
ron 86.8% (85.3%)
cat 88.7% (84.2%)
13 6 August - 14 August
  • Fix broken bidix entries
  • Write new transfer rules

Final evaluation

23,015 (~31,000) ron > cat ~29% (~22%)
cat > ron ~46% (~45%)
ron 86.8% (85.8%)
cat 88.7% (84.7%)