Difference between revisions of "User:Capsot/GSOC 2018 Data"
Jump to navigation
Jump to search
m |
Hectoralos (talk | contribs) m |
||
(2 intermediate revisions by 2 users not shown) | |||
Line 13: | Line 13: | ||
| ||'''Disambiguation Rules''' || '''Lexical Selection Rules''' || '''Transfer Rules''' |
| ||'''Disambiguation Rules''' || '''Lexical Selection Rules''' || '''Transfer Rules''' |
||
|- |
|- |
||
|fra-oci|| 678 || 855 (85 without counting the selection rules for anthroponyms) || t1x: 130 |
|fra-oci|| 678 || 855 (85 without counting the selection rules for anthroponyms) || t1x: 130<br/>t2x: 14<br/>t3x: 1<br/>t4x: 5 (for punctuation) |
||
|- |
|- |
||
|oci-fra|| 87 || 928 (151 without counting the selection rules for anthroponyms) || t1x: 51 |
|oci-fra|| 87 || 928 (151 without counting the selection rules for anthroponyms) || t1x: 51<br/>t2ax: 21 (mostly for the insertion or not of the subject pronoun and agreement between subject and attribute)<br/>t2bx: 4 (inclusion or not of adverb "ne" in negation)<br/>t2cx: 1 (for the partitive article after verb)<br/>t3x: 2<br/>t4x: 5 (for punctuation) |
||
|} |
|} |
||
PS: (oci-fra) A corpus of 14,000 words has been manually disambiguated for getting a morphological disambuagator, but we couldn't get a working prob file. |
* ''PS: (oci-fra) A corpus of 14,000 words has been manually disambiguated for getting a morphological disambuagator, but we couldn't get a working prob file.'' |
||
[[Category:Occitan e francés]] |
|||
[[Category:Occitan and French]] |
Latest revision as of 13:11, 18 August 2018
Statistics[edit]
Bidix | Coverage | WER | |
fra-oci | 41,000 (27,000 without family names) | 92.3% | 10% |
oci-fra | 41,000 (27,000 without family names | 92.9% | Not calculated (due to non-functional morphological disambiguator) |
Rules[edit]
Disambiguation Rules | Lexical Selection Rules | Transfer Rules | |
fra-oci | 678 | 855 (85 without counting the selection rules for anthroponyms) | t1x: 130 t2x: 14 t3x: 1 t4x: 5 (for punctuation) |
oci-fra | 87 | 928 (151 without counting the selection rules for anthroponyms) | t1x: 51 t2ax: 21 (mostly for the insertion or not of the subject pronoun and agreement between subject and attribute) t2bx: 4 (inclusion or not of adverb "ne" in negation) t2cx: 1 (for the partitive article after verb) t3x: 2 t4x: 5 (for punctuation) |
- PS: (oci-fra) A corpus of 14,000 words has been manually disambiguated for getting a morphological disambuagator, but we couldn't get a working prob file.