Difference between revisions of "User:Capsot/GSOC 2018 Data"
Jump to navigation
Jump to search
(Created page with "{|class="wikitable" | ||'''Disambiguation Rules''' || '''Lexical Selection Rules''' || '''Transfer Rules''' || '''Bidix''' || '''Coverage''' || '''WER''' |- |fra-oci|| 678 || ...") |
m |
||
Line 1: | Line 1: | ||
+ | == Statistics == |
||
{|class="wikitable" |
{|class="wikitable" |
||
− | | |
+ | | || '''Bidix''' || '''Coverage''' || '''WER''' |
|- |
|- |
||
− | |fra-oci |
+ | |fra-oci || 41,000 (27,000 without family names) || 92.3% || 10% |
|- |
|- |
||
− | |oci-fra |
+ | |oci-fra || 41,000 (27,000 without family names || 92.9% || Not calculated (due to non-functional morphological disambiguator) |
|} |
|} |
||
+ | |||
+ | == Rules == |
||
+ | |||
+ | {|class="wikitable" |
||
+ | | ||'''Disambiguation Rules''' || '''Lexical Selection Rules''' || '''Transfer Rules''' |
||
+ | |- |
||
+ | |fra-oci|| 678 || 855 (85 without counting the selection rules for anthroponyms) || t1x: 130, t2x: 14, t3x: 1, t4x: 5 (for punctuation) |
||
+ | |- |
||
+ | |oci-fra|| 87 || 928 (151 without counting the selection rules for anthroponyms) || t1x: 51, t2ax: 21 (mostly for the insertion or not of the subject pronoun and agreement between subject and attribute), t2bx: 4 (inclusion or not of adverb "ne" in negation), t2cx: 1 (for the partitive article after verb), t3x: 2, t4x: 5 (for punctuation) |
||
+ | |} |
||
+ | PS: (oci-fra) A corpus of 14,000 words has been manually disambiguated for getting a morphological disambuagator, but we couldn't get a working prob file. |
Revision as of 22:37, 14 August 2018
Statistics
Bidix | Coverage | WER | |
fra-oci | 41,000 (27,000 without family names) | 92.3% | 10% |
oci-fra | 41,000 (27,000 without family names | 92.9% | Not calculated (due to non-functional morphological disambiguator) |
Rules
Disambiguation Rules | Lexical Selection Rules | Transfer Rules | |
fra-oci | 678 | 855 (85 without counting the selection rules for anthroponyms) | t1x: 130, t2x: 14, t3x: 1, t4x: 5 (for punctuation) |
oci-fra | 87 | 928 (151 without counting the selection rules for anthroponyms) | t1x: 51, t2ax: 21 (mostly for the insertion or not of the subject pronoun and agreement between subject and attribute), t2bx: 4 (inclusion or not of adverb "ne" in negation), t2cx: 1 (for the partitive article after verb), t3x: 2, t4x: 5 (for punctuation) |
PS: (oci-fra) A corpus of 14,000 words has been manually disambiguated for getting a morphological disambuagator, but we couldn't get a working prob file.