Difference between revisions of "User:Capsot/GSOC 2018 Data"

From Apertium
Jump to navigation Jump to search
m
 
Line 18: Line 18:
|}
|}
* ''PS: (oci-fra) A corpus of 14,000 words has been manually disambiguated for getting a morphological disambuagator, but we couldn't get a working prob file.''
* ''PS: (oci-fra) A corpus of 14,000 words has been manually disambiguated for getting a morphological disambuagator, but we couldn't get a working prob file.''

[[Category:Occitan e francés]]
[[Category:Occitan and French]]

Latest revision as of 13:11, 18 August 2018

Statistics[edit]

Bidix Coverage WER
fra-oci 41,000 (27,000 without family names) 92.3% 10%
oci-fra 41,000 (27,000 without family names 92.9% Not calculated (due to non-functional morphological disambiguator)

Rules[edit]

Disambiguation Rules Lexical Selection Rules Transfer Rules
fra-oci 678 855 (85 without counting the selection rules for anthroponyms) t1x: 130
t2x: 14
t3x: 1
t4x: 5 (for punctuation)
oci-fra 87 928 (151 without counting the selection rules for anthroponyms) t1x: 51
t2ax: 21 (mostly for the insertion or not of the subject pronoun and agreement between subject and attribute)
t2bx: 4 (inclusion or not of adverb "ne" in negation)
t2cx: 1 (for the partitive article after verb)
t3x: 2
t4x: 5 (for punctuation)
  • PS: (oci-fra) A corpus of 14,000 words has been manually disambiguated for getting a morphological disambuagator, but we couldn't get a working prob file.