Talk:Infrastructure discussion

From Apertium
Jump to navigation Jump to search

This is how things happen at Tromsø

-bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
Dan Dan+N+Prop+Mal+Sg+Attr
Dan Dan+N+Prop+Mal+Sg+Acc
Dan Dan+N+Prop+Mal+Sg+Gen
Dan Dan+N+Prop+Mal+Sg+Nom
Dan dat+Pron+Dem+Sg+Acc
Dan dat+Pron+Dem+Sg+Gen
Dan dat+Pron+Pers+Sg3+Acc
Dan dat+Pron+Pers+Sg3+Gen
Dan D+N+ACR+Ess

mun mun+Pron+Pers+Sg1+Nom

lean leat+V+IV+Ind+Prs+Sg1
lean leat+V+IV+PrfPrc

dahkan dahkat+V+TV+Actio+Acc
dahkan dahkat+V+TV+Actio+Gen
dahkan dahkat+V+TV+Actio+Nom
dahkan dahkat+V+TV+PrfPrc
dahkan dahkat+V+TV+Der3+Der/n+N+Sg+Gen
dahkan dahkat+V+TV+Der3+Der/n+N+Sg+Nom

. .+CLB

(18:27:25) ttrosterud: -bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg
"<Dan>"
        "dat" Pron Pers Sg3 Acc
        "dat" Pron Dem Sg Acc
        "Dan" N Prop Mal Sg Gen
        "Dan" N Prop Mal Sg Attr
        "dat" Pron Pers Sg3 Gen
        "D" N ACR Ess
        "Dan" N Prop Mal Sg Acc
        "dat" Pron Dem Sg Gen
        "Dan" N Prop Mal Sg Nom
"<mun>"
        "mun" Pron Pers Sg1 Nom
"<lean>"
        "leat" V IV Ind Prs Sg1
        "leat" V IV PrfPrc
"<dahkan>"
        "dahkat" V TV Actio Gen
        "dahkat" V TV PrfPrc
        "dahkat" V* TV Der3 Der/n N Sg Nom
        "dahkat" V TV Actio Acc
        "dahkat" V* TV Der3 Der/n N Sg Gen
        "dahkat" V TV Actio Nom
"<.>"
        "." CLB

-bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg
"<Dan>"
        "dat" Pron Pers Sg3 Acc
        "dat" Pron Dem Sg Acc
        "Dan" N Prop Mal Sg Gen
        "Dan" N Prop Mal Sg Attr
        "dat" Pron Pers Sg3 Gen
        "D" N ACR Ess
        "Dan" N Prop Mal Sg Acc
        "dat" Pron Dem Sg Gen
        "Dan" N Prop Mal Sg Nom
"<mun>"
        "mun" Pron Pers Sg1 Nom
"<lean>"
        "leat" V IV Ind Prs Sg1
        "leat" V IV PrfPrc
"<dahkan>"
        "dahkat" V TV Actio Gen
        "dahkat" V TV PrfPrc
        "dahkat" V* TV Der3 Der/n N Sg Nom
        "dahkat" V TV Actio Acc
        "dahkat" V* TV Der3 Der/n N Sg Gen
        "dahkat" V TV Actio Nom
"<.>"
        "." CLB

-bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg | vislcg3 -g gt/sme/src/sme-dis.rle 
VISL CG-3 Disambiguator version 0.9.3.3362
Codepage: default UTF-8, input UTF-8, output UTF-8, grammar UTF-8
Parsing grammar took 0.657588 seconds.
Grammar has 27 sections, 3284 rules, 3658 sets, 8514 tags.
25 rules cannot be skipped by index.
"<Dan>"
        "dat" Pron Pers Sg3 Acc @OBJ 
"<mun>"
        "mun" Pron Pers Sg1 Nom @SUBJ 
"<lean>"
        "leat" V IV Ind Prs Sg1 @+FAUXV 
"<dahkan>"
        "dahkat" V TV PrfPrc @-FMAINV 
"<.>"
        "." CLB 

-bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg | vislcg3 -g gt/sme/src/sme-dis.rle | vislcg3 -g gt/sme/src/sme-dep.rle
VISL CG-3 Disambiguator version 0.9.3.3362
Codepage: default UTF-8, input UTF-8, output UTF-8, grammar UTF-8
VISL CG-3 Disambiguator version 0.9.3.3362
Codepage: default UTF-8, input UTF-8, output UTF-8, grammar UTF-8
Parsing grammar took 0.088736 seconds.      
Grammar has 2 sections, 57 rules, 835 sets, 7955 tags.
Grammar has dependency rules.
Parsing grammar took 0.65936 seconds.
Grammar has 27 sections, 3284 rules, 3658 sets, 8514 tags.
25 rules cannot be skipped by index.
"<Dan>"
        "dat" Pron Pers Sg3 Acc @OBJ #1->4 
"<mun>"
        "mun" Pron Pers Sg1 Nom @SUBJ #2->3 
"<lean>"
        "leat" <aux> V IV Ind Prs Sg1 @FS-STA #3->0 
"<dahkan>"
        "dahkat" <mv> V TV PrfPrc @ICL-AUX< #4->3 
"<.>"
        "." CLB #5->0 

Multiwords in xfst[edit]

(09:56:30) spectre: quick question... how do you deal with multiword units in xfst? (e.g. lemmas where there is a space in the middle "United Kingdom<PN>"
(09:58:04) ttrosterud: two ways
(09:58:15) ttrosterud: the preprocessor must know them
(09:58:17) ttrosterud: so:
(09:58:22) ttrosterud: New% York
(09:58:29) ttrosterud: where % literalizes the space
(09:58:39) ttrosterud: sorry that was in xfst
(09:58:47) spectre: ok
(09:58:49) ttrosterud: in the preprocessor it myst be given as
(09:59:00) ttrosterud: I
live
in
New York
(09:59:09) ttrosterud: then in xfst (or rather in lexc)
(09:59:11) ttrosterud: I write
(09:59:30) ttrosterud: New% York namelex ;
London namelex ;
(09:59:31) ttrosterud: etc