Talk:Infrastructure discussion
Jump to navigation
Jump to search
This is how things happen at Tromsø
-bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
Dan Dan+N+Prop+Mal+Sg+Attr
Dan Dan+N+Prop+Mal+Sg+Acc
Dan Dan+N+Prop+Mal+Sg+Gen
Dan Dan+N+Prop+Mal+Sg+Nom
Dan dat+Pron+Dem+Sg+Acc
Dan dat+Pron+Dem+Sg+Gen
Dan dat+Pron+Pers+Sg3+Acc
Dan dat+Pron+Pers+Sg3+Gen
Dan D+N+ACR+Ess
mun mun+Pron+Pers+Sg1+Nom
lean leat+V+IV+Ind+Prs+Sg1
lean leat+V+IV+PrfPrc
dahkan dahkat+V+TV+Actio+Acc
dahkan dahkat+V+TV+Actio+Gen
dahkan dahkat+V+TV+Actio+Nom
dahkan dahkat+V+TV+PrfPrc
dahkan dahkat+V+TV+Der3+Der/n+N+Sg+Gen
dahkan dahkat+V+TV+Der3+Der/n+N+Sg+Nom
. .+CLB
(18:27:25) ttrosterud: -bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg
"<Dan>"
"dat" Pron Pers Sg3 Acc
"dat" Pron Dem Sg Acc
"Dan" N Prop Mal Sg Gen
"Dan" N Prop Mal Sg Attr
"dat" Pron Pers Sg3 Gen
"D" N ACR Ess
"Dan" N Prop Mal Sg Acc
"dat" Pron Dem Sg Gen
"Dan" N Prop Mal Sg Nom
"<mun>"
"mun" Pron Pers Sg1 Nom
"<lean>"
"leat" V IV Ind Prs Sg1
"leat" V IV PrfPrc
"<dahkan>"
"dahkat" V TV Actio Gen
"dahkat" V TV PrfPrc
"dahkat" V* TV Der3 Der/n N Sg Nom
"dahkat" V TV Actio Acc
"dahkat" V* TV Der3 Der/n N Sg Gen
"dahkat" V TV Actio Nom
"<.>"
"." CLB
-bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg
"<Dan>"
"dat" Pron Pers Sg3 Acc
"dat" Pron Dem Sg Acc
"Dan" N Prop Mal Sg Gen
"Dan" N Prop Mal Sg Attr
"dat" Pron Pers Sg3 Gen
"D" N ACR Ess
"Dan" N Prop Mal Sg Acc
"dat" Pron Dem Sg Gen
"Dan" N Prop Mal Sg Nom
"<mun>"
"mun" Pron Pers Sg1 Nom
"<lean>"
"leat" V IV Ind Prs Sg1
"leat" V IV PrfPrc
"<dahkan>"
"dahkat" V TV Actio Gen
"dahkat" V TV PrfPrc
"dahkat" V* TV Der3 Der/n N Sg Nom
"dahkat" V TV Actio Acc
"dahkat" V* TV Der3 Der/n N Sg Gen
"dahkat" V TV Actio Nom
"<.>"
"." CLB
-bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg | vislcg3 -g gt/sme/src/sme-dis.rle
VISL CG-3 Disambiguator version 0.9.3.3362
Codepage: default UTF-8, input UTF-8, output UTF-8, grammar UTF-8
Parsing grammar took 0.657588 seconds.
Grammar has 27 sections, 3284 rules, 3658 sets, 8514 tags.
25 rules cannot be skipped by index.
"<Dan>"
"dat" Pron Pers Sg3 Acc @OBJ
"<mun>"
"mun" Pron Pers Sg1 Nom @SUBJ
"<lean>"
"leat" V IV Ind Prs Sg1 @+FAUXV
"<dahkan>"
"dahkat" V TV PrfPrc @-FMAINV
"<.>"
"." CLB
-bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg | vislcg3 -g gt/sme/src/sme-dis.rle | vislcg3 -g gt/sme/src/sme-dep.rle
VISL CG-3 Disambiguator version 0.9.3.3362
Codepage: default UTF-8, input UTF-8, output UTF-8, grammar UTF-8
VISL CG-3 Disambiguator version 0.9.3.3362
Codepage: default UTF-8, input UTF-8, output UTF-8, grammar UTF-8
Parsing grammar took 0.088736 seconds.
Grammar has 2 sections, 57 rules, 835 sets, 7955 tags.
Grammar has dependency rules.
Parsing grammar took 0.65936 seconds.
Grammar has 27 sections, 3284 rules, 3658 sets, 8514 tags.
25 rules cannot be skipped by index.
"<Dan>"
"dat" Pron Pers Sg3 Acc @OBJ #1->4
"<mun>"
"mun" Pron Pers Sg1 Nom @SUBJ #2->3
"<lean>"
"leat" <aux> V IV Ind Prs Sg1 @FS-STA #3->0
"<dahkan>"
"dahkat" <mv> V TV PrfPrc @ICL-AUX< #4->3
"<.>"
"." CLB #5->0
Multiwords in xfst
(09:56:30) spectre: quick question... how do you deal with multiword units in xfst? (e.g. lemmas where there is a space in the middle "United Kingdom<PN>" (09:58:04) ttrosterud: two ways (09:58:15) ttrosterud: the preprocessor must know them (09:58:17) ttrosterud: so: (09:58:22) ttrosterud: New% York (09:58:29) ttrosterud: where % literalizes the space (09:58:39) ttrosterud: sorry that was in xfst (09:58:47) spectre: ok (09:58:49) ttrosterud: in the preprocessor it myst be given as (09:59:00) ttrosterud: I live in New York (09:59:09) ttrosterud: then in xfst (or rather in lexc) (09:59:11) ttrosterud: I write (09:59:30) ttrosterud: New% York namelex ; London namelex ; (09:59:31) ttrosterud: etc