Talk:Infrastructure discussion
Jump to navigation
Jump to search
This is how things happen at Tromsø
-bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo 0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100% Dan Dan+N+Prop+Mal+Sg+Attr Dan Dan+N+Prop+Mal+Sg+Acc Dan Dan+N+Prop+Mal+Sg+Gen Dan Dan+N+Prop+Mal+Sg+Nom Dan dat+Pron+Dem+Sg+Acc Dan dat+Pron+Dem+Sg+Gen Dan dat+Pron+Pers+Sg3+Acc Dan dat+Pron+Pers+Sg3+Gen Dan D+N+ACR+Ess mun mun+Pron+Pers+Sg1+Nom lean leat+V+IV+Ind+Prs+Sg1 lean leat+V+IV+PrfPrc dahkan dahkat+V+TV+Actio+Acc dahkan dahkat+V+TV+Actio+Gen dahkan dahkat+V+TV+Actio+Nom dahkan dahkat+V+TV+PrfPrc dahkan dahkat+V+TV+Der3+Der/n+N+Sg+Gen dahkan dahkat+V+TV+Der3+Der/n+N+Sg+Nom . .+CLB (18:27:25) ttrosterud: -bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg "<Dan>" "dat" Pron Pers Sg3 Acc "dat" Pron Dem Sg Acc "Dan" N Prop Mal Sg Gen "Dan" N Prop Mal Sg Attr "dat" Pron Pers Sg3 Gen "D" N ACR Ess "Dan" N Prop Mal Sg Acc "dat" Pron Dem Sg Gen "Dan" N Prop Mal Sg Nom "<mun>" "mun" Pron Pers Sg1 Nom "<lean>" "leat" V IV Ind Prs Sg1 "leat" V IV PrfPrc "<dahkan>" "dahkat" V TV Actio Gen "dahkat" V TV PrfPrc "dahkat" V* TV Der3 Der/n N Sg Nom "dahkat" V TV Actio Acc "dahkat" V* TV Der3 Der/n N Sg Gen "dahkat" V TV Actio Nom "<.>" "." CLB -bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg "<Dan>" "dat" Pron Pers Sg3 Acc "dat" Pron Dem Sg Acc "Dan" N Prop Mal Sg Gen "Dan" N Prop Mal Sg Attr "dat" Pron Pers Sg3 Gen "D" N ACR Ess "Dan" N Prop Mal Sg Acc "dat" Pron Dem Sg Gen "Dan" N Prop Mal Sg Nom "<mun>" "mun" Pron Pers Sg1 Nom "<lean>" "leat" V IV Ind Prs Sg1 "leat" V IV PrfPrc "<dahkan>" "dahkat" V TV Actio Gen "dahkat" V TV PrfPrc "dahkat" V* TV Der3 Der/n N Sg Nom "dahkat" V TV Actio Acc "dahkat" V* TV Der3 Der/n N Sg Gen "dahkat" V TV Actio Nom "<.>" "." CLB -bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg | vislcg3 -g gt/sme/src/sme-dis.rle VISL CG-3 Disambiguator version 0.9.3.3362 Codepage: default UTF-8, input UTF-8, output UTF-8, grammar UTF-8 Parsing grammar took 0.657588 seconds. Grammar has 27 sections, 3284 rules, 3658 sets, 8514 tags. 25 rules cannot be skipped by index. "<Dan>" "dat" Pron Pers Sg3 Acc @OBJ "<mun>" "mun" Pron Pers Sg1 Nom @SUBJ "<lean>" "leat" V IV Ind Prs Sg1 @+FAUXV "<dahkan>" "dahkat" V TV PrfPrc @-FMAINV "<.>" "." CLB -bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg | vislcg3 -g gt/sme/src/sme-dis.rle | vislcg3 -g gt/sme/src/sme-dep.rle VISL CG-3 Disambiguator version 0.9.3.3362 Codepage: default UTF-8, input UTF-8, output UTF-8, grammar UTF-8 VISL CG-3 Disambiguator version 0.9.3.3362 Codepage: default UTF-8, input UTF-8, output UTF-8, grammar UTF-8 Parsing grammar took 0.088736 seconds. Grammar has 2 sections, 57 rules, 835 sets, 7955 tags. Grammar has dependency rules. Parsing grammar took 0.65936 seconds. Grammar has 27 sections, 3284 rules, 3658 sets, 8514 tags. 25 rules cannot be skipped by index. "<Dan>" "dat" Pron Pers Sg3 Acc @OBJ #1->4 "<mun>" "mun" Pron Pers Sg1 Nom @SUBJ #2->3 "<lean>" "leat" <aux> V IV Ind Prs Sg1 @FS-STA #3->0 "<dahkan>" "dahkat" <mv> V TV PrfPrc @ICL-AUX< #4->3 "<.>" "." CLB #5->0
Multiwords in xfst
(09:56:30) spectre: quick question... how do you deal with multiword units in xfst? (e.g. lemmas where there is a space in the middle "United Kingdom<PN>" (09:58:04) ttrosterud: two ways (09:58:15) ttrosterud: the preprocessor must know them (09:58:17) ttrosterud: so: (09:58:22) ttrosterud: New% York (09:58:29) ttrosterud: where % literalizes the space (09:58:39) ttrosterud: sorry that was in xfst (09:58:47) spectre: ok (09:58:49) ttrosterud: in the preprocessor it myst be given as (09:59:00) ttrosterud: I live in New York (09:59:09) ttrosterud: then in xfst (or rather in lexc) (09:59:11) ttrosterud: I write (09:59:30) ttrosterud: New% York namelex ; London namelex ; (09:59:31) ttrosterud: etc