Difference between revisions of "Talk:Infrastructure discussion"

From Apertium
Jump to navigation Jump to search
 
(One intermediate revision by the same user not shown)
Line 30: Line 30:
 
(18:27:25) ttrosterud: -bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg
 
(18:27:25) ttrosterud: -bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg
 
"<Dan>"
 
"<Dan>"
"dat" Pron Pers Sg3 Acc
+
"dat" Pron Pers Sg3 Acc
"dat" Pron Dem Sg Acc
+
"dat" Pron Dem Sg Acc
"Dan" N Prop Mal Sg Gen
+
"Dan" N Prop Mal Sg Gen
"Dan" N Prop Mal Sg Attr
+
"Dan" N Prop Mal Sg Attr
"dat" Pron Pers Sg3 Gen
+
"dat" Pron Pers Sg3 Gen
"D" N ACR Ess
+
"D" N ACR Ess
"Dan" N Prop Mal Sg Acc
+
"Dan" N Prop Mal Sg Acc
"dat" Pron Dem Sg Gen
+
"dat" Pron Dem Sg Gen
"Dan" N Prop Mal Sg Nom
+
"Dan" N Prop Mal Sg Nom
 
"<mun>"
 
"<mun>"
"mun" Pron Pers Sg1 Nom
+
"mun" Pron Pers Sg1 Nom
 
"<lean>"
 
"<lean>"
"leat" V IV Ind Prs Sg1
+
"leat" V IV Ind Prs Sg1
"leat" V IV PrfPrc
+
"leat" V IV PrfPrc
 
"<dahkan>"
 
"<dahkan>"
"dahkat" V TV Actio Gen
+
"dahkat" V TV Actio Gen
"dahkat" V TV PrfPrc
+
"dahkat" V TV PrfPrc
"dahkat" V* TV Der3 Der/n N Sg Nom
+
"dahkat" V* TV Der3 Der/n N Sg Nom
"dahkat" V TV Actio Acc
+
"dahkat" V TV Actio Acc
"dahkat" V* TV Der3 Der/n N Sg Gen
+
"dahkat" V* TV Der3 Der/n N Sg Gen
"dahkat" V TV Actio Nom
+
"dahkat" V TV Actio Nom
 
"<.>"
 
"<.>"
"." CLB
+
"." CLB
   
 
-bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg
 
-bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg
 
"<Dan>"
 
"<Dan>"
"dat" Pron Pers Sg3 Acc
+
"dat" Pron Pers Sg3 Acc
"dat" Pron Dem Sg Acc
+
"dat" Pron Dem Sg Acc
"Dan" N Prop Mal Sg Gen
+
"Dan" N Prop Mal Sg Gen
"Dan" N Prop Mal Sg Attr
+
"Dan" N Prop Mal Sg Attr
"dat" Pron Pers Sg3 Gen
+
"dat" Pron Pers Sg3 Gen
"D" N ACR Ess
+
"D" N ACR Ess
"Dan" N Prop Mal Sg Acc
+
"Dan" N Prop Mal Sg Acc
"dat" Pron Dem Sg Gen
+
"dat" Pron Dem Sg Gen
"Dan" N Prop Mal Sg Nom
+
"Dan" N Prop Mal Sg Nom
 
"<mun>"
 
"<mun>"
"mun" Pron Pers Sg1 Nom
+
"mun" Pron Pers Sg1 Nom
 
"<lean>"
 
"<lean>"
"leat" V IV Ind Prs Sg1
+
"leat" V IV Ind Prs Sg1
"leat" V IV PrfPrc
+
"leat" V IV PrfPrc
 
"<dahkan>"
 
"<dahkan>"
"dahkat" V TV Actio Gen
+
"dahkat" V TV Actio Gen
"dahkat" V TV PrfPrc
+
"dahkat" V TV PrfPrc
"dahkat" V* TV Der3 Der/n N Sg Nom
+
"dahkat" V* TV Der3 Der/n N Sg Nom
"dahkat" V TV Actio Acc
+
"dahkat" V TV Actio Acc
"dahkat" V* TV Der3 Der/n N Sg Gen
+
"dahkat" V* TV Der3 Der/n N Sg Gen
"dahkat" V TV Actio Nom
+
"dahkat" V TV Actio Nom
 
"<.>"
 
"<.>"
"." CLB
+
"." CLB
   
 
-bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg | vislcg3 -g gt/sme/src/sme-dis.rle
 
-bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg | vislcg3 -g gt/sme/src/sme-dis.rle
Line 87: Line 87:
 
25 rules cannot be skipped by index.
 
25 rules cannot be skipped by index.
 
"<Dan>"
 
"<Dan>"
"dat" Pron Pers Sg3 Acc @OBJ
+
"dat" Pron Pers Sg3 Acc @OBJ
 
"<mun>"
 
"<mun>"
"mun" Pron Pers Sg1 Nom @SUBJ
+
"mun" Pron Pers Sg1 Nom @SUBJ
 
"<lean>"
 
"<lean>"
"leat" V IV Ind Prs Sg1 @+FAUXV
+
"leat" V IV Ind Prs Sg1 @+FAUXV
 
"<dahkan>"
 
"<dahkan>"
"dahkat" V TV PrfPrc @-FMAINV
+
"dahkat" V TV PrfPrc @-FMAINV
 
"<.>"
 
"<.>"
"." CLB
+
"." CLB
   
 
-bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg | vislcg3 -g gt/sme/src/sme-dis.rle | vislcg3 -g gt/sme/src/sme-dep.rle
 
-bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg | vislcg3 -g gt/sme/src/sme-dis.rle | vislcg3 -g gt/sme/src/sme-dep.rle
Line 109: Line 109:
 
25 rules cannot be skipped by index.
 
25 rules cannot be skipped by index.
 
"<Dan>"
 
"<Dan>"
"dat" Pron Pers Sg3 Acc @OBJ #1->4
+
"dat" Pron Pers Sg3 Acc @OBJ #1->4
 
"<mun>"
 
"<mun>"
"mun" Pron Pers Sg1 Nom @SUBJ #2->3
+
"mun" Pron Pers Sg1 Nom @SUBJ #2->3
 
"<lean>"
 
"<lean>"
"leat" <aux> V IV Ind Prs Sg1 @FS-STA #3->0
+
"leat" <aux> V IV Ind Prs Sg1 @FS-STA #3->0
 
"<dahkan>"
 
"<dahkan>"
"dahkat" <mv> V TV PrfPrc @ICL-AUX< #4->3
+
"dahkat" <mv> V TV PrfPrc @ICL-AUX< #4->3
 
"<.>"
 
"<.>"
"." CLB #5->0
+
"." CLB #5->0
   
  +
</pre>
  +
  +
==Multiwords in xfst==
  +
  +
<pre>
  +
(09:56:30) spectre: quick question... how do you deal with multiword units in xfst? (e.g. lemmas where there is a space in the middle "United Kingdom<PN>"
  +
(09:58:04) ttrosterud: two ways
  +
(09:58:15) ttrosterud: the preprocessor must know them
  +
(09:58:17) ttrosterud: so:
  +
(09:58:22) ttrosterud: New% York
  +
(09:58:29) ttrosterud: where % literalizes the space
  +
(09:58:39) ttrosterud: sorry that was in xfst
  +
(09:58:47) spectre: ok
  +
(09:58:49) ttrosterud: in the preprocessor it myst be given as
  +
(09:59:00) ttrosterud: I
  +
live
  +
in
  +
New York
  +
(09:59:09) ttrosterud: then in xfst (or rather in lexc)
  +
(09:59:11) ttrosterud: I write
  +
(09:59:30) ttrosterud: New% York namelex ;
  +
London namelex ;
  +
(09:59:31) ttrosterud: etc
 
</pre>
 
</pre>

Latest revision as of 11:42, 23 April 2008

This is how things happen at Tromsø

-bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
Dan Dan+N+Prop+Mal+Sg+Attr
Dan Dan+N+Prop+Mal+Sg+Acc
Dan Dan+N+Prop+Mal+Sg+Gen
Dan Dan+N+Prop+Mal+Sg+Nom
Dan dat+Pron+Dem+Sg+Acc
Dan dat+Pron+Dem+Sg+Gen
Dan dat+Pron+Pers+Sg3+Acc
Dan dat+Pron+Pers+Sg3+Gen
Dan D+N+ACR+Ess

mun mun+Pron+Pers+Sg1+Nom

lean leat+V+IV+Ind+Prs+Sg1
lean leat+V+IV+PrfPrc

dahkan dahkat+V+TV+Actio+Acc
dahkan dahkat+V+TV+Actio+Gen
dahkan dahkat+V+TV+Actio+Nom
dahkan dahkat+V+TV+PrfPrc
dahkan dahkat+V+TV+Der3+Der/n+N+Sg+Gen
dahkan dahkat+V+TV+Der3+Der/n+N+Sg+Nom

. .+CLB

(18:27:25) ttrosterud: -bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg
"<Dan>"
        "dat" Pron Pers Sg3 Acc
        "dat" Pron Dem Sg Acc
        "Dan" N Prop Mal Sg Gen
        "Dan" N Prop Mal Sg Attr
        "dat" Pron Pers Sg3 Gen
        "D" N ACR Ess
        "Dan" N Prop Mal Sg Acc
        "dat" Pron Dem Sg Gen
        "Dan" N Prop Mal Sg Nom
"<mun>"
        "mun" Pron Pers Sg1 Nom
"<lean>"
        "leat" V IV Ind Prs Sg1
        "leat" V IV PrfPrc
"<dahkan>"
        "dahkat" V TV Actio Gen
        "dahkat" V TV PrfPrc
        "dahkat" V* TV Der3 Der/n N Sg Nom
        "dahkat" V TV Actio Acc
        "dahkat" V* TV Der3 Der/n N Sg Gen
        "dahkat" V TV Actio Nom
"<.>"
        "." CLB

-bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg
"<Dan>"
        "dat" Pron Pers Sg3 Acc
        "dat" Pron Dem Sg Acc
        "Dan" N Prop Mal Sg Gen
        "Dan" N Prop Mal Sg Attr
        "dat" Pron Pers Sg3 Gen
        "D" N ACR Ess
        "Dan" N Prop Mal Sg Acc
        "dat" Pron Dem Sg Gen
        "Dan" N Prop Mal Sg Nom
"<mun>"
        "mun" Pron Pers Sg1 Nom
"<lean>"
        "leat" V IV Ind Prs Sg1
        "leat" V IV PrfPrc
"<dahkan>"
        "dahkat" V TV Actio Gen
        "dahkat" V TV PrfPrc
        "dahkat" V* TV Der3 Der/n N Sg Nom
        "dahkat" V TV Actio Acc
        "dahkat" V* TV Der3 Der/n N Sg Gen
        "dahkat" V TV Actio Nom
"<.>"
        "." CLB

-bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg | vislcg3 -g gt/sme/src/sme-dis.rle 
VISL CG-3 Disambiguator version 0.9.3.3362
Codepage: default UTF-8, input UTF-8, output UTF-8, grammar UTF-8
Parsing grammar took 0.657588 seconds.
Grammar has 27 sections, 3284 rules, 3658 sets, 8514 tags.
25 rules cannot be skipped by index.
"<Dan>"
        "dat" Pron Pers Sg3 Acc @OBJ 
"<mun>"
        "mun" Pron Pers Sg1 Nom @SUBJ 
"<lean>"
        "leat" V IV Ind Prs Sg1 @+FAUXV 
"<dahkan>"
        "dahkat" V TV PrfPrc @-FMAINV 
"<.>"
        "." CLB 

-bash-3.00$ echo "Dan mun lean dahkan." | preprocess | lo | lookup2cg | vislcg3 -g gt/sme/src/sme-dis.rle | vislcg3 -g gt/sme/src/sme-dep.rle
VISL CG-3 Disambiguator version 0.9.3.3362
Codepage: default UTF-8, input UTF-8, output UTF-8, grammar UTF-8
VISL CG-3 Disambiguator version 0.9.3.3362
Codepage: default UTF-8, input UTF-8, output UTF-8, grammar UTF-8
Parsing grammar took 0.088736 seconds.      
Grammar has 2 sections, 57 rules, 835 sets, 7955 tags.
Grammar has dependency rules.
Parsing grammar took 0.65936 seconds.
Grammar has 27 sections, 3284 rules, 3658 sets, 8514 tags.
25 rules cannot be skipped by index.
"<Dan>"
        "dat" Pron Pers Sg3 Acc @OBJ #1->4 
"<mun>"
        "mun" Pron Pers Sg1 Nom @SUBJ #2->3 
"<lean>"
        "leat" <aux> V IV Ind Prs Sg1 @FS-STA #3->0 
"<dahkan>"
        "dahkat" <mv> V TV PrfPrc @ICL-AUX< #4->3 
"<.>"
        "." CLB #5->0 

Multiwords in xfst[edit]

(09:56:30) spectre: quick question... how do you deal with multiword units in xfst? (e.g. lemmas where there is a space in the middle "United Kingdom<PN>"
(09:58:04) ttrosterud: two ways
(09:58:15) ttrosterud: the preprocessor must know them
(09:58:17) ttrosterud: so:
(09:58:22) ttrosterud: New% York
(09:58:29) ttrosterud: where % literalizes the space
(09:58:39) ttrosterud: sorry that was in xfst
(09:58:47) spectre: ok
(09:58:49) ttrosterud: in the preprocessor it myst be given as
(09:59:00) ttrosterud: I
live
in
New York
(09:59:09) ttrosterud: then in xfst (or rather in lexc)
(09:59:11) ttrosterud: I write
(09:59:30) ttrosterud: New% York namelex ;
London namelex ;
(09:59:31) ttrosterud: etc