Difference between revisions of "North Saami and Lule Saami"
		
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
		
		
		
		
		
	
| (37 intermediate revisions by 3 users not shown) | |||
| Line 1: | Line 1: | ||
| {{TOCD}} | {{TOCD}} | ||
| This page gives some details about the North Sámi to Lule Sámi translator. | |||
| ==Files== | |||
| *<code>apertium-sme-smj.sme.dix</code> — Northern Sami transducer | |||
| *<code>apertium-sme-smj.sme-smj.dix</code> — Transfer lexicon | |||
| *<code>apertium-sme-smj.smj.dix</code> — Lule Sami transducer | |||
| *<code>apertium-sme-smj.sme-smj.rlx</code> — Constraint grammar  | |||
| *<code>apertium-sme-smj.sme-smj.t1x</code> — Transfer rule file (level 1 -- Local re-ordering, chunking) | |||
| *<code>apertium-sme-smj.sme-smj.t2x</code> — Transfer rule file (level 2 -- Phrase and chunk re-ordering) | |||
| *<code>apertium-sme-smj.sme-smj.t3x</code> — Transfer rule file (level 3 -- Final touches) | |||
| ==Linguistic issues== | |||
| ==TODO== | |||
| * [[North Saami - Lule Saami quasicode for the transfer files]] | |||
| * Mapped tags in the CG use special characters in Apertium, for example '>' (used for delimiting tags) and '-' (causes problems with pretransfer). These should be replaced somehow. | |||
| * [[North Saami - Lule Saami testing notes]] | |||
| ::Example: | |||
| :::<code>^Wikipedia<N><Prop><Sg><Nom><@SUBJ>>$</code> or <code>^prošeakta<N><Sg><Nom><@<SPRED>$</code> | |||
| :::This comes from the CG tag @SUBJ>  | |||
| * Re-train the HMM-based POS tagger on a Sami corpus. | |||
| * Closed categories in sme analyser | |||
| ==Technical issues== | |||
| ==Reminders== | |||
| * [[North Saami - Lule Saami tagset mismatches]] | |||
| * In the transfer rule files, don't forget to escape the '+' character in tags, for example:  | |||
| ::'''no:''' <code><attr-item tags="@+FMAINV"/></code> ,  | |||
| ::'''yes:''' <code><attr-item tags="@\+FMAINV"/></code> | |||
| == | ==Evaluation== | ||
| *[[/Regression tests]] | |||
| ;Analysing some Northern Sami text: | |||
| *[[/Pending tests]] | |||
| ==See also== | |||
| <pre> | |||
| $ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji gosa \ | |||
| gii beare sáhttá čállit artihkkaliid." |  lt-proc sme-smj.automorf.bin | |||
| * The project also has [http://giellatekno.uit.no/doc/mt/smesmj/smesmj.html a home page at Giellatekno] | |||
| ^Wikipedia/Wikipedia<N><Prop><Sg><Nom>/Wikipedia<N><Prop><Sg><Gen>/Wikipedia<N><Prop><Sg><Acc>$ ^lea/leat<V><IV><Ind><Prs><Sg3>$  | |||
| * [[Integration and tagset conversion with Giellatekno]] | |||
| ^máŋggagielat/máŋggagielat<A><Attr>/máŋggagielat<A><Sg><Nom>$ ^prošeakta/prošeakta<N><Sg><Nom>$  | |||
| ^man/man<ADV>/mii<Pron><Interr><Sg><Gen>/mii<Pron><Interr><Sg><Acc>/mii<Pron><Rel><Sg><Gen>/mii<Pron><Rel><Sg><Acc>$ ^ulbmilin/ulbmil<N><Ess>$  | |||
| ^lea/leat<V><IV><Ind><Prs><Sg3>$ ^ráhkadit/ráhkadit<V><TV><Inf>/ráhkadit<V><TV><Ind><Prs><Pl3>/ráhkadit<V><TV><Ind><Prt><Sg2>$  | |||
| ^almmolaš/almmolaš<A><Attr>/almmolaš<A><Sg><Nom>$ ^diehtosátnegirjji/diehtosátnegirji<N><Sg><Acc>$  | |||
| ^gosa/gosa<ADV>/gossat<V><IV><VGen>/gossat<V><IV><Imprt><Prs><ConNeg>/gossat<V><IV><Imprt><Prs><Sg2>/gossat<V><IV><Ind><Prs><ConNeg>$  | |||
| ^gii/gii<Pron><Interr><Sg><Nom>/gii<Pron><Rel><Sg><Nom>$ ^beare/beare<ADV>$ ^sáhttá/sáhttit<V><IV><Ind><Prs><Sg3>$  | |||
| ^čállit/čállit<V><TV><Inf>/čállit<V><TV><Ind><Prs><Pl1>$ ^artihkkaliid/artihkal<N><Pl><Gen>/artihkal<N><Pl><Acc>$. | |||
| *[[North Saami Lule Saami reminder issues|Reminders]] | |||
| ;Disambiguating and annotating text with Constraint grammar: | |||
| <pre> | |||
| $ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji gosa \  | |||
| gii beare sáhttá čállit artihkkaliid." | lt-proc sme-smj.automorf.bin | cg-proc sme-smj.rlx.bin  | |||
| ^Wikipedia/Wikipedia<N><Prop><Sg><Nom><@SUBJ→>$ ^lea/leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ ^máŋggagielat/máŋggagielat<A><Attr><@→N>$  | |||
| ^prošeakta/prošeakta<N><Sg><Nom><@←SPRED>$ ^man/mii<Pron><Rel><Sg><Gen><@→N>$ ^ulbmilin/ulbmil<N><Ess><@SPRED→>$  | |||
| ^lea/leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ ^ráhkadit/ráhkadit<V><TV><Inf><@-FMAINV>$ ^almmolaš/almmolaš<A><Attr><@→N>$  | |||
| ^diehtosátnegirjji/diehtosátnegirji<N><Sg><Acc><@←OBJ>$ ^gosa/gosa<ADV>$ ^gii/gii<Pron><Rel><Sg><Nom><@SUBJ→>$ ^beare/beare<ADV>$  | |||
| ^sáhttá/sáhttit<V><IV><Ind><Prs><Sg3><@+FAUXV>$ ^čállit/čállit<V><TV><Inf><@-FMAINV>$ ^artihkkaliid/artihkal<N><Pl><Acc><@←OBJ>$. | |||
| </pre> | |||
| ;Finishing off the disambiguation with Apertium's HMM tagger: | |||
| <pre> | |||
| $ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji gosa \  | |||
| gii beare sáhttá čállit artihkkaliid." |  lt-proc sme-smj.automorf.bin | cg-proc sme-smj.rlx.bin | apertium-tagger -g sme-smj.prob  | |||
| ^Wikipedia<N><Prop><Sg><Nom><@SUBJ→>$ ^leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ ^máŋggagielat<A><Attr><@→N>$ ^prošeakta<N><Sg><Nom><@←SPRED>$  | |||
| ^mii<Pron><Rel><Sg><Gen><@→N>$ ^ulbmil<N><Ess><@SPRED→>$ ^leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ ^ráhkadit<V><TV><Inf><@-FMAINV>$  | |||
| ^almmolaš<A><Attr><@→N>$ ^diehtosátnegirji<N><Sg><Acc><@←OBJ>$ ^gosa<ADV>$ ^gii<Pron><Rel><Sg><Nom><@SUBJ→>$ ^beare<ADV>$  | |||
| ^sáhttit<V><IV><Ind><Prs><Sg3><@+FAUXV>$ ^čállit<V><TV><Inf><@-FMAINV>$ ^artihkal<N><Pl><Acc><@←OBJ>$. | |||
| </pre> | |||
| ;Applying lexical transfer and chunking: | |||
| <pre> | |||
| $ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji gosa \ | |||
| gii beare sáhttá čállit artihkkaliid." |  lt-proc sme-smj.automorf.bin | cg-proc sme-smj.rlx.bin | apertium-tagger -g sme-smj.prob | \ | |||
| apertium-transfer apertium-sme-smj.sme-smj.t1x sme-smj.t1x.bin sme-smj.autobil.bin | |||
| ^nom<SN><@SUBJ→><Sg><Nom>{^Wikipedia<N><Prop><Sg><Nom>$}$ ^verb<SV><@+FMAINV>{^liehket<V><Ind><Prs><Sg3>$}$ ^nom<SN><@→N>{^@máŋggagielat<A><Attr>$}$  | |||
| ^nom<SN><Sg><Nom>{^prosjækta<N><Sg><Nom>$}$ ^pronom<SN><@→N><Sg><Gen>{^mij<Pron><Rel><Sg><Gen>$}$ ^nom<SN><@SPRED→><Ess>{^ulmme<N><Ess>$}$  | |||
| ^verb<SV><@+FMAINV>{^liehket<V><Ind><Prs><Sg3>$}$ ^verb<SV><@-FMAINV>{^dahkat<V><Inf>$}$ ^nom<SN><@→N>{^almulasj<A><Attr>$}$  | |||
| ^nom<SN><@←OBJ><Sg><Acc>{^@diehtosátnegirji<N><Sg><Acc>$}$ ^adv<Adv>{^ADV><ADV>$}$ ^pronom<SN><@SUBJ→><Sg><Nom>{^guhti<Pron><Rel><Sg><Nom>$}$  | |||
| ^adv<Adv>{^beru<ADV>$}$ ^verb<SV>{^sáhttet<V><Ind><Prs><Sg3>$}$ ^verb<SV><@-FMAINV>{^tjállet<V><Inf>$}$  | |||
| ^nom<SN><@←OBJ><Pl><Acc>{^artihkal<N><Pl><Acc>$}$. | |||
| </pre> | |||
| ==External links== | ==External links== | ||
| * [http://www.divvun.no/doc/lang/ | * [http://www.divvun.no/doc/lang/smj/docu-smj-grammartags.html Lule Sámi: The grammatical tags] | ||
| [[Category: | [[Category:North Saami and Lule Saami|*]] | ||
| [[Category:North Saami]] | |||
| [[Category:Lule Saami]] | |||
Latest revision as of 13:22, 5 January 2016
This page gives some details about the North Sámi to Lule Sámi translator.
Linguistic issues[edit]
Technical issues[edit]
Evaluation[edit]
See also[edit]
- The project also has a home page at Giellatekno
- Integration and tagset conversion with Giellatekno

