Difference between revisions of "North Saami and Lule Saami"

From Apertium
Jump to navigation Jump to search
 
(45 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
{{TOCD}}
 
{{TOCD}}
   
  +
This page gives some details about the North Sámi to Lule Sámi translator.
==Files==
 
   
*<code>apertium-sme-smj.sme.dix</code> &mdash; Northern Sami transducer
 
*<code>apertium-sme-smj.sme-smj.dix</code> &mdash; Transfer lexicon
 
*<code>apertium-sme-smj.smj.dix</code> &mdash; Lule Sami transducer
 
*<code>apertium-sme-smj.sme-smj.rlx</code> &mdash; Constraint grammar
 
*<code>apertium-sme-smj.sme-smj.t1x</code> &mdash; Transfer rule file (level 1 -- Local re-ordering, chunking)
 
*<code>apertium-sme-smj.sme-smj.t2x</code> &mdash; Transfer rule file (level 2 -- Phrase and chunk re-ordering)
 
*<code>apertium-sme-smj.sme-smj.t3x</code> &mdash; Transfer rule file (level 3 -- Final touches)
 
   
  +
==Linguistic issues==
==TODO==
 
   
  +
* [[North Saami - Lule Saami quasicode for the transfer files]]
* Mapped tags in the CG use special characters in Apertium, for example '>' (used for delimiting tags) and '-' (causes problems with pretransfer). These should be replaced somehow.
 
  +
* [[North Saami - Lule Saami testing notes]]
::Example:
 
:::<code>^Wikipedia<N><Prop><Sg><Nom><@SUBJ>>$</code> or <code>^prošeakta<N><Sg><Nom><@<SPRED>$</code>
 
:::This comes from the CG tag @SUBJ>
 
* Re-train the HMM-based POS tagger on a Sami corpus.
 
   
  +
==Technical issues==
==Testing==
 
   
  +
* [[North Saami - Lule Saami tagset mismatches]]
;Analysing some Northern Sami text:
 
   
  +
==Evaluation==
<pre>
 
$ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji gosa \
 
gii beare sáhttá čállit artihkkaliid." | lt-proc sme-smj.automorf.bin
 
   
  +
*[[/Regression tests]]
^Wikipedia/Wikipedia<N><Prop><Sg><Nom>/Wikipedia<N><Prop><Sg><Gen>/Wikipedia<N><Prop><Sg><Acc>$ ^lea/leat<V><IV><Ind><Prs><Sg3>$
 
  +
*[[/Pending tests]]
^máŋggagielat/*máŋggagielat$ ^prošeakta/prošeakta<N><Sg><Nom>$ ^man/man<ADV>$ ^ulbmilin/*ulbmilin$ ^lea/leat<V><IV><Ind><Prs><Sg3>$
 
^ráhkadit/ráhkadit<V><TV><Inf>/ráhkadit<V><TV><Ind><Prs><Pl3>/ráhkadit<V><TV><Ind><Prt><Sg2>$ ^almmolaš/almmolaš<A><Attr>/almmolaš<A><Sg><Nom>$
 
^diehtosátnegirjji/*diehtosátnegirjji$
 
^gosa/gosa<ADV>/gossat<V><IV><VGen>/gossat<V><IV><Imprt><Prs><ConNeg>/gossat<V><IV><Imprt><Prs><Sg2>/gossat<V><IV><Ind><Prs><ConNeg>$ ^gii/*gii$
 
^beare/beare<ADV>$ ^sáhttá/sáhttit<V><IV><Ind><Prs><Sg3>$ ^čállit/čállit<V><TV><Inf>/čállit<V><TV><Ind><Prs><Pl1>$ ^artihkkaliid/*artihkkaliid$.
 
</pre>
 
   
  +
==See also==
;Disambiguating and annotating text with Constraint grammar:
 
   
  +
* The project also has [http://giellatekno.uit.no/doc/mt/smesmj/smesmj.html a home page at Giellatekno]
<pre>
 
  +
* [[Integration and tagset conversion with Giellatekno]]
$ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji gosa \
 
gii beare sáhttá čállit artihkkaliid." | lt-proc sme-smj.automorf.bin | cg-proc sme-smj.rlx.bin
 
   
  +
*[[North Saami Lule Saami reminder issues|Reminders]]
^Wikipedia/Wikipedia<N><Prop><Sg><Nom><@SUBJ%>$ ^lea/leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ ^máŋggagielat/*máŋggagielat$
 
^prošeakta/prošeakta<N><Sg><Nom><@<SPRED>$ ^man/man<ADV>$ ^ulbmilin/*ulbmilin$ ^lea/leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$
 
^ráhkadit/ráhkadit<V><TV><Inf><@-FMAINV>$ ^almmolaš/almmolaš<A><Sg><Nom><@%SUBJ>$ ^diehtosátnegirjji/*diehtosátnegirjji$ ^gosa/gosa<ADV>$ ^gii/*gii$
 
^beare/beare<ADV>$ ^sáhttá/sáhttit<V><IV><Ind><Prs><Sg3><@+FAUXV>$ ^čállit/čállit<V><TV><Inf><@-FMAINV>$ ^artihkkaliid/*artihkkaliid$
 
</pre>
 
   
  +
==External links==
;Finishing off the disambiguation with Apertium's HMM tagger:
 
   
  +
* [http://www.divvun.no/doc/lang/smj/docu-smj-grammartags.html Lule Sámi: The grammatical tags]
<pre>
 
$ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji gosa \
 
gii beare sáhttá čállit artihkkaliid." | lt-proc sme-smj.automorf.bin | cg-proc sme-smj.rlx.bin | apertium-tagger -g sme-smj.prob
 
   
  +
[[Category:North Saami and Lule Saami|*]]
^Wikipedia<N><Prop><Sg><Nom><@SUBJ%>$ ^leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ ^*máŋggagielat$ ^prošeakta<N><Sg><Nom><@<SPRED>$ ^man<ADV>$
 
 
[[Category:North Saami]]
^*ulbmilin$ ^leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ ^ráhkadit<V><TV><Inf><@-FMAINV>$ ^almmolaš<A><Sg><Nom><@%SUBJ>$ ^*diehtosátnegirjji$
 
  +
[[Category:Lule Saami]]
^gosa<ADV>$ ^*gii$ ^beare<ADV>$ ^sáhttit<V><IV><Ind><Prs><Sg3><@+FAUXV>$ ^čállit<V><TV><Inf><@-FMAINV>$ ^*artihkkaliid$
 
</pre>
 
 
;Applying lexical transfer and chunking:
 
 
<pre>
 
$ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji gosa \
 
gii beare sáhttá čállit artihkkaliid." | lt-proc sme-smj.automorf.bin | cg-proc sme-smj.rlx.bin | apertium-tagger -g sme-smj.prob | \
 
apertium-transfer apertium-sme-smj.sme-smj.t1x sme-smj.t1x.bin sme-smj.autobil.bin
 
 
^nom<SN><@SUBJ%><Sg><Nom>{^Wikipedia<N><Prop><Sg><Nom>$}$ ^default{^liehk<Sg3><@+FMAINV>$}$ ^unknown{^*máŋggagielat$}$
 
^nom<SN><Sg><Nom>{^prosjækta<N><Sg><Nom>$}$ ^default{^man<ADV>$}$ ^unknown{^*ulbmilin$}$ ^default{^liehk<Sg3><@+FMAINV>$}$
 
^default{^dahkat<V><TV><Inf><@-FMAINV>$}$ ^default{^almulasj<A><Sg><Nom><@%SUBJ>$}$ ^unknown{^*diehtosátnegirjji$}$ ^default{^<ADV>$}$
 
^unknown{^*gii$}$ ^default{^@beare<ADV>$}$ ^default{^sáhttet<V><IV><Ind><Prs><Sg3><@+FAUXV>$}$ ^default{^tjállet<Inf><@-FMAINV>$}$
 
^unknown{^*artihkkaliid$}$
 
</pre>
 
 
[[Category:Language pairs]]
 

Latest revision as of 13:22, 5 January 2016