Difference between revisions of "North Saami and Lule Saami"

From Apertium
Jump to navigation Jump to search
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{TOCD}}
{{TOCD}}


This page gives some details about the North Sámi to Lule Sámi translator.
==Files==


==Standardisation==
*<code>apertium-sme-smj.sme.dix</code> &mdash; Northern Sami transducer

*<code>apertium-sme-smj.sme-smj.dix</code> &mdash; Transfer lexicon
* Leahppi go gávdnan gusade?
*<code>apertium-sme-smj.smj.dix</code> &mdash; Lule Sami transducer
** Læhppe gu gávnnam gusáda?
*<code>apertium-sme-smj.sme-smj.rlx</code> &mdash; Constraint grammar
** Lihppe gu gávnnam gusáda?
*<code>apertium-sme-smj.sme-smj.t1x</code> &mdash; Transfer rule file (level 1 -- Local re-ordering, chunking)
*<code>apertium-sme-smj.sme-smj.t2x</code> &mdash; Transfer rule file (level 2 -- Phrase and chunk re-ordering)
*<code>apertium-sme-smj.sme-smj.t3x</code> &mdash; Transfer rule file (level 3 -- Final touches)


==TODO==
==TODO==


===Tagset mismatches===
* <s>Mapped tags in the CG use special characters in Apertium, for example '>' (used for delimiting tags) and '-' (causes problems with pretransfer). These should be replaced somehow.
::Example:
:::<code>^Wikipedia<N><Prop><Sg><Nom><@SUBJ>>$</code> or <code>^prošeakta<N><Sg><Nom><@<SPRED>$</code>
:::This comes from the CG tag @SUBJ></s>
::Replaced > with → and < with ←
* Re-train the HMM-based POS tagger on a Sami corpus.
* Closed categories in sme analyser


; eará -- ietjá
==Reminders==

* In the transfer rule files, don't forget to escape the '+' character in tags, for example:
::'''no:''' <code><attr-item tags="@+FMAINV"/></code> ,
::'''yes:''' <code><attr-item tags="@\+FMAINV"/></code>

==Testing==

;Analysing some Northern Sami text:


<pre>
<pre>
$ echo "eará" | osme
$ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji gosa \
191480 0
gii beare sáhttá čállit artihkkaliid." | lt-proc sme-smj.automorf.bin
eará eará+Pron+Indef+Sg+Nom
eará eará+Pron+Indef+Sg+Gen
eará eará+Pron+Indef+Sg+Acc
eará eará+Pron+Indef+Attr


$ echo "ietjá+Pron+Indef+Attr" | dsmj
^Wikipedia/Wikipedia<N><Prop><Sg><Nom>/Wikipedia<N><Prop><Sg><Gen>/Wikipedia<N><Prop><Sg><Acc>$ ^lea/leat<V><IV><Ind><Prs><Sg3>$
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
^máŋggagielat/máŋggagielat<A><Attr>/máŋggagielat<A><Sg><Nom>$ ^prošeakta/prošeakta<N><Sg><Nom>$
ietjá+Pron+Indef+Attr ietjá+Pron+Indef+Attr +?
^man/man<ADV>/mii<Pron><Interr><Sg><Gen>/mii<Pron><Interr><Sg><Acc>/mii<Pron><Rel><Sg><Gen>/mii<Pron><Rel><Sg><Acc>$ ^ulbmilin/ulbmil<N><Ess>$
^lea/leat<V><IV><Ind><Prs><Sg3>$ ^ráhkadit/ráhkadit<V><TV><Inf>/ráhkadit<V><TV><Ind><Prs><Pl3>/ráhkadit<V><TV><Ind><Prt><Sg2>$
^almmolaš/almmolaš<A><Attr>/almmolaš<A><Sg><Nom>$ ^diehtosátnegirjji/diehtosátnegirji<N><Sg><Acc>$
^gosa/gosa<ADV>/gossat<V><IV><VGen>/gossat<V><IV><Imprt><Prs><ConNeg>/gossat<V><IV><Imprt><Prs><Sg2>/gossat<V><IV><Ind><Prs><ConNeg>$
^gii/gii<Pron><Interr><Sg><Nom>/gii<Pron><Rel><Sg><Nom>$ ^beare/beare<ADV>$ ^sáhttá/sáhttit<V><IV><Ind><Prs><Sg3>$
^čállit/čállit<V><TV><Inf>/čállit<V><TV><Ind><Prs><Pl1>$ ^artihkkaliid/artihkal<N><Pl><Gen>/artihkal<N><Pl><Acc>$.
</pre>
</pre>

;Disambiguating and annotating text with Constraint grammar:
;buot -- divnna


<pre>
<pre>
$ echo "buot" | osme
$ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji gosa \
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
gii beare sáhttá čállit artihkkaliid." | lt-proc sme-smj.automorf.bin | cg-proc sme-smj.rlx.bin
buot buot+Adv
buot buot+Pron+Indef


$ echo "divnna" | osmj
^Wikipedia/Wikipedia<N><Prop><Sg><Nom><@SUBJ→>$ ^lea/leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ ^máŋggagielat/máŋggagielat<A><Attr><@→N>$
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
^prošeakta/prošeakta<N><Sg><Nom><@←SPRED>$ ^man/mii<Pron><Rel><Sg><Gen><@→N>$ ^ulbmilin/ulbmil<N><Ess><@SPRED→>$
divnna divnna+Pron+Indef+Sg+Nom
^lea/leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ ^ráhkadit/ráhkadit<V><TV><Inf><@-FMAINV>$ ^almmolaš/almmolaš<A><Attr><@→N>$
divnna divnna+Pron+Indef+Attr
^diehtosátnegirjji/diehtosátnegirji<N><Sg><Acc><@←OBJ>$ ^gosa/gosa<ADV>$ ^gii/gii<Pron><Rel><Sg><Nom><@SUBJ→>$ ^beare/beare<ADV>$
^sáhttá/sáhttit<V><IV><Ind><Prs><Sg3><@+FAUXV>$ ^čállit/čállit<V><TV><Inf><@-FMAINV>$ ^artihkkaliid/artihkal<N><Pl><Acc><@←OBJ>$.
</pre>
</pre>


; ieš#guhtet -- iesj#guhtik
;Finishing off the disambiguation with Apertium's HMM tagger:


<pre>
<pre>
$ echo "iešguđetge" | osme
$ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji gosa \
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
gii beare sáhttá čállit artihkkaliid." | lt-proc sme-smj.automorf.bin | cg-proc sme-smj.rlx.bin | apertium-tagger -g sme-smj.prob
iešguđetge ieš#guhtege+Pron+Indef+Pl+Nom
iešguđetge ieš#guhtet+Pron+Indef+Acc+Foc/ge
iešguđetge ieš#guhtet+Pron+Indef+Gen+Foc/ge
iešguđetge ieš#guđet+Pron+Indef+Foc/ge


$ echo "iesj#guhtik+Pron+Indef+Gen+Foc/ge" | dsmj
^Wikipedia<N><Prop><Sg><Nom><@SUBJ→>$ ^leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ ^máŋggagielat<A><Attr><@→N>$ ^prošeakta<N><Sg><Nom><@←SPRED>$
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
^mii<Pron><Rel><Sg><Gen><@→N>$ ^ulbmil<N><Ess><@SPRED→>$ ^leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ ^ráhkadit<V><TV><Inf><@-FMAINV>$
iesj#guhtik+Pron+Indef+Gen+Foc/ge iesj#guhtik+Pron+Indef+Gen+Foc/ge +?
^almmolaš<A><Attr><@→N>$ ^diehtosátnegirji<N><Sg><Acc><@←OBJ>$ ^gosa<ADV>$ ^gii<Pron><Rel><Sg><Nom><@SUBJ→>$ ^beare<ADV>$
^sáhttit<V><IV><Ind><Prs><Sg3><@+FAUXV>$ ^čállit<V><TV><Inf><@-FMAINV>$ ^artihkal<N><Pl><Acc><@←OBJ>$.


$ echo "iesj#guhtik+Pron+Indef+Foc/ge" | dsmj
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
iesj#guhtik+Pron+Indef+Foc/ge iesj#guhtik#ge
iesj#guhtik+Pron+Indef+Foc/ge iesj#guhtikge
iesj#guhtik+Pron+Indef+Foc/ge iesjguhtik#ge
iesj#guhtik+Pron+Indef+Foc/ge iesjguhtikge
</pre>
</pre>


; maid -- aj
;Applying lexical transfer and chunking:


<pre>
<pre>
$ echo aj | osmj
$ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji gosa \
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
gii beare sáhttá čállit artihkkaliid." | lt-proc sme-smj.automorf.bin | cg-proc sme-smj.rlx.bin | apertium-tagger -g sme-smj.prob | \
aj aj+Pcle
apertium-transfer apertium-sme-smj.sme-smj.t1x sme-smj.t1x.bin sme-smj.autobil.bin


$ echo maid | osme
^nom<SN><@SUBJ→><Sg><Nom>{^Wikipedia<N><Prop><Sg><Nom>$}$ ^verb<SV><@+FMAINV>{^liehket<V><Ind><Prs><Sg3>$}$ ^nom<SN><@→N>{^@máŋggagielat<A><Attr>$}$
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
^nom<SN><Sg><Nom>{^prosjækta<N><Sg><Nom>$}$ ^pronom<SN><@→N><Sg><Gen>{^mij<Pron><Rel><Sg><Gen>$}$ ^nom<SN><@SPRED→><Ess>{^ulmme<N><Ess>$}$
maid mii+Pron+Interr+Pl+Acc
^verb<SV><@+FMAINV>{^liehket<V><Ind><Prs><Sg3>$}$ ^verb<SV><@-FMAINV>{^dahkat<V><Inf>$}$ ^nom<SN><@→N>{^almulasj<A><Attr>$}$
maid mii+Pron+Interr+Pl+Gen
^nom<SN><@←OBJ><Sg><Acc>{^@diehtosátnegirji<N><Sg><Acc>$}$ ^adv<Adv>{^ADV><ADV>$}$ ^pronom<SN><@SUBJ→><Sg><Nom>{^guhti<Pron><Rel><Sg><Nom>$}$
maid mii+Pron+Interr+Sg+Acc
^adv<Adv>{^beru<ADV>$}$ ^verb<SV>{^sáhttet<V><Ind><Prs><Sg3>$}$ ^verb<SV><@-FMAINV>{^tjállet<V><Inf>$}$
maid mii+Pron+Rel+Pl+Acc
^nom<SN><@←OBJ><Pl><Acc>{^artihkal<N><Pl><Acc>$}$.
maid mii+Pron+Rel+Pl+Gen
maid mii+Pron+Rel+Sg+Acc
maid maid+Adv
maid maid+Interj


</pre>
</pre>


==Reminders==
; Running through the whole system
<pre>
$ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji gosa \
gii beare sáhttá čállit artihkkaliid." | apertium -d . sme-smj


* In the transfer rule files, don't forget to escape the '+' character in tags, for example:
Wikipedia l @máŋggagielat prosjækta man ulmmen l dahkat almulasj @diehtosátnegirji #ADV>
::'''no:''' <code><attr-item tags="@+FMAINV"/></code> ,
guhti beru sáhttá tjállet artihkkalijt
::'''yes:''' <code><attr-item tags="@\+FMAINV"/></code>

==Testing==
<pre>
$ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji \
gosa gii beare sáhttá čállit artihkkaliid." | apertium -d . sme-smj
Wikipedia la moattegielak prosjækta/prosjäkta #mij ulmmen la dahkat almulasj #diehtobáhkogirjje
#masi guhti beru máhttá tjállet artihkkalijt.
</pre>
</pre>


Line 98: Line 101:


*[[/Regression tests]]
*[[/Regression tests]]
*[[/Pending tests]]


==External links==
==External links==
Line 103: Line 107:
* [http://www.divvun.no/doc/lang/sme/docu-sme-grammartags.html Northern Sami: The grammatical tags]
* [http://www.divvun.no/doc/lang/sme/docu-sme-grammartags.html Northern Sami: The grammatical tags]


[[Category:Language pairs]]
[[Category:Northern Sámi and Lule Sámi]]

Revision as of 14:22, 17 June 2010

This page gives some details about the North Sámi to Lule Sámi translator.

Standardisation

  • Leahppi go gávdnan gusade?
    • Læhppe gu gávnnam gusáda?
    • Lihppe gu gávnnam gusáda?

TODO

Tagset mismatches

eará -- ietjá
$ echo "eará" | osme
191480 0
eará	eará+Pron+Indef+Sg+Nom
eará	eará+Pron+Indef+Sg+Gen
eará	eará+Pron+Indef+Sg+Acc
eará	eará+Pron+Indef+Attr

$ echo "ietjá+Pron+Indef+Attr" | dsmj
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
ietjá+Pron+Indef+Attr	ietjá+Pron+Indef+Attr	+?
buot -- divnna
$ echo "buot" | osme
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
buot	buot+Adv
buot	buot+Pron+Indef

$ echo "divnna" | osmj
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
divnna	divnna+Pron+Indef+Sg+Nom
divnna	divnna+Pron+Indef+Attr
ieš#guhtet -- iesj#guhtik
$ echo "iešguđetge" | osme
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
iešguđetge	ieš#guhtege+Pron+Indef+Pl+Nom
iešguđetge	ieš#guhtet+Pron+Indef+Acc+Foc/ge
iešguđetge	ieš#guhtet+Pron+Indef+Gen+Foc/ge
iešguđetge	ieš#guđet+Pron+Indef+Foc/ge

$ echo "iesj#guhtik+Pron+Indef+Gen+Foc/ge" | dsmj
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
iesj#guhtik+Pron+Indef+Gen+Foc/ge	iesj#guhtik+Pron+Indef+Gen+Foc/ge	+?

$ echo "iesj#guhtik+Pron+Indef+Foc/ge" | dsmj
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
iesj#guhtik+Pron+Indef+Foc/ge	iesj#guhtik#ge
iesj#guhtik+Pron+Indef+Foc/ge	iesj#guhtikge
iesj#guhtik+Pron+Indef+Foc/ge	iesjguhtik#ge
iesj#guhtik+Pron+Indef+Foc/ge	iesjguhtikge
maid -- aj
$ echo aj | osmj
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
aj	aj+Pcle

$ echo maid | osme
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
maid	mii+Pron+Interr+Pl+Acc
maid	mii+Pron+Interr+Pl+Gen
maid	mii+Pron+Interr+Sg+Acc
maid	mii+Pron+Rel+Pl+Acc
maid	mii+Pron+Rel+Pl+Gen
maid	mii+Pron+Rel+Sg+Acc
maid	maid+Adv
maid	maid+Interj

Reminders

  • In the transfer rule files, don't forget to escape the '+' character in tags, for example:
no: <attr-item tags="@+FMAINV"/> ,
yes: <attr-item tags="@\+FMAINV"/>

Testing

$ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji \
  gosa gii beare sáhttá čállit artihkkaliid." | apertium -d . sme-smj
Wikipedia la moattegielak prosjækta/prosjäkta #mij ulmmen la dahkat almulasj #diehtobáhkogirjje  
  #masi guhti beru máhttá tjállet artihkkalijt.

See also

External links