Difference between revisions of "North Saami and Lule Saami"
Jump to navigation
Jump to search
Line 35: | Line 35: | ||
^Wikipedia/Wikipedia<N><Prop><Sg><Nom>/Wikipedia<N><Prop><Sg><Gen>/Wikipedia<N><Prop><Sg><Acc>$ ^lea/leat<V><IV><Ind><Prs><Sg3>$ |
^Wikipedia/Wikipedia<N><Prop><Sg><Nom>/Wikipedia<N><Prop><Sg><Gen>/Wikipedia<N><Prop><Sg><Acc>$ ^lea/leat<V><IV><Ind><Prs><Sg3>$ |
||
^máŋggagielat/*máŋggagielat$ ^prošeakta/prošeakta<N><Sg><Nom>$ ^man/man<ADV>$ ^ulbmilin/ |
^máŋggagielat/*máŋggagielat$ ^prošeakta/prošeakta<N><Sg><Nom>$ ^man/man<ADV>$ ^ulbmilin/ulbmil<N><Ess>$ ^lea/leat<V><IV><Ind><Prs><Sg3>$ |
||
^ráhkadit/ráhkadit<V><TV><Inf>/ráhkadit<V><TV><Ind><Prs><Pl3>/ráhkadit<V><TV><Ind><Prt><Sg2>$ ^almmolaš/almmolaš<A><Attr>/almmolaš<A><Sg><Nom>$ |
^ráhkadit/ráhkadit<V><TV><Inf>/ráhkadit<V><TV><Ind><Prs><Pl3>/ráhkadit<V><TV><Ind><Prt><Sg2>$ ^almmolaš/almmolaš<A><Attr>/almmolaš<A><Sg><Nom>$ |
||
^diehtosátnegirjji/*diehtosátnegirjji$ |
^diehtosátnegirjji/*diehtosátnegirjji$ |
||
^gosa/gosa<ADV>/gossat<V><IV><VGen>/gossat<V><IV><Imprt><Prs><ConNeg>/gossat<V><IV><Imprt><Prs><Sg2>/gossat<V><IV><Ind><Prs><ConNeg> |
^gosa/gosa<ADV>/gossat<V><IV><VGen>/gossat<V><IV><Imprt><Prs><ConNeg>/gossat<V><IV><Imprt><Prs><Sg2>/gossat<V><IV><Ind><Prs><ConNeg>$ |
||
^ |
^gii/gii<Pron><Interr><Sg><Nom>/gii<Pron><Rel><Sg><Nom>$ ^beare/beare<ADV>$ ^sáhttá/sáhttit<V><IV><Ind><Prs><Sg3>$ |
||
^čállit/čállit<V><TV><Inf>/čállit<V><TV><Ind><Prs><Pl1>$ ^artihkkaliid/artihkal<N><Pl><Gen>/artihkal<N><Pl><Acc>$. |
|||
</pre> |
</pre> |
||
Line 48: | Line 49: | ||
gii beare sáhttá čállit artihkkaliid." | lt-proc sme-smj.automorf.bin | cg-proc sme-smj.rlx.bin |
gii beare sáhttá čállit artihkkaliid." | lt-proc sme-smj.automorf.bin | cg-proc sme-smj.rlx.bin |
||
^Wikipedia/Wikipedia<N><Prop><Sg><Nom><@ |
^Wikipedia/Wikipedia<N><Prop><Sg><Nom><@SUBJ→>$ ^lea/leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ ^máŋggagielat/*máŋggagielat$ |
||
^prošeakta/prošeakta<N><Sg><Nom><@ |
^prošeakta/prošeakta<N><Sg><Nom><@←SPRED>$ ^man/man<ADV>$ ^ulbmilin/ulbmil<N><Ess><@SPRED→>$ ^lea/leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ |
||
^ráhkadit/ráhkadit<V><TV><Inf><@-FMAINV>$ ^almmolaš/almmolaš<A><Sg><Nom><@ |
^ráhkadit/ráhkadit<V><TV><Inf><@-FMAINV>$ ^almmolaš/almmolaš<A><Sg><Nom><@←SUBJ>$ ^diehtosátnegirjji/*diehtosátnegirjji$ ^gosa/gosa<ADV>$ |
||
^beare/beare<ADV>$ ^sáhttá/sáhttit<V><IV><Ind><Prs><Sg3><@+FAUXV>$ ^čállit/čállit<V><TV><Inf><@-FMAINV>$ |
^gii/gii<Pron><Rel><Sg><Nom><@SUBJ→>$ ^beare/beare<ADV>$ ^sáhttá/sáhttit<V><IV><Ind><Prs><Sg3><@+FAUXV>$ ^čállit/čállit<V><TV><Inf><@-FMAINV>$ |
||
^artihkkaliid/artihkal<N><Pl><Acc><@←OBJ>$. |
|||
</pre> |
</pre> |
||
Line 60: | Line 62: | ||
gii beare sáhttá čállit artihkkaliid." | lt-proc sme-smj.automorf.bin | cg-proc sme-smj.rlx.bin | apertium-tagger -g sme-smj.prob |
gii beare sáhttá čállit artihkkaliid." | lt-proc sme-smj.automorf.bin | cg-proc sme-smj.rlx.bin | apertium-tagger -g sme-smj.prob |
||
^Wikipedia<N><Prop><Sg><Nom><@ |
^Wikipedia<N><Prop><Sg><Nom><@SUBJ→>$ ^leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ ^*máŋggagielat$ ^prošeakta<N><Sg><Nom><@←SPRED>$ ^man<ADV>$ |
||
^ |
^ulbmil<N><Ess><@SPRED→>$ ^leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ ^ráhkadit<V><TV><Inf><@-FMAINV>$ ^almmolaš<A><Sg><Nom><@←SUBJ>$ ^*diehtosátnegirjji$ |
||
^gosa<ADV>$ ^ |
^gosa<ADV>$ ^gii<Pron><Rel><Sg><Nom><@SUBJ→>$ ^beare<ADV>$ ^sáhttit<V><IV><Ind><Prs><Sg3><@+FAUXV>$ ^čállit<V><TV><Inf><@-FMAINV>$ |
||
^artihkal<N><Pl><Acc><@←OBJ>$. |
|||
</pre> |
</pre> |
||
Line 72: | Line 75: | ||
apertium-transfer apertium-sme-smj.sme-smj.t1x sme-smj.t1x.bin sme-smj.autobil.bin |
apertium-transfer apertium-sme-smj.sme-smj.t1x sme-smj.t1x.bin sme-smj.autobil.bin |
||
^nom<SN><@ |
^nom<SN><@SUBJ→><Sg><Nom>{^Wikipedia<N><Prop><Sg><Nom>$}$ ^verb<SV><@+FMAINV>{^liehket<V><IV><Ind><Prs><Sg3>$}$ ^unknown{^*máŋggagielat$}$ |
||
^nom<SN><Sg><Nom>{^prosjækta<N><Sg><Nom>$}$ ^default{^man<ADV>$}$ ^ |
^nom<SN><Sg><Nom>{^prosjækta<N><Sg><Nom>$}$ ^default{^man<ADV>$}$ ^nom<SN><@SPRED→><Ess>{^ulmme<N><Ess>$}$ |
||
^ |
^verb<SV><@+FMAINV>{^liehket<V><IV><Ind><Prs><Sg3>$}$ ^verb<SV><@-FMAINV>{^dahkat<V><TV><Inf>$}$ ^default{^almulasj<A><Sg><Nom><@←SUBJ>$}$ |
||
^unknown{^*diehtosátnegirjji$}$ ^default{^<ADV>$}$ ^ |
^unknown{^*diehtosátnegirjji$}$ ^default{^<ADV>$}$ ^default{^<Pron><Rel><Sg><Nom><@SUBJ→>$}$ ^default{^@beare<ADV>$}$ |
||
^ |
^verb<SV>{^sáhttet<V><IV><Ind><Prs><Sg3>$}$ ^verb<SV><@-FMAINV>{^tjállet<V><TV><Inf>$}$ ^nom<SN><Pl><Acc>{^artihkkal<N><Pl><Acc>$}$. |
||
</pre> |
</pre> |
||
Revision as of 16:54, 7 October 2008
Contents |
Files
apertium-sme-smj.sme.dix
— Northern Sami transducerapertium-sme-smj.sme-smj.dix
— Transfer lexiconapertium-sme-smj.smj.dix
— Lule Sami transducerapertium-sme-smj.sme-smj.rlx
— Constraint grammarapertium-sme-smj.sme-smj.t1x
— Transfer rule file (level 1 -- Local re-ordering, chunking)apertium-sme-smj.sme-smj.t2x
— Transfer rule file (level 2 -- Phrase and chunk re-ordering)apertium-sme-smj.sme-smj.t3x
— Transfer rule file (level 3 -- Final touches)
TODO
- Mapped tags in the CG use special characters in Apertium, for example '>' (used for delimiting tags) and '-' (causes problems with pretransfer). These should be replaced somehow.
- Example:
^Wikipedia<N><Prop><Sg><Nom><@SUBJ>>$
or^prošeakta<N><Sg><Nom><@<SPRED>$
- This comes from the CG tag @SUBJ>
- Example:
- Re-train the HMM-based POS tagger on a Sami corpus.
- Closed categories in sme analyser
Reminders
- In the transfer rule files, don't forget to escape the '+' character in tags, for example:
- no:
<attr-item tags="@+FMAINV"/>
, - yes:
<attr-item tags="@\+FMAINV"/>
- no:
Testing
- Analysing some Northern Sami text
$ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji gosa \ gii beare sáhttá čállit artihkkaliid." | lt-proc sme-smj.automorf.bin ^Wikipedia/Wikipedia<N><Prop><Sg><Nom>/Wikipedia<N><Prop><Sg><Gen>/Wikipedia<N><Prop><Sg><Acc>$ ^lea/leat<V><IV><Ind><Prs><Sg3>$ ^máŋggagielat/*máŋggagielat$ ^prošeakta/prošeakta<N><Sg><Nom>$ ^man/man<ADV>$ ^ulbmilin/ulbmil<N><Ess>$ ^lea/leat<V><IV><Ind><Prs><Sg3>$ ^ráhkadit/ráhkadit<V><TV><Inf>/ráhkadit<V><TV><Ind><Prs><Pl3>/ráhkadit<V><TV><Ind><Prt><Sg2>$ ^almmolaš/almmolaš<A><Attr>/almmolaš<A><Sg><Nom>$ ^diehtosátnegirjji/*diehtosátnegirjji$ ^gosa/gosa<ADV>/gossat<V><IV><VGen>/gossat<V><IV><Imprt><Prs><ConNeg>/gossat<V><IV><Imprt><Prs><Sg2>/gossat<V><IV><Ind><Prs><ConNeg>$ ^gii/gii<Pron><Interr><Sg><Nom>/gii<Pron><Rel><Sg><Nom>$ ^beare/beare<ADV>$ ^sáhttá/sáhttit<V><IV><Ind><Prs><Sg3>$ ^čállit/čállit<V><TV><Inf>/čállit<V><TV><Ind><Prs><Pl1>$ ^artihkkaliid/artihkal<N><Pl><Gen>/artihkal<N><Pl><Acc>$.
- Disambiguating and annotating text with Constraint grammar
$ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji gosa \ gii beare sáhttá čállit artihkkaliid." | lt-proc sme-smj.automorf.bin | cg-proc sme-smj.rlx.bin ^Wikipedia/Wikipedia<N><Prop><Sg><Nom><@SUBJ→>$ ^lea/leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ ^máŋggagielat/*máŋggagielat$ ^prošeakta/prošeakta<N><Sg><Nom><@←SPRED>$ ^man/man<ADV>$ ^ulbmilin/ulbmil<N><Ess><@SPRED→>$ ^lea/leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ ^ráhkadit/ráhkadit<V><TV><Inf><@-FMAINV>$ ^almmolaš/almmolaš<A><Sg><Nom><@←SUBJ>$ ^diehtosátnegirjji/*diehtosátnegirjji$ ^gosa/gosa<ADV>$ ^gii/gii<Pron><Rel><Sg><Nom><@SUBJ→>$ ^beare/beare<ADV>$ ^sáhttá/sáhttit<V><IV><Ind><Prs><Sg3><@+FAUXV>$ ^čállit/čállit<V><TV><Inf><@-FMAINV>$ ^artihkkaliid/artihkal<N><Pl><Acc><@←OBJ>$.
- Finishing off the disambiguation with Apertium's HMM tagger
$ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji gosa \ gii beare sáhttá čállit artihkkaliid." | lt-proc sme-smj.automorf.bin | cg-proc sme-smj.rlx.bin | apertium-tagger -g sme-smj.prob ^Wikipedia<N><Prop><Sg><Nom><@SUBJ→>$ ^leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ ^*máŋggagielat$ ^prošeakta<N><Sg><Nom><@←SPRED>$ ^man<ADV>$ ^ulbmil<N><Ess><@SPRED→>$ ^leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ ^ráhkadit<V><TV><Inf><@-FMAINV>$ ^almmolaš<A><Sg><Nom><@←SUBJ>$ ^*diehtosátnegirjji$ ^gosa<ADV>$ ^gii<Pron><Rel><Sg><Nom><@SUBJ→>$ ^beare<ADV>$ ^sáhttit<V><IV><Ind><Prs><Sg3><@+FAUXV>$ ^čállit<V><TV><Inf><@-FMAINV>$ ^artihkal<N><Pl><Acc><@←OBJ>$.
- Applying lexical transfer and chunking
$ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji gosa \ gii beare sáhttá čállit artihkkaliid." | lt-proc sme-smj.automorf.bin | cg-proc sme-smj.rlx.bin | apertium-tagger -g sme-smj.prob | \ apertium-transfer apertium-sme-smj.sme-smj.t1x sme-smj.t1x.bin sme-smj.autobil.bin ^nom<SN><@SUBJ→><Sg><Nom>{^Wikipedia<N><Prop><Sg><Nom>$}$ ^verb<SV><@+FMAINV>{^liehket<V><IV><Ind><Prs><Sg3>$}$ ^unknown{^*máŋggagielat$}$ ^nom<SN><Sg><Nom>{^prosjækta<N><Sg><Nom>$}$ ^default{^man<ADV>$}$ ^nom<SN><@SPRED→><Ess>{^ulmme<N><Ess>$}$ ^verb<SV><@+FMAINV>{^liehket<V><IV><Ind><Prs><Sg3>$}$ ^verb<SV><@-FMAINV>{^dahkat<V><TV><Inf>$}$ ^default{^almulasj<A><Sg><Nom><@←SUBJ>$}$ ^unknown{^*diehtosátnegirjji$}$ ^default{^<ADV>$}$ ^default{^<Pron><Rel><Sg><Nom><@SUBJ→>$}$ ^default{^@beare<ADV>$}$ ^verb<SV>{^sáhttet<V><IV><Ind><Prs><Sg3>$}$ ^verb<SV><@-FMAINV>{^tjállet<V><TV><Inf>$}$ ^nom<SN><Pl><Acc>{^artihkkal<N><Pl><Acc>$}$.