Difference between revisions of "North Saami and Lule Saami"
Jump to navigation
Jump to search
(43 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
{{TOCD}} |
{{TOCD}} |
||
+ | This page gives some details about the North Sámi to Lule Sámi translator. |
||
− | ==Files== |
||
− | *<code>apertium-sme-smj.sme.dix</code> — Northern Sami transducer |
||
− | *<code>apertium-sme-smj.sme-smj.dix</code> — Transfer lexicon |
||
− | *<code>apertium-sme-smj.smj.dix</code> — Lule Sami transducer |
||
− | *<code>apertium-sme-smj.sme-smj.rlx</code> — Constraint grammar |
||
− | *<code>apertium-sme-smj.sme-smj.t1x</code> — Transfer rule file (level 1 -- Local re-ordering, chunking) |
||
− | *<code>apertium-sme-smj.sme-smj.t2x</code> — Transfer rule file (level 2 -- Phrase and chunk re-ordering) |
||
− | *<code>apertium-sme-smj.sme-smj.t3x</code> — Transfer rule file (level 3 -- Final touches) |
||
+ | ==Linguistic issues== |
||
− | ==TODO== |
||
+ | * [[North Saami - Lule Saami quasicode for the transfer files]] |
||
− | * Mapped tags in the CG use special characters in Apertium, for example '>' (used for delimiting tags) and '-' (causes problems with pretransfer). These should be replaced somehow. |
||
+ | * [[North Saami - Lule Saami testing notes]] |
||
− | ::Example: |
||
− | :::<code>^Wikipedia<N><Prop><Sg><Nom><@SUBJ>>$</code> or <code>^prošeakta<N><Sg><Nom><@<SPRED>$</code> |
||
− | :::This comes from the CG tag @SUBJ> |
||
− | * Re-train the HMM-based POS tagger on a Sami corpus. |
||
+ | ==Technical issues== |
||
− | ==Testing== |
||
+ | * [[North Saami - Lule Saami tagset mismatches]] |
||
− | ;Analysing some Northern Sami text: |
||
+ | ==Evaluation== |
||
− | <pre> |
||
− | $ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji gosa \ |
||
− | gii beare sáhttá čállit artihkkaliid." | lt-proc sme-smj.automorf.bin |
||
+ | *[[/Regression tests]] |
||
− | ^Wikipedia/Wikipedia<N><Prop><Sg><Nom>/Wikipedia<N><Prop><Sg><Gen>/Wikipedia<N><Prop><Sg><Acc>$ ^lea/leat<V><IV><Ind><Prs><Sg3>$ |
||
+ | *[[/Pending tests]] |
||
− | ^máŋggagielat/*máŋggagielat$ ^prošeakta/prošeakta<N><Sg><Nom>$ ^man/man<ADV>$ ^ulbmilin/*ulbmilin$ ^lea/leat<V><IV><Ind><Prs><Sg3>$ |
||
− | ^ráhkadit/ráhkadit<V><TV><Inf>/ráhkadit<V><TV><Ind><Prs><Pl3>/ráhkadit<V><TV><Ind><Prt><Sg2>$ ^almmolaš/almmolaš<A><Attr>/almmolaš<A><Sg><Nom>$ |
||
− | ^diehtosátnegirjji/*diehtosátnegirjji$ |
||
− | ^gosa/gosa<ADV>/gossat<V><IV><VGen>/gossat<V><IV><Imprt><Prs><ConNeg>/gossat<V><IV><Imprt><Prs><Sg2>/gossat<V><IV><Ind><Prs><ConNeg>$ ^gii/*gii$ |
||
− | ^beare/beare<ADV>$ ^sáhttá/sáhttit<V><IV><Ind><Prs><Sg3>$ ^čállit/čállit<V><TV><Inf>/čállit<V><TV><Ind><Prs><Pl1>$ ^artihkkaliid/*artihkkaliid$. |
||
− | </pre> |
||
+ | ==See also== |
||
− | ;Disambiguating and annotating text with Constraint grammar: |
||
+ | * The project also has [http://giellatekno.uit.no/doc/mt/smesmj/smesmj.html a home page at Giellatekno] |
||
− | <pre> |
||
+ | * [[Integration and tagset conversion with Giellatekno]] |
||
− | $ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji gosa \ |
||
− | gii beare sáhttá čállit artihkkaliid." | lt-proc sme-smj.automorf.bin | cg-proc sme-smj.rlx.bin |
||
+ | *[[North Saami Lule Saami reminder issues|Reminders]] |
||
− | ^Wikipedia/Wikipedia<N><Prop><Sg><Nom><@SUBJ%>$ ^lea/leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ ^máŋggagielat/*máŋggagielat$ |
||
− | ^prošeakta/prošeakta<N><Sg><Nom><@<SPRED>$ ^man/man<ADV>$ ^ulbmilin/*ulbmilin$ ^lea/leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ |
||
− | ^ráhkadit/ráhkadit<V><TV><Inf><@-FMAINV>$ ^almmolaš/almmolaš<A><Sg><Nom><@%SUBJ>$ ^diehtosátnegirjji/*diehtosátnegirjji$ ^gosa/gosa<ADV>$ ^gii/*gii$ |
||
− | ^beare/beare<ADV>$ ^sáhttá/sáhttit<V><IV><Ind><Prs><Sg3><@+FAUXV>$ ^čállit/čállit<V><TV><Inf><@-FMAINV>$ ^artihkkaliid/*artihkkaliid$ |
||
− | </pre> |
||
+ | ==External links== |
||
− | ;Finishing off the disambiguation with Apertium's HMM tagger: |
||
+ | * [http://www.divvun.no/doc/lang/smj/docu-smj-grammartags.html Lule Sámi: The grammatical tags] |
||
− | <pre> |
||
− | $ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji gosa \ |
||
− | gii beare sáhttá čállit artihkkaliid." | lt-proc sme-smj.automorf.bin | cg-proc sme-smj.rlx.bin | apertium-tagger -g sme-smj.prob |
||
+ | [[Category:North Saami and Lule Saami|*]] |
||
− | ^Wikipedia<N><Prop><Sg><Nom><@SUBJ%>$ ^leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ ^*máŋggagielat$ ^prošeakta<N><Sg><Nom><@<SPRED>$ ^man<ADV>$ |
||
⚫ | |||
− | ^*ulbmilin$ ^leat<V><IV><Ind><Prs><Sg3><@+FMAINV>$ ^ráhkadit<V><TV><Inf><@-FMAINV>$ ^almmolaš<A><Sg><Nom><@%SUBJ>$ ^*diehtosátnegirjji$ |
||
+ | [[Category:Lule Saami]] |
||
− | ^gosa<ADV>$ ^*gii$ ^beare<ADV>$ ^sáhttit<V><IV><Ind><Prs><Sg3><@+FAUXV>$ ^čállit<V><TV><Inf><@-FMAINV>$ ^*artihkkaliid$ |
||
− | </pre> |
||
− | |||
− | ;Applying lexical transfer and chunking: |
||
− | |||
− | <pre> |
||
− | $ echo "Wikipedia lea máŋggagielat prošeakta man ulbmilin lea ráhkadit almmolaš diehtosátnegirjji gosa \ |
||
− | gii beare sáhttá čállit artihkkaliid." | lt-proc sme-smj.automorf.bin | cg-proc sme-smj.rlx.bin | apertium-tagger -g sme-smj.prob | \ |
||
− | apertium-transfer apertium-sme-smj.sme-smj.t1x sme-smj.t1x.bin sme-smj.autobil.bin |
||
− | |||
− | ^nom<SN><@SUBJ%><Sg><Nom>{^Wikipedia<N><Prop><Sg><Nom>$}$ ^default{^liehket<V><IV><Ind><Prs><Sg3><@+FMAINV>$}$ ^unknown{^*máŋggagielat$}$ |
||
− | ^nom<SN><Sg><Nom>{^prosjækta<N><Sg><Nom>$}$ ^default{^man<ADV>$}$ ^default{^@ulbmil<N><Ess><@SPRED>>$}$ |
||
− | ^default{^liehket<V><IV><Ind><Prs><Sg3><@+FMAINV>$}$ ^default{^dahkat<V><TV><Inf><@-FMAINV>$}$ ^default{^almulasj<A><Sg><Nom><@%SUBJ>$}$ |
||
− | ^unknown{^*diehtosátnegirjji$}$ ^default{^<ADV>$}$ ^unknown{^*gii$}$ ^default{^@beare<ADV>$}$ ^default{^sáhttet<V><IV><Ind><Prs><Sg3><@+FAUXV>$}$ |
||
− | ^default{^tjállet<V><TV><Inf><@-FMAINV>$}$ ^unknown{^*artihkkaliid$}$. |
||
− | |||
− | </pre> |
||
− | |||
⚫ |
Latest revision as of 13:22, 5 January 2016
This page gives some details about the North Sámi to Lule Sámi translator.
Linguistic issues[edit]
Technical issues[edit]
Evaluation[edit]
See also[edit]
- The project also has a home page at Giellatekno
- Integration and tagset conversion with Giellatekno