Difference between revisions of "North Saami and Finnish"
(→Setup) |
|||
(62 intermediate revisions by 8 users not shown) | |||
Line 1: | Line 1: | ||
{{TOCD}} |
{{TOCD}} |
||
This page is for discussing the Northern Sámi and Finnish translator (<code>apertium-fin-sme</code>). Some pending things to think about: |
This page is for discussing the Northern Sámi and Finnish translator (<code>apertium-fin-sme</code>). Some pending things to think about: |
||
==Setup== |
|||
* How are compounds dealt with in [[Omorfi]] and in the GTSVN analysers ? Do they always split in the same places ? If not, we probably have to add those that don't as lexicalised entries in the transducers. |
|||
* [[Compiling the North Saami - Inari Saami translator|How to set up the sme and fin analysers]] (example from sme-smn) |
|||
* Adding subcategories (Dem, Itg, etc.) to pronouns in Omorfi |
|||
==General todo list== |
|||
* Fred Karlsson's constraint grammar for Finnish has been GPL'd, and is available and undergoing conversion to CG3 here: https://victorio.uit.no/langtech/trunk/kt/fin/src |
|||
:''For old list items see [[Northern Sámi and Finnish/Completed tasks|completed tasks]]'' |
|||
** This should be converted in an Apertium-compatible manner from the start! No using reserved symbols (e.g. <code><</code>, <code>></code> and <code>/</code>) |
|||
* How are compounds dealt with in [[Omorfi]] and in the GTSVN analysers ? Do they always split in the same places ? If not, we probably have to add those that don't as lexicalised entries in the transducers. |
|||
* How can we restrict generation of alternative forms in the Sámi generator ? In lttoolbox this is done with LR (only analyse)/RL (only generate) markings. |
|||
** Compounds in sme and fin are similar, and we should strive at translating dynamic compounds. |
|||
* Can we get access to the [http://kaino.kotus.fi/algu/index.php?t=etusivu Álgu database] ? |
* Can we get access to the [http://kaino.kotus.fi/algu/index.php?t=etusivu Álgu database] ? |
||
** We can haz. Data is now checked in on Victorio at /langtech/trunk/words/dicts/algu, with a rough script for turning the XML format into CSV in /langtech/trunk/words/dicts/scripts/algu_xml_to_csv.py. The database is relational, so it's probably just better to import it to an actual database, and work on it in that format rather than by CSV, to select the necessary words. There are a LOT of words, and pairs in Finnish and sme, so this is an amazing resource, and with fairly little work could result in a huge improvement in word coverage. |
|||
* Another possible source for paired words and sentences: [http://open-tran.eu/ Open-tran]. Contains translation strings for linux software with GUIs, allows searching in any language pair, and contains Finnish and Northern Sámi. Ryan can contact them if it seems like their data would be of use. |
|||
* <code>hfst-lookup</code> or something similar to _generate_ analyses that come in with ^ and $ |
|||
* Lex choice build xsl script: Add colon number on the Finnish side. |
|||
* Find frequent multiwords, perhaps take advantage of [https://sourceforge.net/projects/mwetoolkit/ mwetoolkit]. Are there any multiword resources for Finnish existing ? |
|||
* Can we rig up SVN to pull in the twol file from GT svn directly ? |
|||
==Comparisons of Northern Sámi and Finnish== |
==Comparisons of Northern Sámi and Finnish== |
||
=== Passive: Derivation / Inflection === |
|||
===Noun phrases=== |
|||
In Finnish (and according to omorfi), passive is mostly inflectional and can exist in both finite and infinite forms (as may be seen in the section on infinitives). Finnish passives are somewhat impersonal but encode an animate agent of some sort (ex.: laboratoriossa usein räjähde-tään 'In the laboratory (people) often explode(+Pass).', vs. laboratoriossa usein räjähtä-ä 'In the laboratory (things) often explode(+Act, +3pSg)'). This fact seems to be reflected more or less in the morphological analysis from omorfi, which includes a "Pe4" tag, or 4th person tag. In Finnish, finite passives have only one form. |
|||
Northern Sámi by comparison has a derivational passive, which allows for more inflectional possibilities such as marking of more persons (than just the Finnish "fourth" person, for example). There are also two morphological forms to the derivational passive, a long (-juvvot) and a short form (-ot). Most verbs have one or the other, some can have both. Based on Nickel, it seems like some forms may be more or less lexicalized, and have separate meanings than just verb+passive. |
|||
These examples aren't necessarily semantically connected, but meant to just show how the tagging works. |
|||
Fin: |
|||
# päästetään = V Pass Ind Prs Pe4 |
|||
Sme: |
|||
# bessojuvvot = V IV Der2 Der/PassL V Inf |
|||
# firrot = V TV Der/PassS V Inf |
|||
How do we deal with this? |
|||
=== Infinite verb forms === |
|||
==== Finnish participle ==== |
|||
# sanova (PrsPrc) = Prs Prc |
|||
# sanonut = Prf Prc |
|||
# sanoma = |
|||
# sanottu = |
|||
## sanottuaan = |
|||
==== Finnish 1st infinitive ==== |
|||
# sanoa (Inf1 Sg Lat) = Inf |
|||
# sanoakseen (Inf1 Sg Tra Sg3/Pl3) = vai X |
|||
==== Finnish 2nd infinitive ==== |
|||
# sanoen (Inf2 Sg Ins) = Ger |
|||
# sanoessa (Inf2 Sg Ine) = Ger |
|||
# sanottaen (V Pass Inf2 Ins) = Pass Ger |
|||
# sanottaessa (V Pass Inf2 Ine) = Pass Ger |
|||
==== Finnish 3rd infinitive ==== |
|||
# Hän meni sanomaan sen. (Inf3 Ill) = Inf (''dadjat'') |
|||
# Hän oli sanomassa sen. (Inf3 Ine) = Actio Essive (''dadjame'') |
|||
# Hän tuli sanomasta sen. (Inf3 Ela) = Actio Locative (''dadjamis'') |
|||
# Hän pääsi sisään sanomalla salasanan. (Inf3 Ade) = Der_n N Sg Com (''dadjamiin'') |
|||
# Hän meni sanomatta mitään. (Inf3 Abe) = V TV VAbess (''dajakeahttá'') |
|||
==== Finnish 4th infinitive ==== |
|||
; Fredin kieliopin mukaan tämä on 4 infinitiivi, mutta näyttää siltä, että omorfissa se on derivaatio |
|||
# sanominen <V><Der_minen> = dadjan |
|||
==== Finnish 5th infinitive ==== |
|||
; kieliopin mukaan on olemassa viides infinitiivi, mutta sekin on derivointi(?) omorfissa |
|||
# sanomaisillaan (V Der/maisilla Sg3/Pl3) |
|||
=== Noun phrases === |
|||
Both Northern Sámi and Finnish order noun suffixes in this way: |
Both Northern Sámi and Finnish order noun suffixes in this way: |
||
Line 31: | Line 97: | ||
Det Num Adj+ Noun |
Det Num Adj+ Noun |
||
Where ''Det'' can be either a demonstrative |
Where ''Det'' can be either a demonstrative pronoun or a pronoun denoting possession (i.e., a personal pronoun in the genitive). |
||
====Cases==== |
====Cases==== |
||
Line 54: | Line 120: | ||
Of course, the last set ending in AccGen will have to be distinguished with certain numbers and pronouns. |
Of course, the last set ending in AccGen will have to be distinguished with certain numbers and pronouns. |
||
=====Case agreement===== |
|||
Most adjectives just have a predicative and attributive form, but some do agree in number with the subject. |
|||
Váralaš > váralaččat |
|||
* {{test|fin|Muovipussit ovat vaarallisia.|Plásttetseahkat leat váralaččat.}} |
|||
earálágán > earálágánat |
|||
* {{test|fin|Ihmiset ovat erilaisia.|Olbmot leat earálágánat.}} |
|||
Pronouns and demonstratives within DPs also agree with their head nouns, although there is some amount of syncreticism when they are attributes. In the plural however, the illative and locative forms are not syncretic and agree with a plural head noun. |
|||
{|class=wikitable |
|||
! Case !! Independent !! As Attribute !! Head Noun !! Attr Pl !! Head Noun Pl |
|||
|- |
|||
| Nom || mii || mii (Nom) || beana (Nom) || mat (Nom Pl) || beatnagat (Nom Pl) |
|||
|- |
|||
| Gen || man || man (Gen/Acc) || beatnaga (Gen/Acc) || maid (Gen/Acc Pl) || beatnagiid (Gen/Acc Pl) |
|||
|- |
|||
| Acc || man || man (Gen/Acc) || beatnaga (Gen/Acc) || maid (Gen/Acc Pl) || beatnagiid (Gen/Acc Pl) |
|||
|- |
|||
| Ill || masa || man (Gen/Acc) || beatnagii (Ill) || maidda (Ill Pl) || beatnagiidda (Ill Pl) |
|||
|- |
|||
| Loc || mas || man (Gen/Acc) || beatnagis (Loc) || main (Loc Pl) || beatnagiin (Loc Pl) |
|||
|- |
|||
| Com || mainna || mainna (Com) || beatnagiin (Com) || maiguin (Com Pl) || beatnagiiguin (Com Pl) |
|||
|- |
|||
| Ess || manin || manin (Ess) || beanan (Ess) || manin (Ess) || beanan (Ess) |
|||
|- |
|||
|} |
|||
This pattern holds for other demonstratives and numbers, except numbers do not have the same syncreticisms for Gen/Acc, in that the numbers may show separate marking for genitive and accusative, although the head noun shows syncretic Gen/Acc forms. |
|||
{|class=wikitable |
|||
! Case !! Independent !! As Attribute !! Head Noun !! Attr Pl !! Head Noun Pl |
|||
|- |
|||
| Nom || okta || okta (Nom) || gáma (Nom) || ovttat (Nom Pl) || gápmagat (Nom Pl) |
|||
|- |
|||
| Gen || ovtta || ovtta (Gen) || gápmaga (Gen/Acc) || ovttaid (Gen/Acc) || gápmagiid (Gen/Acc) |
|||
|- |
|||
| Acc || ovtta || ovtta (Acc) || gápmaga (Gen/Acc) || ovttaid (Gen/Acc) || gápmagiid (Gen/Acc) |
|||
|- |
|||
| Ill || oktii || ovtta (Gen/Acc) || gápmagii (Ill) || ovttaide (Ill Pl) || gápmagiidda (Ill Pl) |
|||
|- |
|||
| Loc || ovttas || ovtta (Gen/Acc) || gápmagis (Loc) || ovttain (Loc Pl) || gápmagiin (Loc Pl) |
|||
|- |
|||
| Com || ovttain || ovttain (Com) || gápmagiin (Com) || ovttaiguin (Com Pl) || gápmagiiguin (Com Pl) |
|||
|- |
|||
| Ess || oktan || oktan (Ess) || gáman (Ess) || oktan (Ess) || gáman (Ess) |
|||
|- |
|||
|} |
|||
{|class=wikitable |
|||
! Case !! Independent !! As Attribute !! Head Noun !! Attr Pl !! Head Noun Pl |
|||
|- |
|||
| Nom || guokte || guokte (Nom) || gápmaga (Gen/Acc) || guovttit (Nom Pl) || gápmagat (Nom Pl) |
|||
|- |
|||
| Gen || guovtti || guovtti (Gen) || gápmaga (Gen/Acc) || guvttiid (Gen/Acc) || gápmagiid (Gen/Acc) |
|||
|- |
|||
| Acc || guokte || guokte (Acc) || gápmaga (Gen/Acc) || guvttiid (Gen/Acc) || gápmagiid (Gen/Acc) |
|||
|- |
|||
| Ill || guoktái || guovtti (Gen/Acc) || gápmagii (Ill) || guvttiide (Ill Pl) || gápmagiidda (Ill Pl) |
|||
|- |
|||
| Loc || guovttis || guovtti (Gen/Acc) || gápmagis (Loc) || guvttiin (Loc Pl) || gápmagiin (Loc Pl) |
|||
|- |
|||
| Com || guvttiin || guvttiin (Com) || gápmagiin (Com) || guvttiiguin (Com Pl) || gápmagiiguin (Com Pl) |
|||
|- |
|||
| Ess || guoktin || guoktin (Ess) || gáman (Ess) || guoktin (Ess) || gáman (Ess) |
|||
|- |
|||
|} |
|||
This pattern is not exactly the same as with the number one, which has a syncreticism with Gen/Acc. |
|||
Some other attributes such as goappašat and guktot 'both (pl)' have separate patterns: syncretic gen/acc/ill (sometimes ill agreement is okay), agreement with locative, but optional agreement with comitative. |
|||
{|class=wikitable |
|||
! Case !! Goappašat case !! Noun Case |
|||
|- |
|||
| Nom Pl || AGR || AGR |
|||
|- |
|||
| Gen Pl || AGR || AGR |
|||
|- |
|||
| Acc Pl || AGR || AGR |
|||
|- |
|||
| Ill Pl || Gen/Acc Pl OR Ill Pl || Ill Pl |
|||
|- |
|||
| Loc Pl || AGR || AGR |
|||
|- |
|||
| Com Pl || Gen/Acc Pl OR Com Pl || Com Pl |
|||
|- |
|||
|} |
|||
{|class=wikitable |
|||
! Case !! Guktot case !! Noun Case |
|||
|- |
|||
| Nom Pl || AGR || AGR |
|||
|- |
|||
| Gen Pl || AGR || AGR |
|||
|- |
|||
| Acc Pl || AGR || AGR |
|||
|- |
|||
| Ill Pl || Gen/Acc Pl || Ill Pl |
|||
|- |
|||
| Loc Pl || AGR || AGR |
|||
|- |
|||
| Com Pl || Gen/Acc Pl || Com Pl |
|||
|- |
|||
|} |
|||
Determiner agreement: |
|||
* {{test|fin|se talo|dat viessu}} |
|||
* {{test|fin|sen talon|dan viesu}} |
|||
* {{test|fin|sitä taloa|dan viesu}} |
|||
* {{test|fin|sitä taloa|dat viesu}} (predicative) |
|||
* {{test|fin|siihen taloon|dan viessui}} |
|||
* {{test|fin|sille talolle|dan viessui}} |
|||
* {{test|fin|siinä talossa|dan viesus}} |
|||
* {{test|fin|sillä talolla|dan viesus}} |
|||
* {{test|fin|siinä talosta|dan viesus}} |
|||
* {{test|fin|siltä talolta|dan viesus}} |
|||
* {{test|fin|siksi taloksi|danin viessun}} |
|||
* {{test|fin|sinä talona|danin viessun}} |
|||
* {{test|fin|ne talo|dat viesut}} |
|||
* {{test|fin|niiden talojen|daid viesuid}} |
|||
* {{test|fin|niitä taloja|daid viesuid}} |
|||
* {{test|fin|niitä taloja|dat viesut}} (predicative) |
|||
* {{test|fin|niihin taloihin|dan viessuide}} |
|||
* {{test|fin|niille taloille|daid viesuide}} |
|||
* {{test|fin|niistä taloissa|daid viesuin}} |
|||
* {{test|fin|niillä taloilla|daid viesuin}} |
|||
* {{test|fin|niistä taloista|daid viesuin}} |
|||
* {{test|fin|niiltä taloilta|daid viesuin}} |
|||
* {{test|fin|niiksi taloiksi|danin viessun}} |
|||
* {{test|fin|niinä taloina|danin viessun}} |
|||
We need a list of all words that are like this-- seems like there are several others that I haven't found from Svonni, such as *seamma*. |
|||
===Adjectives=== |
===Adjectives=== |
||
Line 60: | Line 265: | ||
In Finnish, adjectives always agree in number and case with the head noun, and agree in number when they occur in predicates (although there is some variation as to whether or not the predicative adjective is partitive plural or nominative plural). |
In Finnish, adjectives always agree in number and case with the head noun, and agree in number when they occur in predicates (although there is some variation as to whether or not the predicative adjective is partitive plural or nominative plural). |
||
===Numerals=== |
|||
* Numerals in Finnish can have possessive suffix: <code>yksinänsä/yksi<Num><Card><Pl><Ess><PxPl3></code>, in Northern Sámi they cannot. |
|||
* Ordinals are tagged as numerals in Finnish and as adjectives in Northern Sámi |
|||
==Derivation== |
==Derivation== |
||
Line 65: | Line 275: | ||
{|class=wikitable |
{|class=wikitable |
||
! Tag !! Type !! Example !! in North Sámi !! Gloss |
! Tag !! Type !! Example !! Analysis !! in North Sámi !! Gloss |
||
|- |
|- |
||
| <code>Der/inen</code> || N→Adj || "muovinen" <code>muovi+N+Der/inen+Pos+Sg+Nom</code> || plastihkas ráhkaduvvon || plastihkka+{{sc|n.loc}} build+{{sc|v.pass.pp}} |
| <code>Der/inen</code> || N→Adj || "muovinen" || <code>muovi+N+Der/inen+Pos+Sg+Nom</code> || plastihkas ráhkaduvvon || plastihkka+{{sc|n.loc}} build+{{sc|v.pass.pp}} |
||
|- |
|- |
||
| <code>Der/ja</code> || || || || |
| <code>Der/ja</code> || V→N || "kirjoja" || <code>kirjoa+V+Der/ja+Sg+Nom</code> || || kirjoa-ja = write-er (writer) ? |
||
|- |
|- |
||
| <code>Der/lainen</code> || N→Adj || "saamelainen" <code>saame+N+Der/lainen+Pos+Sg+Nom</code> || sápmelaš || |
| <code>Der/lainen</code> || N→Adj || "saamelainen" || <code>saame+N+Der/lainen+Pos+Sg+Nom</code> || sápmelaš || <code>-laš</code> |
||
|- |
|- |
||
| <code>Der/llinen</code> || N→Adj || "kirjallinen" <code>kirja+N+Der/llinen+Sg+Nom</code> || || kirja-llinen = book-ish (literary)? |
| <code>Der/llinen</code> || N→Adj || "kirjallinen" || <code>kirja+N+Der/llinen+Sg+Nom</code> || || kirja-llinen = book-ish (literary)? |
||
|- |
|- |
||
| <code>Der/minen</code> || || || || |
| <code>Der/minen</code> || || || || || marks deverbal nouns ? |
||
|- |
|- |
||
| <code>Der/oi</code> || || || || |
| <code>Der/oi</code> || || || || || |
||
|- |
|- |
||
| <code>Der/sti</code> || || || || |
| <code>Der/sti</code> || Adj→Adv || || || || derives an adverb from an adjective ? -ly |
||
|- |
|- |
||
| <code>Der/tar</code> || || || || |
| <code>Der/tar</code> || || || || || |
||
|- |
|- |
||
| <code>Der/ton</code> ||N→Adj || "rahaton" <code>raha+N+Der/ton+Sg+Nom</code> || ruđaheapme || ruht + <code>-heapme</code> |
| <code>Der/ton</code> ||N→Adj || "rahaton" || <code>raha+N+Der/ton+Sg+Nom</code> || ruđaheapme || ruht + <code>-heapme</code> |
||
|- |
|- |
||
| <code>Der/tse</code> || || || || |
| <code>Der/tse</code> || || || || || |
||
|- |
|- |
||
| <code>Der/ttain</code> || || || || |
| <code>Der/ttain</code> || || || || || |
||
|- |
|- |
||
| <code>Der/u</code> || || || || |
| <code>Der/u</code> || || || || || |
||
|- |
|- |
||
| <code>Der/vs</code> || || || || |
| <code>Der/vs</code> || || || || || |
||
|- |
|- |
||
|} |
|} |
||
There are some cases where both a derived and a lexicalised entry might be in one analyser, but only one or the other in the other analyser. For example: |
|||
<pre> |
|||
saamelainen [LEMMA='saamelainen'][POS=ADJECTIVE][KTN=38][CMP=POS][NUM=SG][CASE=NOM] |
|||
saamelainen [LEMMA='saame'][POS=NOUN][KTN=8][GUESS=DERIVE][DRV=LAINEN][CMP=POS][NUM=SG][CASE=NOM] |
|||
saamelainen [LEMMA='saame'][POS=NOUN][KTN=8][NUM=SG][CASE=NOM][BOUNDARY=COMPOUND][GUESS=COMPOUND] |
|||
[LEMMA='lainen'][POS=NOUN][KTN=38][NUM=SG][CASE=NOM] |
|||
saamelainen saamelainen+A+Pos+Sg+Nom |
|||
saamelainen saame+N+Der/lainen+Pos+Sg+Nom |
|||
saamelainen saame+N+Sg+Nom#lainen+N+Sg+Nom |
|||
</pre> |
|||
versus: |
|||
<pre> |
|||
sápmelaš sápmelaš+A+Sg+Nom |
|||
sápmelaš sápmelaš+A+Attr |
|||
</pre> |
|||
How to deal with this will be one of the main challenges. E.g. do we add more entries, or do we remove entries ? Is there a way to do either of those automatically ? |
|||
The reason why the sme analysis gives only the lexicalised analysis is that there is a postprocessor choosing the lexicalised one, the perl file ''lookup2cg''. Run through the same file the fin output is compatible: |
|||
<pre> |
|||
$echo saamelainen|ufin|lookup2cg |
|||
"<saamelainen>" |
|||
"saamelainen" A Pos Sg Nom |
|||
</pre> |
|||
==Files== |
==Files== |
||
Line 127: | Line 367: | ||
* [[/Pending tests]] |
* [[/Pending tests]] |
||
* [[/Regression tests]] |
|||
[[Category:North Saami and Finnish|*]] |
|||
[[Category:North Saami]] |
|||
[[Category: |
[[Category:Finnish]] |
Latest revision as of 08:19, 12 April 2017
This page is for discussing the Northern Sámi and Finnish translator (apertium-fin-sme
). Some pending things to think about:
Setup[edit]
- How to set up the sme and fin analysers (example from sme-smn)
General todo list[edit]
- For old list items see completed tasks
- How are compounds dealt with in Omorfi and in the GTSVN analysers ? Do they always split in the same places ? If not, we probably have to add those that don't as lexicalised entries in the transducers.
- Compounds in sme and fin are similar, and we should strive at translating dynamic compounds.
- Can we get access to the Álgu database ?
- We can haz. Data is now checked in on Victorio at /langtech/trunk/words/dicts/algu, with a rough script for turning the XML format into CSV in /langtech/trunk/words/dicts/scripts/algu_xml_to_csv.py. The database is relational, so it's probably just better to import it to an actual database, and work on it in that format rather than by CSV, to select the necessary words. There are a LOT of words, and pairs in Finnish and sme, so this is an amazing resource, and with fairly little work could result in a huge improvement in word coverage.
- Another possible source for paired words and sentences: Open-tran. Contains translation strings for linux software with GUIs, allows searching in any language pair, and contains Finnish and Northern Sámi. Ryan can contact them if it seems like their data would be of use.
- Lex choice build xsl script: Add colon number on the Finnish side.
- Find frequent multiwords, perhaps take advantage of mwetoolkit. Are there any multiword resources for Finnish existing ?
Comparisons of Northern Sámi and Finnish[edit]
Passive: Derivation / Inflection[edit]
In Finnish (and according to omorfi), passive is mostly inflectional and can exist in both finite and infinite forms (as may be seen in the section on infinitives). Finnish passives are somewhat impersonal but encode an animate agent of some sort (ex.: laboratoriossa usein räjähde-tään 'In the laboratory (people) often explode(+Pass).', vs. laboratoriossa usein räjähtä-ä 'In the laboratory (things) often explode(+Act, +3pSg)'). This fact seems to be reflected more or less in the morphological analysis from omorfi, which includes a "Pe4" tag, or 4th person tag. In Finnish, finite passives have only one form.
Northern Sámi by comparison has a derivational passive, which allows for more inflectional possibilities such as marking of more persons (than just the Finnish "fourth" person, for example). There are also two morphological forms to the derivational passive, a long (-juvvot) and a short form (-ot). Most verbs have one or the other, some can have both. Based on Nickel, it seems like some forms may be more or less lexicalized, and have separate meanings than just verb+passive.
These examples aren't necessarily semantically connected, but meant to just show how the tagging works.
Fin:
- päästetään = V Pass Ind Prs Pe4
Sme:
- bessojuvvot = V IV Der2 Der/PassL V Inf
- firrot = V TV Der/PassS V Inf
How do we deal with this?
Infinite verb forms[edit]
Finnish participle[edit]
- sanova (PrsPrc) = Prs Prc
- sanonut = Prf Prc
- sanoma =
- sanottu =
- sanottuaan =
Finnish 1st infinitive[edit]
- sanoa (Inf1 Sg Lat) = Inf
- sanoakseen (Inf1 Sg Tra Sg3/Pl3) = vai X
Finnish 2nd infinitive[edit]
- sanoen (Inf2 Sg Ins) = Ger
- sanoessa (Inf2 Sg Ine) = Ger
- sanottaen (V Pass Inf2 Ins) = Pass Ger
- sanottaessa (V Pass Inf2 Ine) = Pass Ger
Finnish 3rd infinitive[edit]
- Hän meni sanomaan sen. (Inf3 Ill) = Inf (dadjat)
- Hän oli sanomassa sen. (Inf3 Ine) = Actio Essive (dadjame)
- Hän tuli sanomasta sen. (Inf3 Ela) = Actio Locative (dadjamis)
- Hän pääsi sisään sanomalla salasanan. (Inf3 Ade) = Der_n N Sg Com (dadjamiin)
- Hän meni sanomatta mitään. (Inf3 Abe) = V TV VAbess (dajakeahttá)
Finnish 4th infinitive[edit]
- Fredin kieliopin mukaan tämä on 4 infinitiivi, mutta näyttää siltä, että omorfissa se on derivaatio
- sanominen <V><Der_minen> = dadjan
Finnish 5th infinitive[edit]
- kieliopin mukaan on olemassa viides infinitiivi, mutta sekin on derivointi(?) omorfissa
- sanomaisillaan (V Der/maisilla Sg3/Pl3)
Noun phrases[edit]
Both Northern Sámi and Finnish order noun suffixes in this way:
NOUN-Pl-Case-Possessive-CliticParticles
Possessives markers are much less common in Northern Sámi, but morphological analyzers will handle them.
Constituent order within noun phrases is similar:
Det Num Adj+ Noun
Where Det can be either a demonstrative pronoun or a pronoun denoting possession (i.e., a personal pronoun in the genitive).
Cases[edit]
Northern Sámi has 7 cases: nominative, accusative, genitive, locative, illative, comitative, essive.
- Accusative and Genitive are often syncretic, except in some numbers and some pronouns.
- Comitative and Essive are the same in singular and plural
Finnish has 15 cases (and several additional case-like suffixes only applied to adverbials). This is alot, here are the significant facts to avoid a string of opaque latinate terms:
- Structural cases: 4. nominative, partitive, accusative, genitive
- Locative cases: 6. An internal and external set (3 cases each) that show goal, location, and source.
- Stative cases: 2. state, goal state; rarely a third - source state
- Additional: 2 instructive/instrumental cases (with, without), 1 comitative case (plural only)
Where Finnish distinguishes internality and externality with locative and stative cases, there is no such distinction in Northern Sámi. Northern Sámi uses locative for source and location, and illative for goal. Thus, cases can roughly be transfered this way:
- (fin) Internal Source, Internal Location, External Source, External Location → Locative
- (fin) Internal Goal, External Goal → Illative
- (fin) Partitive, Accusative, Genitive → AccGen
Of course, the last set ending in AccGen will have to be distinguished with certain numbers and pronouns.
Case agreement[edit]
Most adjectives just have a predicative and attributive form, but some do agree in number with the subject.
Váralaš > váralaččat
- (fin) Muovipussit ovat vaarallisia. → Plásttetseahkat leat váralaččat.
earálágán > earálágánat
- (fin) Ihmiset ovat erilaisia. → Olbmot leat earálágánat.
Pronouns and demonstratives within DPs also agree with their head nouns, although there is some amount of syncreticism when they are attributes. In the plural however, the illative and locative forms are not syncretic and agree with a plural head noun.
Case | Independent | As Attribute | Head Noun | Attr Pl | Head Noun Pl |
---|---|---|---|---|---|
Nom | mii | mii (Nom) | beana (Nom) | mat (Nom Pl) | beatnagat (Nom Pl) |
Gen | man | man (Gen/Acc) | beatnaga (Gen/Acc) | maid (Gen/Acc Pl) | beatnagiid (Gen/Acc Pl) |
Acc | man | man (Gen/Acc) | beatnaga (Gen/Acc) | maid (Gen/Acc Pl) | beatnagiid (Gen/Acc Pl) |
Ill | masa | man (Gen/Acc) | beatnagii (Ill) | maidda (Ill Pl) | beatnagiidda (Ill Pl) |
Loc | mas | man (Gen/Acc) | beatnagis (Loc) | main (Loc Pl) | beatnagiin (Loc Pl) |
Com | mainna | mainna (Com) | beatnagiin (Com) | maiguin (Com Pl) | beatnagiiguin (Com Pl) |
Ess | manin | manin (Ess) | beanan (Ess) | manin (Ess) | beanan (Ess) |
This pattern holds for other demonstratives and numbers, except numbers do not have the same syncreticisms for Gen/Acc, in that the numbers may show separate marking for genitive and accusative, although the head noun shows syncretic Gen/Acc forms.
Case | Independent | As Attribute | Head Noun | Attr Pl | Head Noun Pl |
---|---|---|---|---|---|
Nom | okta | okta (Nom) | gáma (Nom) | ovttat (Nom Pl) | gápmagat (Nom Pl) |
Gen | ovtta | ovtta (Gen) | gápmaga (Gen/Acc) | ovttaid (Gen/Acc) | gápmagiid (Gen/Acc) |
Acc | ovtta | ovtta (Acc) | gápmaga (Gen/Acc) | ovttaid (Gen/Acc) | gápmagiid (Gen/Acc) |
Ill | oktii | ovtta (Gen/Acc) | gápmagii (Ill) | ovttaide (Ill Pl) | gápmagiidda (Ill Pl) |
Loc | ovttas | ovtta (Gen/Acc) | gápmagis (Loc) | ovttain (Loc Pl) | gápmagiin (Loc Pl) |
Com | ovttain | ovttain (Com) | gápmagiin (Com) | ovttaiguin (Com Pl) | gápmagiiguin (Com Pl) |
Ess | oktan | oktan (Ess) | gáman (Ess) | oktan (Ess) | gáman (Ess) |
Case | Independent | As Attribute | Head Noun | Attr Pl | Head Noun Pl |
---|---|---|---|---|---|
Nom | guokte | guokte (Nom) | gápmaga (Gen/Acc) | guovttit (Nom Pl) | gápmagat (Nom Pl) |
Gen | guovtti | guovtti (Gen) | gápmaga (Gen/Acc) | guvttiid (Gen/Acc) | gápmagiid (Gen/Acc) |
Acc | guokte | guokte (Acc) | gápmaga (Gen/Acc) | guvttiid (Gen/Acc) | gápmagiid (Gen/Acc) |
Ill | guoktái | guovtti (Gen/Acc) | gápmagii (Ill) | guvttiide (Ill Pl) | gápmagiidda (Ill Pl) |
Loc | guovttis | guovtti (Gen/Acc) | gápmagis (Loc) | guvttiin (Loc Pl) | gápmagiin (Loc Pl) |
Com | guvttiin | guvttiin (Com) | gápmagiin (Com) | guvttiiguin (Com Pl) | gápmagiiguin (Com Pl) |
Ess | guoktin | guoktin (Ess) | gáman (Ess) | guoktin (Ess) | gáman (Ess) |
This pattern is not exactly the same as with the number one, which has a syncreticism with Gen/Acc.
Some other attributes such as goappašat and guktot 'both (pl)' have separate patterns: syncretic gen/acc/ill (sometimes ill agreement is okay), agreement with locative, but optional agreement with comitative.
Case | Goappašat case | Noun Case |
---|---|---|
Nom Pl | AGR | AGR |
Gen Pl | AGR | AGR |
Acc Pl | AGR | AGR |
Ill Pl | Gen/Acc Pl OR Ill Pl | Ill Pl |
Loc Pl | AGR | AGR |
Com Pl | Gen/Acc Pl OR Com Pl | Com Pl |
Case | Guktot case | Noun Case |
---|---|---|
Nom Pl | AGR | AGR |
Gen Pl | AGR | AGR |
Acc Pl | AGR | AGR |
Ill Pl | Gen/Acc Pl | Ill Pl |
Loc Pl | AGR | AGR |
Com Pl | Gen/Acc Pl | Com Pl |
Determiner agreement:
- (fin) se talo → dat viessu
- (fin) sen talon → dan viesu
- (fin) sitä taloa → dan viesu
- (fin) sitä taloa → dat viesu (predicative)
- (fin) siihen taloon → dan viessui
- (fin) sille talolle → dan viessui
- (fin) siinä talossa → dan viesus
- (fin) sillä talolla → dan viesus
- (fin) siinä talosta → dan viesus
- (fin) siltä talolta → dan viesus
- (fin) siksi taloksi → danin viessun
- (fin) sinä talona → danin viessun
- (fin) ne talo → dat viesut
- (fin) niiden talojen → daid viesuid
- (fin) niitä taloja → daid viesuid
- (fin) niitä taloja → dat viesut (predicative)
- (fin) niihin taloihin → dan viessuide
- (fin) niille taloille → daid viesuide
- (fin) niistä taloissa → daid viesuin
- (fin) niillä taloilla → daid viesuin
- (fin) niistä taloista → daid viesuin
- (fin) niiltä taloilta → daid viesuin
- (fin) niiksi taloiksi → danin viessun
- (fin) niinä taloina → danin viessun
We need a list of all words that are like this-- seems like there are several others that I haven't found from Svonni, such as *seamma*.
Adjectives[edit]
Adjectives in Northern Sámi can have two separate forms depending on whether they are attributive or predicative. The attributive adjectives mostly do not agree in number with the head noun, but predicative adjectives do. Attributive adjectives do not agree in case with the head noun.
In Finnish, adjectives always agree in number and case with the head noun, and agree in number when they occur in predicates (although there is some variation as to whether or not the predicative adjective is partitive plural or nominative plural).
Numerals[edit]
- Numerals in Finnish can have possessive suffix:
yksinänsä/yksi<Num><Card><Pl><Ess><PxPl3>
, in Northern Sámi they cannot. - Ordinals are tagged as numerals in Finnish and as adjectives in Northern Sámi
Derivation[edit]
Tag | Type | Example | Analysis | in North Sámi | Gloss |
---|---|---|---|---|---|
Der/inen |
N→Adj | "muovinen" | muovi+N+Der/inen+Pos+Sg+Nom |
plastihkas ráhkaduvvon | plastihkka+n.loc build+v.pass.pp |
Der/ja |
V→N | "kirjoja" | kirjoa+V+Der/ja+Sg+Nom |
kirjoa-ja = write-er (writer) ? | |
Der/lainen |
N→Adj | "saamelainen" | saame+N+Der/lainen+Pos+Sg+Nom |
sápmelaš | -laš
|
Der/llinen |
N→Adj | "kirjallinen" | kirja+N+Der/llinen+Sg+Nom |
kirja-llinen = book-ish (literary)? | |
Der/minen |
marks deverbal nouns ? | ||||
Der/oi |
|||||
Der/sti |
Adj→Adv | derives an adverb from an adjective ? -ly | |||
Der/tar |
|||||
Der/ton |
N→Adj | "rahaton" | raha+N+Der/ton+Sg+Nom |
ruđaheapme | ruht + -heapme
|
Der/tse |
|||||
Der/ttain |
|||||
Der/u |
|||||
Der/vs |
There are some cases where both a derived and a lexicalised entry might be in one analyser, but only one or the other in the other analyser. For example:
saamelainen [LEMMA='saamelainen'][POS=ADJECTIVE][KTN=38][CMP=POS][NUM=SG][CASE=NOM] saamelainen [LEMMA='saame'][POS=NOUN][KTN=8][GUESS=DERIVE][DRV=LAINEN][CMP=POS][NUM=SG][CASE=NOM] saamelainen [LEMMA='saame'][POS=NOUN][KTN=8][NUM=SG][CASE=NOM][BOUNDARY=COMPOUND][GUESS=COMPOUND] [LEMMA='lainen'][POS=NOUN][KTN=38][NUM=SG][CASE=NOM] saamelainen saamelainen+A+Pos+Sg+Nom saamelainen saame+N+Der/lainen+Pos+Sg+Nom saamelainen saame+N+Sg+Nom#lainen+N+Sg+Nom
versus:
sápmelaš sápmelaš+A+Sg+Nom sápmelaš sápmelaš+A+Attr
How to deal with this will be one of the main challenges. E.g. do we add more entries, or do we remove entries ? Is there a way to do either of those automatically ?
The reason why the sme analysis gives only the lexicalised analysis is that there is a postprocessor choosing the lexicalised one, the perl file lookup2cg. Run through the same file the fin output is compatible:
$echo saamelainen|ufin|lookup2cg "<saamelainen>" "saamelainen" A Pos Sg Nom
Files[edit]
Source files | ||
File | Description | Notes |
---|---|---|
apertium-sme-fin.sme-fin.dix |
Transfer lexicon / Bilingual dictionary | |
apertium-sme-fin.sme.twol |
Morphophonology for Sámi | This file is copied as is from Giellatekno SVN. No changes should be made to the local version. |
apertium-sme-fin.fin-sme.rlx |
Constraint Grammar for Finnish | |
apertium-sme-fin.fin-sme.t1x |
Chunker file for Finnish→Northern Sámi | |
Compiled and binary files | ||
File | Description | Notes |
fin-sme.prob |
Tagger HMM probability file | This file needs to be trained when the CG is fully converted. |
fin-sme.rlx.bin |
Compiled Constraint Grammar for Finnish | |
fin-sme.autobil.bin |
Compiled transfer lexicon | |
fin-sme.t1x.bin |
Compiled transfer rules | These are first-stage transfer rules, mostly for chunking and local reordering. |