Difference between revisions of "Talk:Agglutination"
Line 58: | Line 58: | ||
With this knowledge we can construct the English (or Spanish, German, etc...) form. [[User:Muki987|Muki987]] 20:14, 6 April 2009 (UTC) |
With this knowledge we can construct the English (or Spanish, German, etc...) form. [[User:Muki987|Muki987]] 20:14, 6 April 2009 (UTC) |
||
==Omorfi== |
|||
http://www.ling.helsinki.fi/cgi-bin/omor/omorfi-cgi-demo.py |
|||
Omorfi - Demo of Finnish Morphology |
|||
These demos are based on the HFST implementation of Finnish morphology using SFST , and Nykysuomen sanalista . A guesser is used for missing words. For more information see HFST home page |
|||
Wordform Nykysuomen has no known analyses. The 6 best baseform and paradigm guesses were chosen: |
|||
*1. Nykysuomen 32 noun sg nom |
|||
*2. Nykysuomi 7 noun sg acc |
|||
*3. Nykysuomi 7 noun sg gen |
|||
*4. Nykysuomi 7 noun sg ins |
|||
*5. Nykysuomi 25 noun sg acc |
|||
*6. Nykysuomi 25 noun sg gen |
|||
As far as I can see here: http://wiki.apertium.org/wiki/Omorfi |
|||
$ echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan." | fst-proc omorfi/src/omorfi.sfstc |
|||
^kaikki/kaikki<noun><7><a><sg><nom>$ ^ihmiset/ihminen<noun><38><pl><acc>/ihminen<noun><38><pl><nom>$ |
|||
^syntyvät/syntyä<verb><52><j><act><pcpva><pl><acc>/syntyä<verb><52><j><act><pcpva><pl><nom>/syntyä<verb><52><j><act><indv><pres><pl3>$ |
|||
^vapaina/vapaa<noun><17><pl><ess>$ ^ja/*ja$ ^tasavertaisina/*tasavertaisina$ ^arvoltaan/arvo<noun><1><sg><abl><pl3>/arvo<noun><1><sg><abl><sg3>$ ^ja/*ja$ |
|||
^oikeuksiltaan/oikeus<noun><40><pl><abl><pl3>/oikeus<noun><40><pl><abl><sg3>$. |
|||
Omorfi also analyses and that's it. I do not see any difference to hunmorph, do you? |
Revision as of 21:24, 6 April 2009
In Hungarian a word has usually 2500 forms. Therefore a Hungarian dictionary with all forms would contain 1 million * 2500 words, that is 2.5 GWords, approx 20 GBytes, that can not be handled by computers and handling it would make no sense.
Yes, hunspell handles that perfectly, it also handles vowel harmony.
What about Apertum to handle Hungarian? Muki987 10:56, 6 April 2009 (UTC)
- I've played about with hunmorph — one of its limitations iirc is that it cannot do generation, only analysis. My personal preference for handling languages like Hungarian and Finnish etc. is to use something like SFST (see also Omorfi). The problem of course is then to get someone to write the actual code. - Francis Tyers 12:21, 6 April 2009 (UTC)
In Hungarian there is simply too much to generate. Not sure with Finnish&Turkish&Basque&Persian, but they also have a lot. I give you an example:
- ház (house)
- házhoz to the ..
- háztól from the..
- házig up to..
- háznak of the..
- háznál at the
- házba into..
- házban in the...
- házból from the...
- házról about ...
- házra on top of the...
- házon on the ....
- házzá become a ...
- házat it (accusativ) - 14
- házam (my house - repeat all previous to this like:) -- 28
- házamhoz ...
...
- házamat ...
- házad (your house repeat all previous to this) -42
- háza (his, her, its house repeat all previous to this) - 56
- házunk (our house repeat all previous to this) -- 70
- házatok (your house repeat all previous to this) - 84
- házuk (their house repeat all previous to this) - 98
- házé (of the house repeat all previous to this) - 112
- házamé (of my house -repeat all previous to this) - 126
...
- házuké (of their house -repeat all previous to this) - 210
- házak (plural - repeat all previous for this up to here) 420
...
- házacska ( a little house - repeat all prevoius up to here) 840
- házikó ( a little house- repeat all prevoius, except last) 1260
- házas (married- repeat all previous for this up to here, except the last 2) 1680
...
- you can see, it is almost trivial to get thousands of words just without a grammar book for each substantive.
In my opinion if we get a word, házaitokétól, (which is not unusual) we need an analysis tool, that shows:
- házaitokétól (from something of your houses)
- this is from ház
- it is plural
- it suits to the prefix "from"
- it suits to "plural you"
- the houses own something
- the owned thing is singular (otherwise it would be házaitokéitól)
With this knowledge we can construct the English (or Spanish, German, etc...) form. Muki987 20:14, 6 April 2009 (UTC)
Omorfi
http://www.ling.helsinki.fi/cgi-bin/omor/omorfi-cgi-demo.py Omorfi - Demo of Finnish Morphology
These demos are based on the HFST implementation of Finnish morphology using SFST , and Nykysuomen sanalista . A guesser is used for missing words. For more information see HFST home page Wordform Nykysuomen has no known analyses. The 6 best baseform and paradigm guesses were chosen:
- 1. Nykysuomen 32 noun sg nom
- 2. Nykysuomi 7 noun sg acc
- 3. Nykysuomi 7 noun sg gen
- 4. Nykysuomi 7 noun sg ins
- 5. Nykysuomi 25 noun sg acc
- 6. Nykysuomi 25 noun sg gen
As far as I can see here: http://wiki.apertium.org/wiki/Omorfi
$ echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan." | fst-proc omorfi/src/omorfi.sfstc
^kaikki/kaikki<noun><7><a><sg><nom>$ ^ihmiset/ihminen<noun><38><pl><acc>/ihminen<noun><38><pl><nom>$ ^syntyvät/syntyä<verb><52><j><act><pcpva><pl><acc>/syntyä<verb><52><j><act><pcpva><pl><nom>/syntyä<verb><52><j><act><indv><pres><pl3>$ ^vapaina/vapaa<noun><17><pl><ess>$ ^ja/*ja$ ^tasavertaisina/*tasavertaisina$ ^arvoltaan/arvo<noun><1><sg><abl><pl3>/arvo<noun><1><sg><abl><sg3>$ ^ja/*ja$ ^oikeuksiltaan/oikeus<noun><40><pl><abl><pl3>/oikeus<noun><40><pl><abl><sg3>$.
Omorfi also analyses and that's it. I do not see any difference to hunmorph, do you?