Difference between revisions of "Talk:Agglutination"
Jump to navigation
Jump to search
(Removing all content from page) |
|||
(15 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | {{TOCD}} |
||
− | In Hungarian a word has usually 2500 forms. |
||
− | Therefore a Hungarian dictionary with all forms would contain 1 million * 2500 words, |
||
− | that is 2.5 GWords, approx 20 GBytes, that can not be handled by computers |
||
− | and handling it would make no sense. |
||
− | |||
− | Yes, hunspell handles that perfectly, it also handles vowel harmony. |
||
− | |||
− | What about Apertum to handle Hungarian? [[User:Muki987|Muki987]] 10:56, 6 April 2009 (UTC) |
||
− | |||
− | :I've played about with [[hunmorph]] — one of its limitations iirc is that it cannot do generation, only analysis. My personal preference for handling languages like Hungarian and Finnish etc. is to use something like [[SFST]] (see also [[Omorfi]]). The problem of course is then to get someone to write the actual code. - [[User:Francis Tyers|Francis Tyers]] 12:21, 6 April 2009 (UTC) |
||
− | |||
− | In Hungarian there is simply too much to generate. Not sure with Finnish&Turkish&Basque&Persian, but they also have a lot. I give you an example: |
||
− | |||
− | :Hello, there is not "simply too much to generate", there are languages much more agglutinative than Hungarian that have FST morphologies. For example see [http://www.morphologic.hu/downloads/publications/na/2008_lrec-saltmil_ural_na.pdf this paper]. - [[User:Francis Tyers|Francis Tyers]] 07:10, 7 April 2009 (UTC) |
||
− | |||
− | * ház (house) |
||
− | * házhoz to the .. |
||
− | * háztól from the.. |
||
− | * házig up to.. |
||
− | * háznak of the.. |
||
− | * háznál at the |
||
− | * házba into.. |
||
− | * házban in the... |
||
− | * házból from the... |
||
− | * házról about ... |
||
− | * házra on top of the... |
||
− | * házon on the .... |
||
− | * házzá become a ... |
||
− | * házat it (accusativ) - 14 |
||
− | * házam (my house - repeat all previous to this like:) -- 28 |
||
− | ** házamhoz ... |
||
− | ... |
||
− | ** házamat ... |
||
− | * házad (your house repeat all previous to this) -42 |
||
− | * háza (his, her, its house repeat all previous to this) - 56 |
||
− | * házunk (our house repeat all previous to this) -- 70 |
||
− | * házatok (your house repeat all previous to this) - 84 |
||
− | * házuk (their house repeat all previous to this) - 98 |
||
− | * házé (of the house repeat all previous to this) - 112 |
||
− | * házamé (of my house -repeat all previous to this) - 126 |
||
− | ... |
||
− | * házuké (of their house -repeat all previous to this) - 210 |
||
− | * házak (plural - repeat all previous for this up to here) 420 |
||
− | ... |
||
− | * házacska ( a little house - repeat all prevoius up to here) 840 |
||
− | * házikó ( a little house- repeat all prevoius, except last) 1260 |
||
− | * házas (married- repeat all previous for this up to here, except the last 2) 1680 |
||
− | |||
− | ... |
||
− | * you can see, it is almost trivial to get thousands of words just without a grammar book for each substantive. |
||
− | |||
− | In my opinion if we get a word, házaitokétól, (which is not unusual) we need an analysis tool, that shows: |
||
− | * házaitokétól (from something of your houses) |
||
− | * this is from ház |
||
− | * it is plural |
||
− | * it suits to the prefix "from" |
||
− | * it suits to "plural you" |
||
− | * the houses own something |
||
− | * the owned thing is singular (otherwise it would be házaitokéitól) |
||
− | |||
− | With this knowledge we can construct the English (or Spanish, German, etc...) form. [[User:Muki987|Muki987]] 20:14, 6 April 2009 (UTC) |
||
− | |||
− | ::I copied the diskussion with Jimregan onto my discussion page. We can continue there. [[User:Muki987|Muki987]] 10:14, 7 April 2009 (UTC) |
||
− | |||
− | ==Comparison of Omorfi and Hunmorph== |
||
− | ===Omorfi=== |
||
− | http://www.ling.helsinki.fi/cgi-bin/omor/omorfi-cgi-demo.py |
||
− | |||
− | Omorfi - Demo of Finnish Morphology |
||
− | |||
− | These demos are based on the HFST implementation of Finnish morphology using SFST , and Nykysuomen sanalista . A guesser is used for missing words. For more information see HFST home page |
||
− | Wordform Nykysuomen has no known analyses. The 6 best baseform and paradigm guesses were chosen: |
||
− | <pre> |
||
− | *1. Nykysuomen 32 noun sg nom |
||
− | *2. Nykysuomi 7 noun sg acc |
||
− | *3. Nykysuomi 7 noun sg gen |
||
− | *4. Nykysuomi 7 noun sg ins |
||
− | *5. Nykysuomi 25 noun sg acc |
||
− | *6. Nykysuomi 25 noun sg gen |
||
− | </pre> |
||
− | As far as I can see here: http://wiki.apertium.org/wiki/Omorfi |
||
− | <pre> |
||
− | $ echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan." | fst-proc omorfi/src/omorfi.sfstc |
||
− | |||
− | ^kaikki/kaikki<noun><7><a><sg><nom>$ ^ihmiset/ihminen<noun><38><pl><acc>/ihminen<noun><38><pl><nom>$ |
||
− | ^syntyvät/syntyä<verb><52><j><act><pcpva><pl><acc>/syntyä<verb><52><j><act><pcpva><pl><nom>/syntyä<verb><52><j><act><indv><pres><pl3>$ |
||
− | ^vapaina/vapaa<noun><17><pl><ess>$ |
||
− | ^ja/*ja$ ^tasavertaisina/*tasavertaisina$ ^arvoltaan/arvo<noun><1><sg><abl><pl3>/arvo<noun><1><sg><abl><sg3>$ ^ja/*ja$ |
||
− | ^oikeuksiltaan/oikeus<noun><40><pl><abl><pl3>/oikeus<noun><40><pl><abl><sg3>$. |
||
− | </pre> |
||
− | |||
− | Omorfi also analyses and that's it. I do not see any difference to hunmorph, do you? [[User:Muki987|Muki987]] 21:24, 6 April 2009 (UTC) |
||
− | |||
− | ===Hunmorph=== |
||
− | <pre> |
||
− | $ echo "ablakot" | ocamorph --aff lexicons/morphdb.hu/out/morphdb_hu.aff --dic lexicons/morphdb.hu/out/morphdb_hu.dic |
||
− | </pre> |
||
− | and you get |
||
− | |||
− | <pre> |
||
− | > ablakot ablak/NOUN> |
||
− | </pre> |
||
− | |||
− | This is pretty much the same IMHO, what Omorfi produces. What so you think? [[User:Muki987|Muki987]] 21:28, 6 April 2009 (UTC) |
||
− | |||
− | ::The difference is that in Omorfi, you can go the other way. From |
||
− | |||
− | :::<code>^syntyä<verb><52><j><act><pcpva><pl><acc>$</code> → <code>syntyvät</code> |
||
− | |||
− | ::Can you do that in hunmorph? It was my understanding that you couldn't. - [[User:Francis Tyers|Francis Tyers]] 07:10, 7 April 2009 (UTC) |
||
− | |||
− | :::Btw, if you want to look at an agglutinative language pair currently in SVN, check out [[Northern Sámi and Lule Sámi]] — the transducers were generated from full-form lists, which is not the ideal way to do it. A better way would have been to somehow compile the XFST source code using a free compiler (for example SFST/HFST), but unfortunately that isn't possible yet :( - [[User:Francis Tyers|Francis Tyers]] 07:17, 7 April 2009 (UTC) |