Difference between revisions of "Talk:Agglutination"

@@ Line 1: / Line 1: @@
-{{TOCD}}
-In Hungarian a word has usually 2500 forms.
-Therefore a Hungarian dictionary with all forms would contain 1 million * 2500 words,
-that is 2.5 GWords, approx 20 GBytes, that can not be handled by computers
-and handling it would make no sense.
-Yes, hunspell handles that perfectly, it also handles vowel harmony.
-What about Apertum to handle Hungarian? [[User:Muki987|Muki987]] 10:56, 6 April 2009 (UTC)
-:I've played about with [[hunmorph]] &mdash; one of its limitations iirc is that it cannot do generation, only analysis. My personal preference for handling languages like Hungarian and Finnish etc. is to use something like [[SFST]] (see also [[Omorfi]]). The problem of course is then to get someone to write the actual code. - [[User:Francis Tyers|Francis Tyers]] 12:21, 6 April 2009 (UTC)
-In Hungarian there is simply too much to generate. Not sure with Finnish&Turkish&Basque&Persian, but they also have a lot. I give you an example:
-:Hello, there is not "simply too much to generate", there are languages much more agglutinative than Hungarian that have FST morphologies. For example see [http://www.morphologic.hu/downloads/publications/na/2008_lrec-saltmil_ural_na.pdf this paper]. - [[User:Francis Tyers|Francis Tyers]] 07:10, 7 April 2009 (UTC)
-::The above statement is wrong. [[User:Muki987|Muki987]] 19:35, 14 May 2009 (UTC)
-* ház (house)
-* házhoz to the ..
-* háztól from the..
-* házig up to..
-* háznak of the..
-* háznál at the
-* házba into..
-* házban in the...
-* házból  from the...
-* házról about ...
-* házra on top of the...
-* házon  on the ....
-* házzá become a ...
-* házat it (accusativ)  - 14
-* házam (my house - repeat all previous to this like:)  -- 28
-** házamhoz ...
-...
-** házamat ...
-* házad (your house repeat all previous to this) -42
-* háza (his, her, its house repeat all previous to this) - 56
-* házunk (our house repeat all previous to this)  -- 70
-* házatok (your house repeat all previous to this) - 84
-* házuk (their house repeat all previous to this)  - 98
-* házé (of the house repeat all previous to this)   - 112
-* házamé (of my house -repeat all previous to this)  - 126
-...
-* házuké (of their house -repeat all previous to this) - 210
-* házak (plural - repeat all previous for this up to here) 420
-...
-* házacska ( a little house - repeat all prevoius up to here) 840
-* házikó  ( a little house- repeat all prevoius, except last)  1260
-* házas (married- repeat all previous for this up to here, except the last 2) 1680
-...
-* you can see, it is almost trivial to get thousands of words just without a grammar book for each substantive.
-In my opinion if we get a word, házaitokétól, (which is not unusual) we need an analysis tool, that shows:
-* házaitokétól  (from something of your houses)
-* this is from ház
-* it is plural
-* it suits to the prefix "from"
-* it suits to "plural you"
-* the houses own something
-* the owned thing is singular (otherwise it would be házaitokéitól)
-With this knowledge we can construct the English (or Spanish, German, etc...) form. [[User:Muki987|Muki987]] 20:14, 6 April 2009 (UTC)
-::I copied the diskussion with Jimregan onto my discussion page. We can continue there. [[User:Muki987|Muki987]] 10:14, 7 April 2009 (UTC)
-==Comparison of Omorfi and Hunmorph==
-===Omorfi===
-http://www.ling.helsinki.fi/cgi-bin/omor/omorfi-cgi-demo.py
-Omorfi - Demo of Finnish Morphology
-These demos are based on the HFST implementation of Finnish morphology using SFST , and Nykysuomen sanalista . A guesser is used for missing words. For more information see HFST home page
-Wordform Nykysuomen has no known analyses. The 6 best baseform and paradigm guesses were chosen:
-<pre>
-*1. 	Nykysuomen 	32 noun 	sg nom
-*2. 	Nykysuomi 	7 noun 	sg acc
-*3. 	Nykysuomi 	7 noun 	sg gen
-*4. 	Nykysuomi 	7 noun 	sg ins
-*5. 	Nykysuomi 	25 noun 	sg acc
-*6. 	Nykysuomi 	25 noun 	sg gen
-</pre>
-As far as I can see here: http://wiki.apertium.org/wiki/Omorfi
-<pre>
-$ echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan." | fst-proc omorfi/src/omorfi.sfstc
-^kaikki/kaikki<noun><7><a><sg><nom>$ ^ihmiset/ihminen<noun><38><pl><acc>/ihminen<noun><38><pl><nom>$
-^syntyvät/syntyä<verb><52><j><act><pcpva><pl><acc>/syntyä<verb><52><j><act><pcpva><pl><nom>/syntyä<verb><52><j><act><indv><pres><pl3>$
-^vapaina/vapaa<noun><17><pl><ess>$
-^ja/*ja$ ^tasavertaisina/*tasavertaisina$ ^arvoltaan/arvo<noun><1><sg><abl><pl3>/arvo<noun><1><sg><abl><sg3>$ ^ja/*ja$
-^oikeuksiltaan/oikeus<noun><40><pl><abl><pl3>/oikeus<noun><40><pl><abl><sg3>$.
-</pre>
-Omorfi also analyses and that's it. I do not see any difference to hunmorph, do you? [[User:Muki987|Muki987]] 21:24, 6 April 2009 (UTC)
-===Hunmorph===
-<pre>
-$ echo "ablakot" | ocamorph --aff lexicons/morphdb.hu/out/morphdb_hu.aff --dic lexicons/morphdb.hu/out/morphdb_hu.dic
-</pre>
-and you get
-<pre>
-> ablakot ablak/NOUN>
-</pre>
-This is pretty much the same IMHO, what Omorfi produces. What so you think? [[User:Muki987|Muki987]] 21:28, 6 April 2009 (UTC)
-::The difference is that in Omorfi, you can go the other way. From
-:::<code>^syntyä<verb><52><j><act><pcpva><pl><acc>$</code> → <code>syntyvät</code>
-::Can you do that in hunmorph? It was my understanding that you couldn't. - [[User:Francis Tyers|Francis Tyers]] 07:10, 7 April 2009 (UTC)
-:::Btw, if you want to look at an agglutinative language pair currently in SVN, check out [[Northern Sámi and Lule Sámi]] &mdash; the transducers were generated from full-form lists, which is not the ideal way to do it. A better way would have been to somehow compile the XFST source code using a free compiler (for example SFST/HFST), but unfortunately that isn't possible yet :( - [[User:Francis Tyers|Francis Tyers]] 07:17, 7 April 2009 (UTC)
-==Moses==
-I compared Moses to Apertium, and as far as I can see, Apertium is much better, cleaner, more usable. Moses is like google translation, not bad in certain situations, but will never have acceptable quality. Unfortunately.
-==Matxin==
-Unfortunately I can not read Spanish. I can read English, Hungarian, German.
-If anybody translated to English (using apertium,) would be a great help for me.
-:We have Catalan→English, Welsh→English, Spanish→English, Galician→English. - [[User:Francis Tyers|Francis Tyers]] 07:12, 8 April 2009 (UTC)
-::I mean, I would like to be able the user's guide and description. (Descripción del sistema de traducción es-eu Matxin) [[User:Muki987|Muki987]] 19:57, 8 April 2009 (UTC)
-:::For Apertium, you can check out the [https://wiki.apertium.org/w/images/d/d0/Apertium2-documentation.pdf official docs] linked from the [[documentation]] page, they're a bit out of date, but for a broad overview they are valid. For Matxin, I have translated the documentation into English, but they haven't put it on their site yet. If you [[Special:Emailuser/Francis Tyers|email me]] I can send you a copy. - [[User:Francis Tyers|Francis Tyers]] 21:43, 8 April 2009 (UTC)
-===Hungarian-whatever (inflecting)===
-However, I have an imagination of Hungarian-whatever-Hungarian translation.
-A Hungarian sentence would be preprocessed, and each word would be added the morphological analysis result.
-For example:
-Ma megyek a házba.  (Today (ma) I go (megyek)  into the house (a házba))
-I get with hunmorph for this:
- > ma
- ma/ADV
- ma/NOUN
- > megyek
- megy/VERB<PERS<1>>
- > a
- a/ART
- > házba
- ház/NOUN<CAS<ILL>>
-I would enter this to apertium,
-I would expect a usable translation for non-agglutinative languages, and some output, that hunmorph could again translate back into more readable form for agglutinative languages.
-:Not even any need for a pre-process, there is no reason why we cannot replace the Apertium morphological analyser with hunmorph (apart from the fact that it is written in Ocaml and slow!) ;). If you're interested in doing this I'll look at converting the output of hunmorph to apertium 'standard' (see [[Apertium stream format]]). - [[User:Francis Tyers|Francis Tyers]] 07:12, 8 April 2009 (UTC)
-::I wonder why ocaml is that slow. It is a compilable language, being as fast, as C/C++. Also, hunpos, that delivers also word types, works much faster (is also an ocaml project!). [[User:Muki987|Muki987]] 12:10, 8 April 2009 (UTC)
-:::I don't know, it's strange. It might be either to do with the language, or with the program itself -- but when I tried it last it was ~212 words/second. Which is at least a power of ten or more slower than lttoolbox/sfst. - [[User:Francis Tyers|Francis Tyers]] 21:43, 8 April 2009 (UTC)
-::Is there somewhere a system documentation of apertium, that would speed up my understandig of it's structure and logic? I believe, apertium is the right project for me, and I should like get first a system overview.  Thanks. [[User:Muki987|Muki987]] 12:10, 8 April 2009 (UTC)
-:::See my answer above. - [[User:Francis Tyers|Francis Tyers]] 21:43, 8 April 2009 (UTC)
-===Whatever (inflecting) -Hungarian===
-I go into the house
-Apertium knows from the rules:
- I go = megyek
- the: a
- Into the house: ház/NOUN<CAS<ILL>> ->hunlex or whatever translates into: házba
-What do you think?
-[[User:Muki987|Muki987]] 20:59, 7 April 2009 (UTC)
-:Yep, no problem. - [[User:Francis Tyers|Francis Tyers]] 07:12, 8 April 2009 (UTC)

Difference between revisions of "Talk:Agglutination"

Latest revision as of 07:55, 7 July 2009

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools