Difference between revisions of "User talk:Muki987"
(136 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
== |
==Useful links== |
||
* [http://wiki.apertium.org/wiki/Special:Contributions/Muki987 my contributions] |
|||
:'With this knowledge we can construct the English' -- How? You don't seem to have given thought to that part. |
|||
* [http://wiki.apertium.org/wiki/Special:Contributions/Francis_Tyers F. Tyers] |
|||
:'háza (his, her, its house repeat all previous to this) - 56' -- it strikes me as a) unlikely that you can chain all possible possessives in this manner and b) that you can do something useful that will convey an understandable meaning in another language even if it is. |
|||
* [http://wiki.apertium.org/wiki/Special:Contributions/Jimregan regan's] |
|||
:'házas (married- repeat all previous for this up to here, except the last 2) 1680' -- a married house? Really? |
|||
:'házacska' -- are there no lexicalised diminutives in Hungarian? I can theoretically add '-let' to any noun in English, but 'piglet' has a separate translation to most languages, and 'hamlet' is not a diminutive of 'ham'. |
|||
:Just because you can theoretically infer meaning from an analysis doesn't mean that results will translate. -- [[User:Jimregan|Jimregan]] 05:24, 7 April 2009 (UTC) |
|||
== |
==Basque== |
||
I found some useful Basque introductions: |
|||
>'With this knowledge we can construct the English' -- How? You don't seem to have given thought to that part. |
|||
* [http://www.hermanboel.eu/en-language_basque_lesson.htm#definite hermanboel] |
|||
* [http://www.buber.net/Basque/Euskara/lesson.2.html buber] |
|||
* [http://learnlanguagefromluton.net/basque.html learn] |
|||
* [http://ixa.si.ehu.es/Ixa ixa] |
|||
* [http://en.wikipedia.org/wiki/Basque_language#Grammar en wiki] |
|||
Of course I did. Whatever I can do as a human translator, the machine can also do, if I tell him how. I am absolutely optimistic in the fact, and looking for the proper technology. |
|||
:Great. 'To a person with a hammer, all problems look like nails'. I say that Hunmorph is your hammer; you are mixing derivational processes with agglutination and 'normal' morphology. Just because all of these things can be treated the same doesn't mean it always makes sense to do so, which is the point underlying everything I said. -- [[User:Jimregan|Jimregan]] 12:57, 7 April 2009 (UTC) |
|||
::Your pont remains unclear for me, but it might be not worth to seek for clearness in this case, since your text seems to be philosophical for me. I am a practicing person, less philosophic type. I am rather new to practicing Hunmorph, anyway. [[User:Muki987|Muki987]] 18:16, 7 April 2009 (UTC) |
|||
:::My point is that you have one solution to one problem; you're trying to use that solution for other problems. Clear? -- [[User:Jimregan|Jimregan]] 10:46, 8 April 2009 (UTC) |
|||
::::Sorry, not. Please explain what you want to say more detailed with examples. Also explain, why are you saying that. [[User:Muki987|Muki987]] 11:49, 8 April 2009 (UTC) |
|||
:::::I'll make it as simple as possible: you think there's only one problem; there are many more. You are ignoring them because you have one solution, and think it will work for them all. It won't -- [[User:Jimregan|Jimregan]] 15:52, 8 April 2009 (UTC) |
|||
::::::You misunderstand me completely. I very clearly see, we have lots of things to do, I just address one of them, that's all. The main one for me at the moment. If that is fixed, I'll continue with the rest, or even better, we have a lot of commonly solvable problems, and we solve together the rest. What I addressed, is no problem for prefix type language pairs, but very clearly a problem for me. My primer focus is English-Hungarian, German-Hungarian, second English-German, third German-English, Fourth Hungarian-German, Hungarian-English. The other option is, we say, it is impossible to write a translator, I think, that is simply wrong. [[User:Muki987|Muki987]] 18:14, 8 April 2009 (UTC) |
|||
:::::::If I misunderstood you; good. Because it seemed clear to me that you were ignoring other issues, thinking that they would all be solved by using HunMorph. I'd much rather make you angry now than see you work for months, only to find you have to redo everything. For English-Hungarian, you will most likely find it easier to treat certain types of words - derivatives - as separate words, rather than forms of a base word. It's still ''possible'' to do otherwise, but it causes a lot of unnecessary complication, and will have undesired side-effects. I find it hard to believe that Hungarian-German will be much different. (Or, honestly, anything other than Hungarian-Finnish and Hungarian-Estonian). |
|||
:::::::: Yes,I agree with all that. Especially with the remark, German-Hungarian relation compared to English-German relation. I found while translating lots of texts from both, I was faced almost all the time with the same problems. The only exception is in English the lots of meanings of the same word, that fortunately is not the case for German or Hungarian. <strong>In English-German relation</strong> the only grammatical problem I saw, (besides using of the false words or expressions, which is a general problem for any language pair) is the position of verbs, that tend to be often at the end of a structure, while in English in the middle of it. |
|||
::::::::<strong>Example:</strong> |
|||
::::::::Er dachte, sie würde in die Schule gehen. |
|||
::::::::He thought, she would go to school. |
|||
::::::::Er dachte, sie würde gehen in die Schule --is unusual in German, and sounds un-German. One can see such bad structures in swallow translated texts. |
|||
::::::::I believe, Apertium has standard tools to fix this. I also think, to handle words near to the stem is simpler, than try to complicate our life with derivatives (Ház-házas- házas can, and should clearly be considered az an independent word.). My example was just set up to illustrate the great number of derivatives (I forgot even some), and not to suggest any special way to translate that word. [[User:Muki987|Muki987]] 09:31, 9 April 2009 (UTC) |
|||
:::::::::Unfortunately, we don't have an effective way of dealing with this example -- but, we (well, Francis and I) recently learned something about our transfer architecture that can possibly be used to deal with this (n-level transfer), but neither of us have had a chance to experiment with it yet. (At least, I haven't; perhaps Francis has). -- [[User:Jimregan|Jimregan]] 10:58, 9 April 2009 (UTC) |
|||
:::::::::: Word order is very important, not only in English-German, but also in German-Hungarian relation. |
|||
::::::::::He thought, she would go to school. |
|||
::::::::::Azt hitte, iskolába megy. (azt=that, hitte=thought, iskolába=to school, megy go), when we want to say, she goes to school, and not elsewhere. |
|||
::::::::::Azt hitte, megy az iskolába- when we want to say, she went and did not fly or swim). |
|||
::::::::::This also exists in German, we put to the begin the part, we want to stress. |
|||
::::::::::In die Schule wollte sie gehen. (To school she wanted to go) |
|||
::::::::::Gehen wollte sie in die Schule. (She wanted to go to school) [[User:Muki987|Muki987]] 11:18, 9 April 2009 (UTC) |
|||
:::::::Have a look at apertium-eu-es (Basque->Spanish). It's one direction only, but using HunMorph would limit you to only being able to translate from Hungarian anyway. (AFAIK, the main reason eu-es is one direction only is because Matxin already exists for the other direction) -- [[User:Jimregan|Jimregan]] 09:01, 9 April 2009 (UTC) |
|||
:::::::: Yes, I am doing that at the moment, thanks. My priority is, as you know, English-Hungarian, German-Hungarian first. [[User:Muki987|Muki987]] 09:31, 9 April 2009 (UTC) |
|||
>'háza (his, her, its house repeat all previous to this) - 56' -- it strikes me as a) unlikely that you can chain all possible possessives in this manner and b) that you can do something useful that will convey an understandable meaning in another language even if it is. |
|||
ház- házam, házad, háza, házunk, házatok házuk (my house, your house, his, her its house, our house, your hous their house) All relations to MY HOUSE are then expressed, as in the case of ház: |
|||
házban- házamban |
|||
házra- házamra |
|||
etc... |
|||
It is simple and understandable in all cultur languages. |
|||
:You aren't addressing my point. 'repeat all previous to this', implying that you can have some combination meaning 'my your his their house'. -- [[User:Jimregan|Jimregan]] 12:57, 7 April 2009 (UTC) |
|||
:: Repeat all previus means, that I can express the relations to "my house" "your house" ... "Their house" by using exactly the same inflects, as for "house". Above the example with "ban" = in , all others work exactly on the same way. [[User:Muki987|Muki987]] 18:16, 7 April 2009 (UTC) |
|||
:::So, referring just to grammatical cases? Ok, that answers my question -- [[User:Jimregan|Jimregan]] 10:46, 8 April 2009 (UTC) |
|||
::::If you want to express it like that. Neither English, nor Hungarian have grammatical cases in fact, just to be precise. [[User:Muki987|Muki987]] 11:56, 8 April 2009 (UTC) |
|||
:::::Hungarian ''does'' have grammatical cases; 'I usually quote 17 following those established by Antal László in 1977' -- the first group in your set of examples are grammatical cases -- [[User:Jimregan|Jimregan]] 15:52, 8 April 2009 (UTC) |
|||
:::::: If I was in you, I were much more modest in my statements. Antal László's linguistic ideas are disputable. In Hungary, nobody speaks about n cases, because that is simply contraproductive. It is also contraproductive for foreigners, if they learn Hungarian. I see now, it is useful for translation, so I will use the concept, but for this purpose only. [[User:Muki987|Muki987]] 18:14, 8 April 2009 (UTC) |
|||
::::::: You misunderstand me here; I was not being immodest: all of the literature I could find in English is in agreement. It may be the case that the views are disputed, but that is not represented in English writings about Hungarian grammar, as far as I have seen at least. How are they considered, then? Because it may be the case that it could be easier to translate to and from Hungarian if the set of suffixes I would regard as case endings were instead treated as enclitic postfixes (that is, by splitting off the suffix and treating it as if it was a separate word: see, for example, how 'dímelo' in Spanish is split into 'decir<vblex><imp><p2><sg>+prpers<prn><enc><p1><mf><sg>+lo<prn><enc><p3><nt>'. Does Hungarian have vowel harmony, like Finnish? That may complicate things, but I think there's a (relatively) easy way around it. -- [[User:Jimregan|Jimregan]] 09:01, 9 April 2009 (UTC) |
|||
:::::::: I see. You consider English literature as authoritativ for Hungarian grammar questions? I would not do that. It is written in fact by analphabets from the linguistic point of view. Authoritative are in my opinion are only mother tongue authors, who agree with most of the other Hungarian mother tongue linguists. (of course, this includes linguists of the past also, not only at present). |
|||
:::::::::No, no; I don't speak Hungarian, so I can't check the literature in Hungarian: I have to rely on literature in English. -- [[User:Jimregan|Jimregan]] 10:52, 9 April 2009 (UTC) |
|||
:::::::: <strong>You are however, IMHO absolutely correct, </strong>if you say, we must classify postfixes for translation purposes, since we MUST find a way to match our postfixes to the prepositions, and therefore we must classify them. We can call the classification anything, IMHO the best name is classification, but we can call them also cases, which has very little to do with German type cases. |
|||
:::::::::Great. See Francis' example, below: this is basically what I propose, for analysis. For generation, I propose taking that 'pseudo word' system, and converting it into a string of tags, much like HunMorph generates, in a set of Hungarian-only rules (most of our rules are based on knowledge of both languages, but this set could be reused among language pairs using Hungarian). -- [[User:Jimregan|Jimregan]] 10:52, 9 April 2009 (UTC) |
|||
:::::::: <strong>Yes we do have vowel harmony,</strong> however, this is almost trivial: high words get high ending, low low endings, there are a few exceptions, that can be handled by rules. eéiíöüõû are high, aáouóú are low vocales. Every postfix has low and high form, for example ba, be (into) ajtó-ba, szék-be. If the word is mixed, (mixed are typically words taken over from foreign languages) for example radio, either the last syllable decides or we use low, for example rádió-ba. Exceptions are some ancient words for etymological reasons, for example derék, derékba, íj, íjba, they are only a handful words, no problem. [[User:Muki987|Muki987]] 09:31, 9 April 2009 (UTC) |
|||
::::::::: Well, exceptions are exceptions, and every language has them. You're right; vowel harmony is not a big problem (at least, not in my opinion) -- but it does mean I need to ask you for more examples, as one set of suffixes is not enough. I know I can take them from Hunspell/Hunmorph (in fact, it would be a requirement IMO, to be able to reuse that data as quickly and easily as possible), but I'd rather focus on one small dataset to begin with, and expand later -- [[User:Jimregan|Jimregan]] 10:52, 9 April 2009 (UTC) |
|||
==Basque== |
|||
Note that in our Basque→Spanish system we do something similar with Basque cases. For example, a typical way of representing "hegoak" would be: |
|||
<pre> |
<pre> |
||
Line 107: | Line 40: | ||
- [[User:Francis Tyers|Francis Tyers]] 10:13, 9 April 2009 (UTC) |
- [[User:Francis Tyers|Francis Tyers]] 10:13, 9 April 2009 (UTC) |
||
: |
: I see, Matxin can handle both Basque-Spanish and Spanish-Basque, so I'll look throughoutly into that. Basque is also a hun language, as far as I know, very similar to Hungarian. [[User:Muki987|Muki987]] 10:32, 9 April 2009 (UTC) |
||
::Actually, the Matxin system cannot handle Basque→Spanish, as there is no dependency analysis for Basque. Apertium is used for Basque→Spanish and Matxin for Spanish→Basque. As far as I know, Basque does not have any living relatives. - [[User:Francis Tyers|Francis Tyers]] 10:44, 9 April 2009 (UTC) |
::Actually, the Matxin system cannot handle Basque→Spanish, as there is no dependency analysis for Basque. Apertium is used for Basque→Spanish and Matxin for Spanish→Basque. As far as I know, Basque does not have any living relatives. - [[User:Francis Tyers|Francis Tyers]] 10:44, 9 April 2009 (UTC) |
||
Line 117: | Line 50: | ||
:::Basque has lots of living relatives, Hungarian, Armenian, Turkish, Aserbaidshan, Uigur, Finnish, Estonian, Persian, Japanese (thru Ainu = hunnish influence), Ketchua (Inka language in south America), ancient Egyptian (no more living, but hieroglyphes show a great past), Etruscian (also no more living, but great past), Hindi, and more. [[User:Muki987|Muki987]] 11:30, 9 April 2009 (UTC) |
:::Basque has lots of living relatives, Hungarian, Armenian, Turkish, Aserbaidshan, Uigur, Finnish, Estonian, Persian, Japanese (thru Ainu = hunnish influence), Ketchua (Inka language in south America), ancient Egyptian (no more living, but hieroglyphes show a great past), Etruscian (also no more living, but great past), Hindi, and more. [[User:Muki987|Muki987]] 11:30, 9 April 2009 (UTC) |
||
== Headline text == |
|||
::::I do not agree, although the issue is not really pertinent to our current discussion. - [[User:Francis Tyers|Francis Tyers]] 11:53, 9 April 2009 (UTC) |
|||
==Considerations for prefix groups and possessions== |
==Considerations for prefix groups and possessions== |
||
Line 158: | Line 91: | ||
===Adding plural=== |
===Adding plural=== |
||
*With Peter's coffee and tees - Péter kávé-já-val és teá-i-val - "i" is plural possession for tea |
*With Peter's coffee and tees - Péter kávé-já-val és teá-i-val - "i" is plural possession for tea |
||
:Oh. That's interesting, that plurality 'goes with' the possessive. Not really an extra problem, but it is interesting. -- [[User:Jimregan|Jimregan]] 11:31, 9 April 2009 (UTC) |
|||
===Remark=== |
===Remark=== |
||
These kind of structures caused for me the most manual work when translated texts from English/German, therefore it is very important to set up their proper translation. Thanks in advance for any critics/thought/comments. [[User:Muki987|Muki987]] 10:12, 9 April 2009 (UTC) |
These kind of structures caused for me the most manual work when translated texts from English/German, therefore it is very important to set up their proper translation. Thanks in advance for any critics/thought/comments. [[User:Muki987|Muki987]] 10:12, 9 April 2009 (UTC) |
||
:Yes; they pose quite a problem, because the phrase boundary needs to be detected: in 'Peter's coffee and tea', 'coffee and tea' is the part that's possessed, but in the sentence 'I drank Peter's coffee and tea was spilled on the ground' only 'coffee' is possessed. We can use CG to add boundaries here, but it will be a lot of work. -- [[User:Jimregan|Jimregan]] 11:29, 9 April 2009 (UTC) |
|||
::An interesting example. This is a fourth kind of structure closing signal: noun immediately followed a verb also stops the structure: |
::An interesting example. This is a fourth kind of structure closing signal: noun immediately followed a verb also stops the structure: |
||
Line 179: | Line 107: | ||
:::- [[User:Francis Tyers|Francis Tyers]] 13:36, 9 April 2009 (UTC) |
:::- [[User:Francis Tyers|Francis Tyers]] 13:36, 9 April 2009 (UTC) |
||
::::This particular example aside, the premise is sound: English has ambiguities that can cause difficulties in determining phrase boundaries. -- [[User:Jimregan|Jimregan]] 14:13, 9 April 2009 (UTC) |
|||
:::::Indeed. - [[User:Francis Tyers|Francis Tyers]] 14:23, 9 April 2009 (UTC) |
|||
:::Yes. Apertium's transfer works on left to right longest match. We were hoping that someone would be interested in integrating CG's dependency analysis for GSoC, which would help to resolve these ambiguities; at the moment, we have to simply pick the most common cases, and fail in others. -- [[User:Jimregan|Jimregan]] 13:32, 9 April 2009 (UTC) |
|||
===Some tests=== |
===Some tests=== |
||
Line 242: | Line 165: | ||
házas means married, and also a man/woman, who has a house |
házas means married, and also a man/woman, who has a house |
||
In case if ing (shirt) inges means someone, who wears a shirt |
In case if ing (shirt) inges means someone, who wears a shirt |
||
::Ah; now I see what you mean. I thought you meant that the suffix meant married, not the word -- [[User:Jimregan|Jimregan]] 10:46, 8 April 2009 (UTC) |
|||
:Then it's a derivation, and better treated as a separate word. -- [[User:Jimregan|Jimregan]] 12:57, 7 April 2009 (UTC) |
|||
>'házacska' -- are there no lexicalised diminutives in Hungarian? I can theoretically add '-let' to any noun in English, but 'piglet' has a separate translation to most languages, and 'hamlet' is not a diminutive of 'ham'. |
|||
acska or ikó is the diminutive. It is the same thing as pig-piglet. |
|||
:I know what a diminutive is; did you understand my question? 'piglet' is a diminutive of pig, but it is a separate word in its own right, which would have its own translation -- it is lexicalised. Many (most) other diminutives are unproductive, and can be safely treated in terms of the original word. -- [[User:Jimregan|Jimregan]] 12:57, 7 April 2009 (UTC) |
|||
::Yes, there might be some words, whose diminutive form modifies the original word's meaning, however, I can't think even a single one at the moment in Hungarian. Piglet means little pig or a child pig. What do you want with these words and examples? English is very hard to translate due to tens of very different meanings of lots of words, like prime and the like. This is a very specific English problem, Hungarian or German do not have it. Are you addressing this problem? If yes, can you see any practical solution for this? [[User:Muki987|Muki987]] 18:16, 7 April 2009 (UTC) |
|||
:::You're changing the issue again. If you want a German example; 'piglet' should be translated as 'Ferkel', not 'Schweinchen'; 'Mädchen', which is a lexicalised diminutive, should not be considered a form of 'Mäd'. |
|||
:::Word sense disambiguation is not a problem specific to English. -- [[User:Jimregan|Jimregan]] 10:46, 8 April 2009 (UTC) |
|||
:::: Not specific to English, but sharper in English, than in any other cultur language. What about your ideas to solve it? [[User:Muki987|Muki987]] 11:54, 8 April 2009 (UTC) |
:::: Not specific to English, but sharper in English, than in any other cultur language. What about your ideas to solve it? [[User:Muki987|Muki987]] 11:54, 8 April 2009 (UTC) |
||
::::: [[Word sense disambiguation]] |
::::: [[Word sense disambiguation]] |
||
::::: We don't currently have a good working lexical selection module, but it is one of the ideas we're hoping to get implemented through GSOC. - [[User:Francis Tyers|Francis Tyers]] 21:48, 8 April 2009 (UTC) |
::::: We don't currently have a good working lexical selection module, but it is one of the ideas we're hoping to get implemented through GSOC. - [[User:Francis Tyers|Francis Tyers]] 21:48, 8 April 2009 (UTC) |
||
Line 272: | Line 180: | ||
* -ka/ke: fóka (seal), róka (fox), csóka (jackdaw), pulyka (turkey), szarka (magpie) |
* -ka/ke: fóka (seal), róka (fox), csóka (jackdaw), pulyka (turkey), szarka (magpie) |
||
* -cska/cske: macska (cat), kecske (goat), fecske (swallow), szöcske (grasshopper) |
* -cska/cske: macska (cat), kecske (goat), fecske (swallow), szöcske (grasshopper) |
||
...which answers my question; yes, Hungarian ''does'' have lexicalised diminutives. -- [[User:Jimregan|Jimregan]] 10:50, 8 April 2009 (UTC) |
|||
:You see, you get better answers in fickipedia. You are right, this is an issue for translations, however one of the issues, that can easily be covered. [[User:Muki987|Muki987]] 11:54, 8 April 2009 (UTC) |
:You see, you get better answers in fickipedia. You are right, this is an issue for translations, however one of the issues, that can easily be covered. [[User:Muki987|Muki987]] 11:54, 8 April 2009 (UTC) |
||
::Yes; it's an issue; one that you weren't considering. -- [[User:Jimregan|Jimregan]] 15:52, 8 April 2009 (UTC) |
|||
:::Sure, and a lot of others also not. One after the other. [[User:Muki987|Muki987]] 18:14, 8 April 2009 (UTC) |
:::Sure, and a lot of others also not. One after the other. [[User:Muki987|Muki987]] 18:14, 8 April 2009 (UTC) |
||
>Just because you can theoretically infer meaning from an analysis doesn't mean that results will translate. -- Jimregan 05:24, 7 April 2009 (UTC) |
|||
I translated so much already, that I can say: You can not say anything in any cultur language, that can not be translated into an other one. |
|||
I hope, you do not want to stress that there are untranslatable things? I would strongly disagree with that assumption, and would ask you to give me at least one example. [[User:Muki987|Muki987]] 08:13, 7 April 2009 (UTC) |
|||
==Continuation JR== |
|||
:Ok, let me refine what I meant: the results won't translate in a meaningful way. There are all sorts of ways of inferring from derivational processes what a word 'means', but they tend to be useful only to linguists/translators who can then determine the best way to represent that in the target language. |
|||
:Yes, there are certain words that are not directly translatable between languages: their concepts may be conveyed in other ways, but it's an explanation, not a translation. -- [[User:Jimregan|Jimregan]] 12:57, 7 April 2009 (UTC) |
|||
:::I call explanation on the target language a way to translate it. For example in German Hammelsprung means a sort of voting, when those, who say yes, exit the room using some doors, those, who say no, on some others. This can IMHO not directly be translated on any language, but must be explained; I call then the explanation translation, what it is. What do you think? [[User:Muki987|Muki987]] 18:09, 7 April 2009 (UTC) |
|||
::::'Hammelsprung' -> 'parliamentary division', or just 'division', in context. That's the kind of translation MT should give: something as closely equivalent as possible, that fits into the same context. Your long explanation doesn't. -- [[User:Jimregan|Jimregan]] 11:02, 8 April 2009 (UTC) |
|||
::::: Well, that might be the case for German-English, but as far as I know, not the case for German-Hungarian. C'est la vie. [[User:Muki987|Muki987]] 12:00, 8 April 2009 (UTC) |
|||
:::::Maybe. Still, it's better to use something shorter, that fits into the same general category, than to give a long winded explanation. -- [[User:Jimregan|Jimregan]] 15:55, 8 April 2009 (UTC) |
|||
:::::: The shortest possible explanation, but it must be understandable for every reader [[User:Muki987|Muki987]] 18:17, 8 April 2009 (UTC) |
|||
::::::: By that, yes, you're right that if a 'close equivalent' is used, that should be as understandable as possible. On the other hand, it's perfectly acceptable to use specific terminology, which may not be understandable to everyone. -- [[User:Jimregan|Jimregan]] 08:38, 9 April 2009 (UTC) |
|||
:::::::: Yes, if one exists. I doubt, we have something in Hungarian, but I might be wrong. I can imagine "kimenõs szavazás", (voting by/at leaving) but I would not understand that without further explanations, [[User:Muki987|Muki987]] 09:02, 9 April 2009 (UTC) |
|||
::This is rather off the topic of the discussion, this page is more to discuss methods of representing agglutinative morphology in Apertium, rather than the translation problems of agglutinative languages (which are also interesting, but better reserved for another page, or the [[contact|mailing list]]). :) - [[User:Francis Tyers|Francis Tyers]] 08:21, 7 April 2009 (UTC) |
::This is rather off the topic of the discussion, this page is more to discuss methods of representing agglutinative morphology in Apertium, rather than the translation problems of agglutinative languages (which are also interesting, but better reserved for another page, or the [[contact|mailing list]]). :) - [[User:Francis Tyers|Francis Tyers]] 08:21, 7 April 2009 (UTC) |
||
:::Glad to hear, that you are convinced, apertium technology is suitable for agglutinative languages. Having gone thru the English-SerboCroatian example I was not that sure. I am at the moment in the evaluation phase, and I am looking for all existing technologies. At present in my opinion google translation technology with its statistical, grammar free approach will never have the quality of a grammar oriented one, like apertium. It will for ever remain on the surface, with no real improvement perspective. However, for some situations it is very helpful. That was my first step in the direction. We can continue this subject on my discussion page, if |
:::Glad to hear, that you are convinced, apertium technology is suitable for agglutinative languages. Having gone thru the English-SerboCroatian example I was not that sure. I am at the moment in the evaluation phase, and I am looking for all existing technologies. At present in my opinion google translation technology with its statistical, grammar free approach will never have the quality of a grammar oriented one, like apertium. It will for ever remain on the surface, with no real improvement perspective. However, for some situations it is very helpful. That was my first step in the direction. We can continue this subject on my discussion page, if xxx wants. [[User:Muki987|Muki987]] 10:02, 7 April 2009 (UTC) |
||
::Regarding other free grammar-focussed MT engines, you might also check out and [[Matxin]]. Open Logos has the downside of not supporting UTF-8 and not having very active development, while Matxin requires a dependency grammar to be written in [[Freeling]] format. If you want to go from English→Hungarian then this might be the answer, as they already have one written for English, but for Hungarian→English, it might take some extra development time. The [[Constraint grammar]] formalism for disambiguation and syntactic annotation might also be interesting. I'm quite happy to discuss other options and if you have any questions, please contact us on the mailing list, personally or through IRC. - [[User:Francis Tyers|Francis Tyers]] 10:36, 7 April 2009 (UTC) |
::Regarding other free grammar-focussed MT engines, you might also check out and [[Matxin]]. Open Logos has the downside of not supporting UTF-8 and not having very active development, while Matxin requires a dependency grammar to be written in [[Freeling]] format. If you want to go from English→Hungarian then this might be the answer, as they already have one written for English, but for Hungarian→English, it might take some extra development time. The [[Constraint grammar]] formalism for disambiguation and syntactic annotation might also be interesting. I'm quite happy to discuss other options and if you have any questions, please contact us on the mailing list, personally or through IRC. - [[User:Francis Tyers|Francis Tyers]] 10:36, 7 April 2009 (UTC) |
||
Line 447: | Line 326: | ||
:::::: Skype is completely free, very-very easy to use. Telephon abroad costs always a lot, no matter how you try. [[User:Muki987|Muki987]] 10:24, 11 April 2009 (UTC) |
:::::: Skype is completely free, very-very easy to use. Telephon abroad costs always a lot, no matter how you try. [[User:Muki987|Muki987]] 10:24, 11 April 2009 (UTC) |
||
:::::::No problem. |
|||
:::::::Well, humour doesn't tend to translate well. ;) |
|||
:::::::When I use free in terms of software, I mean free as in [http://www.gnu.org/philosophy/free-sw.html free software] (yay ambiguity!). - [[User:Francis Tyers|Francis Tyers]] 10:32, 11 April 2009 (UTC) |
|||
::::::::For me, as a poor person, free means, if I have to pay for it or not. I have not much time or interest in moralistic philosophying. [[User:Muki987|Muki987]] 10:58, 11 April 2009 (UTC) |
|||
:::::::::If you were a truly poor person, you wouldn't be watching Monty Python videos on the internet, nor preoccupying yourself with the vagaries of English expressions or Basque linguistics. So, best to leave the high horse where it belongs, in the stable. :) - [[User:Francis Tyers|Francis Tyers]] 11:27, 11 April 2009 (UTC) |
|||
:::::::::: No, I have never watched a Monthy Python video on Internet. I am a practicer, and it is harder to understand spoken text then to read it. I read it and did not really understand it (maybe therefore considered it as primitive hostility). Occupying myself with English expressions or Basque linguistic or translating texts is a typical activity of poor people. Nobody pays a cent today for linguistics or grammar or word collections or translations. However, I am so poor, that there are obviously people, who even envy me for being a poor person. [[User:Muki987|Muki987]] 11:43, 11 April 2009 (UTC) |
|||
::::::::::: I don't think you need to justify anything in the face of Francis' ''bourgeois'' ignorance of the vast difference between absolute and relative poverty. -- [[User:Jimregan|Jimregan]] 13:35, 11 April 2009 (UTC) |
|||
::::::::::: I have an idea that might prevent you spending your time on something that isn't what you want or need. Could you send me ten sentences in Hungarian of varying levels of complexity, with linguist's glosses in English and translation in English. e.g. |
::::::::::: I have an idea that might prevent you spending your time on something that isn't what you want or need. Could you send me ten sentences in Hungarian of varying levels of complexity, with linguist's glosses in English and translation in English. e.g. |
||
Line 964: | Line 832: | ||
I use Freeling 1.5. Please copy your console output for this sentence to this page. [[User:Muki987|Muki987]] 08:26, 23 April 2009 (UTC) |
I use Freeling 1.5. Please copy your console output for this sentence to this page. [[User:Muki987|Muki987]] 08:26, 23 April 2009 (UTC) |
||
:For which sentence? I'm using Freeling 2.0 - [[User:Francis Tyers|Francis Tyers]] 08:32, 23 April 2009 (UTC) |
|||
::Why should I have taken a raincoat and an umbrella, when my aunt, living in Georgia, told, it will be nice weather. [[User:Muki987|Muki987]] 08:46, 23 April 2009 (UTC) |
|||
:::Here it is, although it isn't really grammatical English. - [[User:Francis Tyers|Francis Tyers]] 08:50, 23 April 2009 (UTC) |
|||
<pre> |
|||
$ echo "Why should I have taken a raincoat and an umbrella, when my aunt, living in Georgia, told, it will be nice weather." | \ |
|||
./analyzer -f ../../data/config/en.cfg |
|||
DEPENDENCIES: NO HEAD Found!!! Check your chunking grammar and your dependency-building rules. |
|||
WRB(why)/top/(Why why WRB -) [ |
|||
MD(should)/modnorule/(should should MD -) |
|||
PRP(i)/modnorule/(I i PRP -) |
|||
verb/modnorule/(taken take VBN -) [ |
|||
VB*<have>/modnorule/(have have VBP -) |
|||
] |
|||
Z(a)/modnorule/(a 1 Z -) |
|||
grup-n/modnorule/(raincoat raincoat NN -) |
|||
CC(and)/modnorule/(and and CC -) |
|||
sn/modnorule/(umbrella umbrella NN -) [ |
|||
DT/modnorule/(an an DT -) |
|||
] |
|||
Fc(,)/modnorule/(, , Fc -) |
|||
WRB(when)/modnorule/(when when WRB -) |
|||
PRP$(my)/modnorule/(my my PRP$ -) |
|||
grup-n/modnorule/(aunt aunt NN -) |
|||
Fc(,)/modnorule/(, , Fc -) |
|||
verb/modnorule/(living live VBG -) |
|||
IN(in)/modnorule/(in in IN -) |
|||
grup-n/modnorule/(Georgia georgia NNP -) |
|||
Fc(,)/modnorule/(, , Fc -) |
|||
verb/modnorule/(told tell VBD -) |
|||
Fc(,)/modnorule/(, , Fc -) |
|||
PRP(it)/modnorule/(it it PRP -) |
|||
verb/modnorule/(be be VB -) [ |
|||
MD/modnorule/(will will MD -) |
|||
] |
|||
grup-n/modnorule/(weather weather NN -) [ |
|||
adj/modnorule/(nice nice JJ -) |
|||
] |
|||
Fp(.)/modnorule/(. . Fp -) |
|||
] |
|||
</pre> |
|||
Yes, exactly. The same as for freeling 1.5. If we compare this to the web screen output, we see, this is very different from that. For example, here all words ar modnorule, except the top one, on the web screen there are different func types. |
|||
:It could be a difference between SVN (or Freeling 2.1) and Freeling 2.0. The person to ask would be Lluís Padró, I've emailed you his email address. - [[User:Francis Tyers|Francis Tyers]] 09:07, 23 April 2009 (UTC) |
|||
Also the tree itself is completely different. Top of it is "told" on web, here "why". And so on.... Why that? [[User:Muki987|Muki987]] 09:00, 23 April 2009 (UTC) |
|||
:I'm not sure, it could have something to do with it not being a grammatical sentence in English ? - [[User:Francis Tyers|Francis Tyers]] 09:07, 23 April 2009 (UTC) |
|||
::OK. Change it to a grammatical one, and you still see the big differences. Why? [[User:Muki987|Muki987]] 09:22, 23 April 2009 (UTC) |
|||
I entered this same question on freeling forum, however it seems to be dead. |
|||
http://garraf.epsevg.upc.es/freeling/index.php?option=com_simpleboard&Itemid=55&func=view&catid=3&id=883#883 |
|||
[[User:Muki987|Muki987]] 09:22, 23 April 2009 (UTC) |
|||
::They sometimes take a while to reply. I think it might be a holiday up there too (La Diada de Sant Jordi) - [[User:Francis Tyers|Francis Tyers]] 09:51, 23 April 2009 (UTC) |
|||
A more standard rendering of the sentence below: |
|||
<pre> |
|||
$ echo "Why should I have taken a raincoat and an umbrella? My aunt who lives in Georgia said that the weather would be nice." | \ |
|||
./analyzer -f ../../data/config/en.cfg |
|||
DEPENDENCIES: NO HEAD Found!!! Check your chunking grammar and your dependency-building rules. |
|||
DEPENDENCIES: NO HEAD Found!!! Check your chunking grammar and your dependency-building rules. |
|||
WRB(why)/top/(Why why WRB -) [ |
|||
MD(should)/modnorule/(should should MD -) |
|||
PRP(i)/modnorule/(I i PRP -) |
|||
verb/modnorule/(taken take VBN -) [ |
|||
VB*<have>/modnorule/(have have VBP -) |
|||
] |
|||
Z(a)/modnorule/(a 1 Z -) |
|||
grup-n/modnorule/(raincoat raincoat NN -) |
|||
CC(and)/modnorule/(and and CC -) |
|||
sn/modnorule/(umbrella umbrella NN -) [ |
|||
DT/modnorule/(an an DT -) |
|||
] |
|||
Fit(?)/modnorule/(? ? Fit -) |
|||
] |
|||
PRP$(my)/top/(My my PRP$ -) [ |
|||
grup-n/modnorule/(aunt aunt NN -) |
|||
WP(who)/modnorule/(who who WP -) |
|||
verb/modnorule/(lives live VBZ -) |
|||
IN(in)/modnorule/(in in IN -) |
|||
grup-n/modnorule/(Georgia georgia NNP -) |
|||
verb/modnorule/(said say VBD -) |
|||
IN(that)/modnorule/(that that IN -) |
|||
sn/modnorule/(weather weather NN -) [ |
|||
DT/modnorule/(the the DT -) |
|||
] |
|||
verb/modnorule/(be be VB -) [ |
|||
MD/modnorule/(would will MD -) |
|||
] |
|||
adj/modnorule/(nice nice JJ -) |
|||
Fp(.)/modnorule/(. . Fp -) |
|||
] |
|||
</pre> |
|||
Although the analysis is still a bit of a mystery. They both seem to come out fine in the web interface. - [[User:Francis Tyers|Francis Tyers]] 10:00, 23 April 2009 (UTC) |
|||
The SVN gives: |
|||
<pre> |
|||
$ echo "Why should I have taken a raincoat and an umbrella, my aunt who lives in Georgia said that the weather would be nice." | \ |
|||
./analyzer -f ~/source/FREELING/local/share/FreeLing/config/en.cfg |
|||
sub-cl/top/(Why why WRB -) [ |
|||
mod-chunk/modnomatch/(should should MD -) |
|||
sv/cmod/(taken take VBN -) [ |
|||
vb-have/aux/(have have VBP -) |
|||
sn-chunk/ncsubj/(I i PRP -) |
|||
sn-coor/dobj/(and and CC -) [ |
|||
sn-chunk/conj/(raincoat raincoat NN -) [ |
|||
DT/det/(a a DT -) |
|||
] |
|||
sn-chunk/conj/(umbrella umbrella NN -) [ |
|||
DT/det/(an a DT -) |
|||
] |
|||
] |
|||
] |
|||
sf-brk/modnomatch/(, , Fc -) |
|||
sn-chunk/modnomatch/(aunt aunt NN -) [ |
|||
PRP$/ncmod-poss/(my my PRP$ -) |
|||
rel-cl/cmod/(who who WP -) [ |
|||
rel/ccomp/(lives live VBZ -) [ |
|||
sp-chunk/ncmod/(in in IN -) [ |
|||
sv/cmod/(said say VBD -) [ |
|||
n-chunk/ncsubj/(Georgia georgia NNP -) |
|||
] |
|||
] |
|||
] |
|||
] |
|||
] |
|||
sub-cl/modnomatch/(that that IN -) [ |
|||
sv/cmod/(be be VB -) [ |
|||
mod-chunk/aux/(would would MD -) |
|||
sn-chunk/ncsubj/(weather weather NN -) [ |
|||
DT/det/(the the DT -) |
|||
] |
|||
attrib/ncmod/(nice nice JJ -) |
|||
] |
|||
] |
|||
st-brk/modnomatch/(. . Fp -) |
|||
] |
|||
</pre> |
|||
Which is much better. - [[User:Francis Tyers|Francis Tyers]] 11:06, 23 April 2009 (UTC) |
|||
The equivalent apertium output (although slightly mangled for translation en→ca) would be: |
|||
<pre> |
|||
^Adv<adv><itg>{^why<adv><itg>$}$ |
|||
^inf<SV><inf><PD><ND>{^should<3>$}$ |
|||
^prnsubj<SN><p1><mf><sg>{^prpers<prn><p1><mf><sg>$}$ |
|||
^have_pp<SV><vblex><pri><PD><ND>{^have<vbhaver><3><4><5>$ ^take<vblex><pp><m><sg>$}$ |
|||
^det_nom<SN><DET><GD><sg>{^a<det><ind><3><4>$ ^coat<n><4>$}$ |
|||
^cnj<cnjcoo>{^and<cnjcoo>$}$ |
|||
^det_nom<SN><DET><GD><sg>{^a<det><ind><3><4>$ ^umbrella<n><4>$}$ |
|||
^coma<cm>{^,<cm>$}$ |
|||
^det_nom<SN><DET><GD><sg>{^my<det><pos><3><4><sp>$ ^aunt<n><4>$}$ |
|||
^reladj<REL><an><mf><sp>{^who<rel><an><3><4>$}$ |
|||
^verbcj<SV><vblex><pri><p3><sg>{^live<vblex><3><4><5>$}$ |
|||
^pr<PREP>{^in<pr>$}$ |
|||
^nom<SN><UNDET><sg>{^Georgia<np><loc><4>$}$ |
|||
^verbcj_perif<SV><reporting><ifip><PD><ND>{^anar<vaux><4><5>$ ^say<vblex><inf>$}$ |
|||
^cnj<cnjsub>{^that<cnjsub>$}$ |
|||
^det_nom<SN><DET><GD><sg>{^the<det><def><3><4><sp>$ ^weather<n><4>$}$ |
|||
^verbcj<SV><vbser><cni><PD><ND>{^be<vbser><3><4><5>$}$ |
|||
^adj<SA><GD><ND>{^nice<adj><2><3>$}$ |
|||
^punt<sent>{^.<sent>$}$ |
|||
</pre> |
|||
We can actually collapse <code>det_nom cnj det_nom</code> into e.g. <code>det_nom_cnj_det_nom</code>, but probably collapsing the relatives would be harder. The benefit of the FreeLing output is that 'my aunt who lives in Georgia' is expressed as one chunk that can be moved. Both ways have their benefits, for hu→en I'd go with Apertium and for en→hu probably FreeLing/Matxin or a hybrid of Apertium/Matxin. - [[User:Francis Tyers|Francis Tyers]] 11:27, 23 April 2009 (UTC) |
|||
==Versions== |
|||
I do not understand, why are you using freeling 2.0 or 2.1, when for matxin is clearly 1.5 suggested. |
|||
I also do not understand, how can be that fundamental differences between freeeling 2.1 (the svn version) and 1.5, that I use. I get following : |
|||
<pre> |
|||
Why should I have taken a raincoat and an umbrella, my aunt who lives in Georgia said that the weather would be nice. |
|||
grup-n/top/(Why why NN) [ |
|||
MD(should)/modnorule/(should should MD) |
|||
NP(i)/modnorule/(I i NP) |
|||
verb/modnorule/(taken take VBN) [ |
|||
VB*<have>/modnorule/(have have VBP) |
|||
] |
|||
IN(a)/modnorule/(a a IN) |
|||
grup-n/modnorule/(raincoat raincoat NN) |
|||
CC(and)/modnorule/(and and CC) |
|||
sn/modnorule/(umbrella umbrella NN) [ |
|||
DT/modnorule/(an an DT) |
|||
] |
|||
Fc(,)/modnorule/(, , Fc) |
|||
PP$(my)/modnorule/(my my PP$) |
|||
grup-n/modnorule/(aunt aunt NN) |
|||
WP(who)/modnorule/(who who WP) |
|||
verb/modnorule/(lives live VBZ) |
|||
IN(in)/modnorule/(in in IN) |
|||
NP(georgia)/modnorule/(Georgia georgia NP) |
|||
verb/modnorule/(said say VBD) |
|||
IN(that)/modnorule/(that that IN) |
|||
sn/modnorule/(weather weather NN) [ |
|||
DT/modnorule/(the the DT) |
|||
] |
|||
verb/modnorule/(be be VBP) [ |
|||
MD/modnorule/(would would MD) |
|||
] |
|||
adj/modnorule/(nice nice JJ) |
|||
Fp(.)/modnorule/(. . Fp) |
|||
] |
|||
</pre> |
|||
Which is fundamentally different from your output. If for matxin 1.5 is the valid version, why are you using the svn version of freeling? |
|||
[[User:Muki987|Muki987]] 13:00, 23 April 2009 (UTC) |
|||
::The problem is that the people who develop Matxin are shy of committing to their SVN repository. For internal development they are using freeling 2.0/2.1 and lttoolbox 3.1, just they haven't committed it to their SVN yet (you can see the last commit is from sometime in November!). So, for testing we should try the version of FreeLing that they are using. I've sent them an email asking if they can send us a snapshot of what they have locally. - [[User:Francis Tyers|Francis Tyers]] 13:19, 23 April 2009 (UTC) |
|||
::PS. Lluís responded on the FreeLing forum. - [[User:Francis Tyers|Francis Tyers]] 13:37, 23 April 2009 (UTC) |
|||
:The pont is, a really good dep. tagger sees, that the sentence has 3 almost independent parts: |
|||
<pre> |
|||
1. Why should I have taken a raincoat and an umbrella (where raincoat and umbrella belong together) |
|||
2. My aunt lives in Georgia, told |
|||
3. It will be nice weather |
|||
</pre> |
|||
[[User:Muki987|Muki987]] 11:54, 24 April 2009 (UTC) |
|||
::Yep, exactly :) - [[User:Francis Tyers|Francis Tyers]] 12:22, 24 April 2009 (UTC) |
|||
==Dog us== |
|||
http://www.smh.com.au/cgi-bin/common/popupPrintArticle.pl?path=/articles/2008/09/30/1222651083043.html |
|||
I got it here: |
|||
http://sujitpal.blogspot.com/2008/11/ir-math-in-java-hmm-based-pos.html |
|||
Do you know the expression: "Failure will dog us"? |
|||
Does this mean, failure will follow us? |
|||
Thanks, [[User:Muki987|Muki987]] 10:42, 25 April 2009 (UTC) |
|||
:Yes, this is an expression that can be used. It means that "Failure will follow us and cause us trouble". You can probably disambiguate with a rule that says "choose infinitive if -1 modal and +1 personal pronoun". - [[User:Francis Tyers|Francis Tyers]] 10:45, 25 April 2009 (UTC) |
|||
::Thanks. Seems to be used quite seldom in real life. Shows, that dog can be a verb, if used in a verb environment. He dogs a cat, for example. Or the dogs dogged a cat. Or the wolves dogged the rabbit. Or my big mistake dogged me 10 years long. At least the blogspot indicates this. [[User:Muki987|Muki987]] 11:06, 25 April 2009 (UTC) |
|||
Yes, but outside certain fixed semantic environments, it sounds odd. |
|||
<pre> |
|||
*He dogs a cat |
|||
*The dogs dogged a cat |
|||
The wolves dogged the rabbit. |
|||
*My big mistake dogged me for 10 years long. |
|||
My mistake dogged me for ten years. |
|||
</pre> |
|||
The ones marked with '*' are not wrong syntactically, but I would say they sound ''very'' strange. - [[User:Francis Tyers|Francis Tyers]] 11:14, 25 April 2009 (UTC) |
|||
::What is wrong with |
|||
<pre> |
|||
The wolves dogged the rabbit. |
|||
My mistake dogged me for ten years. |
|||
</pre> |
|||
? |
|||
:: How to write them syntactically correctly? [[User:Muki987|Muki987]] 12:21, 25 April 2009 (UTC) |
|||
:::No those are correct, I mean they sound good. - [[User:Francis Tyers|Francis Tyers]] 12:33, 25 April 2009 (UTC) |
|||
::::Thanks, [[User:Muki987|Muki987]] 12:36, 25 April 2009 (UTC) |
|||
==Fixes== |
|||
It looks great, I noticed a couple of issues, although I haven't tested it, try making [http://wiki.apertium.org/w/index.php?title=Talk%3AApertium_New_Language_Pair_HOWTO&diff=12266&oldid=12265 these edits] and seeing how the result turns out. - [[User:Francis Tyers|Francis Tyers]] 05:36, 30 April 2009 (UTC) |
|||
== Hi [[User:Jimregan|regan,]]== |
|||
The expressions <b>semi-crazed rants</b> and <b>you are exactly the wrong sort of person to work with translation in any form</b> are your expressions, you try to apply to others. [[User:Jimregan|regan]], you think, something entitles you to use that kind of language? |
|||
[[User:Jimregan|regan]], your <b>semi-crazed rants</b> in all subjects are rather <b>primitive</b>, and not amusing at all. |
|||
What do you think, who you are, that you would like to decide, <b>who may say what</b>? You believe, you are a <b>soviet commissar?</b> Or a <b>kapo of the barrack</b>? Have you let <b>check your mental state</b>? If you behave like that, <b>you, [[User:Jimregan|regan]] are exactly the wrong sort of person to work with translation in any form. </b> |
|||
<i>Of course, present language killers and people idiotizers on the TV screens and in radios, newspapers and magazines written by idiots, foreign advertisers and similar state-supported criminals try to push foreign words, which is not good for any language, and makes tools, like wordnet necessary. </i> |
|||
19:00, 17 May 2009 (UTC) |
|||
==Dep analysis== |
|||
I try to dependency analyse this sentence, since it is complicated enough: |
|||
*1. ''I think that if you have an agenda that you want to push of this kind, then you are exactly the wrong sort of person to work with translation in any form.'' |
|||
Why looking at that, I think, that is grammatically incorrect and erroneous. |
|||
*2. ''I think that if you have an agenda that you want to push of this kind, then...'' |
|||
Should not that be: |
|||
*3. ''I think that if you have an agenda of this kind, that you want to push, then...'' ? |
|||
Are both 2 and 3 correct, or is 3 wrong? |
|||
With other words, is "push of" a valid structure? |
|||
:::Both variants sound ok to my ears, although I would say: |
|||
::::"I think that if you have an agenda of this kind that you want to push, ..." |
|||
:::(without extra comma) - [[User:Francis Tyers|Francis Tyers]] 16:10, 20 May 2009 (UTC) |
|||
If the sentence is no good in the first form, is it still understandable? |
|||
of this kind is an attribute of the agenda. Let's assume, the attribute is ''blue''. |
|||
*4. ''I think that if you have a blue agenda, that you want to push, then...'' |
|||
*5. ''I think that if you have an agenda that you want to push blue, then...'' |
|||
*6. ''I think that if you have an agenda that you want to push, and that is blue, then...'' |
|||
I changed ''of this kind'' to ''blue''. Is 5 in that form not completely bad? Is 6 ok? Does "of this kind" implicitely say: ''and that is of this kind''? |
|||
Dependency analysis shows very different results, and if version 2 is correct, I report that to freeling- otherwise not. |
|||
<pre> |
|||
If you have an agenda of this kind, that you want to push, then you are good. |
|||
sub-adv/top/(If if IN -) [ |
|||
sv/modnorule/(have have VBP -) [ |
|||
sn-chunk/ncsubj/(you you PRP -) |
|||
sn-chunk/dobj/(agenda agenda NN -) [ |
|||
DT/det/(an a DT -) |
|||
] |
|||
] |
|||
sp-chunk/modnorule/(of of IN -) [ |
|||
sn-chunk/dobj/(kind kind NN -) [ |
|||
DT/det/(this this DT -) |
|||
] |
|||
] |
|||
sf-brk/modnorule/(, , Fc -) |
|||
rel-cl/modnorule/(that that WDT -) [ |
|||
rel/ccomp/(want want VBP -) [ |
|||
sn-chunk/ncsubj/(you you PRP -) |
|||
sp-chunk/ncmod/(to to TO -) [ |
|||
sv/cmod/(push push VB -) |
|||
] |
|||
] |
|||
] |
|||
sf-brk/modnorule/(, , Fc -) |
|||
claus/modnorule/(are be VBP -) [ <----------------------------- are |
|||
adv/cmod/(then then RB -) |
|||
sn-chunk/ncsubj/(you you PRP -) |
|||
attrib/ncmod/(good good JJ -) |
|||
st-brk/ta/(. . Fp -) |
|||
] |
|||
] |
|||
If you have an agenda that you want to push of this kind, then you are good. |
|||
sub-adv/top/(If if IN -) [ |
|||
sv/modnorule/(have have VBP -) [ |
|||
sn-chunk/ncsubj/(you you PRP -) |
|||
sn-chunk/dobj/(agenda agenda NN -) [ |
|||
DT/det/(an a DT -) |
|||
] |
|||
] |
|||
claus/modnorule/(are be VBP -) [ <---------------------------- are |
|||
sub-cl/modnomatch/(that that IN -) [ |
|||
sv/modnorule/(want want VBP -) [ |
|||
sn-chunk/ncsubj/(you you PRP -) |
|||
sp-chunk/ncmod/(to to TO -) [ |
|||
sv/cmod/(push push VB -) [ |
|||
sp-chunk/ncmod/(of of IN -) [ |
|||
sn-chunk/dobj/(kind kind NN -) [ |
|||
DT/det/(this this DT -) |
|||
] |
|||
] |
|||
] |
|||
] |
|||
] |
|||
sf-brk/modnorule/(, , Fc -) |
|||
] |
|||
adv/cmod/(then then RB -) |
|||
sn-chunk/ncsubj/(you you PRP -) |
|||
attrib/ncmod/(good good JJ -) |
|||
st-brk/ta/(. . Fp -) |
|||
] |
|||
] |
|||
</pre> |
|||
In my opinion the second case is misinterpreted. What do you think?[[User:Muki987|Muki987]] 19:05, 20 May 2009 (UTC) |
|||
:It is wrong, it should attach to the the direct object. But on the other hand, the wording is strange, and although I'd say it was grammatically ok, it does ''sound'' weird. If this were a linguistic example I'd label it with <code>?</code> - [[User:Francis Tyers|Francis Tyers]] 20:32, 20 May 2009 (UTC) |
|||
Thanks, then I better do not report it now, there are much more important things in dep analysis, than weird expressions. Maybe later. [[User:Muki987|Muki987]] 21:16, 20 May 2009 (UTC) |
|||
==Wiki== |
|||
Not sure what you mean. I've changed the image you added to the [[Documentation of Matxin]] page so it fits better. - [[User:Francis Tyers|Francis Tyers]] 09:04, 26 May 2009 (UTC) |
|||
: Go into en.wikipedia org, select random article, select edit tab. You will see, on the upper left window there are numerous images, for example for <nowiki>[[word]]</nowiki>, to sign an article, to insert an image, to select bold or italics. If still not clear, I can insert a screenshot. [[User:Muki987|Muki987]] 09:20, 26 May 2009 (UTC) |
|||
::Aha, ok, let me see if I can remember where to change that :) - [[User:Francis Tyers|Francis Tyers]] 09:52, 26 May 2009 (UTC) |
|||
:::Fixed after a bit of faffing. - [[User:Francis Tyers|Francis Tyers]] 10:16, 26 May 2009 (UTC) |
|||
: Thanks a million, now it's fun to edit. :-) [[User:Muki987|Muki987]] 11:33, 26 May 2009 (UTC) |
|||
==Documentation of Matxin== |
|||
Hi, I would prefer that the images not be so large, especially when they are rather garish. If you like I will remake the images, but I can't do it until e.g. 22nd June. Could you please put in the original Spanish for the Generation section? - [[User:Francis Tyers|Francis Tyers]] 08:50, 27 May 2009 (UTC) |
|||
: you mean with decenter colors? Thats OK for me, but a small picture one is unusable. I work with it. |
|||
If you insist on small ones, let me know, then I set up a private matxin page for myself. |
|||
Here the spanish text: |
|||
3.3. Formato tras generación |
|||
Los cambios más importantes son la reordenación por medio del valor recalculado para el |
|||
atributo ord, y la generación morfológica de ciertos nodos (edun ->ditudalako, patata -> patatak). |
|||
El resultado es la frase “At entatu hirukoitz batek Bagad astintzen du” |
|||
[[User:Muki987|Muki987]] 09:38, 27 May 2009 (UTC) |
|||
:Ok, translated it. Regarding the images, I've made them a bit larger and giving a re-working I think the text could be clearly visible at this size. - [[User:Francis Tyers|Francis Tyers]] 11:03, 27 May 2009 (UTC) |
|||
== of== |
|||
<pre> |
|||
Is usage of of like: |
|||
"This is the house of Peter and of Martha" grammatically ok? |
|||
Ez Peter háza és Martha. |
|||
Das ist das Haus von Peter und von Martha, |
|||
or |
|||
only the form |
|||
"This is the house of Peter and Martha" is correct? |
|||
Ez Peter és Martha háza. |
|||
Das ist das Haus von Peter und Martha. |
|||
</pre> |
|||
:Both are ok, I would probably say in speech "This is Peter and Martha's house", but consider e.g. "The Department of Health and Social Security". I would say single use of 'of' is more typical, but I wouldn't mark the first as ungrammatical. PS. I made some changes to your test sentences, I hope they are ok. - [[User:Francis Tyers|Francis Tyers]] 10:10, 2 June 2009 (UTC) |
|||
Thanks. |
|||
Is "This is Peter's and Martha's house" also correct, just unusual? |
|||
"This is Henry, good old Peter, little Otto and Martha's house" is the way, it would be used? |
|||
(not Henry's, goold old Peter's, little Otto's and Martha's house' ? |
|||
:Both are fine, probably the former is more frequent. - [[User:Francis Tyers|Francis Tyers]] 10:45, 2 June 2009 (UTC) |
|||
;my |
|||
I saw in the web "My father and son went together to Spain", however someone told, this is very unusual, maybe it was just written by non-English person? Usual is: "My father and my son ..." (in google). |
|||
:Both are fine, although probably we'd duplicate the possessive here for disambiguation. - [[User:Francis Tyers|Francis Tyers]] 10:45, 2 June 2009 (UTC) |
|||
;to |
|||
Is "He went to England and to Spain this summer" as good as "He went to England and Spain this summer"? |
|||
:In my opinion, duplicating the preposition seems to add emphasis, "He went to England ''and to Spain'' this summer. - [[User:Francis Tyers|Francis Tyers]] 10:45, 2 June 2009 (UTC) |
|||
Thanks. |
|||
==Otto== |
|||
Otto, who is an engineer and works for BMW, told, that he likes football |
|||
<pre> |
|||
en@anonymous:~/tmp/download/forditas/freeling/FreeLing-2.1-beta1/src/main$ echo "Otto, who is an engineer and works for BMW, told, that \ |
|||
he likes football" | ./analyzer -f en.cfg |
|||
sv/top/(told tell VBD -) [ |
|||
n-chunk/ncsubj/(Otto otto NNP -) [ |
|||
sf-brk/modnomatch/(, , Fc -) |
|||
rel-cl/cmod/(who who WP -) [ |
|||
rel/ccomp/(is be VBZ -) [ |
|||
sn-chunk/dobj/(engineer engineer NN -) [ |
|||
DT/det/(an a DT -) |
|||
] |
|||
sn-coor/modnomatch/(and and CC -) [ |
|||
n-chunk/modnomatch/(works work NNS -) |
|||
] |
|||
sp-chunk/ncmod/(for for IN -) [ |
|||
n-chunk/dobj/(BMW bmw NNP -) |
|||
] |
|||
] |
|||
] |
|||
sf-brk/modnomatch/(, , Fc -) |
|||
] |
|||
sub-cl/modnomatch/(that that IN -) [ |
|||
sf-brk/modnorule/(, , Fc -) |
|||
sv/modnorule/(likes like VBZ -) [ |
|||
sn-chunk/ncsubj/(he he PRP -) |
|||
n-chunk/dobj/(football football NN -) |
|||
] |
|||
] |
|||
] |
|||
</pre> |
|||
Otto, who is an engineer and works for BMW told, that he likes football |
|||
<pre> |
|||
en@anonymous:~/tmp/download/forditas/freeling/FreeLing-2.1-beta1/src/main$ echo "Otto, who is an engineer and works for BMW told, \ |
|||
that he likes football" | ./analyzer -f en.cfg |
|||
n-chunk/top/(Otto otto NNP -) [ |
|||
sf-brk/modnomatch/(, , Fc -) |
|||
rel-cl/cmod/(who who WP -) [ |
|||
rel/ccomp/(is be VBZ -) [ |
|||
sn-chunk/dobj/(engineer engineer NN -) [ |
|||
DT/det/(an a DT -) |
|||
] |
|||
sn-coor/modnomatch/(and and CC -) [ |
|||
n-chunk/modnomatch/(works work NNS -) |
|||
] |
|||
] |
|||
] |
|||
adv/modnomatch/(for for IN -) [ |
|||
vb-chunk/aux/(told tell VBD -) [ |
|||
n-chunk/ncsubj/(BMW bmw NNP -) |
|||
] |
|||
sf-brk/modnomatch/(, , Fc -) |
|||
] |
|||
sub-cl/modnomatch/(that that IN -) [ |
|||
sv/modnorule/(likes like VBZ -) [ |
|||
sn-chunk/ncsubj/(he he PRP -) |
|||
n-chunk/dobj/(football football NN -) |
|||
] |
|||
] |
|||
] |
|||
</pre> |
|||
What do you think, is the comma absolute necessary after BMW? |
|||
:I would put: |
|||
::Otto, who is an engineer and works for BMW, told us that he likes football. |
|||
:or |
|||
::Otto, who is an engineer and works for BMW, said that he likes football. |
|||
:There is no comma necessary after 'that'. - [[User:Francis Tyers|Francis Tyers]] 13:06, 8 June 2009 (UTC) |
|||
==dep== |
|||
Please check my issues on freeling/linguistic part. I tried today 5 very basic, simple sentences, I cannot imagine more simple ones, and 4 out of them show serious errors. There are so fundamental problems in freeling dep analysis, that I have more and more the feeling, I can not use it for English. It seems to be very useful at the first sight, really impressing, at the second much less. |
|||
It is much worse for English, than for Spanish, and I have the feeling, English has no priority for the authors. Its configuration is anything than simple, and not even the configuration file names are documented. |
|||
What is your impression? |
|||
:Could you show me the sentences that you tried ? The English version is much less mature than the Spanish version, and understandably it is less of a priority as it is being developed in a Catalan university. The file names are I believe documented in the PDF of the documentation. I've been planning to write a 'HOWTO' for Matxin, but need to get some things in order here first. It should be done by mid July or so. - [[User:Francis Tyers|Francis Tyers]] 15:21, 8 June 2009 (UTC) |
|||
Here the sentences. The expressions are replaced by the word expression, as below, and and this is presented freeling. |
|||
*Thank you, said Henry, and Otto also said: thank you. |
|||
*I asked him: How do you do? He answered: Fine, thank you. |
|||
::"Thank you" and "How to do you?" are interjections. They would be added as multiwords (if the system was aimed at analysing speech). |
|||
*It was raining cats and dogs in all August that year. |
|||
::'''Good''' It was raining cats and dogs for all of August that year. |
|||
::'''Bad''' In August that year it was raining cats and dogs |
|||
::'''Bad''' In August that year it was strong raining |
|||
::'''Bad''' In August it was strong raining |
|||
:::"strong raining" does not make sense, it should be "In August it was raining heavily". Incidentally FreeLing seems to analyse this correctly. |
|||
:::Note: I've yet to hear "raining cats and dogs" (which would be a multiword) used outside of the present, e.g. "It's raining cats and dogs". Nor have I heard it used in past narrative, nor in the news. If you [http://www.google.es/search?hl=en&q=%22raining+cats+and+dogs%22+site%3Anews.bbc.co.uk&btnG=Google+Search&aq=f&oq= search] on the BBC site, the top situations it comes up in are 1) talking about expressions, 2) jokes, 3) clichés. |
|||
*The young pair will go Dutch that evening. |
|||
:::Note: "go Dutch" is a multiword, it is not semantically compositional. Although in any case the parse seems correct. |
|||
*He was in the black for long time, he was the blue-eyed boy of the manager. |
|||
::"The blue-eyed boy of the manager was in the black for a long time." This kind of subject repetition is unusual, unless you're talking about two distinct people with "he", in which case it sounds strange anyway. "Both he and the blue-eyed boy of the manager were in the black for a long time". |
|||
*I'll go to England next year. |
|||
::This one is correct, the sentence sounds perfectly normal and the error is in FreeLing |
|||
*expr0 , said Henry , and Otto also said: expr1 . . |
|||
*I asked him: expr2 ? He answered: Fine , expr3 . . |
|||
*It was expr4 in all August that year . . |
|||
*The young pair will go Dutch that evening . . |
|||
*He was expr6 for long time , he was the expr5 of the manager . . |
|||
*I shall go to England next year . |
|||
Is "in all August" OK or should be "in the whole August"? |
|||
Especially frustrated me, that the last very simple sentence shows a clear fault with freeling dep analysis. No matter how we express the motion, freeling can not handle " I move (no matter how) to England next year" properly. |
|||
::This last one is quite strange that it fails, but the other sentences are not exactly "simple". Here are five sentences that I've taken from the BBC today (more or less at random): |
|||
* '''Good'''. Environment minister Jane Kennedy said she could not support him as leader. '''Bad?''' |
|||
* '''Bad'''. The investigation has been focusing on whether the plane's speed sensors stopped working properly just before it crashed in turbulent weather. |
|||
* '''Good'''. Authorities are making deep cuts to tackle the budget deficit. |
|||
* '''Bad'''. "That happened one month before the ballot opened, so it had quite a rallying effect," he said. '''Good?''' |
|||
* '''Bad'''. The defence ministry said it was closing Gabon's air, land and sea borders. |
|||
:::These are not simple, but they are every day sentences, and FreeLing does fairly badly (although the co-ordination problem in the last one is a known bug). Now for some simple sentences: |
|||
* '''Good'''. The boy kicks the ball to the girl. |
|||
* '''Bad'''. The boy kicks the ball to the girl with the telescope. '''Good?''' |
|||
:::The prepositional phrase 'with the telescope' is not attached in the right place. Although, this is genuinely ambiguous, is he kicking the ball with the telescope, or is he kicking the ball to the girl with the telescope? The ambiguity is resolved semantically, either statistically kick (ball) with (telescope) is much less frequent than to (girl) with (telescope). Or logically... "telescope is a scientific instrument and not used for kicking balls". |
|||
* '''Good'''. The cat runs. |
|||
* '''Good'''. He follows the same route every day. |
|||
* <nowiki>*</nowiki>He followed the same route next day. |
|||
* '''Good'''. I go to sleep every night. |
|||
* <nowiki>*</nowiki>I go sleep in my bed next day. |
|||
* <nowiki>*</nowiki>I went to sleep in my bed next day" |
|||
I printed out the pdf version, the file names are missing. Probably it was first one config file, and as it grew, became more and more. |
|||
Yes, the quality of the English version is significantly worse, than that of the Spanish one. I try to get the Spanish file working for English, maybe hopeless, who knows.... Checked, Spanish uses special commands related to spanish words (para, etc...) and English with English words (of, and, etc) so porting is probably not simple at all. Probably English files are missing, because Spanish directory es/dep is full with additional word files. I gave it up :-( |
|||
::It would require quite a lot of work, if it were easy they would have probably have done it by now. |
|||
Do you know (also commercial ok) any working dependency analyzer on the market (Any of English, German, Hungarian)? |
|||
::I don't know anything about commercial software, but if you search in Google, there are various dependency analysers for English available, [http://w3.msi.vxu.se/~jha/maltparser/ this one] for example is a statistical dependency parser and can be integrated in to FreeLing. - [[User:Francis Tyers|Francis Tyers]] 22:05, 8 June 2009 (UTC) |
|||
:::One piece of advice... when making test sentences, take them from webpages (e.g. the BBC) this way we will be saved a lot of time by going through grammatically incorrect sentences. If you would like a sentence that exhibits a feature of English ask me, but I may not respond to future discussions about sentences such as "strong raining", or may just mark them as <nowiki>*</nowiki>. If you really must make up your own sentences, at least Google parts of them that you are not sure about, for example "strong raining" (412 hits) "raining heavily" (131,000 hits). - [[User:Francis Tyers|Francis Tyers]] 08:11, 9 June 2009 (UTC) |
|||
==oak&malt== |
|||
Why should I have taken a raincoat and an umbrella, when my aunt who lives in Georgia said that the weather would be nice. |
|||
<pre> |
|||
1 Why why WRB WRB B-ADVP 0 ROOT _ _ |
|||
2 should should MD MD O 0 ROOT _ _ |
|||
3 I I PRP PRP B-NP 2 SBJ _ _ |
|||
4 have have VB VB B-VP 2 VC _ _ |
|||
5 taken take VBN VBN B-PP 4 VC _ _ |
|||
6 a a DT DT B-NP 7 NMOD _ _ |
|||
7 raincoat raincoat NN NN I-NP 5 OBJ _ _ |
|||
8 and and CC CC O 7 CC _ _ |
|||
9 an an DT DT B-NP 10 NMOD _ _ |
|||
10 umbrella umbrella NN NN I-NP 7 COORD _ _ |
|||
11 , , , , O 5 P _ _ |
|||
12 when when WRB WRB B-ADVP 19 ADV _ _ |
|||
13 my my PRP$ PRP$ B-NP 14 NMOD _ _ |
|||
14 aunt aunt NN NN I-NP 19 SBJ _ _ |
|||
15 who who WP WP B-NP 16 SBJ _ _ |
|||
16 lives live VBZ VBZ B-VP 14 NMOD _ _ |
|||
17 in in IN IN B-PP 16 ADV _ _ |
|||
18 Georgia Georgia NNP NNP B-NP 17 PMOD _ _ |
|||
19 said say VBD VBD B-VP 5 ADV _ _ |
|||
20 that that IN IN B-SBAR 23 VMOD _ _ |
|||
21 the the DT DT B-NP 22 NMOD _ _ |
|||
22 weather weather NN NN I-NP 23 SBJ _ _ |
|||
23 would would MD MD B-VP 19 OBJ _ _ |
|||
24 be be VB VB I-VP 23 VC _ _ |
|||
25 nice nice JJ JJ B-ADJP 24 PRD _ _ |
|||
26 . . . . O 2 P _ _ |
|||
</pre> |
|||
Never before had ski racing, a sport dominated by monosyllabic mountain men, seen the likes of Alberto Tomba, the flamboyant Bolognese flatlander who at 21 captured two gold medals at the Calgary olympics. |
|||
<pre> |
|||
1 Never never RB RB B-ADVP 37 DEP _ _ |
|||
2 before before IN IN B-PP 37 ADV _ _ |
|||
3 had have VBN VBN B-NP 4 NMOD _ _ |
|||
4 ski ski NN NN I-NP 5 SBJ _ _ |
|||
5 racing race VBG VBG B-VP 2 PMOD _ _ |
|||
6 , , , , O 37 P _ _ |
|||
7 a a DT DT B-NP 8 NMOD _ _ |
|||
8 sport sport NN NN I-NP 37 DEP _ _ |
|||
9 dominated dominate VBN VBN B-VP 8 NMOD _ _ |
|||
10 by by IN IN B-PP 9 LGS _ _ |
|||
11 monosyllabic monosyllabic JJ JJ B-NP 13 NMOD _ _ |
|||
12 mountain mountain NN NN I-NP 13 NMOD _ _ |
|||
13 men man NNS NNS I-NP 10 PMOD _ _ |
|||
14 , , , , O 8 P _ _ |
|||
15 seen see VBN VBN B-VP 8 NMOD _ _ |
|||
16 the the DT DT B-NP 17 NMOD _ _ |
|||
17 likes like NNS NNS I-NP 15 OBJ _ _ |
|||
18 of of IN IN B-PP 17 NMOD _ _ |
|||
19 Alberto Alberto NNP NNP B-NP 20 NMOD _ _ |
|||
20 Tomba Tomba NNP NNP I-NP 18 PMOD _ _ |
|||
21 , , , , O 20 P _ _ |
|||
22 the the DT DT B-NP 25 NMOD _ _ |
|||
23 flamboyant flamboyant JJ JJ I-NP 25 NMOD _ _ |
|||
24 Bolognese Bolognese NNP NNP I-NP 25 NMOD _ _ |
|||
25 flatlander flatlander NNP NNP I-NP 20 NMOD _ _ |
|||
26 who who WP WP B-NP 0 ROOT _ _ |
|||
27 at at IN IN B-PP 0 ROOT _ _ |
|||
28 21 21 CD CD B-NP 32 NMOD _ _ |
|||
29 captured capture VBN VBN I-NP 32 NMOD _ _ |
|||
30 two two CD CD I-NP 32 NMOD _ _ |
|||
31 gold gold NN NN I-NP 32 NMOD _ _ |
|||
32 medals medal NNS NNS I-NP 27 PMOD _ _ |
|||
33 at at IN IN B-PP 32 ADV _ _ |
|||
34 the the DT DT B-NP 36 NMOD _ _ |
|||
35 Calgary Calgary NNP NNP I-NP 36 NMOD _ _ |
|||
36 olympics olympics NN NN I-NP 33 PMOD _ _ |
|||
37 . . . . O 0 ROOT _ _ |
|||
</pre> |
|||
Otto and Martha go to Italy, Spain and France. |
|||
<pre> |
|||
1 Otto Otto NNP NNP B-NP 4 SBJ _ _ |
|||
2 and and CC CC I-NP 1 CC _ _ |
|||
3 Martha Martha NNP NNP I-NP 1 COORD _ _ |
|||
4 go go VB VB B-VP 0 ROOT _ _ |
|||
5 to to TO TO B-PP 4 ADV _ _ |
|||
6 Italy Italy NNP NNP B-NP 5 PMOD _ _ |
|||
7 , , , , O 6 P _ _ |
|||
8 Spain Spain NNP NNP B-NP 6 COORD _ _ |
|||
9 and and CC CC O 6 CC _ _ |
|||
10 France France NNP NNP B-NP 6 COORD _ _ |
|||
11 . . . . O 4 P _ _ |
|||
</pre> |
|||
Otto, Peter and Martha go to Italy, Spain and France. |
|||
<pre> |
|||
1 Otto Otto NNP NNP B-NP 6 SBJ _ _ |
|||
2 , , , , O 1 P _ _ |
|||
3 Peter Peter NNP NNP B-NP 1 COORD _ _ |
|||
4 and and CC CC O 1 CC _ _ |
|||
5 Martha Martha NNP NNP B-NP 1 COORD _ _ |
|||
6 go go VB VB B-VP 0 ROOT _ _ |
|||
7 to to TO TO B-PP 6 ADV _ _ |
|||
8 Italy Italy NNP NNP B-NP 7 PMOD _ _ |
|||
9 , , , , O 8 P _ _ |
|||
10 Spain Spain NNP NNP B-NP 8 COORD _ _ |
|||
11 and and CC CC O 8 CC _ _ |
|||
12 France France NNP NNP B-NP 8 COORD _ _ |
|||
13 . . . . O 6 P _ _ |
|||
</pre> |
|||
Dear Otto, good old Peter and friendly Martha go to warm Italy, warmer Spain and cool France. |
|||
<pre> |
|||
1 Dear dear RB RB B-ADVP 10 ADV _ _ |
|||
2 Otto Otto NNP NNP B-NP 10 ADV _ _ |
|||
3 , , , , O 10 P _ _ |
|||
4 good good JJ JJ B-NP 10 SBJ _ _ |
|||
5 old old JJ JJ I-NP 10 VMOD _ _ |
|||
6 Peter Peter NNP NNP I-NP 10 SBJ _ _ |
|||
7 and and CC CC I-NP 6 CC _ _ |
|||
8 friendly friendly JJ JJ I-NP 9 NMOD _ _ |
|||
9 Martha Martha NNP NNP I-NP 6 COORD _ _ |
|||
10 go go VB VB B-VP 0 ROOT _ _ |
|||
11 to to TO TO I-VP 12 VMOD _ _ |
|||
12 warm warm VB VB I-VP 10 OBJ _ _ |
|||
13 Italy Italy NNP NNP B-NP 12 OBJ _ _ |
|||
14 , , , , O 12 P _ _ |
|||
15 warmer warmer JJR JJR B-NP 20 DEP _ _ |
|||
16 Spain Spain NNP NNP I-NP 20 DEP _ _ |
|||
17 and and CC CC O 16 CC _ _ |
|||
18 cool cool JJ JJ B-NP 19 NMOD _ _ |
|||
19 France France NNP NNP I-NP 16 COORD _ _ |
|||
20 . . . . O 10 P _ _ |
|||
</pre> |
|||
John and Martha's apple and pear were sweet. |
|||
<pre> |
|||
1 John John NNP NNP B-NP 4 NMOD _ _ |
|||
2 and and CC CC I-NP 1 CC _ _ |
|||
3 Martha's Martha's NNP NNP I-NP 1 COORD _ _ |
|||
4 apple apple NN NN I-NP 7 SBJ _ _ |
|||
5 and and CC CC I-NP 4 CC _ _ |
|||
6 pear pear NN NN I-NP 4 COORD _ _ |
|||
7 were were VBD VBD B-VP 0 ROOT _ _ |
|||
8 sweet sweet JJ JJ B-ADJP 7 PRD _ _ |
|||
9 . . . . O 7 P _ _ |
|||
</pre> |
|||
In the final days of the war, Hitler and his new wife, Eva Braun, committed suicide in his underground bunker in Berlin, as the city was overrun by the Red Army of the Soviet Union. |
|||
<pre> |
|||
1 In in IN IN B-PP 18 ADV _ _ |
|||
2 the the DT DT B-NP 4 NMOD _ _ |
|||
3 final final JJ JJ I-NP 4 NMOD _ _ |
|||
4 days day NNS NNS I-NP 1 PMOD _ _ |
|||
5 of of IN IN B-PP 4 NMOD _ _ |
|||
6 the the DT DT B-NP 7 NMOD _ _ |
|||
7 war war NN NN I-NP 5 PMOD _ _ |
|||
8 , , , , O 7 P _ _ |
|||
9 Hitler Hitler NNP NNP B-NP 7 COORD _ _ |
|||
10 and and CC CC O 7 CC _ _ |
|||
11 his his PRP$ PRP$ B-NP 13 NMOD _ _ |
|||
12 new new JJ JJ I-NP 13 NMOD _ _ |
|||
13 wife wife NN NN I-NP 7 COORD _ _ |
|||
14 , , , , O 13 P _ _ |
|||
15 Eva Eva NNP NNP B-NP 16 NMOD _ _ |
|||
16 Braun Braun NNP NNP I-NP 13 NMOD _ _ |
|||
17 , , , , O 13 P _ _ |
|||
18 committ committ VBD VBD B-VP 0 ROOT _ _ |
|||
19 suicide suicide NN NN I-VP 18 OBJ _ _ |
|||
20 in in IN IN B-PP 18 ADV _ _ |
|||
21 his his PRP$ PRP$ B-NP 23 NMOD _ _ |
|||
22 undergr undergr JJ JJ I-NP 23 NMOD _ _ |
|||
23 bunker bunker NN NN I-NP 20 PMOD _ _ |
|||
24 in in IN IN B-PP 23 ADV _ _ |
|||
25 Berlin Berlin NNP NNP B-NP 24 PMOD _ _ |
|||
26 , , , , O 18 P _ _ |
|||
27 as as IN IN B-SBAR 30 VMOD _ _ |
|||
28 the the DT DT B-NP 29 NMOD _ _ |
|||
29 city city NN NN I-NP 30 SBJ _ _ |
|||
30 was be VBD VBD B-VP 18 ADV _ _ |
|||
31 overrun overrun VBN VBN I-VP 30 VC _ _ |
|||
32 by by IN IN B-PP 31 LGS _ _ |
|||
33 the the DT DT B-NP 35 NMOD _ _ |
|||
34 Red Red NNP NNP I-NP 35 NMOD _ _ |
|||
35 Army Army NNP NNP I-NP 32 PMOD _ _ |
|||
36 of of IN IN B-PP 35 NMOD _ _ |
|||
37 the the DT DT B-NP 39 NMOD _ _ |
|||
38 Soviet Soviet NNP NNP I-NP 39 NMOD _ _ |
|||
39 Union Union NNP NNP I-NP 36 PMOD _ _ |
|||
40 . . . . O 18 P _ _ |
|||
</pre> |
|||
Well, malt is far behind freeling in the analysis depth and quality, as far as I can see. What do you think? |
|||
::I find it very hard to read the output. Is there a "graphical" output mode ? - [[User:Francis Tyers|Francis Tyers]] 13:41, 9 June 2009 (UTC) |
|||
I could not find any. Documentation focuses on experimenting with different algorithms rather than product usage. |
|||
Original text of it: |
|||
Currently, MaltParser only supports tab-separated data files, which means that a sentence in a data file in the CoNLL data format could look like this (and shows the file format above). |
|||
::I searched for CoNLL visualisation in Google and came up with [http://w3.msi.vxu.se/~nivre/research/MaltEval.html this], perhaps it might work ? - [[User:Francis Tyers|Francis Tyers]] 13:50, 9 June 2009 (UTC) |
|||
==Dep tree dept calculation== |
|||
In the table the seventh field (18,4,4,1,4,7,...) shows always the line number of the parent node. |
|||
<pre> |
|||
# if table looks like: |
|||
#1 2 3 4 5 6 7 8 9 10 |
|||
#... |
|||
#1 In in IN IN B-PP 18 ADV _ _ |
|||
#2 the the DT DT B-NP 4 NMOD _ _ |
|||
#3 final final JJ JJ I-NP 4 NMOD _ _ |
|||
#4 days day NNS NNS I-NP 1 PMOD _ _ |
|||
#5 of of IN IN B-PP 4 NMOD _ _ |
|||
#6 the the DT DT B-NP 7 NMOD _ _ |
|||
#7 war war NN NN I-NP 5 PMOD _ _ |
|||
#8 , , , , O 7 P _ _ |
|||
#9 Hitler Hitler NNP NNP B-NP 7 COORD _ _ |
|||
#10 and and CC CC O 7 CC _ _ |
|||
#11 his his PRP$ PRP$ B-NP 13 NMOD _ _ |
|||
#12 new new JJ JJ I-NP 13 NMOD _ _ |
|||
#13 wife wife NN NN I-NP 7 COORD _ _ |
|||
#14 , , , , O 13 P _ _ |
|||
#15 Eva Eva NNP NNP B-NP 16 NMOD _ _ |
|||
#16 Braun Braun NNP NNP I-NP 13 NMOD _ _ |
|||
#17 , , , , O 13 P _ _ |
|||
#18 committ committ VBD VBD B-VP 0 ROOT _ _ |
|||
#19 suicide suicide NN NN I-VP 18 OBJ _ _ |
|||
#20 in in IN IN B-PP 18 ADV _ _ |
|||
#21 his his PRP$ PRP$ B-NP 23 NMOD _ _ |
|||
#22 undergr undergr JJ JJ I-NP 23 NMOD _ _ |
|||
#23 bunker bunker NN NN I-NP 20 PMOD _ _ |
|||
#24 in in IN IN B-PP 23 ADV _ _ |
|||
#25 Berlin Berlin NNP NNP B-NP 24 PMOD _ _ |
|||
#26 , , , , O 18 P _ _ |
|||
#27 as as IN IN B-SBAR 30 VMOD _ _ |
|||
#28 the the DT DT B-NP 29 NMOD _ _ |
|||
#29 city city NN NN I-NP 30 SBJ _ _ |
|||
#30 was be VBD VBD B-VP 18 ADV _ _ |
|||
#31 overrun overrun VBN VBN I-VP 30 VC _ _ |
|||
#32 by by IN IN B-PP 31 LGS _ _ |
|||
#33 the the DT DT B-NP 35 NMOD _ _ |
|||
#34 Red Red NNP NNP I-NP 35 NMOD _ _ |
|||
#35 Army Army NNP NNP I-NP 32 PMOD _ _ |
|||
# |
|||
# then depth (how far is 32 from 0) of 32 is 4. why? |
|||
# 32 shows to 31 |
|||
# 31 shows to 30 |
|||
# 30 shows to 18 |
|||
# 18 shows to 0 (0 is always the last) |
|||
# |
|||
# depth of 33 is 6. |
|||
# 33 35 |
|||
# 35 32 |
|||
# 32 31 |
|||
# 31 30 |
|||
# 30 18 |
|||
# 18 0 |
|||
# |
|||
# and so on.... |
|||
</pre> |
|||
The so calculated depth is the y coordinate of the point |
|||
in the Hitler sentence the depths: |
|||
<pre> |
|||
$VAR1 = \{ |
|||
'32' => 4, |
|||
'33' => 6, |
|||
'21' => 4, |
|||
'7' => 5, |
|||
'26' => 2, |
|||
'17' => 7, |
|||
'2' => 4, |
|||
'1' => 2, |
|||
'18' => 1, |
|||
'30' => 2, |
|||
'16' => 7, |
|||
'25' => 5, |
|||
'27' => 3, |
|||
'28' => 4, |
|||
'40' => 2, |
|||
'20' => 2, |
|||
'14' => 7, |
|||
'24' => 4, |
|||
'10' => 6, |
|||
'31' => 3, |
|||
'35' => 5, |
|||
'11' => 7, |
|||
'22' => 4, |
|||
'13' => 6, |
|||
'23' => 3, |
|||
'29' => 3, |
|||
'6' => 6, |
|||
'39' => 7, |
|||
'36' => 6, |
|||
'3' => 4, |
|||
'9' => 6, |
|||
'12' => 7, |
|||
'15' => 8, |
|||
'38' => 8, |
|||
'8' => 6, |
|||
'4' => 3, |
|||
'34' => 6, |
|||
'37' => 8, |
|||
'19' => 2, |
|||
'5' => 4 |
|||
}; |
|||
</pre> |
|||
==Visualising Matxin/Freeling== |
|||
I believe that the Matxin people have an XSL style sheet which will convert an analysis into an SVG. You could try asking them about it on their mailing list. - [[User:Francis Tyers|Francis Tyers]] 14:46, 17 June 2009 (UTC) |
|||
==They were playing or they were arrested== |
|||
<pre> |
|||
en@anonymous:~/tmp/download/forditas/freeling/FreeLing-2.1-beta1/src/main$ echo "They were playing during the day." | ./analyzer -f en.cfg |
|||
claus/top/(playing play VBG -) [ |
|||
vb-be/aux/(were be VBD -) |
|||
sn-chunk/ncsubj/(They they PRP -) |
|||
sp-chunk/ncmod/(during during IN -) [ |
|||
sn-chunk/dobj/(day day NN -) [ |
|||
DT/det/(the the DT -) |
|||
] |
|||
] |
|||
st-brk/ta/(. . Fp -) |
|||
] |
|||
en@anonymous:~/tmp/download/forditas/freeling/FreeLing-2.1-beta1/src/main$ echo "They were arrested during the day." | ./analyzer -f en.cfg |
|||
claus/top/(arrested arrest VBN -) [ |
|||
vb-be/aux/(were be VBD -) |
|||
sn-chunk/ncsubj/(They they PRP -) |
|||
sp-chunk/ncmod/(during during IN -) [ |
|||
sn-chunk/dobj/(day day NN -) [ |
|||
DT/det/(the the DT -) |
|||
] |
|||
] |
|||
st-brk/ta/(. . Fp -) |
|||
] |
|||
en@anonymous:~/tmp/download/forditas/freeling/FreeLing-2.1-beta1/src/main$ echo "They were old during the day." | ./analyzer -f en.cfg |
|||
claus/top/(were be VBD -) [ |
|||
sn-chunk/ncsubj/(They they PRP -) |
|||
n-chunk/dobj/(old old NN -) |
|||
sp-chunk/ncmod/(during during IN -) [ |
|||
sn-chunk/dobj/(day day NN -) [ |
|||
DT/det/(the the DT -) |
|||
] |
|||
] |
|||
st-brk/ta/(. . Fp -) |
|||
] |
|||
</pre> |
|||
I think, that gerund and passive constructs are generally false interpreted by freeling. "were" is the verb and that should be the root. What do you think? [[User:Muki987|Muki987]] 20:37, 19 June 2009 (UTC) |
Latest revision as of 20:42, 19 June 2009
Contents
- 1 Useful links
- 2 Basque
- 3 Headline text
- 4 Considerations for prefix groups and possessions
- 5 Expressions
- 6 Further subjects
- 7 Resources
- 8 Lexicon
- 9 Look after
- 10 Examples
- 11 Corpora
- 12 Hunspell for generation
- 13 Speed
- 14 POS taggers
- 15 dependency
- 16 Versions
- 17 Dog us
- 18 Fixes
- 19 Hi regan,
- 20 Dep analysis
- 21 Wiki
- 22 Documentation of Matxin
- 23 of
- 24 Otto
- 25 dep
- 26 oak&malt
- 27 Dep tree dept calculation
- 28 Visualising Matxin/Freeling
- 29 They were playing or they were arrested
Useful links[edit]
Basque[edit]
I found some useful Basque introductions:
Note that in our Basque→Spanish system we do something similar with Basque cases. For example, a typical way of representing "hegoak" would be:
^hegoak/hego<n>+a<det><art><pl>/ hego<n>+a<det><art><sg>+k<post>$
"<hegoak>" "hego" IZE ARR DEK ABS NUMP MUGM "hego" IZE ARR DEK ERG NUMS MUGM
Note how in our representation the case is marked as a postfix k<post>
where in the more traditional analysis it is marked as a case ERG (ergative). Compared with Basque→Spanish, Hungarian→English would be easier in terms of word order:
S O V Txinako Poliziak, datu ofizialen arabera, 1.317 pertsona atzeman zituen la Policía de China, según los datos oficiales, 1.317 personas capturó The Chinese police, according to official data, 1,317 people detained. `According to official data the Chinese police detained 1,317 people.'
- Francis Tyers 10:13, 9 April 2009 (UTC)
- I see, Matxin can handle both Basque-Spanish and Spanish-Basque, so I'll look throughoutly into that. Basque is also a hun language, as far as I know, very similar to Hungarian. Muki987 10:32, 9 April 2009 (UTC)
- Actually, the Matxin system cannot handle Basque→Spanish, as there is no dependency analysis for Basque. Apertium is used for Basque→Spanish and Matxin for Spanish→Basque. As far as I know, Basque does not have any living relatives. - Francis Tyers 10:44, 9 April 2009 (UTC)
- That's important for me to know, thanks. Muki987 11:30, 9 April 2009 (UTC)
- Is there any difference between the main diagram "How Apertium works" between Apertium and Matxin? If yes, where, if not: What is the difference between Matxin and Apertium (except of character coding)? Muki987 12:37, 9 April 2009 (UTC)
- Basque has lots of living relatives, Hungarian, Armenian, Turkish, Aserbaidshan, Uigur, Finnish, Estonian, Persian, Japanese (thru Ainu = hunnish influence), Ketchua (Inka language in south America), ancient Egyptian (no more living, but hieroglyphes show a great past), Etruscian (also no more living, but great past), Hindi, and more. Muki987 11:30, 9 April 2009 (UTC)
Headline text[edit]
Considerations for prefix groups and possessions[edit]
Prefix groups[edit]
In English, one prefix can handle more nouns, for example: I travel to England, France and Spain. This will be translated as: Utazom Angliá-ba, Franciaország-ba és Spanyolország-ba. ("-" added for clarification). Utazom: I travel, Angliába: to England, ... , Spanyolországba: to Spain
In English the prefix ...nouns structure will be closed by:
- a dot (finishing the sentence)
- a verb - I travel to england an spain and will carry a bag- the word "will" closes the scope of to.
- a new prefix - I travel to england and spain with train or aeroplane- the word "with" closes the scope of to.
- Co-ordinated noun phrase with case agreement. I would probably do this kind of thing in pre-transfer with a constraint grammar. Basically write a rule which does: "add accusative case to nouns following the preposition 'to' until a new preposition, verb or end-of-sentence". - Francis Tyers 10:22, 9 April 2009 (UTC)
Possesions[edit]
In English the possessor may be before the possesion: Peter's coffee and tee
but also behind it: the coffee and tee of Peter
In Hungarian the possessor is always strictly before the possesion, both sentences above must be translated as: Péter kávé-ja és teá-ja. (again "-" just for clarity).
In English the possession structure will be closed by
- a dot (finishing the sentence)
- a verb - Peters coffee and tee looks like a bag - the word "looks" closes the scope of possession structure
- a new prefix - Peters coffee and tee with sugar - the word "with" closes the scope of possession structure
In the case of "the coffee and tee of Peter" type possession relation: If an noun enumeration starts, the translator must watch. If the enumeration ends with "of", this is a possession structure, and must be translated, os such.
Combination of possession and prefix[edit]
- With Peter's coffee and tee - Péter kávé-já-val és teá-já-val - ja is possession, val, vel is with
- With the coffe and tee of Peter - as above
Adding plural[edit]
- With Peter's coffee and tees - Péter kávé-já-val és teá-i-val - "i" is plural possession for tea
Remark[edit]
These kind of structures caused for me the most manual work when translated texts from English/German, therefore it is very important to set up their proper translation. Thanks in advance for any critics/thought/comments. Muki987 10:12, 9 April 2009 (UTC)
- An interesting example. This is a fourth kind of structure closing signal: noun immediately followed a verb also stops the structure:
- I drank Peter's coffee and children played near to us.
- I saw Peter's coffee and tee smell like sugar - this sentence is even in English is ambiquous - what smells like sugar, both or only tee? Would a comma after coffee limit possession to coffee? Muki987 11:48, 9 April 2009 (UTC)
- I'm not sure if this is ambiguous. The ambiguity is resolved by the inflection of the verb in this particular case.
- ?I saw Peter's coffee and tee smell like sugar
- I saw [Peter's coffee] and [tea] smell like sugar
- I saw [Peter's coffee and tea] smell like sugar
- - Francis Tyers 13:36, 9 April 2009 (UTC)
Some tests[edit]
Peter's coffee and tee are sweet
- el café de Peter y tee es dulce
Peter's mice and cats are sweet
- los ratones de Peter y los gatos son dulces
Peter's mice and Peter's cats are sweet
- los ratones de Peter y los gatos de Peter son dulces
This clearly shows, that in case of possesion apertium would not consider multiple possessions. Since it is also ambiguious in English, this is a feature.
- Peter's coffee and tea are sweet
- el café de Peter y el té son dulces
- In this case, rules are not available for co-ordinated noun phrases, and so the translations come out rather badly. It is not to say that it is impossible to do in Apertium, just that so far for English→Spanish we have had more important things to work on. - Francis Tyers 13:27, 10 April 2009 (UTC)
Same is true for attributes:
Little boys and girls were playing
- Pocos chicos y las chicas jugaban
Little boys and little girls were playing
- Pocos chicos y pocas chicas jugaban
attribute is only taken as attribute of the neighbouring noun. Again, a feature. Muki987 13:14, 10 April 2009 (UTC)
- Part of the problem above is in part-of-speech tagging "Little" is being taken as as a quantifier. If you try with 'small' the result is better.
- Small boys and girls were playing
- Chicas y chicos pequeños jugaban
- Also in small case only the boys are concerned. Apertium is rather consistent here. But let's consider this as a feature, since English is also ambiguous here. Muki987 09:48, 11 April 2009 (UTC)
- The ambiguity is preserved in the sentence — this is a feature. - Francis Tyers 10:41, 11 April 2009 (UTC)
- Small boys and small girls were playing
- Los Chicos pequeños y las chicas pequeñas jugaban
- Here both of the translations are ok. - Francis Tyers 13:27, 10 April 2009 (UTC)
Expressions[edit]
What about expressions? For example "look after one's fences" this can be in the form of looking, looked, however, at present not handled at all:
- Peter looked after Martha's fences
- Peter miraba después de las vallas de Martha
The expression will be not at all recognized (Peter handled in the interest of Martha).
Is there something planned for this? Are there working examples available? 20-30% of our speech are expressions!!!! Muki987 13:38, 10 April 2009 (UTC)
Further subjects[edit]
>'házas (married- repeat all previous for this up to here, except the last 2) 1680' -- a married house? Really?
That word is a bit exception, since it has two meanings házas means married, and also a man/woman, who has a house In case if ing (shirt) inges means someone, who wears a shirt
- Not specific to English, but sharper in English, than in any other cultur language. What about your ideas to solve it? Muki987 11:54, 8 April 2009 (UTC)
- We don't currently have a good working lexical selection module, but it is one of the ideas we're hoping to get implemented through GSOC. - Francis Tyers 21:48, 8 April 2009 (UTC)
From Wikipedia:
In some cases, the diminutive suffix has become part of the basic form. These are no longer regarded as diminutive forms:
Animals
- -ka/ke: fóka (seal), róka (fox), csóka (jackdaw), pulyka (turkey), szarka (magpie)
- -cska/cske: macska (cat), kecske (goat), fecske (swallow), szöcske (grasshopper)
- You see, you get better answers in fickipedia. You are right, this is an issue for translations, however one of the issues, that can easily be covered. Muki987 11:54, 8 April 2009 (UTC)
- Sure, and a lot of others also not. One after the other. Muki987 18:14, 8 April 2009 (UTC)
- This is rather off the topic of the discussion, this page is more to discuss methods of representing agglutinative morphology in Apertium, rather than the translation problems of agglutinative languages (which are also interesting, but better reserved for another page, or the mailing list). :) - Francis Tyers 08:21, 7 April 2009 (UTC)
- Glad to hear, that you are convinced, apertium technology is suitable for agglutinative languages. Having gone thru the English-SerboCroatian example I was not that sure. I am at the moment in the evaluation phase, and I am looking for all existing technologies. At present in my opinion google translation technology with its statistical, grammar free approach will never have the quality of a grammar oriented one, like apertium. It will for ever remain on the surface, with no real improvement perspective. However, for some situations it is very helpful. That was my first step in the direction. We can continue this subject on my discussion page, if xxx wants. Muki987 10:02, 7 April 2009 (UTC)
- Regarding other free grammar-focussed MT engines, you might also check out and Matxin. Open Logos has the downside of not supporting UTF-8 and not having very active development, while Matxin requires a dependency grammar to be written in Freeling format. If you want to go from English→Hungarian then this might be the answer, as they already have one written for English, but for Hungarian→English, it might take some extra development time. The Constraint grammar formalism for disambiguation and syntactic annotation might also be interesting. I'm quite happy to discuss other options and if you have any questions, please contact us on the mailing list, personally or through IRC. - Francis Tyers 10:36, 7 April 2009 (UTC)
- PS. Are you the one asking on the hunmorph list about generation ('morp visszafele')? :) - Francis Tyers 12:00, 7 April 2009 (UTC)
- Yes, I am an old language "rabbit" :-) Peter H. says, hunlex knows something similar, we are waiting for Victor, the author, he might know..... Muki987 18:04, 7 April 2009 (UTC)
Resources[edit]
Perhaps we could make a page of free resources for Hungarian ? - Francis Tyers 12:59, 9 April 2009 (UTC)
Sure, why not. As I go ahead, I'll think of the idea, and collect things. Muki987 13:31, 9 April 2009 (UTC)
Lexicon[edit]
I took a look at the lexicon you pointed me to. It looks ok, although it isn't tagged for part-of-speech. There are lots of "set phrases" (which we can extract into a translation memory) and "multiwords" (nice!) Some questions on the format:
writer szerzõ írnok író write up elõnyös színben tüntet fel feldicsér feldolgoz kidolgoz megír naprakész állapotba hoz
Is it:
<word1 in English> <trans1 in Hungarian> <trans2 in Hungarian> <word2 in English> <trans1 in Hungarian> ...
E.g. blank line, English, Hungarian ... blank line ?
- blank line is the sign for "next word" Muki987 20:11, 10 April 2009 (UTC)
Is the first translation always the most frequent one ?
- No, they are alphabeticalMuki987 20:11, 10 April 2009 (UTC)
The first step in conversion might be to take out the entries which only have one translation into a file which looks like
program ; program workweek ; munkahét workshop ; mûhely ...
Then analyse the left side with the English analyser from Apertium, the right side with the hunmorph.
^program/program<n><sg>$ ; program/NOUN ^workweek/*workweek$ ; munkahét/NOUN ^workshop/workshop<n><sg>$ ; mûhely/NOUN ^work/work<n><sg>/work<vblex><inf>/work<vblex><pres>$ ; alkotás/NOUN @ alkot/VERB[GERUND]/NOUN @ alkot/VERB[GERUND]/NOUN ...
Then extract the entries which have analyses in both analysers which agree for parts of speech.
^program/program<n><sg>$ ; program/NOUN ^workshop/workshop<n><sg>$ ; mûhely/NOUN ^work/work<n><sg>$ ; alkotás/NOUN ...
And convert that to Apertium format:
<e><p><l>program<s n="n"/></l><r>program<s n="NOUN"/></r></p></e> <e><p><l>workshop<s n="n"/></l><r>mûhely<s n="NOUN"/></r></p></e> <e><p><l>work<s n="n"/></l><r>alkotás<s n="NOUN"/></r></p></e> ...
This could largely be done automatically, but would need to be manually checked. I would focus on the most frequent open-category words, closed-category words can be done better by hand from scratch. - Francis Tyers 13:48, 10 April 2009 (UTC)
- I've emailed you a "first pass" along with some of the commands I used for creating it. If you have Apertium installed you will be able to see basic noun transfer. I still haven't figured out a way to get hunmorph to do generation though. PS. I would like to ask your permission to put the generated file in our incubator - Francis Tyers 14:47, 10 April 2009 (UTC)
- Received, thanks a million. :-) I have to install apertium on my pc, learn how to use this, and test the results. Sounds very promising!!!! Muki987 20:11, 10 April 2009 (UTC)
- Ok cool :) Can I put the dictionary I extracted in our incubator? It isn't mandatory, but it might be useful to someone at some point. - Francis Tyers 20:21, 10 April 2009 (UTC)
- Sure, you can. I think, however, it needs a lot of work before it is really usable. Muki987 21:12, 10 April 2009 (UTC)
- Thanks, and yes it definitely does, but one of my principles, is release as soon as possible :) A lot of times I've come across stuff that would be useful to me, but people don't want to release it until it is "finished" -- which sometimes doesn't happen then you lose all the work... If you rifle through the incubator you'll see a lot of junk that I've just played around with and put up there... in case someone else finds it useful some day. - Francis Tyers 21:20, 10 April 2009 (UTC)
Look after[edit]
I just spoke to another developer and he says that "vigilar" is a better translation than "cuidar". So i've added it to the SVN and the change should be live in ~12 hours.
English dictionary: <e lm="look after"><i>look</i><par n="accept__vblex"/> <p> <l><b/>after</l><r><g><b/>after</g></r></p></e> English--Spanish bilingual dictionary: <e srl="look_after"><p><l>look<g><b/>after</g><s n="vblex"/></l><r>vigilar<s n="vblex"/></r></p></e>
And now:
$ echo "Peter looked after Martha's fences" | apertium -d . en-es Peter vigiló las vallas de Martha
Much better :) - Francis Tyers 20:33, 10 April 2009 (UTC)
- Not really. Completely missing the expression meaning, he cares for her interest. vallas is fence, and not interest.
- Please also see the examples on your page, not a single working expression :-( Muki987 21:27, 10 April 2009 (UTC)
- I don't agree. I am a native speaker of English, and from that expression I take "Peter looked after the fences of Martha". If you would like to talk by telephone, I'll email you my phone number, but I think you have quite an unusual idea of what English is. - Francis Tyers 21:52, 10 April 2009 (UTC)
- By the way, in case you had any doubt, I wouldn't expect the translator to be able to translate "alri' there ma' gizzu gleg", "here y'ar" or "this int nowt" either. - Francis Tyers 22:12, 10 April 2009 (UTC)
- I also took the opportunity to ask a speaker of American English and he suggested that you might have wanted to say "mend fences" ? Which means "to resolve past conflicts, or put differences aside" -- personally I've never heard of that one, but perhaps you got mixed up? - Francis Tyers 07:53, 11 April 2009 (UTC)
- Ahh, so what I think, is a usual English expression (look after one's fences == take care of one's interest) is not a common expression at all? I must rely on dictionaries I find here and there, like Jim pertaining Hungarian grammar. So now I changed my picture, thank for your input, as native English speaker. I use skype, and we could talk over skype at any time, however, writing is probably more effective. Muki987 09:56, 11 April 2009 (UTC)
- No, it definitely isn't a common expression! :) Did you ever see the Monty Python sketch "Hungarian phrasebook (youtube)"? I don't use Skype as there is unfortunately not a free client for it. - Francis Tyers 10:13, 11 April 2009 (UTC)
- Good to know (the expression), thanks!
- That sketch is not funny for me at all.
- Skype is completely free, very-very easy to use. Telephon abroad costs always a lot, no matter how you try. Muki987 10:24, 11 April 2009 (UTC)
- I have an idea that might prevent you spending your time on something that isn't what you want or need. Could you send me ten sentences in Hungarian of varying levels of complexity, with linguist's glosses in English and translation in English. e.g.
- 1. John lát egy almát
- ~~ John see+VERB an+ART apple+ACC
- ~~ John sees an apple
- This would be an "easy" sentence to translate, could you send 10 examples of others from "easy" to hard? I'll then look at them and let you know which ones will be able to be translated with Apertium, which with Matxin, and which will not be able to be translated (with machine translation). - Francis Tyers 12:20, 11 April 2009 (UTC)
Examples[edit]
I have put some addresses onto my page.
- Yes, I saw :) Incidentally, our lead developer has a print out of one of the descriptions of the MetaMorfo parser on his desk. We're planning to do something similar one of these days... - Francis Tyers 21:26, 11 April 2009 (UTC)
Glosses[edit]
- John and Martha's apple and pear were sweet
- John és Martha almája és körtéje édesek voltak
- John and Martha apple-his and pear-his sweet-s be+past+plural,3 person
- A remark: John és Martha almája és körtéje édes- if in present - John and Martha apple+his and pear-his sweet. (no need for verb in this case, the adjektiv is in singular)
- The ski jumpers were on the top of the hill
- A síugrók a hegy tetején voltak
- The ski-jumpers the hill top-its-on be+past+plural,3 person
- In present: A síugrók a hegy tetején vannak -- the ski-jumpers the hill top-its-on be+plural,3.person
- however: The ski jumpers ont the top of the hill are friendly
- A síugrók a hegy tetején barátságosak - the ski-jumpers the hill top-its-on friendly-s -- No need for verb here.
- He travelled in a nice coach
- Jó kocsiban utazott
- good coach-in travel+past+singular,3 person
Interesting here that the verb comes final. How would you say, for example "John saw Martha" and "John saw Martha through his telescope"? - Francis Tyers 19:17, 12 April 2009 (UTC)
- John saw Martha
- John látta Marthát
- John see+past+sng.3person Martha+ACC
- John saw Martha through his telescope
- John látta Mártát teleszkópján
- John see+past+sng.3person Martha+ACC telescope+his+on
- John saw Martha through his telescope
- John látta Mártát teleszkópján keresztül
- John see+past+sng.3person Martha+ACC telescope+his+on through
-- this version is a bit overkill with the "keresztül", the previous is more practical
- Ok, so it seems that with adverbial phrases "in a nice coach", "on the top of the hill" and with adjective complements, they come before the verb. How about:
- John saw through his telescope
Would be:
- John teleszkópján látta -- John telescope-his-on see+sng+3.pers
? - Francis Tyers 20:14, 12 April 2009 (UTC)
- It depends on the context.
- John saw through his telescope that the Moon is yellow
- John teleszkópján látta, hogy a hold sárga -- John telescope-his-on see+sng+3.pers, that the moon yellow.
- John looked through his telescope to see a part of the moon.
- John teleszkópjába nézett, hogy lássa a hold egy részét. -- John telescope-his-into look+past,sng,3.pers because see+imperativ+sng+3.pers the moon one part.
- It is also important, that each verb has 2 conjugated form: with and without subject
- I watch = nézek or nézem. Nézek: I watch down: generally, nothing special. Nézem: I watch the cat.
- ... (each person)
- They watch = néznek or nézik. They watch down: néznek they watch the cat: nézik
1. example genitiv and grouping with "and"[edit]
Ok, let's start:
John's and Martha's apple and pear were sweet- John and Martha's apple and pear were sweet
- John y Martha la manzana y la pera eran dulces - Apertium
- John y la manzana de Martha y la pera eran agradables - promt
- John és Martha almája és körtéje édesek voltak - webforditas - John and Martha apple-his and pear-his sweet-s were.
- errors:
- Ignores the 's after John.- apertium
- Ignores that both apple and pear belong to John and Martha - apertium & prompt
- "John's and Martha's apple and pear were sweet" I would qualify as ungrammatical.
- "John and Martha's apple and pear were sweet". The 's possessive in English is clitic and applies to the whole phrase.
- Not that it makes a real difference to the translation quality. Co-ordinated noun phrases require extra rules that we haven't written in English→Spanish. This is not an issue of the power of the engine, but rather the number of rules. "Martha's apple was sweet" → "La manzana de Martha era dulce". The principle is the same. Incidentally, FreeLing (and therefore Matxin) also make a mess of parsing this).
- I'm calculating how frequent this construction is in the Europarl corpus, and came up with this example:
- The resources and capabilities of this country's agriculture and industry
- Los recursos y capacidades de la agricultura de este país e industria
- Los recursos y capacidades de la agricultura de este país e industria - promt
- Ennek az országnak a mezőgazdaságának és iparának az erőforrásai és képességei - webforditas
- It is still broken, but less so, again. The fixed length pattern for co-ordinated phrases after a genitive. The rule which fires above matches the pattern "DET NOM1 GENITIU NOM2", where we would need "DET NOM1 GENITIU NOM2 CC NOM3". - Francis Tyers 21:06, 11 April 2009 (UTC)
- This pattern appears in approximately 0.4% of the two million sentences in the Europarl corpus. - Francis Tyers 22:02, 11 April 2009 (UTC)
- John and Martha's apple and pear were sweet
- John y la manzana de Martha y la pera eran agradables - promt
- John y Martha la manzana y la pera eran dulces - apertium
- John és Martha almája és körtéje édesek voltak
errors:
- prompt ignores that John is also owner, ignores that apple and pear are together
- apertium ignores both as owners
2 . example word combination[edit]
- The ski jumpers were on the top of the hill
Los jerseys de esquí eran en la parte superior del cerroLos saltadores de esquí eran en la parte superior del cerro - apertium- Los saltadores de esquí estaban en la cumbre de la colina - promt
- A síugrók a domb tetõjén voltak webforditas, should be: A síugrók a hegy tetején voltak, The ski-jumpers the hill top-its-on were.
- errors:
Apertium does not know "ski jumpers" combination.- webforditas generates false form tetõjén ->(should be)-> tetején
- Here the Prompt output is much better. Adding the "ski jumper" multiword is easy, but that doesn't fix the problem with "en la parte superior", we'd have to look at the dictionary for that one. Again, this would be an incremental improvement, nothing "insurmountable". - Francis Tyers 21:33, 11 April 2009 (UTC)
You teached apertium? web testing shows: (http://xixona.dlsi.ua.es/apertium-www/index.php?id=translatetext):
- Los jerseys de esquí eran en la parte superior del cerro Muki987 21:41, 11 April 2009 (UTC)
- Let me update that (it updates every ~12 hours), but now I can do it manually. - Francis Tyers 21:46, 11 April 2009 (UTC)
- PS. You can use the testing interface here. It also allows you to see the translation steps (see "print intermediate representation"). - Francis Tyers 19:23, 12 April 2009 (UTC)
3. example single word[edit]
- He travelled in a nice coach
- Viajó en un entrenador guapo - apertium
- Él viajó en un entrenador agradable - promt
- Egy jó edzõben utazott - webforditas - A good in-coach travelled, should be: Jó kocsiban utazott: good coach-in travelled.
- Errors:
none recognizes, that coach is not only trainer, but also a coach, where I can sit. This shows, word selection (3.4 in docs) does not work reliable. Apertium also misses el. Besides that Egy is unnecessary and ugly in webforditas.Muki987 21:37, 11 April 2009 (UTC)
- Word selection is not turned on, because the current module we have does not work. I mentioned this a couple of days ago (see above in the talk page). - Francis Tyers
- I see. I will not test word selection. It is one of the most critical features generally, besides expression selection (multiword selection) Muki987 21:42, 11 April 2009 (UTC)
- Yes, when dealing with distant languages it is one of the most important features. This is why we have it as #1 on our projects list for Google Summer of Code :) - Francis Tyers 21:45, 11 April 2009 (UTC)
- Strongly agree. I would say, word selection is prio 1, expression selection immediately after word selection, and after these comes the rest. The mechanisms in apertium look good and well configurable. Muki987 21:53, 11 April 2009 (UTC)
- My main complaint after word selection (when working on unrelated languages) is that we can't (so far) do recursive pattern matching. But we're working on it... - Francis Tyers 22:01, 11 April 2009 (UTC)
- I start now installation and set up on my machine, I do not see any insurmountable problems. We shall see after setting up, how to continue. I must understand the mechanisms and handling first. If you need tests for word selection or expressions, just let me know, I am glad to help at testing. Muki987 21:53, 11 April 2009 (UTC)
- Ok, thanks. I have been working on some tricks for limited rule-based lexical selection, you can find them in category:Lexical selection. These are just proofs of concept though. Working with unrelated languages is quite new for us (yes, even after 3-4 years!), most of our work goes on related languages. PS. I would be interested in seeing glosses for the Hungarian translations above (if you have time) - Francis Tyers 22:01, 11 April 2009 (UTC)
- Done Muki987 18:48, 12 April 2009 (UTC)
- Can you check if my changes to annotations are correct? - Francis Tyers 19:17, 12 April 2009 (UTC)
- I could not find any changes. Where are they? The first was not gott, I corrected it. Muki987 19:28, 12 April 2009 (UTC)
- I put them at the top of this section. - Francis Tyers 19:51, 12 April 2009 (UTC)
- Found thanks. Also corrected then, the on was missing at the end. Muki987 20:03, 12 April 2009 (UTC)
Corpora[edit]
You can also add Hungarian Wikipedia to that list. We have some scripts for processing it too, see Calculating coverage. - Francis Tyers 11:20, 12 April 2009 (UTC)
Hunspell for generation[edit]
I get the same error that you posted to the list "NO DATA". The analysis works reasonably well. We would probably need to change the tagset, but that wouldn't be more than a few search/replace operations on the .aff and .dic files. "[" and "/" are reserved characters, and "<" cannot be embedded (for more information see Apertium stream format). In terms of analysis it is quite unfortunate that it doesn't do tokenise-as-you-analyse (as lttoolbox does), but it can probably be adapted to do this. - Francis Tyers 19:12, 12 April 2009 (UTC)
- Yes, here some analysis results with hunpos:
en@anonymous:~/tmp/download/forditas/hunpos/hunpos-1.0-linux$ ./hunpos-tag ../hu_szeged_kr.model model loaded tagger compiled holnap elmegyek moziba .
holnap ADV elmegyek VERB<PERS<1>> moziba NOUN<CAS<ILL>> . PUNCT
Márta és János almái és körtéi édesek .
Márta NOUN és CONJ János NOUN almái NOUN<PLUR><POSS> és CONJ körtéi NOUN<PLUR><POSS> édesek ADJ<PLUR> . PUNCT
Jó kocsiba ülj !
Jó ADJ kocsiba NOUN<CAS<ILL>> ülj VERB<SUBJUNC-IMP><PERS<2>> ! PUNCT
Tagset name change is absolutely no problem. I am very experienced in awk, and such changes are done in minutes.
I got on the Oo languagetool list from Német László 2 weeks ago working analysis results. The problem is with encoding of dictionary, it must be 8 bit. Looking for that. Muki987 19:53, 12 April 2009 (UTC)
Yes, I found that if I switched to an 8bit terminal the analysis worked fine. I think it might be something to do with the encoding of the aff/dic files for Hungarian (which seem to be in 8bit). I was intending to get around to change it, but didn't try it yet.
PS. Here is how I imagine Apertium tags:
elmegyek VERB<PERS<1>> → ^elmegyek/el<VERB><PRES><PERS1>$ moziba NOUN<CAS<ILL>> → ^moziba/mozi<NOUN><ILL>$ almái NOUN<PLUR><POSS> → ^almái/alma<NOUN><PLUR><POSS>$ édesek ADJ<PLUR> → ^édesek/édes<ADJ><PLUR>$ kocsiba NOUN<CAS<ILL>> → ^kocsiba/kocsi<NOUN><ILL>$ ülj VERB<SUBJUNC-IMP><PERS<2>> → ^ülj/ül<VERB><SUBJ_IMP><PERS2>$
- Francis Tyers 20:15, 12 April 2009 (UTC)
- How to switch to an 8 bit terminal (or how to convert the aff/dic files)? Muki987 21:06, 12 April 2009 (UTC)
- In gnome-terminal, you can go to "Terminal -> Change character encoding". To convert the aff/dic files I'd use iconv, e.g.
$ cat morphdb_hu.aff | iconv -f latin2 -t utf-8 > morphdb_hu.aff.u8
- - Francis Tyers 21:18, 12 April 2009 (UTC)
- Actually, I just tried this and it doesn't seem to work. - Francis Tyers 22:41, 12 April 2009 (UTC)
- Got the solution: download the last hu_HU.aff and hu_HU.dic from magyarispell.sf.net, and they work fine:
en@anonymous:~/program/humorph$ ls analyze hu_HU.aff morphdb_hu.aff morphdb_hu.dic chmorph hu_HU.dic morphdb_hu.aff.u8 morphdb_hu.dic.u8 en@anonymous:~/program/humorph$ echo program | ./chmorph hu_HU.aff hu_HU.dic /dev/stdin NOM ACC programot en@anonymous:~/program/humorph$ echo programot | ./analyze hu_HU.aff hu_HU.dic /dev/stdin > programot analyze(programot) = st:program po:noun ts:NOM is:ACC stem(programot) = program
- Could you give me a direct link to those files? I couldn't find them — my Hungarian really is non-existent :( - Francis Tyers 20:44, 16 April 2009 (UTC)
- Slowness is because of affix tables and also flags of dic words are much more complicated, than for prefix languages like English, Spanish etc. Read in is much slower, however working is the same speed. For example Hungarian spell checking has the same speed as German or English spell checking. Start up is slower for Hungarian dictionaries.
- Muki987 20:01, 15 April 2009 (UTC)
- Ok, I'll check this out when I get home. Is there a binary format for the .aff/.dic files? Perhaps this might speed it up? - Francis Tyers 08:16, 16 April 2009 (UTC)
- No. Even if there were, the amount of data is much higher for Hungarian, than for prefix languages. To read in 55 MB is slower than to read in 3 MB. Muki987 08:42, 16 April 2009 (UTC)
- Yes, but there are ways to reducing the size. For example, a compiled Apertium dictionary typically takes up ~10% of the size due to the re-use of suffixes. E.g. if you have 'house +s -> house +n+pl' , 'computer +s -> 'computer +n+pl' , 'cat +s -> 'cat +n+pl' you only need to store the part '+s -> +n+pl' once. Also, parsing binary files can be quicker than parsing text files. - Francis Tyers 09:02, 16 April 2009 (UTC)
- Yes, I agree with you. I shall propose to add an optional translation phase in hunmporph group. For example to write all structures after having read in into a file "froze them" and then, when called the next time, simply read in the frozen data. Muki987 09:13, 16 April 2009 (UTC)
Speed[edit]
Having checked chmorph source, there is no need for speed optimization. The program creates a class hunspell, and uses its generate method to get the morphologica data. We- as the translator application- can simply link libhunspell to our application, create the hunspell class once (that's expensive = slow), and then use that class for our translation procedure until we are finished. What do you think? Muki987 20:15, 16 April 2009 (UTC)
- It is fine but it will be better for translating long documents than short ones, as the apertium pipeline loads the data each time. PS. I got Matxin to build!! - Francis Tyers 20:29, 16 April 2009 (UTC)
- I use hunspell every day for all kinds documents, no load slowness problem at all. We shall see when we work with the translation system, I am optimistic. Muki987 20:51, 16 April 2009 (UTC)
- Ok, great. - Francis Tyers 22:11, 22 April 2009 (UTC)
- I saw, you documented Matxin, I'll follow the suggestions asap. Thanks! Muki987 20:51, 16 April 2009 (UTC)
POS taggers[edit]
I am looking for tagger, that best suits for English-Hungarian and German-Hungarian. I tested treetagger (http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/), freeling and apertium. Treetagger is not open source, and fails to show lemmas for German, however at the moment the only working German tagger. It is very fast.
- I think there are corpora tagged for German, so it shouldn't be hard to train any tagger. I'll have to go looking. There are also morphological resources for German lying around somewhere, it is just a matter of changing them to Apertium or Freeling format. - Francis Tyers 22:06, 22 April 2009 (UTC)
- Also see
apertium-de-en
in the incubator. There is a dictionary partially converted, morphological analysis for German here. - Francis Tyers 22:13, 22 April 2009 (UTC)
- Also see
In my opinion apertium gives a more stable and more detailed analysis than freeling.
- This was because you were using freeling "tagged" output, but apertium "analysed" output. See below. - Francis Tyers 22:06, 22 April 2009 (UTC)
Freeling[edit]
Henry and Martha went home when they saw an aeroplane fly through the sky. Henry henry NN and and CC Martha martha NP went go VBD home home NN when when NN they they PRP saw see VBD an an DT aeroplane aeroplane NN fly fly VBP through through IN the the DT sky sky NN . . Fp ** OutputFormat=morfo in en.cfg ** $ echo "Henry and Martha went home when they saw an aeroplane fly through the sky." | ./analyzer -f ../../data/config/en.cfg Henry henry NNP 1 and and CC 0.999723 and JJ 0.000163167 and NN 0.000114217 Martha martha NNP 1 went go VBD 0.499505 wend VBD 0.499505 wend VBN 0.00098912 home home NN 0.953125 home RB 0.046875 when when WRB 0.999667 when IN 0.000333111 they they PRP 1 saw see VBD 0.947674 saw NN 0.0523256 an an DT 1 aeroplane aeroplane NN 1 fly fly VB 0.698718 fly VBP 0.211538 fly NN 0.0833333 f RB 0.00641026 through through IN 0.2 through JJ 0.2 through RB 0.2 through 0.2 RP RP 0.2 the the DT 0.9998 the JJ 0.000121792 the NN 3.72143e-05 the VB 2.02987e-05 the VBP 2.02987e-05 sky sky NN 1 . . Fp 1 ** OutputFormat=tagged in en.cfg ** $ echo "Henry and Martha went home when they saw an aeroplane fly through the sky." | ./analyzer -f ../../data/config/en.cfg Henry henry NNP 1 and and CC 0.999723 Martha martha NNP 1 went go VBD 0.499505 wend VBD 0.499505 home home NN 0.953125 when when WRB 0.999667 they they PRP 1 saw see VBD 0.947674 an an DT 1 aeroplane aeroplane NN 1 fly fly VB 0.698718 through RP RP 0.2 the the DT 0.9998 sky sky NN 1 . . Fp 1 ** OutputFormat=dep in en.cfg ** echo "Henry and Martha went home when they saw an aeroplane fly through the sky." | ./analyzer -f ../../data/config/en.cfg grup-n/top/(Henry henry NNP -) [ CC(and)/modnorule/(and and CC -) grup-n/modnorule/(Martha martha NNP -) verb/modnorule/(went go VBD -) grup-n/modnorule/(home home NN -) WRB(when)/modnorule/(when when WRB -) PRP(they)/modnorule/(they they PRP -) verb/modnorule/(saw see VBD -) sn/modnorule/(aeroplane aeroplane NN -) [ DT/modnorule/(an an DT -) ] verb/modnorule/(fly fly VB -) RP(through)/modnorule/(through RP RP -) sn/modnorule/(sky sky NN -) [ DT/modnorule/(the the DT -) ] Fp(.)/modnorule/(. . Fp -) ]
Apertium[edit]
en@anonymous:~/tmp/download/forditas/apertium-en-es-0.6$ apertium -d . en-es-anmor /tmp/x ^Henry/Henry<np><ant><m><sg>$ ^and/and<cnjcoo>$ ^Martha/Martha<np><ant><f><sg>$ ^went/go<vblex><past>$ ^home/home<adv>/home<n><sg>$^,/,<cm>$ ^when/when<cnjadv>/when<adv><itg>$ ^they/prpers<prn><subj><p3><mf><pl>$ ^saw/saw<n><sg>/saw<vblex><inf>/saw<vblex><pres>/see<vblex><past>$^,/,<cm>$ ^that/that<cnjsub>/that<det><dem><sg>/that<prn><tn><mf><sg>/that<rel><an><mf><sp>$ ^an/a<det><ind><sg>$ ^aeroplane/*aeroplane$ ^flied/*flied$ ^on/on<adv>/on<pr>$ ^the/the<det><def><sp>$ ^sky/sky<n><sg>$^./.<sent>$^./.<sent>$
$ echo "Henry and Martha went home when they saw an aeroplane fly through the sky." | lt-proc en-ca.automorf.bin ^Henry/Henry<np><ant><m><sg>$ ^and/and<cnjcoo>$ ^Martha/Martha<np><ant><f><sg>$ ^went/go<vblex><past>$ ^home/home<adv>/home<n><sg>$ ^when/when<cnjadv>/when<adv><itg>$ ^they/prpers<prn><subj><p3><mf><pl>$ ^saw/saw<n><sg>/saw<vblex><inf>/saw<vblex><pres>/see<vblex><past>$ ^an/a<det><ind><sg>$ ^aeroplane/aeroplane<n><sg>$ ^fly/fly<n><sg>/fly<vblex><inf>/fly<vblex><pres>$ ^through/through<pr>$ ^the/the<det><def><sp>$ ^sky/sky<n><sg>$^./.<sent>$ $ echo "Henry and Martha went home when they saw an aeroplane fly through the sky." | lt-proc en-ca.automorf.bin | apertium-tagger -g en-ca.prob ^Henry<np><ant><m><sg>$ ^and<cnjcoo>$ ^Martha<np><ant><f><sg>$ ^go<vblex><past>$ ^home<adv>$ ^when<adv><itg>$ ^prpers<prn><subj><p3><mf><pl>$ ^see<vblex><past>$ ^a<det><ind><sg>$ ^aeroplane<n><sg>$ ^fly<n><sg>$ ^through<pr>$ ^the<det><def><sp>$ ^sky<n><sg>$^.<sent>$
The question[edit]
For me apertium seems to be the better one. Do you know any example, where freeling does a better job? Thanks, Muki987 21:07, 22 April 2009 (UTC)
- The benefit with freeling is that it does parsing too.. see above examples. - Francis Tyers 22:02, 22 April 2009 (UTC)
- Is your en.cfg file of freeling the standard one I use? If not, could you please email it to me? Thanks. Muki987 22:43, 22 April 2009 (UTC)
- Yes, the standard one, just changing the output format. Francis Tyers 22:48, 22 April 2009 (UTC)
If there are differences between my version and your version, you can try changing the tagger from:
#### Tagger options Tagger=relax
to:
#### Tagger options Tagger=hmm
- Francis Tyers 07:33, 23 April 2009 (UTC)
dependency[edit]
Why should I have taken a raincoat and an umbrella while my aunt, living in Georgia, told, it will be nice weather. grup-n/top/(Why why NN) [ MD(should)/modnorule/(should should MD) NP(i)/modnorule/(I i NP) verb/modnorule/(taken take VBN) [ VB*<have>/modnorule/(have have VBP) ] IN(a)/modnorule/(a a IN) grup-n/modnorule/(raincoat raincoat NN) CC(and)/modnorule/(and and CC) sn/modnorule/(umbrella umbrella NN) [ DT/modnorule/(an an DT) ] IN(while)/modnorule/(while while IN) PP$(my)/modnorule/(my my PP$) grup-n/modnorule/(aunt aunt NN) Fc(,)/modnorule/(, , Fc) verb/modnorule/(living live VBG) IN(in)/modnorule/(in in IN) NP(georgia)/modnorule/(Georgia georgia NP) Fc(,)/modnorule/(, , Fc) verb/modnorule/(told tell VBD) Fc(,)/modnorule/(, , Fc) grup-n/modnorule/(it it NN) verb/modnorule/(be be VBP) [ MD/modnorule/(will will MD) ] grup-n/modnorule/(weather weather NN) [ adj/modnorule/(nice nice JJ) ] Fp(.)/modnorule/(. . Fp) ]
I get the above output when using freeling from my keyboard. However http://garraf.epsevg.upc.es/freeling/demo.php shows a very different picture. For example, top is "tell", Why is of type WRB, func is fast always modnorule, while on web output cmod, ncmod, ... How can I get the same output on my console output, as the web output is? Muki987 08:12, 23 April 2009 (UTC)
- Which version are you using, SVN or 2.0? When I put the example below into the web interface I get the same result. - Francis Tyers 08:20, 23 April 2009 (UTC)
$ echo "Why should I have taken a raincoat and an umbrella while my aunt, living in Georgia, said that the weather would be nice." | \ ./analyzer -f ../../data/config/en.cfg DEPENDENCIES: NO HEAD Found!!! Check your chunking grammar and your dependency-building rules. WRB(why)/top/(Why why WRB -) [ MD(should)/modnorule/(should should MD -) PRP(i)/modnorule/(I i PRP -) verb/modnorule/(taken take VBN -) [ VB*<have>/modnorule/(have have VBP -) ] Z(a)/modnorule/(a 1 Z -) grup-n/modnorule/(raincoat raincoat NN -) CC(and)/modnorule/(and and CC -) sn/modnorule/(umbrella umbrella NN -) [ DT/modnorule/(an an DT -) ] IN(while)/modnorule/(while while IN -) PRP$(my)/modnorule/(my my PRP$ -) grup-n/modnorule/(aunt aunt NN -) Fc(,)/modnorule/(, , Fc -) verb/modnorule/(living live VBG -) IN(in)/modnorule/(in in IN -) grup-n/modnorule/(Georgia georgia NNP -) Fc(,)/modnorule/(, , Fc -) verb/modnorule/(said say VBD -) IN(that)/modnorule/(that that IN -) sn/modnorule/(weather weather NN -) [ DT/modnorule/(the the DT -) ] verb/modnorule/(be be VB -) [ MD/modnorule/(would will MD -) ] adj/modnorule/(nice nice JJ -) Fp(.)/modnorule/(. . Fp -) ]
I use Freeling 1.5. Please copy your console output for this sentence to this page. Muki987 08:26, 23 April 2009 (UTC)
- For which sentence? I'm using Freeling 2.0 - Francis Tyers 08:32, 23 April 2009 (UTC)
- Why should I have taken a raincoat and an umbrella, when my aunt, living in Georgia, told, it will be nice weather. Muki987 08:46, 23 April 2009 (UTC)
- Here it is, although it isn't really grammatical English. - Francis Tyers 08:50, 23 April 2009 (UTC)
$ echo "Why should I have taken a raincoat and an umbrella, when my aunt, living in Georgia, told, it will be nice weather." | \ ./analyzer -f ../../data/config/en.cfg DEPENDENCIES: NO HEAD Found!!! Check your chunking grammar and your dependency-building rules. WRB(why)/top/(Why why WRB -) [ MD(should)/modnorule/(should should MD -) PRP(i)/modnorule/(I i PRP -) verb/modnorule/(taken take VBN -) [ VB*<have>/modnorule/(have have VBP -) ] Z(a)/modnorule/(a 1 Z -) grup-n/modnorule/(raincoat raincoat NN -) CC(and)/modnorule/(and and CC -) sn/modnorule/(umbrella umbrella NN -) [ DT/modnorule/(an an DT -) ] Fc(,)/modnorule/(, , Fc -) WRB(when)/modnorule/(when when WRB -) PRP$(my)/modnorule/(my my PRP$ -) grup-n/modnorule/(aunt aunt NN -) Fc(,)/modnorule/(, , Fc -) verb/modnorule/(living live VBG -) IN(in)/modnorule/(in in IN -) grup-n/modnorule/(Georgia georgia NNP -) Fc(,)/modnorule/(, , Fc -) verb/modnorule/(told tell VBD -) Fc(,)/modnorule/(, , Fc -) PRP(it)/modnorule/(it it PRP -) verb/modnorule/(be be VB -) [ MD/modnorule/(will will MD -) ] grup-n/modnorule/(weather weather NN -) [ adj/modnorule/(nice nice JJ -) ] Fp(.)/modnorule/(. . Fp -) ]
Yes, exactly. The same as for freeling 1.5. If we compare this to the web screen output, we see, this is very different from that. For example, here all words ar modnorule, except the top one, on the web screen there are different func types.
- It could be a difference between SVN (or Freeling 2.1) and Freeling 2.0. The person to ask would be Lluís Padró, I've emailed you his email address. - Francis Tyers 09:07, 23 April 2009 (UTC)
Also the tree itself is completely different. Top of it is "told" on web, here "why". And so on.... Why that? Muki987 09:00, 23 April 2009 (UTC)
- I'm not sure, it could have something to do with it not being a grammatical sentence in English ? - Francis Tyers 09:07, 23 April 2009 (UTC)
- OK. Change it to a grammatical one, and you still see the big differences. Why? Muki987 09:22, 23 April 2009 (UTC)
I entered this same question on freeling forum, however it seems to be dead.
http://garraf.epsevg.upc.es/freeling/index.php?option=com_simpleboard&Itemid=55&func=view&catid=3&id=883#883 Muki987 09:22, 23 April 2009 (UTC)
- They sometimes take a while to reply. I think it might be a holiday up there too (La Diada de Sant Jordi) - Francis Tyers 09:51, 23 April 2009 (UTC)
A more standard rendering of the sentence below:
$ echo "Why should I have taken a raincoat and an umbrella? My aunt who lives in Georgia said that the weather would be nice." | \ ./analyzer -f ../../data/config/en.cfg DEPENDENCIES: NO HEAD Found!!! Check your chunking grammar and your dependency-building rules. DEPENDENCIES: NO HEAD Found!!! Check your chunking grammar and your dependency-building rules. WRB(why)/top/(Why why WRB -) [ MD(should)/modnorule/(should should MD -) PRP(i)/modnorule/(I i PRP -) verb/modnorule/(taken take VBN -) [ VB*<have>/modnorule/(have have VBP -) ] Z(a)/modnorule/(a 1 Z -) grup-n/modnorule/(raincoat raincoat NN -) CC(and)/modnorule/(and and CC -) sn/modnorule/(umbrella umbrella NN -) [ DT/modnorule/(an an DT -) ] Fit(?)/modnorule/(? ? Fit -) ] PRP$(my)/top/(My my PRP$ -) [ grup-n/modnorule/(aunt aunt NN -) WP(who)/modnorule/(who who WP -) verb/modnorule/(lives live VBZ -) IN(in)/modnorule/(in in IN -) grup-n/modnorule/(Georgia georgia NNP -) verb/modnorule/(said say VBD -) IN(that)/modnorule/(that that IN -) sn/modnorule/(weather weather NN -) [ DT/modnorule/(the the DT -) ] verb/modnorule/(be be VB -) [ MD/modnorule/(would will MD -) ] adj/modnorule/(nice nice JJ -) Fp(.)/modnorule/(. . Fp -) ]
Although the analysis is still a bit of a mystery. They both seem to come out fine in the web interface. - Francis Tyers 10:00, 23 April 2009 (UTC)
The SVN gives:
$ echo "Why should I have taken a raincoat and an umbrella, my aunt who lives in Georgia said that the weather would be nice." | \ ./analyzer -f ~/source/FREELING/local/share/FreeLing/config/en.cfg sub-cl/top/(Why why WRB -) [ mod-chunk/modnomatch/(should should MD -) sv/cmod/(taken take VBN -) [ vb-have/aux/(have have VBP -) sn-chunk/ncsubj/(I i PRP -) sn-coor/dobj/(and and CC -) [ sn-chunk/conj/(raincoat raincoat NN -) [ DT/det/(a a DT -) ] sn-chunk/conj/(umbrella umbrella NN -) [ DT/det/(an a DT -) ] ] ] sf-brk/modnomatch/(, , Fc -) sn-chunk/modnomatch/(aunt aunt NN -) [ PRP$/ncmod-poss/(my my PRP$ -) rel-cl/cmod/(who who WP -) [ rel/ccomp/(lives live VBZ -) [ sp-chunk/ncmod/(in in IN -) [ sv/cmod/(said say VBD -) [ n-chunk/ncsubj/(Georgia georgia NNP -) ] ] ] ] ] sub-cl/modnomatch/(that that IN -) [ sv/cmod/(be be VB -) [ mod-chunk/aux/(would would MD -) sn-chunk/ncsubj/(weather weather NN -) [ DT/det/(the the DT -) ] attrib/ncmod/(nice nice JJ -) ] ] st-brk/modnomatch/(. . Fp -) ]
Which is much better. - Francis Tyers 11:06, 23 April 2009 (UTC)
The equivalent apertium output (although slightly mangled for translation en→ca) would be:
^Adv<adv><itg>{^why<adv><itg>$}$ ^inf<SV><inf><PD><ND>{^should<3>$}$ ^prnsubj<SN><p1><mf><sg>{^prpers<prn><p1><mf><sg>$}$ ^have_pp<SV><vblex><pri><PD><ND>{^have<vbhaver><3><4><5>$ ^take<vblex><pp><m><sg>$}$ ^det_nom<SN><DET><GD><sg>{^a<det><ind><3><4>$ ^coat<n><4>$}$ ^cnj<cnjcoo>{^and<cnjcoo>$}$ ^det_nom<SN><DET><GD><sg>{^a<det><ind><3><4>$ ^umbrella<n><4>$}$ ^coma<cm>{^,<cm>$}$ ^det_nom<SN><DET><GD><sg>{^my<det><pos><3><4><sp>$ ^aunt<n><4>$}$ ^reladj<REL><an><mf><sp>{^who<rel><an><3><4>$}$ ^verbcj<SV><vblex><pri><p3><sg>{^live<vblex><3><4><5>$}$ ^pr<PREP>{^in<pr>$}$ ^nom<SN><UNDET><sg>{^Georgia<np><loc><4>$}$ ^verbcj_perif<SV><reporting><ifip><PD><ND>{^anar<vaux><4><5>$ ^say<vblex><inf>$}$ ^cnj<cnjsub>{^that<cnjsub>$}$ ^det_nom<SN><DET><GD><sg>{^the<det><def><3><4><sp>$ ^weather<n><4>$}$ ^verbcj<SV><vbser><cni><PD><ND>{^be<vbser><3><4><5>$}$ ^adj<SA><GD><ND>{^nice<adj><2><3>$}$ ^punt<sent>{^.<sent>$}$
We can actually collapse det_nom cnj det_nom
into e.g. det_nom_cnj_det_nom
, but probably collapsing the relatives would be harder. The benefit of the FreeLing output is that 'my aunt who lives in Georgia' is expressed as one chunk that can be moved. Both ways have their benefits, for hu→en I'd go with Apertium and for en→hu probably FreeLing/Matxin or a hybrid of Apertium/Matxin. - Francis Tyers 11:27, 23 April 2009 (UTC)
Versions[edit]
I do not understand, why are you using freeling 2.0 or 2.1, when for matxin is clearly 1.5 suggested.
I also do not understand, how can be that fundamental differences between freeeling 2.1 (the svn version) and 1.5, that I use. I get following :
Why should I have taken a raincoat and an umbrella, my aunt who lives in Georgia said that the weather would be nice. grup-n/top/(Why why NN) [ MD(should)/modnorule/(should should MD) NP(i)/modnorule/(I i NP) verb/modnorule/(taken take VBN) [ VB*<have>/modnorule/(have have VBP) ] IN(a)/modnorule/(a a IN) grup-n/modnorule/(raincoat raincoat NN) CC(and)/modnorule/(and and CC) sn/modnorule/(umbrella umbrella NN) [ DT/modnorule/(an an DT) ] Fc(,)/modnorule/(, , Fc) PP$(my)/modnorule/(my my PP$) grup-n/modnorule/(aunt aunt NN) WP(who)/modnorule/(who who WP) verb/modnorule/(lives live VBZ) IN(in)/modnorule/(in in IN) NP(georgia)/modnorule/(Georgia georgia NP) verb/modnorule/(said say VBD) IN(that)/modnorule/(that that IN) sn/modnorule/(weather weather NN) [ DT/modnorule/(the the DT) ] verb/modnorule/(be be VBP) [ MD/modnorule/(would would MD) ] adj/modnorule/(nice nice JJ) Fp(.)/modnorule/(. . Fp) ]
Which is fundamentally different from your output. If for matxin 1.5 is the valid version, why are you using the svn version of freeling? Muki987 13:00, 23 April 2009 (UTC)
- The problem is that the people who develop Matxin are shy of committing to their SVN repository. For internal development they are using freeling 2.0/2.1 and lttoolbox 3.1, just they haven't committed it to their SVN yet (you can see the last commit is from sometime in November!). So, for testing we should try the version of FreeLing that they are using. I've sent them an email asking if they can send us a snapshot of what they have locally. - Francis Tyers 13:19, 23 April 2009 (UTC)
- PS. Lluís responded on the FreeLing forum. - Francis Tyers 13:37, 23 April 2009 (UTC)
- The pont is, a really good dep. tagger sees, that the sentence has 3 almost independent parts:
1. Why should I have taken a raincoat and an umbrella (where raincoat and umbrella belong together) 2. My aunt lives in Georgia, told 3. It will be nice weather
Muki987 11:54, 24 April 2009 (UTC)
- Yep, exactly :) - Francis Tyers 12:22, 24 April 2009 (UTC)
Dog us[edit]
I got it here: http://sujitpal.blogspot.com/2008/11/ir-math-in-java-hmm-based-pos.html
Do you know the expression: "Failure will dog us"? Does this mean, failure will follow us?
Thanks, Muki987 10:42, 25 April 2009 (UTC)
- Yes, this is an expression that can be used. It means that "Failure will follow us and cause us trouble". You can probably disambiguate with a rule that says "choose infinitive if -1 modal and +1 personal pronoun". - Francis Tyers 10:45, 25 April 2009 (UTC)
- Thanks. Seems to be used quite seldom in real life. Shows, that dog can be a verb, if used in a verb environment. He dogs a cat, for example. Or the dogs dogged a cat. Or the wolves dogged the rabbit. Or my big mistake dogged me 10 years long. At least the blogspot indicates this. Muki987 11:06, 25 April 2009 (UTC)
Yes, but outside certain fixed semantic environments, it sounds odd.
*He dogs a cat *The dogs dogged a cat The wolves dogged the rabbit. *My big mistake dogged me for 10 years long. My mistake dogged me for ten years.
The ones marked with '*' are not wrong syntactically, but I would say they sound very strange. - Francis Tyers 11:14, 25 April 2009 (UTC)
- What is wrong with
The wolves dogged the rabbit. My mistake dogged me for ten years.
?
- How to write them syntactically correctly? Muki987 12:21, 25 April 2009 (UTC)
- No those are correct, I mean they sound good. - Francis Tyers 12:33, 25 April 2009 (UTC)
- Thanks, Muki987 12:36, 25 April 2009 (UTC)
- No those are correct, I mean they sound good. - Francis Tyers 12:33, 25 April 2009 (UTC)
Fixes[edit]
It looks great, I noticed a couple of issues, although I haven't tested it, try making these edits and seeing how the result turns out. - Francis Tyers 05:36, 30 April 2009 (UTC)
Hi regan,[edit]
The expressions semi-crazed rants and you are exactly the wrong sort of person to work with translation in any form are your expressions, you try to apply to others. regan, you think, something entitles you to use that kind of language?
regan, your semi-crazed rants in all subjects are rather primitive, and not amusing at all.
What do you think, who you are, that you would like to decide, who may say what? You believe, you are a soviet commissar? Or a kapo of the barrack? Have you let check your mental state? If you behave like that, you, regan are exactly the wrong sort of person to work with translation in any form.
Of course, present language killers and people idiotizers on the TV screens and in radios, newspapers and magazines written by idiots, foreign advertisers and similar state-supported criminals try to push foreign words, which is not good for any language, and makes tools, like wordnet necessary.
19:00, 17 May 2009 (UTC)
Dep analysis[edit]
I try to dependency analyse this sentence, since it is complicated enough:
- 1. I think that if you have an agenda that you want to push of this kind, then you are exactly the wrong sort of person to work with translation in any form.
Why looking at that, I think, that is grammatically incorrect and erroneous.
- 2. I think that if you have an agenda that you want to push of this kind, then...
Should not that be:
- 3. I think that if you have an agenda of this kind, that you want to push, then... ?
Are both 2 and 3 correct, or is 3 wrong? With other words, is "push of" a valid structure?
- Both variants sound ok to my ears, although I would say:
- "I think that if you have an agenda of this kind that you want to push, ..."
- (without extra comma) - Francis Tyers 16:10, 20 May 2009 (UTC)
If the sentence is no good in the first form, is it still understandable?
of this kind is an attribute of the agenda. Let's assume, the attribute is blue.
- 4. I think that if you have a blue agenda, that you want to push, then...
- 5. I think that if you have an agenda that you want to push blue, then...
- 6. I think that if you have an agenda that you want to push, and that is blue, then...
I changed of this kind to blue. Is 5 in that form not completely bad? Is 6 ok? Does "of this kind" implicitely say: and that is of this kind?
Dependency analysis shows very different results, and if version 2 is correct, I report that to freeling- otherwise not.
If you have an agenda of this kind, that you want to push, then you are good. sub-adv/top/(If if IN -) [ sv/modnorule/(have have VBP -) [ sn-chunk/ncsubj/(you you PRP -) sn-chunk/dobj/(agenda agenda NN -) [ DT/det/(an a DT -) ] ] sp-chunk/modnorule/(of of IN -) [ sn-chunk/dobj/(kind kind NN -) [ DT/det/(this this DT -) ] ] sf-brk/modnorule/(, , Fc -) rel-cl/modnorule/(that that WDT -) [ rel/ccomp/(want want VBP -) [ sn-chunk/ncsubj/(you you PRP -) sp-chunk/ncmod/(to to TO -) [ sv/cmod/(push push VB -) ] ] ] sf-brk/modnorule/(, , Fc -) claus/modnorule/(are be VBP -) [ <----------------------------- are adv/cmod/(then then RB -) sn-chunk/ncsubj/(you you PRP -) attrib/ncmod/(good good JJ -) st-brk/ta/(. . Fp -) ] ] If you have an agenda that you want to push of this kind, then you are good. sub-adv/top/(If if IN -) [ sv/modnorule/(have have VBP -) [ sn-chunk/ncsubj/(you you PRP -) sn-chunk/dobj/(agenda agenda NN -) [ DT/det/(an a DT -) ] ] claus/modnorule/(are be VBP -) [ <---------------------------- are sub-cl/modnomatch/(that that IN -) [ sv/modnorule/(want want VBP -) [ sn-chunk/ncsubj/(you you PRP -) sp-chunk/ncmod/(to to TO -) [ sv/cmod/(push push VB -) [ sp-chunk/ncmod/(of of IN -) [ sn-chunk/dobj/(kind kind NN -) [ DT/det/(this this DT -) ] ] ] ] ] sf-brk/modnorule/(, , Fc -) ] adv/cmod/(then then RB -) sn-chunk/ncsubj/(you you PRP -) attrib/ncmod/(good good JJ -) st-brk/ta/(. . Fp -) ] ]
In my opinion the second case is misinterpreted. What do you think?Muki987 19:05, 20 May 2009 (UTC)
- It is wrong, it should attach to the the direct object. But on the other hand, the wording is strange, and although I'd say it was grammatically ok, it does sound weird. If this were a linguistic example I'd label it with
?
- Francis Tyers 20:32, 20 May 2009 (UTC)
Thanks, then I better do not report it now, there are much more important things in dep analysis, than weird expressions. Maybe later. Muki987 21:16, 20 May 2009 (UTC)
Wiki[edit]
Not sure what you mean. I've changed the image you added to the Documentation of Matxin page so it fits better. - Francis Tyers 09:04, 26 May 2009 (UTC)
- Go into en.wikipedia org, select random article, select edit tab. You will see, on the upper left window there are numerous images, for example for [[word]], to sign an article, to insert an image, to select bold or italics. If still not clear, I can insert a screenshot. Muki987 09:20, 26 May 2009 (UTC)
- Aha, ok, let me see if I can remember where to change that :) - Francis Tyers 09:52, 26 May 2009 (UTC)
- Fixed after a bit of faffing. - Francis Tyers 10:16, 26 May 2009 (UTC)
- Thanks a million, now it's fun to edit. :-) Muki987 11:33, 26 May 2009 (UTC)
Documentation of Matxin[edit]
Hi, I would prefer that the images not be so large, especially when they are rather garish. If you like I will remake the images, but I can't do it until e.g. 22nd June. Could you please put in the original Spanish for the Generation section? - Francis Tyers 08:50, 27 May 2009 (UTC)
- you mean with decenter colors? Thats OK for me, but a small picture one is unusable. I work with it.
If you insist on small ones, let me know, then I set up a private matxin page for myself.
Here the spanish text: 3.3. Formato tras generación Los cambios más importantes son la reordenación por medio del valor recalculado para el atributo ord, y la generación morfológica de ciertos nodos (edun ->ditudalako, patata -> patatak).
El resultado es la frase “At entatu hirukoitz batek Bagad astintzen du” Muki987 09:38, 27 May 2009 (UTC)
- Ok, translated it. Regarding the images, I've made them a bit larger and giving a re-working I think the text could be clearly visible at this size. - Francis Tyers 11:03, 27 May 2009 (UTC)
of[edit]
Is usage of of like: "This is the house of Peter and of Martha" grammatically ok? Ez Peter háza és Martha. Das ist das Haus von Peter und von Martha, or only the form "This is the house of Peter and Martha" is correct? Ez Peter és Martha háza. Das ist das Haus von Peter und Martha.
- Both are ok, I would probably say in speech "This is Peter and Martha's house", but consider e.g. "The Department of Health and Social Security". I would say single use of 'of' is more typical, but I wouldn't mark the first as ungrammatical. PS. I made some changes to your test sentences, I hope they are ok. - Francis Tyers 10:10, 2 June 2009 (UTC)
Thanks.
Is "This is Peter's and Martha's house" also correct, just unusual?
"This is Henry, good old Peter, little Otto and Martha's house" is the way, it would be used? (not Henry's, goold old Peter's, little Otto's and Martha's house' ?
- Both are fine, probably the former is more frequent. - Francis Tyers 10:45, 2 June 2009 (UTC)
- my
I saw in the web "My father and son went together to Spain", however someone told, this is very unusual, maybe it was just written by non-English person? Usual is: "My father and my son ..." (in google).
- Both are fine, although probably we'd duplicate the possessive here for disambiguation. - Francis Tyers 10:45, 2 June 2009 (UTC)
- to
Is "He went to England and to Spain this summer" as good as "He went to England and Spain this summer"?
- In my opinion, duplicating the preposition seems to add emphasis, "He went to England and to Spain this summer. - Francis Tyers 10:45, 2 June 2009 (UTC)
Thanks.
Otto[edit]
Otto, who is an engineer and works for BMW, told, that he likes football
en@anonymous:~/tmp/download/forditas/freeling/FreeLing-2.1-beta1/src/main$ echo "Otto, who is an engineer and works for BMW, told, that \ he likes football" | ./analyzer -f en.cfg sv/top/(told tell VBD -) [ n-chunk/ncsubj/(Otto otto NNP -) [ sf-brk/modnomatch/(, , Fc -) rel-cl/cmod/(who who WP -) [ rel/ccomp/(is be VBZ -) [ sn-chunk/dobj/(engineer engineer NN -) [ DT/det/(an a DT -) ] sn-coor/modnomatch/(and and CC -) [ n-chunk/modnomatch/(works work NNS -) ] sp-chunk/ncmod/(for for IN -) [ n-chunk/dobj/(BMW bmw NNP -) ] ] ] sf-brk/modnomatch/(, , Fc -) ] sub-cl/modnomatch/(that that IN -) [ sf-brk/modnorule/(, , Fc -) sv/modnorule/(likes like VBZ -) [ sn-chunk/ncsubj/(he he PRP -) n-chunk/dobj/(football football NN -) ] ] ]
Otto, who is an engineer and works for BMW told, that he likes football
en@anonymous:~/tmp/download/forditas/freeling/FreeLing-2.1-beta1/src/main$ echo "Otto, who is an engineer and works for BMW told, \ that he likes football" | ./analyzer -f en.cfg n-chunk/top/(Otto otto NNP -) [ sf-brk/modnomatch/(, , Fc -) rel-cl/cmod/(who who WP -) [ rel/ccomp/(is be VBZ -) [ sn-chunk/dobj/(engineer engineer NN -) [ DT/det/(an a DT -) ] sn-coor/modnomatch/(and and CC -) [ n-chunk/modnomatch/(works work NNS -) ] ] ] adv/modnomatch/(for for IN -) [ vb-chunk/aux/(told tell VBD -) [ n-chunk/ncsubj/(BMW bmw NNP -) ] sf-brk/modnomatch/(, , Fc -) ] sub-cl/modnomatch/(that that IN -) [ sv/modnorule/(likes like VBZ -) [ sn-chunk/ncsubj/(he he PRP -) n-chunk/dobj/(football football NN -) ] ] ]
What do you think, is the comma absolute necessary after BMW?
- I would put:
- Otto, who is an engineer and works for BMW, told us that he likes football.
- or
- Otto, who is an engineer and works for BMW, said that he likes football.
- There is no comma necessary after 'that'. - Francis Tyers 13:06, 8 June 2009 (UTC)
dep[edit]
Please check my issues on freeling/linguistic part. I tried today 5 very basic, simple sentences, I cannot imagine more simple ones, and 4 out of them show serious errors. There are so fundamental problems in freeling dep analysis, that I have more and more the feeling, I can not use it for English. It seems to be very useful at the first sight, really impressing, at the second much less.
It is much worse for English, than for Spanish, and I have the feeling, English has no priority for the authors. Its configuration is anything than simple, and not even the configuration file names are documented.
What is your impression?
- Could you show me the sentences that you tried ? The English version is much less mature than the Spanish version, and understandably it is less of a priority as it is being developed in a Catalan university. The file names are I believe documented in the PDF of the documentation. I've been planning to write a 'HOWTO' for Matxin, but need to get some things in order here first. It should be done by mid July or so. - Francis Tyers 15:21, 8 June 2009 (UTC)
Here the sentences. The expressions are replaced by the word expression, as below, and and this is presented freeling.
- Thank you, said Henry, and Otto also said: thank you.
- I asked him: How do you do? He answered: Fine, thank you.
- "Thank you" and "How to do you?" are interjections. They would be added as multiwords (if the system was aimed at analysing speech).
- It was raining cats and dogs in all August that year.
- Good It was raining cats and dogs for all of August that year.
- Bad In August that year it was raining cats and dogs
- Bad In August that year it was strong raining
- Bad In August it was strong raining
- "strong raining" does not make sense, it should be "In August it was raining heavily". Incidentally FreeLing seems to analyse this correctly.
- Note: I've yet to hear "raining cats and dogs" (which would be a multiword) used outside of the present, e.g. "It's raining cats and dogs". Nor have I heard it used in past narrative, nor in the news. If you search on the BBC site, the top situations it comes up in are 1) talking about expressions, 2) jokes, 3) clichés.
- The young pair will go Dutch that evening.
- Note: "go Dutch" is a multiword, it is not semantically compositional. Although in any case the parse seems correct.
- He was in the black for long time, he was the blue-eyed boy of the manager.
- "The blue-eyed boy of the manager was in the black for a long time." This kind of subject repetition is unusual, unless you're talking about two distinct people with "he", in which case it sounds strange anyway. "Both he and the blue-eyed boy of the manager were in the black for a long time".
- I'll go to England next year.
- This one is correct, the sentence sounds perfectly normal and the error is in FreeLing
- expr0 , said Henry , and Otto also said: expr1 . .
- I asked him: expr2 ? He answered: Fine , expr3 . .
- It was expr4 in all August that year . .
- The young pair will go Dutch that evening . .
- He was expr6 for long time , he was the expr5 of the manager . .
- I shall go to England next year .
Is "in all August" OK or should be "in the whole August"?
Especially frustrated me, that the last very simple sentence shows a clear fault with freeling dep analysis. No matter how we express the motion, freeling can not handle " I move (no matter how) to England next year" properly.
- This last one is quite strange that it fails, but the other sentences are not exactly "simple". Here are five sentences that I've taken from the BBC today (more or less at random):
- Good. Environment minister Jane Kennedy said she could not support him as leader. Bad?
- Bad. The investigation has been focusing on whether the plane's speed sensors stopped working properly just before it crashed in turbulent weather.
- Good. Authorities are making deep cuts to tackle the budget deficit.
- Bad. "That happened one month before the ballot opened, so it had quite a rallying effect," he said. Good?
- Bad. The defence ministry said it was closing Gabon's air, land and sea borders.
- These are not simple, but they are every day sentences, and FreeLing does fairly badly (although the co-ordination problem in the last one is a known bug). Now for some simple sentences:
- Good. The boy kicks the ball to the girl.
- Bad. The boy kicks the ball to the girl with the telescope. Good?
- The prepositional phrase 'with the telescope' is not attached in the right place. Although, this is genuinely ambiguous, is he kicking the ball with the telescope, or is he kicking the ball to the girl with the telescope? The ambiguity is resolved semantically, either statistically kick (ball) with (telescope) is much less frequent than to (girl) with (telescope). Or logically... "telescope is a scientific instrument and not used for kicking balls".
- Good. The cat runs.
- Good. He follows the same route every day.
- *He followed the same route next day.
- Good. I go to sleep every night.
- *I go sleep in my bed next day.
- *I went to sleep in my bed next day"
I printed out the pdf version, the file names are missing. Probably it was first one config file, and as it grew, became more and more.
Yes, the quality of the English version is significantly worse, than that of the Spanish one. I try to get the Spanish file working for English, maybe hopeless, who knows.... Checked, Spanish uses special commands related to spanish words (para, etc...) and English with English words (of, and, etc) so porting is probably not simple at all. Probably English files are missing, because Spanish directory es/dep is full with additional word files. I gave it up :-(
- It would require quite a lot of work, if it were easy they would have probably have done it by now.
Do you know (also commercial ok) any working dependency analyzer on the market (Any of English, German, Hungarian)?
- I don't know anything about commercial software, but if you search in Google, there are various dependency analysers for English available, this one for example is a statistical dependency parser and can be integrated in to FreeLing. - Francis Tyers 22:05, 8 June 2009 (UTC)
- One piece of advice... when making test sentences, take them from webpages (e.g. the BBC) this way we will be saved a lot of time by going through grammatically incorrect sentences. If you would like a sentence that exhibits a feature of English ask me, but I may not respond to future discussions about sentences such as "strong raining", or may just mark them as *. If you really must make up your own sentences, at least Google parts of them that you are not sure about, for example "strong raining" (412 hits) "raining heavily" (131,000 hits). - Francis Tyers 08:11, 9 June 2009 (UTC)
oak&malt[edit]
Why should I have taken a raincoat and an umbrella, when my aunt who lives in Georgia said that the weather would be nice.
1 Why why WRB WRB B-ADVP 0 ROOT _ _ 2 should should MD MD O 0 ROOT _ _ 3 I I PRP PRP B-NP 2 SBJ _ _ 4 have have VB VB B-VP 2 VC _ _ 5 taken take VBN VBN B-PP 4 VC _ _ 6 a a DT DT B-NP 7 NMOD _ _ 7 raincoat raincoat NN NN I-NP 5 OBJ _ _ 8 and and CC CC O 7 CC _ _ 9 an an DT DT B-NP 10 NMOD _ _ 10 umbrella umbrella NN NN I-NP 7 COORD _ _ 11 , , , , O 5 P _ _ 12 when when WRB WRB B-ADVP 19 ADV _ _ 13 my my PRP$ PRP$ B-NP 14 NMOD _ _ 14 aunt aunt NN NN I-NP 19 SBJ _ _ 15 who who WP WP B-NP 16 SBJ _ _ 16 lives live VBZ VBZ B-VP 14 NMOD _ _ 17 in in IN IN B-PP 16 ADV _ _ 18 Georgia Georgia NNP NNP B-NP 17 PMOD _ _ 19 said say VBD VBD B-VP 5 ADV _ _ 20 that that IN IN B-SBAR 23 VMOD _ _ 21 the the DT DT B-NP 22 NMOD _ _ 22 weather weather NN NN I-NP 23 SBJ _ _ 23 would would MD MD B-VP 19 OBJ _ _ 24 be be VB VB I-VP 23 VC _ _ 25 nice nice JJ JJ B-ADJP 24 PRD _ _ 26 . . . . O 2 P _ _
Never before had ski racing, a sport dominated by monosyllabic mountain men, seen the likes of Alberto Tomba, the flamboyant Bolognese flatlander who at 21 captured two gold medals at the Calgary olympics.
1 Never never RB RB B-ADVP 37 DEP _ _ 2 before before IN IN B-PP 37 ADV _ _ 3 had have VBN VBN B-NP 4 NMOD _ _ 4 ski ski NN NN I-NP 5 SBJ _ _ 5 racing race VBG VBG B-VP 2 PMOD _ _ 6 , , , , O 37 P _ _ 7 a a DT DT B-NP 8 NMOD _ _ 8 sport sport NN NN I-NP 37 DEP _ _ 9 dominated dominate VBN VBN B-VP 8 NMOD _ _ 10 by by IN IN B-PP 9 LGS _ _ 11 monosyllabic monosyllabic JJ JJ B-NP 13 NMOD _ _ 12 mountain mountain NN NN I-NP 13 NMOD _ _ 13 men man NNS NNS I-NP 10 PMOD _ _ 14 , , , , O 8 P _ _ 15 seen see VBN VBN B-VP 8 NMOD _ _ 16 the the DT DT B-NP 17 NMOD _ _ 17 likes like NNS NNS I-NP 15 OBJ _ _ 18 of of IN IN B-PP 17 NMOD _ _ 19 Alberto Alberto NNP NNP B-NP 20 NMOD _ _ 20 Tomba Tomba NNP NNP I-NP 18 PMOD _ _ 21 , , , , O 20 P _ _ 22 the the DT DT B-NP 25 NMOD _ _ 23 flamboyant flamboyant JJ JJ I-NP 25 NMOD _ _ 24 Bolognese Bolognese NNP NNP I-NP 25 NMOD _ _ 25 flatlander flatlander NNP NNP I-NP 20 NMOD _ _ 26 who who WP WP B-NP 0 ROOT _ _ 27 at at IN IN B-PP 0 ROOT _ _ 28 21 21 CD CD B-NP 32 NMOD _ _ 29 captured capture VBN VBN I-NP 32 NMOD _ _ 30 two two CD CD I-NP 32 NMOD _ _ 31 gold gold NN NN I-NP 32 NMOD _ _ 32 medals medal NNS NNS I-NP 27 PMOD _ _ 33 at at IN IN B-PP 32 ADV _ _ 34 the the DT DT B-NP 36 NMOD _ _ 35 Calgary Calgary NNP NNP I-NP 36 NMOD _ _ 36 olympics olympics NN NN I-NP 33 PMOD _ _ 37 . . . . O 0 ROOT _ _
Otto and Martha go to Italy, Spain and France.
1 Otto Otto NNP NNP B-NP 4 SBJ _ _ 2 and and CC CC I-NP 1 CC _ _ 3 Martha Martha NNP NNP I-NP 1 COORD _ _ 4 go go VB VB B-VP 0 ROOT _ _ 5 to to TO TO B-PP 4 ADV _ _ 6 Italy Italy NNP NNP B-NP 5 PMOD _ _ 7 , , , , O 6 P _ _ 8 Spain Spain NNP NNP B-NP 6 COORD _ _ 9 and and CC CC O 6 CC _ _ 10 France France NNP NNP B-NP 6 COORD _ _ 11 . . . . O 4 P _ _
Otto, Peter and Martha go to Italy, Spain and France.
1 Otto Otto NNP NNP B-NP 6 SBJ _ _ 2 , , , , O 1 P _ _ 3 Peter Peter NNP NNP B-NP 1 COORD _ _ 4 and and CC CC O 1 CC _ _ 5 Martha Martha NNP NNP B-NP 1 COORD _ _ 6 go go VB VB B-VP 0 ROOT _ _ 7 to to TO TO B-PP 6 ADV _ _ 8 Italy Italy NNP NNP B-NP 7 PMOD _ _ 9 , , , , O 8 P _ _ 10 Spain Spain NNP NNP B-NP 8 COORD _ _ 11 and and CC CC O 8 CC _ _ 12 France France NNP NNP B-NP 8 COORD _ _ 13 . . . . O 6 P _ _
Dear Otto, good old Peter and friendly Martha go to warm Italy, warmer Spain and cool France.
1 Dear dear RB RB B-ADVP 10 ADV _ _ 2 Otto Otto NNP NNP B-NP 10 ADV _ _ 3 , , , , O 10 P _ _ 4 good good JJ JJ B-NP 10 SBJ _ _ 5 old old JJ JJ I-NP 10 VMOD _ _ 6 Peter Peter NNP NNP I-NP 10 SBJ _ _ 7 and and CC CC I-NP 6 CC _ _ 8 friendly friendly JJ JJ I-NP 9 NMOD _ _ 9 Martha Martha NNP NNP I-NP 6 COORD _ _ 10 go go VB VB B-VP 0 ROOT _ _ 11 to to TO TO I-VP 12 VMOD _ _ 12 warm warm VB VB I-VP 10 OBJ _ _ 13 Italy Italy NNP NNP B-NP 12 OBJ _ _ 14 , , , , O 12 P _ _ 15 warmer warmer JJR JJR B-NP 20 DEP _ _ 16 Spain Spain NNP NNP I-NP 20 DEP _ _ 17 and and CC CC O 16 CC _ _ 18 cool cool JJ JJ B-NP 19 NMOD _ _ 19 France France NNP NNP I-NP 16 COORD _ _ 20 . . . . O 10 P _ _
John and Martha's apple and pear were sweet.
1 John John NNP NNP B-NP 4 NMOD _ _ 2 and and CC CC I-NP 1 CC _ _ 3 Martha's Martha's NNP NNP I-NP 1 COORD _ _ 4 apple apple NN NN I-NP 7 SBJ _ _ 5 and and CC CC I-NP 4 CC _ _ 6 pear pear NN NN I-NP 4 COORD _ _ 7 were were VBD VBD B-VP 0 ROOT _ _ 8 sweet sweet JJ JJ B-ADJP 7 PRD _ _ 9 . . . . O 7 P _ _
In the final days of the war, Hitler and his new wife, Eva Braun, committed suicide in his underground bunker in Berlin, as the city was overrun by the Red Army of the Soviet Union.
1 In in IN IN B-PP 18 ADV _ _ 2 the the DT DT B-NP 4 NMOD _ _ 3 final final JJ JJ I-NP 4 NMOD _ _ 4 days day NNS NNS I-NP 1 PMOD _ _ 5 of of IN IN B-PP 4 NMOD _ _ 6 the the DT DT B-NP 7 NMOD _ _ 7 war war NN NN I-NP 5 PMOD _ _ 8 , , , , O 7 P _ _ 9 Hitler Hitler NNP NNP B-NP 7 COORD _ _ 10 and and CC CC O 7 CC _ _ 11 his his PRP$ PRP$ B-NP 13 NMOD _ _ 12 new new JJ JJ I-NP 13 NMOD _ _ 13 wife wife NN NN I-NP 7 COORD _ _ 14 , , , , O 13 P _ _ 15 Eva Eva NNP NNP B-NP 16 NMOD _ _ 16 Braun Braun NNP NNP I-NP 13 NMOD _ _ 17 , , , , O 13 P _ _ 18 committ committ VBD VBD B-VP 0 ROOT _ _ 19 suicide suicide NN NN I-VP 18 OBJ _ _ 20 in in IN IN B-PP 18 ADV _ _ 21 his his PRP$ PRP$ B-NP 23 NMOD _ _ 22 undergr undergr JJ JJ I-NP 23 NMOD _ _ 23 bunker bunker NN NN I-NP 20 PMOD _ _ 24 in in IN IN B-PP 23 ADV _ _ 25 Berlin Berlin NNP NNP B-NP 24 PMOD _ _ 26 , , , , O 18 P _ _ 27 as as IN IN B-SBAR 30 VMOD _ _ 28 the the DT DT B-NP 29 NMOD _ _ 29 city city NN NN I-NP 30 SBJ _ _ 30 was be VBD VBD B-VP 18 ADV _ _ 31 overrun overrun VBN VBN I-VP 30 VC _ _ 32 by by IN IN B-PP 31 LGS _ _ 33 the the DT DT B-NP 35 NMOD _ _ 34 Red Red NNP NNP I-NP 35 NMOD _ _ 35 Army Army NNP NNP I-NP 32 PMOD _ _ 36 of of IN IN B-PP 35 NMOD _ _ 37 the the DT DT B-NP 39 NMOD _ _ 38 Soviet Soviet NNP NNP I-NP 39 NMOD _ _ 39 Union Union NNP NNP I-NP 36 PMOD _ _ 40 . . . . O 18 P _ _
Well, malt is far behind freeling in the analysis depth and quality, as far as I can see. What do you think?
- I find it very hard to read the output. Is there a "graphical" output mode ? - Francis Tyers 13:41, 9 June 2009 (UTC)
I could not find any. Documentation focuses on experimenting with different algorithms rather than product usage. Original text of it: Currently, MaltParser only supports tab-separated data files, which means that a sentence in a data file in the CoNLL data format could look like this (and shows the file format above).
- I searched for CoNLL visualisation in Google and came up with this, perhaps it might work ? - Francis Tyers 13:50, 9 June 2009 (UTC)
Dep tree dept calculation[edit]
In the table the seventh field (18,4,4,1,4,7,...) shows always the line number of the parent node.
# if table looks like: #1 2 3 4 5 6 7 8 9 10 #... #1 In in IN IN B-PP 18 ADV _ _ #2 the the DT DT B-NP 4 NMOD _ _ #3 final final JJ JJ I-NP 4 NMOD _ _ #4 days day NNS NNS I-NP 1 PMOD _ _ #5 of of IN IN B-PP 4 NMOD _ _ #6 the the DT DT B-NP 7 NMOD _ _ #7 war war NN NN I-NP 5 PMOD _ _ #8 , , , , O 7 P _ _ #9 Hitler Hitler NNP NNP B-NP 7 COORD _ _ #10 and and CC CC O 7 CC _ _ #11 his his PRP$ PRP$ B-NP 13 NMOD _ _ #12 new new JJ JJ I-NP 13 NMOD _ _ #13 wife wife NN NN I-NP 7 COORD _ _ #14 , , , , O 13 P _ _ #15 Eva Eva NNP NNP B-NP 16 NMOD _ _ #16 Braun Braun NNP NNP I-NP 13 NMOD _ _ #17 , , , , O 13 P _ _ #18 committ committ VBD VBD B-VP 0 ROOT _ _ #19 suicide suicide NN NN I-VP 18 OBJ _ _ #20 in in IN IN B-PP 18 ADV _ _ #21 his his PRP$ PRP$ B-NP 23 NMOD _ _ #22 undergr undergr JJ JJ I-NP 23 NMOD _ _ #23 bunker bunker NN NN I-NP 20 PMOD _ _ #24 in in IN IN B-PP 23 ADV _ _ #25 Berlin Berlin NNP NNP B-NP 24 PMOD _ _ #26 , , , , O 18 P _ _ #27 as as IN IN B-SBAR 30 VMOD _ _ #28 the the DT DT B-NP 29 NMOD _ _ #29 city city NN NN I-NP 30 SBJ _ _ #30 was be VBD VBD B-VP 18 ADV _ _ #31 overrun overrun VBN VBN I-VP 30 VC _ _ #32 by by IN IN B-PP 31 LGS _ _ #33 the the DT DT B-NP 35 NMOD _ _ #34 Red Red NNP NNP I-NP 35 NMOD _ _ #35 Army Army NNP NNP I-NP 32 PMOD _ _ # # then depth (how far is 32 from 0) of 32 is 4. why? # 32 shows to 31 # 31 shows to 30 # 30 shows to 18 # 18 shows to 0 (0 is always the last) # # depth of 33 is 6. # 33 35 # 35 32 # 32 31 # 31 30 # 30 18 # 18 0 # # and so on....
The so calculated depth is the y coordinate of the point
in the Hitler sentence the depths:
$VAR1 = \{ '32' => 4, '33' => 6, '21' => 4, '7' => 5, '26' => 2, '17' => 7, '2' => 4, '1' => 2, '18' => 1, '30' => 2, '16' => 7, '25' => 5, '27' => 3, '28' => 4, '40' => 2, '20' => 2, '14' => 7, '24' => 4, '10' => 6, '31' => 3, '35' => 5, '11' => 7, '22' => 4, '13' => 6, '23' => 3, '29' => 3, '6' => 6, '39' => 7, '36' => 6, '3' => 4, '9' => 6, '12' => 7, '15' => 8, '38' => 8, '8' => 6, '4' => 3, '34' => 6, '37' => 8, '19' => 2, '5' => 4 };
Visualising Matxin/Freeling[edit]
I believe that the Matxin people have an XSL style sheet which will convert an analysis into an SVG. You could try asking them about it on their mailing list. - Francis Tyers 14:46, 17 June 2009 (UTC)
They were playing or they were arrested[edit]
en@anonymous:~/tmp/download/forditas/freeling/FreeLing-2.1-beta1/src/main$ echo "They were playing during the day." | ./analyzer -f en.cfg claus/top/(playing play VBG -) [ vb-be/aux/(were be VBD -) sn-chunk/ncsubj/(They they PRP -) sp-chunk/ncmod/(during during IN -) [ sn-chunk/dobj/(day day NN -) [ DT/det/(the the DT -) ] ] st-brk/ta/(. . Fp -) ] en@anonymous:~/tmp/download/forditas/freeling/FreeLing-2.1-beta1/src/main$ echo "They were arrested during the day." | ./analyzer -f en.cfg claus/top/(arrested arrest VBN -) [ vb-be/aux/(were be VBD -) sn-chunk/ncsubj/(They they PRP -) sp-chunk/ncmod/(during during IN -) [ sn-chunk/dobj/(day day NN -) [ DT/det/(the the DT -) ] ] st-brk/ta/(. . Fp -) ] en@anonymous:~/tmp/download/forditas/freeling/FreeLing-2.1-beta1/src/main$ echo "They were old during the day." | ./analyzer -f en.cfg claus/top/(were be VBD -) [ sn-chunk/ncsubj/(They they PRP -) n-chunk/dobj/(old old NN -) sp-chunk/ncmod/(during during IN -) [ sn-chunk/dobj/(day day NN -) [ DT/det/(the the DT -) ] ] st-brk/ta/(. . Fp -) ]
I think, that gerund and passive constructs are generally false interpreted by freeling. "were" is the verb and that should be the root. What do you think? Muki987 20:37, 19 June 2009 (UTC)