Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

Difference between revisions of "User talk:Francis Tyers"

From Apertium
Jump to navigation Jump to search
Line 91: Line 91:
*Tomó aire - took air-bad again, should be breathed
*Tomó aire - took air-bad again, should be breathed
*He takes a backseat in this project - he played a subordinate role
I could not find any working examples yet. If you have one, please also explain the English one, my English is not so good. THnaks, [[User:Muki987|Muki987]] 21:22, 10 April 2009 (UTC)
*Toma un backseat en este proyecto - no word about subordinate role- bad
I could not find any working examples yet. If you have one, please also explain the English one, my English is not so good. THnaks, [[User:Muki987|Muki987]] 21:22, 10 April 2009 (UTC)
*He takes a backseat in this project
*Toma un backseat en este proyecto

Revision as of 21:24, 10 April 2009

Francis, there's an import and export feature of the mediawiki engine, If you tweak it somehow, we may edit the whole dictionary articles here in the wiki and simply export it to Apertium xml format!!! And this will make the whole process unbelievably simpler. We may also utilize the template facilites of the wiki.

Hi Francis, thanks for the message. Üorked on the table for Pivə. Will work more later. Great project. Good luck. --Mehrdad 19:11, 1 September 2007 (BST)
Really glad to see this project,and hope can contribute more in the future. I have to admit though that although I am a native speaker of Azerbaijani, I had no formal education in this language so I may be wrong on some cases. Yes I have a MSN account and will send you the id via email. --Mehrdad 11:24, 4 September 2007 (BST)

Hi. I don't know about 'awesome' yet - I have a bug or two to work out ("Error: Unsupported transducer type for ."), some tag reordering to do, and a heck of a lot more pardefs to write (Polish morphology is... extensive :)

Actually, I'm interested in Polish-{English,Irish,Russian} and Irish-English. I have a lot of spare time :) Yes, I would be very interested in help with SVN, thank you.

IM... hmm... I have Google Chat, Tlen, and ICQ (if it still works). Any preference?

I have a few Polish-English wordlists that I've built up over the past few years of learning the language; I just need to gather them, sort them, and add morphological information. I'll certainly have a look at it. Thanks again. -- Jimregan 21:16, 6 October 2007 (BST)


Hi Francis, it was I who made the edits to the Apertium HOWTO. One thing, I changed the spelling to "realized" because that is how it is spelled in international English (US, Canada, etc.). Your spelling is a British variation, a French derivative (not German like the "z" spelling). Same goes for many other words. For instance, internationalization, not internationisation (it is even indicated as a spelling error in this Wiki editor, with red text, and so is "realise"). --Laseray 14:33, 17 April 2008 (BST)


  • Hi, besides TermOfis, our terminological database, we are building a lexicographical one. I haven't checked it recently but it should be around 60 000 lemmatized forms in it by now. Do you think it could help you ? --Fulup 17:31, 9 November 2008 (UTC)

Yes definitely! Do you know if it includes part-of-speech also?

no, it does not (I'll show you in December), but Omegawiki have some.

We have been working on extracting information from Jan Deloof's dictionary of Breton--Dutch, which he let us use under the GPL. Do you know of it? - Francis Tyers 17:33, 9 November 2008 (UTC)

yes, I get it here at home, but I haven't used it (I don't speak dutch). I think it should be ok to use anyway. --Fulup 17:40, 9 November 2008 (UTC)


Hi Fran. The two "bread" sentences I removed from regression tests was because they have never worked for me. I was going to show how the tests worked so I wanted them to be on the "right" side of the pending/regression tests... which is why I moved them. I've always done svn up and make before testing (learned that the hard way), so I don't understand why they work for you but not for me. --Martha 16:41, 5 March 2009 (UTC)


Is there any difference between the main diagrams "How Apertium works" between Apertium and Matxin? If yes, where, if not: What is the difference between Matxin and Apertium (except of character coding, (Matxin only iso) and usage of FreeLing (Matxin))?

My primary aim is

  • 1. English-Hungarian and German-Hungarian, -- Apertium
  • 2. English-German and German-English, -- Apertium
  • 3. Hungarian-English and Hungarian-German. -- Matxin

For 1 is Apertium the right tool, for 2 Apertium, for 3 Matxin, right?

Muki987 12:37, 9 April 2009 (UTC)

I would reverse the order.
  • 1. Matxin -- We have 'deep' analysis for English, so we should use it .
  • 2. Matxin or Apertium -- Again, 'deep' analysis is available for English. But there are many other tools which do en-de so I would count this as reasonably low priority.
  • 3. Apertium -- We have POS tagging and morphological analysis for Hungarian, so we should take advantage of this. But there is no free parser available.
Matxin now supports Unicode, I have updated the page. The main difference between Apertium and Matxin is that the latter uses FreeLing to do chunking and dependency parsing and then does re-ordering based on that. Whereas Apertium is restricted to re-ordering fixed length patterns, Matxin has some degree of recursion. We are planning to extend Apertium this year to support recursive re-ordering, and any resources made now will be able to be re-used in the future. A brief breakdown about current resources would be good. e.g. English analysis (Apertium or Freeling), English generation (Apertium), English--Hungarian bilingual lexicon (?), Hungarian analysis (Hunmorph), Hungarian generation (?). - Francis Tyers 12:51, 9 April 2009 (UTC)
Yes, now I see, Spanish-Basque is in Matxin. I will start with Matxin.
I'll check if analysis of English in Maxtin is good enough for Hungarian.
I have a quite good English-Hungarian lexicon, I don't think, that causes any problem to transfer it into Apertium xml format, I also know hunspell and the tools behind it quite good. I think that will help at Hungarian generation, and I still hope, I get some support from Hunmorph group in that.
Do you know any usable tool de-en, en-de, you consider being at the quality of Apertium?
You forgot my first question about the main diagram How Apertium works for matix. Such a diagram is very helpful for a beginner. Muki987 13:21, 9 April 2009 (UTC)
If provide some example sentences in English I can send you back the results of FreeLing analysis. -- If you don't want to install FreeLing yourself.
Regarding the lexicon, if you send it to me I'd be happy to take a look at how difficult it would be to convert.
Yes, hunspell is good.
Free software tools for English--German, unfortunately not. There are many commercial tools though.
Regarding the diagram, it is difficult to express like that, as both Apertium and Matxin are typical "transfer" systems. The easiest way of expressing it is that Matxin works on trees, while Apertium works on chunks. To get an idea of the difference, take a look at these two diagrams: chunking and parsing. Apertium analysis is more similar to the first, while Matxin approaches the second. - Francis Tyers
  • You can find my word collections on http://tkltrans.sf.net
  • I will install everything on my pc, so I'll generate examples myself.
  • I checked prompt, which is at present the best according the test, it is in fact miserable (E-G, G-E).
  • I think, the selection of the right word is unsolved, and even more unsolved is the finding and using of expressions like "no space left on", and the like.

Muki987 13:21, 10 April 2009 (UTC)


What about expressions? For example "look after one's fences" at present not handled at all:

   * Peter looked after Martha's fences
   * Peter miraba después de las vallas de Martha 

The expression will be not at all recognized (Peter handled in the interest of Martha).

Is there something planned for this? Are there working examples available? 20-30% of our speech are expressions!!!! Muki987 13:38, 10 April 2009 (UTC)

We have two methods of handling expressions. The first is with multiword units in our dictionaries. Please try "He took away the rubbish". The second way is with TMX files, probably you know about them, but they contain translation segments. The example you have given would be a multiword unit. Probably "look after" → "cuidar", but I'll ask the maintainer of es-en when she gets back from holiday. We tend to gear our development towards translating "news text", where these kinds of expressions tend to be less frequent. So you'll have to excuse if we don't have full coverage :) - Francis Tyers 20:20, 10 April 2009 (UTC)
  • He took away the rubbish -- is this at all an expression??
  • Sacó la basura -- word for word the same thing??
  • He took the minutes -- he wrote the protocol
  • Tomó los minutos -- no word of protocole, I think wrong again
  • He took air - he breathed
  • Tomó aire - took air-bad again, should be breathed
  • He takes a backseat in this project - he played a subordinate role
  • Toma un backseat en este proyecto - no word about subordinate role- bad

I could not find any working examples yet. If you have one, please also explain the English one, my English is not so good. THnaks, Muki987 21:22, 10 April 2009 (UTC)