Difference between revisions of "Maltese and Hebrew/Final report"

From Apertium
Jump to navigation Jump to search
Line 5: Line 5:


===Maltese===
===Maltese===

Writing the Maltese morphological analyser was the hardest task in the project and required most of the time. I'm also very pleased with the results we got.

We used the very little grammar resources we had<ref>J. Aquilina (1994), Teach Yourself Maltese. [http://books.google.com/books?id=iCdjAAAAMAAJ]</ref><ref>A. Borg (1997), Maltese. [http://books.google.com/books?id=rsA5jUU_3g4C]</ref> for adding closed-category terms and learning about morphological rules in general.

We then used Malteses frequency lists generated from the various corpora, and added them slowly using all kinds of translation tools, dictionaries and/or learning/guestimating by context and usage. This was a headache but got very good results; within about 2 weeks (and during my exams period) we got to a ~80% coverage of the Maltese corpora.


===Hebrew===
===Hebrew===


In comparison, writing he.dix and handling Hebrew generation was fairly easy. Other than my own Hebrew knowledge, this was mostly due to research I've done before GSoC started (for my application).
In comparison, writing he.dix and handling Hebrew generation was fairly easy. Other than my own Hebrew knowledge, this was mostly due to research I've done before GSoC started (for [[User:N0nick/Application|my application]]).


We have tweaked some code from the [http://hspell.ivrix.org.il/ hspell] Hebrew spellchecker project, to get most of the open-category terms.
We have tweaked some code from the [http://hspell.ivrix.org.il/ hspell] Hebrew spellchecker project, to get most of the open-category terms.
Line 39: Line 45:
==See Also==
==See Also==


==Footnotes==
<references/>


[[Category:Maltese and Hebrew|*]]
[[Category:Maltese and Hebrew|*]]

Revision as of 14:40, 25 August 2011

Description

Maltese

Writing the Maltese morphological analyser was the hardest task in the project and required most of the time. I'm also very pleased with the results we got.

We used the very little grammar resources we had[1][2] for adding closed-category terms and learning about morphological rules in general.

We then used Malteses frequency lists generated from the various corpora, and added them slowly using all kinds of translation tools, dictionaries and/or learning/guestimating by context and usage. This was a headache but got very good results; within about 2 weeks (and during my exams period) we got to a ~80% coverage of the Maltese corpora.

Hebrew

In comparison, writing he.dix and handling Hebrew generation was fairly easy. Other than my own Hebrew knowledge, this was mostly due to research I've done before GSoC started (for my application).

We have tweaked some code from the hspell Hebrew spellchecker project, to get most of the open-category terms. This way we easily got good enough coverage of nouns, verbs, adjectives, etc.

For closed-category terms, I added a lot of them at the beginning of the project, and then fixed what was needed as we went alone with the bidix.

Bidix

Transfer rules

Statistics

Dictionaries
Coverage
  • Maltese Wikipedia ( , std. dev.: )
  • Maltese news sites ( , std. dev.: )
  • Maltese Scannel corpus ( , std. dev.: )
Rules
Error rate

Future work

Thanks

See Also

Footnotes

  1. J. Aquilina (1994), Teach Yourself Maltese. [1]
  2. A. Borg (1997), Maltese. [2]