Difference between revisions of "User:N0nick/GSoC Journal"

From Apertium
Jump to navigation Jump to search
 
(9 intermediate revisions by the same user not shown)
Line 21: Line 21:
* Added verbs from John Camilleri's slideshow to the Maltese analyser program.
* Added verbs from John Camilleri's slideshow to the Maltese analyser program.


==Bonding Period week 4: 5/15 - 5/22==
==Bonding Period week 4: 5/16 - 5/22==
* Added closed categories to the Maltese analyser (pronouns, prepositions, conjunctions, determiners, numerals)
* Added closed categories to the Maltese analyser (pronouns, prepositions, conjunctions, determiners, numerals)
* Added closed categories to the Hebrew dictionary
* Added closed categories to the Hebrew dictionary
* Added closed categories to the bidix
* Added closed categories to the bidix


==Week 1: 5/23 - 5/30==
==Week 1: 5/23 - 5/29==
* Fixed bugs in some of the environment tools I was using
* Fixed bugs in some of the environment tools I was using
* Added missing Hebrew determiners and pronouns
* Added missing Hebrew determiners and pronouns
* Added missing closed categories to the bidix
* Added missing closed categories to the bidix


==Week 2: 6/3==
==Week 2: 5/30 - 6/5==
* Generate Hebrew verb speling file from hspell output
''work in progress''
* Generated Hebrew verb speling file from hspell output
* Format Hebrew verbs speling file as Apertium dix file
* Format Hebrew verbs speling file as Apertium dix file
* Research handling of attached/clitic pronouns on both Maltese & Hebrew
* Decide on frequency list and filter verbs and nouns from hspell results into Hebrew dix

==Week 3: 6/6 -6/12==
* Fixed Hebrew noun paradigms (automatically generated from hspell)
* Add existing verbs to bidix
* Add existing verbs to bidix
* Add missing determiners to Maltese file (as per Fran's email)
* Add missing determiners to Maltese file (as per Fran's email)

* Research handling of attached/clitic pronouns on both Maltese & Hebrew
==Week 4: 6/13 - 6/19==
--- (Studied for exam, haven't achieved much)

==Week 5: 6/20 - 6/26==
* Fully analyse all words in [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-mt-he/dev/paragraph1.txt sample paragraph 1]
* Added all paragraph words to bidix, then update Hebrew dix accordingly
* Added a list of Maltese proper nouns, adverbs and updated dix, bidix
* Fixed bugs in hspell output for plural nouns, verbs

==Week 6: 6/27 - 7/3==
* Worked on mt.dix to achieve better coverage of Maltese corpus
* Added lists of adjectives generated from 'suspected' lists (according to suffixes, etc)
* Used Wiktionary extracting script to load nouns, adjectives from Maltese items in English Wiktionary
* Used wider Maltese corpus received from Kevin Scnanell
* Added more Maltese nouns, adjectives and verbs from top of frequency list
* Contacted Michael Spagnol re [http://www.um.edu.mt/__data/assets/pdf_file/0006/123990/MayerEtAl-Broken_Plural.pdf Maltese Broken Plurals] to receive list of nouns with broken plural form.
* Fixed several noun paradigms
* Searched for documentation on the kien/ikun verb, differences and forms. Contacted Adam Ussishkin & john Camilleri for help with this.

==Week 7: 7/4 - 7/10==
* Added more nouns, adjectives, adverbs from frequency list
* Updated verbs.py interface, added option to set dictionary restriction
* Added -x negative suffix for all verbs (in verbs.py)
* Fixed all forms of kien/ikun (with help from Kevin & John)
* Added verbs, verb classes from frequency list + corpus
* Added all verified 630 broken plural nouns & adjectives from Tamra Schembri's thesis with sg, pl and gender=GD
* Fixed existing words with gender we have in mt-en dictionary
* Categorized top 1900 words from hitparade frequency list
* Finally acquired the Maltese Descriptive Grammar book!

==Week 8: 7/11 - 7/17==
* Added more verb paradigms and stems from grammar book
* Wrote gen_stems.py for updating the stems file with ones handled by the new verbs script (temporary solution)
* Added many Maltese nouns, adjectives from mt-en dictionary
* Added '@' terms to bidix by frequency: closed-cats, nouns, adjectives, toponyms and some verbs

==Week 9: 7/18 - 7/24==
* Added most determiners to bidix
* Added all nouns that has only masc. form to bidix
* Fixed gender transfer for verbs (copied to pronouns)
* Added most (~550) adjectives to bidix
* Fixed some bad / wrong entries in mt.dix

==Week 10: 7/25 - 7/31==
* Fixed bugs in our modification to hspell that outputs the Hebrew verb dix
* Added most (~150) adverbs to bidix
* Added most (~480) proper nouns to bidix
* Added some determiners
* Fixed some bad entries in the bidix


[[Category:Maltese and Hebrew]]
[[Category:Maltese and Hebrew]]

Latest revision as of 09:06, 5 August 2011

Bonding Period week 1: 4/25-5/1[edit]

  • Got the development environment ready. apertium, lttoolbox and other tools and tests all working properly.
  • Filled the Pending Tests page with some translations (based on the ones in the mt-en page).
  • Started working on a script to generate a Maltese monodix from external sources. Nothing to show yet.
  • Notified 2 TAU professors (both specializing in CL) about my project, both agreed to offer help if necessary.
  • Wrote to a contact related to the MaltiLex project, looking for better contact (perhaps through my university's faculty).

Bonding Period week 2: 5/2 - 5/8[edit]

  • Picked up and started reading the Teach Yourself Maltese grammar book.
  • Wrote the framework for a script that generates fullform Maltese verb lists. [5]
    • We worked on splitting verbs into categorizes and optional subclasses, writing rules (based on stem affixes, roots and vowels) in a python script for each class.
    • Found out about the way [Wiktionary] stores conjugation data about the verbs it has; very useful for creating new rule groups. [6]. Finished converting these tables this week except for [7] [8] (that are identical to strong.py apart for a transformation in imperfect forms).
  • spectre contacted Prof. Adam Ussishkin and he provided us with Maltese verb lists that we need to look over. [9]
  • Contacted John J. Camilleri regarding his Maltese morphology slideshow, asking for data on verb conjugation.
  • Will add the verbs already added to the Hebrew and bidi dix files.

Bonding Period week 3: 5/9 - 5/15[edit]

  • Continued studying Maltese from the grammar book.
  • Added verbs from John Camilleri's slideshow to the Maltese analyser program.

Bonding Period week 4: 5/16 - 5/22[edit]

  • Added closed categories to the Maltese analyser (pronouns, prepositions, conjunctions, determiners, numerals)
  • Added closed categories to the Hebrew dictionary
  • Added closed categories to the bidix

Week 1: 5/23 - 5/29[edit]

  • Fixed bugs in some of the environment tools I was using
  • Added missing Hebrew determiners and pronouns
  • Added missing closed categories to the bidix

Week 2: 5/30 - 6/5[edit]

  • Generate Hebrew verb speling file from hspell output
  • Format Hebrew verbs speling file as Apertium dix file
  • Research handling of attached/clitic pronouns on both Maltese & Hebrew

Week 3: 6/6 -6/12[edit]

  • Fixed Hebrew noun paradigms (automatically generated from hspell)
  • Add existing verbs to bidix
  • Add missing determiners to Maltese file (as per Fran's email)

Week 4: 6/13 - 6/19[edit]

--- (Studied for exam, haven't achieved much)

Week 5: 6/20 - 6/26[edit]

  • Fully analyse all words in sample paragraph 1
  • Added all paragraph words to bidix, then update Hebrew dix accordingly
  • Added a list of Maltese proper nouns, adverbs and updated dix, bidix
  • Fixed bugs in hspell output for plural nouns, verbs

Week 6: 6/27 - 7/3[edit]

  • Worked on mt.dix to achieve better coverage of Maltese corpus
  • Added lists of adjectives generated from 'suspected' lists (according to suffixes, etc)
  • Used Wiktionary extracting script to load nouns, adjectives from Maltese items in English Wiktionary
  • Used wider Maltese corpus received from Kevin Scnanell
  • Added more Maltese nouns, adjectives and verbs from top of frequency list
  • Contacted Michael Spagnol re Maltese Broken Plurals to receive list of nouns with broken plural form.
  • Fixed several noun paradigms
  • Searched for documentation on the kien/ikun verb, differences and forms. Contacted Adam Ussishkin & john Camilleri for help with this.

Week 7: 7/4 - 7/10[edit]

  • Added more nouns, adjectives, adverbs from frequency list
  • Updated verbs.py interface, added option to set dictionary restriction
  • Added -x negative suffix for all verbs (in verbs.py)
  • Fixed all forms of kien/ikun (with help from Kevin & John)
  • Added verbs, verb classes from frequency list + corpus
  • Added all verified 630 broken plural nouns & adjectives from Tamra Schembri's thesis with sg, pl and gender=GD
  • Fixed existing words with gender we have in mt-en dictionary
  • Categorized top 1900 words from hitparade frequency list
  • Finally acquired the Maltese Descriptive Grammar book!

Week 8: 7/11 - 7/17[edit]

  • Added more verb paradigms and stems from grammar book
  • Wrote gen_stems.py for updating the stems file with ones handled by the new verbs script (temporary solution)
  • Added many Maltese nouns, adjectives from mt-en dictionary
  • Added '@' terms to bidix by frequency: closed-cats, nouns, adjectives, toponyms and some verbs

Week 9: 7/18 - 7/24[edit]

  • Added most determiners to bidix
  • Added all nouns that has only masc. form to bidix
  • Fixed gender transfer for verbs (copied to pronouns)
  • Added most (~550) adjectives to bidix
  • Fixed some bad / wrong entries in mt.dix

Week 10: 7/25 - 7/31[edit]

  • Fixed bugs in our modification to hspell that outputs the Hebrew verb dix
  • Added most (~150) adverbs to bidix
  • Added most (~480) proper nouns to bidix
  • Added some determiners
  • Fixed some bad entries in the bidix