User:Shraier/reports

From Apertium
Jump to navigation Jump to search

Community Bonding Period

In this period I devoted myself into the Apertium system. I've established the new language pair and added it to the SVN repository. I've also tried to connect with other community members and as far as I can see they are all very kind.

I also started the work of the coding period and wrote a short list of the changes and fixes I did:

Week 1

  • Apertium Wiki: User account and user page created
  • Slovenian monolingual dictionary clean up - Deleted paradigm: <pardef n="/prpers__n"> //<-- WTF IS THIS
  • Slovenian monolingual dictionary clean up - Remake of pronouns (n="/prpers__n") - Personal and emphatic
  • Slovenian monolingual dictionary clean up - More checks, edited n="k/arkoli__prn" to n="/karkoli__prn" (it contains "česarkoli", "čemurkoli" and "čimerkoli")
  • Slovenian monolingual dictionary clean up - More checks, cleaned duplicates in n="vajin/__prn"

Week 2

  • Made a script to sort paradigm entries (has some minor bugs)
  • Slovenian monolingual dictionary clean up - Took p2 from "vajin/__prn" and made a new paradigm "njen/__pr" - set to lm="njen" and "njun"
  • Slovenian monolingual dictionary clean up - Paradigm "njihov/__prn" cleanup + some checks
  • Slovenian monolingual dictionary clean up - Paradigm "tolikš/en__prn", lemmas drugašen, kolikšen, nekakšen, tolikšen
  • Slovenian monolingual dictionary clean up - Paradigm "tolik/__prn", lemmas: enak, kak, kolik, nekak, tolik
  • Slovenian monolingual dictionary clean up - Added paradigm "tolikš/en__prn" to lemma="kakršen";
  • Slovenian monolingual dictionary clean up - Deleted lm="enako", par fixes= "nekat/i__prn", "kater/i__prn", "k/do__prn", "k/dor__prn"; deleted par="koli/__prn" and lm="koli" (does not exist), added par="koliko__prn"
  • Slovenian monolingual dictionary clean up - Deleted lm="mnogo", "onega", "precej", "tainta"; Deleted par n="nekoliko/__prn", "nikogar/__prn", "precej/__prn", "t/ainta__prn"
  • Slovenian monolingual dictionary clean up - Added "oba" as determinative/pronoun
  • Slovenian monolingual dictionary clean up and fixes

Week 3

  • Slovenian monolingual dictionary clean up - Deleted lm="najnajin", "tega", "toliko", "vsakogar"; deleted par="naj/najin__prn", "tistega/__prn", "toliko/__prn", "vsakogar/__prn"; More fixes
  • Slovenian monolingual dictionary clean up - More fixes
  • Slovenian monolingual dictionary clean up - Added paradigm "barvil/o__n" and fixed a group of nouns (~240)
  • Slovenian monolingual dictionary clean up - Added paradigm "akrobatik/a__n", more fixes

Week 4

  • Slovenian monolingual dictionary clean up - A lot of fixes and checks

Coding period

Here we are in the coding period.

Important notes:
- We decided (me and my mentor) to edit the "week plan" a little bit. Now the first two weeks are intended for the "Correction of errors of the Slovenian monolingual morphology (manual)" and the third week for the "Correction of the differences in source and target tag sets of the morphological dictionaries".

Week 1

Work to be done: Correction of errors of the Slovenian monolingual morphology (manual)
Result:

  • Slovenian monolingual dictionary clean up - Nouns + Proper nouns (~7280 lemmas and ~12500 lines of paradigms)
    • All the paradigm entries had to be checked, all duplicates had to be removed and all missing entries had to be added. I also had to remake and split paradigm entries for "Proper Names" with different tags (now we have all proper names grouped in 3 groups - .ant, .cog, .top)

Week 2