Difference between revisions of "User:Shraier/reports"

Revision as of 13:03, 14 June 2011

Community Bonding Period

In this period I devoted myself into the Apertium system. I've established the new language pair and added it to the SVN repository. I've also tried to connect with other community members and as far as I can see they are all very kind.

I also started the work of the coding period and wrote a short list of the changes and fixes I did:

Week 1

Apertium Wiki: User account and user page created
Slovenian monolingual dictionary clean up - Deleted paradigm: <pardef n="/prpers__n"> //<-- WTF IS THIS
Slovenian monolingual dictionary clean up - Remake of pronouns (n="/prpers__n") - Personal and emphatic
Slovenian monolingual dictionary clean up - More checks, edited n="k/arkoli__prn" to n="/karkoli__prn" (it contains "česarkoli", "čemurkoli" and "čimerkoli")
Slovenian monolingual dictionary clean up - More checks, cleaned duplicates in n="vajin/__prn"

Week 2

Made a script to sort paradigm entries (has some minor bugs)
Slovenian monolingual dictionary clean up - Took p2 from "vajin/__prn" and made a new paradigm "njen/__pr" - set to lm="njen" and "njun"
Slovenian monolingual dictionary clean up - Paradigm "njihov/__prn" cleanup + some checks
Slovenian monolingual dictionary clean up - Paradigm "tolikš/en__prn", lemmas drugašen, kolikšen, nekakšen, tolikšen
Slovenian monolingual dictionary clean up - Paradigm "tolik/__prn", lemmas: enak, kak, kolik, nekak, tolik
Slovenian monolingual dictionary clean up - Added paradigm "tolikš/en__prn" to lemma="kakršen";
Slovenian monolingual dictionary clean up - Deleted lm="enako", par fixes= "nekat/i__prn", "kater/i__prn", "k/do__prn", "k/dor__prn"; deleted par="koli/__prn" and lm="koli" (does not exist), added par="koliko__prn"
Slovenian monolingual dictionary clean up - Deleted lm="mnogo", "onega", "precej", "tainta"; Deleted par n="nekoliko/__prn", "nikogar/__prn", "precej/__prn", "t/ainta__prn"
Slovenian monolingual dictionary clean up - Added "oba" as determinative/pronoun
Slovenian monolingual dictionary clean up and fixes

Week 3

Slovenian monolingual dictionary clean up - Deleted lm="najnajin", "tega", "toliko", "vsakogar"; deleted par="naj/najin__prn", "tistega/__prn", "toliko/__prn", "vsakogar/__prn"; More fixes
Slovenian monolingual dictionary clean up - More fixes
Slovenian monolingual dictionary clean up - Added paradigm "barvil/o__n" and fixed a group of nouns (~240)
Slovenian monolingual dictionary clean up - Added paradigm "akrobatik/a__n", more fixes

Week 4

Slovenian monolingual dictionary clean up - A lot of fixes and checks

Coding period

Here we are in the coding period.

Important notes:
- We decided (me and my mentor) to edit the "week plan" a little bit. Now the first two weeks are intended for the "Correction of errors of the Slovenian monolingual morphology (manual)" and the third week for the "Correction of the differences in source and target tag sets of the morphological dictionaries".

Week 1

Work to be done: Correction of errors of the Slovenian monolingual morphology (manual)
Result:

Slovenian monolingual dictionary clean up - Nouns + Proper nouns (~7280 lemmas and ~12500 lines of paradigms)
- All the paradigm entries had to be checked, all duplicates had to be removed and all missing entries had to be added. I also had to remake and split paradigm entries for "Proper Names" with different tags (now we have all proper names grouped in 3 groups - .ant, .cog, .top)
- Around 1200 lemmas have been removed - not properly tagged

Week 2

Work to be done: Correction of errors of the Slovenian monolingual morphology (manual)
Result:

Slovenian monolingual dictionary clean up - Interjections and Abbreviations checked, Adverbs completely remade + Part of Adjectives
- Adverbs: All lemmas had to be checked - I had to group all lemmas (which have the same paradigm entries) and make new paradigms for them (it contained only the default paradigm). ~3000 lemmas
- Duplicates/non-adverbs have been deleted, ~1200 lemmas
- Adjectives: Lemmas are now linked to paradigms (each type to each paradigm - pst, comp, sup, ela) which are linked to two paradigms containing all entries needed. lm4 -> par4 (adj.pst/copm/sup/ela) -> par2(other tags)

Week 3

Work to be done: Correction of errors of the Slovenian monolingual morphology (manual); Correction of the differences in source and target tag sets of the morphological dictionaries (possibly write rules for this part).
Result:

Slovenian monolingual dictionary clean up - All Adjectives completely remade
- Adjectives: All lemmas had to be checked - ~7000 lemmas; All duplicates have been removed and all paradigms remade (~9500 lines of paradigm entries)
- Source and target tag sets - Rules will be written after the bilingual dictionary is made

@@ Line 54: / Line 54: @@
 ===Week 3===
+'''Work to be done:''' Correction of errors of the Slovenian monolingual morphology (manual); Correction of the differences in source and target tag sets of the morphological dictionaries (possibly write rules for this part). <br />
+'''Result:'''
+* Slovenian monolingual dictionary clean up - All Adjectives completely remade
+** Adjectives: All lemmas had to be checked - ~7000 lemmas; All duplicates have been removed and all paradigms remade (~9500 lines of paradigm entries)
+** Source and target tag sets - Rules will be written after the bilingual dictionary is made
+===Week 4===

Difference between revisions of "User:Shraier/reports"

Revision as of 13:03, 14 June 2011

Contents

Community Bonding Period

Week 1

Week 2

Week 3

Week 4

Coding period

Week 1

Week 2

Week 3

Week 4

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools