Difference between revisions of "User:Shraier/reports"
Line 54: | Line 54: | ||
===Week 3=== |
===Week 3=== |
||
'''Work to be done:''' Correction of errors of the Slovenian monolingual morphology (manual); Correction of the differences in source and target tag sets of the morphological dictionaries (possibly write rules for this part). <br /> |
|||
'''Result:''' |
|||
* Slovenian monolingual dictionary clean up - All Adjectives completely remade |
|||
** Adjectives: All lemmas had to be checked - ~7000 lemmas; All duplicates have been removed and all paradigms remade (~9500 lines of paradigm entries) |
|||
** Source and target tag sets - Rules will be written after the bilingual dictionary is made |
|||
===Week 4=== |
Revision as of 13:03, 14 June 2011
Contents
Community Bonding Period
In this period I devoted myself into the Apertium system. I've established the new language pair and added it to the SVN repository. I've also tried to connect with other community members and as far as I can see they are all very kind.
I also started the work of the coding period and wrote a short list of the changes and fixes I did:
Week 1
- Apertium Wiki: User account and user page created
- Slovenian monolingual dictionary clean up - Deleted paradigm: <pardef n="/prpers__n"> //<-- WTF IS THIS
- Slovenian monolingual dictionary clean up - Remake of pronouns (n="/prpers__n") - Personal and emphatic
- Slovenian monolingual dictionary clean up - More checks, edited n="k/arkoli__prn" to n="/karkoli__prn" (it contains "česarkoli", "čemurkoli" and "čimerkoli")
- Slovenian monolingual dictionary clean up - More checks, cleaned duplicates in n="vajin/__prn"
Week 2
- Made a script to sort paradigm entries (has some minor bugs)
- Slovenian monolingual dictionary clean up - Took p2 from "vajin/__prn" and made a new paradigm "njen/__pr" - set to lm="njen" and "njun"
- Slovenian monolingual dictionary clean up - Paradigm "njihov/__prn" cleanup + some checks
- Slovenian monolingual dictionary clean up - Paradigm "tolikš/en__prn", lemmas drugašen, kolikšen, nekakšen, tolikšen
- Slovenian monolingual dictionary clean up - Paradigm "tolik/__prn", lemmas: enak, kak, kolik, nekak, tolik
- Slovenian monolingual dictionary clean up - Added paradigm "tolikš/en__prn" to lemma="kakršen";
- Slovenian monolingual dictionary clean up - Deleted lm="enako", par fixes= "nekat/i__prn", "kater/i__prn", "k/do__prn", "k/dor__prn"; deleted par="koli/__prn" and lm="koli" (does not exist), added par="koliko__prn"
- Slovenian monolingual dictionary clean up - Deleted lm="mnogo", "onega", "precej", "tainta"; Deleted par n="nekoliko/__prn", "nikogar/__prn", "precej/__prn", "t/ainta__prn"
- Slovenian monolingual dictionary clean up - Added "oba" as determinative/pronoun
- Slovenian monolingual dictionary clean up and fixes
Week 3
- Slovenian monolingual dictionary clean up - Deleted lm="najnajin", "tega", "toliko", "vsakogar"; deleted par="naj/najin__prn", "tistega/__prn", "toliko/__prn", "vsakogar/__prn"; More fixes
- Slovenian monolingual dictionary clean up - More fixes
- Slovenian monolingual dictionary clean up - Added paradigm "barvil/o__n" and fixed a group of nouns (~240)
- Slovenian monolingual dictionary clean up - Added paradigm "akrobatik/a__n", more fixes
Week 4
- Slovenian monolingual dictionary clean up - A lot of fixes and checks
Coding period
Here we are in the coding period.
Important notes:
- We decided (me and my mentor) to edit the "week plan" a little bit. Now the first two weeks are intended for the "Correction of errors of the Slovenian monolingual morphology (manual)" and the third week for the "Correction of the differences in source and target tag sets of the morphological dictionaries".
Week 1
Work to be done: Correction of errors of the Slovenian monolingual morphology (manual)
Result:
- Slovenian monolingual dictionary clean up - Nouns + Proper nouns (~7280 lemmas and ~12500 lines of paradigms)
- All the paradigm entries had to be checked, all duplicates had to be removed and all missing entries had to be added. I also had to remake and split paradigm entries for "Proper Names" with different tags (now we have all proper names grouped in 3 groups - .ant, .cog, .top)
- Around 1200 lemmas have been removed - not properly tagged
Week 2
Work to be done: Correction of errors of the Slovenian monolingual morphology (manual)
Result:
- Slovenian monolingual dictionary clean up - Interjections and Abbreviations checked, Adverbs completely remade + Part of Adjectives
- Adverbs: All lemmas had to be checked - I had to group all lemmas (which have the same paradigm entries) and make new paradigms for them (it contained only the default paradigm). ~3000 lemmas
- Duplicates/non-adverbs have been deleted, ~1200 lemmas
- Adjectives: Lemmas are now linked to paradigms (each type to each paradigm - pst, comp, sup, ela) which are linked to two paradigms containing all entries needed. lm4 -> par4 (adj.pst/copm/sup/ela) -> par2(other tags)
Week 3
Work to be done: Correction of errors of the Slovenian monolingual morphology (manual); Correction of the differences in source and target tag sets of the morphological dictionaries (possibly write rules for this part).
Result:
- Slovenian monolingual dictionary clean up - All Adjectives completely remade
- Adjectives: All lemmas had to be checked - ~7000 lemmas; All duplicates have been removed and all paradigms remade (~9500 lines of paradigm entries)
- Source and target tag sets - Rules will be written after the bilingual dictionary is made