Difference between revisions of "User:Ggregori"
Line 13: | Line 13: | ||
=== TODO list === |
=== TODO list === |
||
* Research and experiment with the topics mentioned by my mentor: |
|||
** implementation of UNIX wildcards. |
|||
*Update the wiki page: only examples left. |
*Update the wiki page: only examples left. |
||
*Implement the indeterministic algorithm as told by Sergio. |
|||
*Implement the execution of the instructions left. |
|||
*Addapt the vm to support different transfer phases (minor changes). |
|||
=== Weekly reports === |
=== Weekly reports === |
||
Line 85: | Line 80: | ||
*Implemented more feature on the system trie, like the insertion of the '|' symbol for pattern's options. |
*Implemented more feature on the system trie, like the insertion of the '|' symbol for pattern's options. |
||
'''Week 7''' - (04/07 - 10/07): |
'''Week 7''' - (04/07 - 10/07): After a week of really hard work and long hours the vm is finished. There are still two things that I need to fix (blanks, and link-to) and I need to test it thoroughly too. |
||
*Added more things to the compiler like the case attribute of a chunk. |
|||
*Implemented instructions: case-of, clipsl, cliptl, storesl, storetl, pushsb, pushbl, out, getCaseFrom, clip, storecl, lu-count. |
|||
*Implemented and added tests for proper handling of patterns in the trie: |
|||
**Patterns which start directly with tags (should accept any lemma, e.g. <n><pl> -> should accept student<n><pl>). |
|||
**Patterns which contain *, e.g. <n><*><sg><*><gen>. |
|||
**Patterns starting with a lemma should accept any case variation of that lemma. |
|||
*Implemented support for shallow transfer or advanced transfer. |
|||
*Added support for the interchunk and postchunk stages in the vm: parsing its input, implementing its specific instructions, specific tag values like “chcontent”, etc. |
|||
*Fixed some bugs in the vm, there is still a need to test it more though. |
|||
'''Week 8''' - (11/07 - 17/07): |
|||
---- |
---- |
||
Revision as of 16:11, 11 July 2011
Contents
About me
Name: Gabriel Gregori Manzano
Email/Google chat: Email me
IRC nick: ggregori
GSoC 2011
VM for the transfer module - Application
Github repository: [1]
TODO list
- Update the wiki page: only examples left.
Weekly reports
Community Bonding Period
Week 1 - (25/04 - 01/05): Basically this week has been dedicated to research/review some topics (some of them suggested by my mentor)
- I have been reviewing NLP and Python using 'Natural Language Processing with Python' book.
- I have been looking for a way to represent morphological labels in UCS/UTF and my mentor suggested using negative numbers as in Apertium internals. Anyway, I can worry about this later.
- Using UTF with Python: 'codecs' and 'unicodedata' can be some useful modules.
- Testing the option 'lt-proc' -b which is going to be the input of my compiler.
Week 2 - (02/05 - 08/05): This week I ended all the review/research needed, although I couldn't do all I wanted because I had to travel.
- Ended with the introductory book reviewing NLP and Python.
- Started designing and redefining the compiler's architecture following last year work and selected and did some tests with some modules. Some of the changes or improvements:
- Use of pipes/command-line arguments for the input of the compiler (like the rest of Apertium).
- Configurable logging module for info and debugging purposes (module: logging).
- Refactoring some methods in the expatparser class (e.g. extracting common code of the callback method).
- Create some additional classes in order to add some flexibility (e.g. parent class parser with the common code).
Week 3 - (09/05 - 15/05): This week I had to redo some work because of the Python3 switch, so didn't accomplish want I wanted. Anyway, two weeks of university classes remaining until I can focus exclusively in this project.
- Switched to Python 3, reasons:
- I hope to get better UTF-8 support among other things.
- Had to test if the modules I use were fully available/compatible in Python3.
- Had to read and research (again...) about str/bytes and std{in,out}.buffer and, in general, everything related to Unicode, UTF-8...
- Started implementing the really basics of the compiler’s architecture:
- Command-line arguments and help, input and output, logging...
- Another think I realized this week is that a lot of the thinking done last week about trying to make a flexible prototype so it is easy to modify in the future doesn’t really apply to Python. For example, my design involved creating interfaces/abstract classes in order to be able to easily change components, but that in Python isn’t needed. In conclusion: duck-typing, although I will need my design in the C++ version.
Coding Period
Week 1 - (16/05 - 29/05): This last days have been impossible with university work, just this week I had like 4 class projects and 2 exams... Tuesday next week I will finish everything and will be able to focus completely on my project.
Week 2 - (30/05 - 05/06): Finally I can focus completely on my project and this week I have developed a lot the compiler:
- Finished the structure of the project, now I am ready to start generating code from the transfer rules.
- Created the Github repository where I will submit my work (link is at the top).
- Implemented all the handling of the sections: def-cats, def-attrs, def-vars, def-lists and def-macros.
- Created some test macros with the desired output in pseudo-assembly.
- Implemented the generation of code for some elements: <not>, <equal>, b, <lit>, ...
- Improved some of the code, creating a SymbolTable, separating debugging output and actual output etc.
Week 3 - (06/06 - 12/06): This week I finished my compiler which is able to generate pseudo-assembly for every element of the transfer rules files.
- Added the ability to store some attributes like the number of children of an event, its parent etc.
- Created more tests of macro's code generation and added new for rules and t2x/t3x.
- Added some necessary instruction and change some other to maintain coherence.
- Added code generation for all the remaining elements and its attributes: <when>, <test>, var, <let>, <lit-tag>, <clip>, <choose>, <otherwise>, <equal caseless=yes>, <and>, <or>, <in>, <list>, <get-case-from>, <concat>, <append>, <modify-case>, <case-of>, <begins-with> <begins-with-list>, <end-with>, <end-with-list>, <contains-substring>, <rule>, <pattern>, <pattern-item>, <action>, <lu>, <mlu>, <tags>, <chunk>, <call-macro>, <with-param>, <interchunk>, <postchunk>, <lu count>.
- Created error detection and reporting for the input transfer rules files.
Week 4 - (13/06 - 19/06): I've spent most of the week thinking and designing the vm's architecture and started implementing it:
- Updated the vm-for-transfer wiki page with the current implemented instruction set.
- Created the initial architecture for the vm: dynamic instruction loader which converts instructions to a vm representation, and then an interpreter executes every instruction.
- Implemented some of it, for example the assemblyloader reads a file, converts some of its contents and fills the appropriate data structures.
Week 5 - (20/06 - 26/06): This week I have implemented almost all the vm, just a little but important detail remaining. Now the only thing left is the implementation of every instruction.
- Added more error checking to the compiler: check every call to a macro without doing a second full pass.
- Added proper handling of labels in the vm with backpatching for all the rules, macros and instructions needed (jmp, jz, jnz, addtrie and call).
- Implemented a simple system trie.
- Added a code-to-preload section to only add patterns to the trie once.
- Created the interpreter which initializes dynamically a dictionary with opCode : processingMethod pairs.
- Added preprocessing and execution capabilities including the structure for the creation of all the instruction processing methods and the vm's main loop.
- Implemented a callstack to handle rules calling macros because we need to store the last PC and its code section.
Week 6 - (27/06 - 03/07): This week's focus was on the reading of patterns which turned out to be harder than I thought, thanks to Sergio I now know how to do it (or at least I think so!).
- Improved/corrected some code generation like the variables problem or the modify-case hell.
- Implemented instructions: and, or, not, cmp, cmpi, cmp-substr, cmpi-substr, push, append, jz, jnz, lu, mlu, begins-with, begins-with-ig, ends-with, ends-with-ig, modifycase.
- Implemented LRLM of patterns to select the rules to execute, although this still needs work.
- Implemented more feature on the system trie, like the insertion of the '|' symbol for pattern's options.
Week 7 - (04/07 - 10/07): After a week of really hard work and long hours the vm is finished. There are still two things that I need to fix (blanks, and link-to) and I need to test it thoroughly too.
- Added more things to the compiler like the case attribute of a chunk.
- Implemented instructions: case-of, clipsl, cliptl, storesl, storetl, pushsb, pushbl, out, getCaseFrom, clip, storecl, lu-count.
- Implemented and added tests for proper handling of patterns in the trie:
- Patterns which start directly with tags (should accept any lemma, e.g. <n><pl> -> should accept student<n><pl>).
- Patterns which contain *, e.g. <n><*><sg><*><gen>.
- Patterns starting with a lemma should accept any case variation of that lemma.
- Implemented support for shallow transfer or advanced transfer.
- Added support for the interchunk and postchunk stages in the vm: parsing its input, implementing its specific instructions, specific tag values like “chcontent”, etc.
- Fixed some bugs in the vm, there is still a need to test it more though.
Week 8 - (11/07 - 17/07):