User:Asfrent/GSoC Log

From Apertium
Jump to navigation Jump to search

GSoC Log

06.06.2014

  • wrote a new test suite under new_tests folder.
  • added language pairs es-ro, ro-es, en-es, es-en.
  • three types of tests:
    • normal - compare the output of xfervm with the one of the xml treewalking based transfer.
    • memory - tests using valgrind --tool=memcheck.
    • performance - tests using valgrind --tool=callgrind.
  • fixed valgrind error complaining about uninitialized raw pointers.
  • starting running time of all test suite is 8m28.204s (508s).

07.06.2014

  • fixed memory leak in SystemTrie, most of the memcheck tests pass now.
  • fixed invalid memory access in ChunkWord.
  • analysed the code for other bugs, discovered issues because of rule ambiguity in XML files.
  • sped up methods of VMWstringUtils. Tests run twice as fast.

10.06.2014

  • bug hunting all day.
  • discovered issues with <modify-case> in XML rules of es-ro.
  • implemented instruction line number for debugging purposes.
  • fixed "numbers are considered uppercase issue".
  • after a discussion with spectie on IRC we decided to move to the next phase of the project, the implementation of a compressed trie datastructure. The tests have to be redone, I will do the changes to the code as to keep the current behavior (output), buggy or not. The rationale behind this decision is that most of the bugs we analysed so far were due to wrong XML rules, rather than code bugs.
  • implemented the new testing strategy. All tests pass. There is still one memcheck test not passing because of wrong XML rules.
  • ran the es-ro es_1000 stage1 test under callgrind tool in order to analyse the performance and decide the next thing to optimize before merging testing branch. The total running time under callgrind was 307 seconds.
  • ran the normal tests, took 4m12.871s (252 seconds).
  • replaced vectors with lists, used list::splice and std::move. Tests pass, the full test suite tales 3m19.230s (199 seconds).
  • merged the testing branch into master. Remaining todos in the testing framework will be addressed in a separate branch.
  • created a new branch for SystemTrie optimizations, system-trie-opt.
  • it seems that making a function for converting strings (template<typename T> T stringTo(const string&)) lowers the time by about 20 more seconds. Two tests fail because the rules.xml file is wrong - test generation gives some warnings about it. This will be taken to spectie tomorrow. Changes undone.