Difference between revisions of "User:Irene/workplan"

From Apertium
Jump to navigation Jump to search
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Workplan ==
== Workplan ==
[https://github.com/irene-tang/separable-words on github]


{|class=wikitable
{|class=wikitable
Line 21: Line 20:
6/21: work on reordering module for Celtic language (welsh) <br />
6/21: work on reordering module for Celtic language (welsh) <br />
6/22: work on reordering module for <br />
6/22: work on reordering module for <br />
6/23: work on reordering module for && prototype should be finished <br />
6/23: work on reordering module for & prototype should be finished <br />
||
||
6/19: Still trying to get FST example to compile on my computer. Worked on the reordering module for English in c++ <br />
6/19: Still trying to get FST example to compile on my computer. Worked on the reordering module for English in c++ <br />
6/20: Spent a lot of time trying to fix hashing errors with the c++ prototype, and then gave up with c++. Switched everything to python because I've been spending too much time working out c++ API. Wrote documentation. Still trying to get [https://github.com/unittest-cpp/unittest-cpp unittest++] and the fst example to compile. <br />
6/20: Spent a lot of time trying to fix hashing errors with the c++ prototype, and then gave up with c++. Switched everything to python because I've been spending too much time working out c++ API. Wrote documentation. Still trying to get [https://github.com/unittest-cpp/unittest-cpp unittest++] and the fst example to compile. <br />
6/21: Separated the prototype (now scripted in python) into organized files.
6/21: Separated the prototype (now scripted in python) into organized files. The reordering module for English and Spanish are under control (there is still some tedious work to with them, which I was not able to get to today). I was I think I am feeling much more confident in being able to get the prototype working by the evaluation deadline this week. I had listed a bunch of languages in my original proposal, except I couldn't find data on separable verbs except for German (. I feel a little out-of-place for fiddling with languages that I am not familiar with. However, I am not confident at all that any of this will even be of use (other than the basic idea) when we try to integrate it into Apertium because most of my code will probably be replaced with existing Lttoolbox functions. :(
6/23: video meeting.
6/22: I think I accomplished very little today...Wrote a small handful of tests (python's unittest is great). Slightly enhanced the module: put deliminators (commas, periods) in the right place when transferring to output file; combined the multiword (e.g. 'take<>#out' rather than 'take<> out<>'. I'm really losing motivation to continue enhancing the module because I know none of the prototype script (whether it's in python or c) will be of much actual use. I'm looking forward to next week, after the first deadline is over, so I will be able to spend more time understanding Lttoolbox API. I found it frustrating on days when I did not accomplish very many useful lines of code at the end of the day, but I tried to keep in mind that a significant portion of any programming project is to read and modify existing code.
6/24: migrated to source forge. added testing set.
||
||

|-
|-
| || ||
| || ||
||
|


|-
|-
Line 36: Line 37:
|-
|-


| 5 || 6/26 - 7/2 ||
| 5 || 6/26 - 7/2 || hard-coded finite-state acceptor for 'take out'
||
6/26: trying to make the program backtrack when it gets to <ANY_CHAR> or <ANY_TAG> <br />
6/27: debugging & supporting any number of tags <br />
6/28: successfully reads and prints ^take<vblex><pres><tag1><tag2><tag3><tag4>$ ^the<det><tag1><tag2><tag3><tag4>$ ^thing<n><sg><tag><Tag>$ ^out<adv>$ <br />
6/29: working on being selective about what middle words are accepted <br />
7/1: python prototype for acceptor is pretty much working, just needs to be able to read from corpuses that don't put every sentence on a new line, and to assign numbers to states in a more elegant fashion. <br />
|-
|-


| 6 || 7/3 - 7/9 ||
| 6 || 7/3 - 7/9 || hard-coded finite-state transducer for 'take out'
||
7/3: tried to convert the python script to c++ code. trying to use lttoolbox's FST class. <br />
7/4: still trying to convert to c++ and use lttoolbox <br />
7/5:
7/6:
7/7:
7/8:
7/9:
|-
|-


| 7 || 7/10 - 7/16 ||
| 7 || 7/10 - 7/16 || xml format, working compiler and processor
|-
|-


| 8 || 7/17 - 7/23 ||
| 8 || 7/17 - 7/23 || improving dictionary, compiler, and processor
|-
|-


!'''Second evaluation''' !! 7/24 - 7/28 !! XML representation, finite-state implementation
!'''Second evaluation''' !! 7/24 - 7/28 !! finite-state implementation
|-
|-


Line 54: Line 69:
|-
|-


| 10 || 7/31 - 8/6 || support for individual language pairs
| 10 || 7/31 - 8/6 || superblanks, integration, fstp
||
7/31: worked on superblanks, used fstp object <br />
8/1: improvements <br />
8/2: insert superblanks between the # in e.g. 'take# out' and between words in counterexamples by amending 'in' and 'out' strings, fixed error where ^ was not printing at the end of reordering a sep. multiword, updated dictionary to improve success rate <br />
|-
|-


| 11 || 8/7 - 8/13 || (cont. support for individual language pairs)
| 11 || 8/7 - 8/13 || (cont. support for individual language pairs)
||
testing and refining for beta testing languages: kaz/kir, deu, eng, fao-nor
|-
|-


| 12 || 8/14 - 8/20 || (cont. support for individual language pairs)
| 12 || 8/14 - 8/20 || (cont. support for individual language pairs)
||
support for +thing, lsx-comp appends <j/> to the end of every entry, before <e/>, causes issues with paradigm, remove feature. <br/>
|-
|-


8/17:
| 13 || 8/21 - 8/27 || (cont. support for individual language pairs)
| 13 || 8/21 - 8/27 || (cont. support for individual language pairs)
|-
|-

Latest revision as of 18:45, 17 August 2017

Workplan[edit]

8/17:
Week Dates Goals Progress/Notes Evaluation
1 5/30 - 6/4 some data, find test corpus
2 6/5 - 6/11 script to bootstrap separable multiwords from dictionaries, set up testing framework, support/preparing data for English separable verbs
3 6/12 - 6/18 preparing data, prototype script set up, read specifications of Lttoolbox API
4 6/19 - 6/25

6/19: separate out the language-dependent functions in the c++ prototype, work on reordering module for Romance languages (Spanish, Portuguese)
6/20: work on reordering module for Germanic language (Swedish)
6/21: work on reordering module for Celtic language (welsh)
6/22: work on reordering module for
6/23: work on reordering module for & prototype should be finished

6/19: Still trying to get FST example to compile on my computer. Worked on the reordering module for English in c++
6/20: Spent a lot of time trying to fix hashing errors with the c++ prototype, and then gave up with c++. Switched everything to python because I've been spending too much time working out c++ API. Wrote documentation. Still trying to get unittest++ and the fst example to compile.
6/21: Separated the prototype (now scripted in python) into organized files. 6/23: video meeting. 6/24: migrated to source forge. added testing set.

First evaluation 6/26 - 6/30 testing framework set up + prototype system in Python
5 6/26 - 7/2 hard-coded finite-state acceptor for 'take out'

6/26: trying to make the program backtrack when it gets to <ANY_CHAR> or <ANY_TAG>
6/27: debugging & supporting any number of tags
6/28: successfully reads and prints ^take<vblex><pres><tag1><tag2><tag3><tag4>$ ^the<det><tag1><tag2><tag3><tag4>$ ^thing<n><sg><tag><Tag>$ ^out<adv>$
6/29: working on being selective about what middle words are accepted
7/1: python prototype for acceptor is pretty much working, just needs to be able to read from corpuses that don't put every sentence on a new line, and to assign numbers to states in a more elegant fashion.

6 7/3 - 7/9 hard-coded finite-state transducer for 'take out'

7/3: tried to convert the python script to c++ code. trying to use lttoolbox's FST class.
7/4: still trying to convert to c++ and use lttoolbox
7/5: 7/6: 7/7: 7/8: 7/9:

7 7/10 - 7/16 xml format, working compiler and processor
8 7/17 - 7/23 improving dictionary, compiler, and processor
Second evaluation 7/24 - 7/28 finite-state implementation
9 7/24 - 7/30 integration with Apertium: fit module between pre-transfer and lt-proc-b
10 7/31 - 8/6 superblanks, integration, fstp

7/31: worked on superblanks, used fstp object
8/1: improvements
8/2: insert superblanks between the # in e.g. 'take# out' and between words in counterexamples by amending 'in' and 'out' strings, fixed error where ^ was not printing at the end of reordering a sep. multiword, updated dictionary to improve success rate

11 8/7 - 8/13 (cont. support for individual language pairs)

testing and refining for beta testing languages: kaz/kir, deu, eng, fao-nor

12 8/14 - 8/20 (cont. support for individual language pairs)

support for +thing, lsx-comp appends <j/> to the end of every entry, before <e/>, causes issues with paradigm, remove feature.

13 8/21 - 8/27 (cont. support for individual language pairs)
Final evaluation 8/29 - 9/5 finite-state implementation in C++ with lttoolbox