Difference between revisions of "User:Irene/workplan"

From Apertium
Jump to navigation Jump to search
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
== Workplan ==
 
== Workplan ==
[https://github.com/irene-tang/separable-words on github]
 
   
 
{|class=wikitable
 
{|class=wikitable
Line 21: Line 20:
 
6/21: work on reordering module for Celtic language (welsh) <br />
 
6/21: work on reordering module for Celtic language (welsh) <br />
 
6/22: work on reordering module for <br />
 
6/22: work on reordering module for <br />
6/23: work on reordering module for && prototype should be finished <br />
+
6/23: work on reordering module for & prototype should be finished <br />
 
||
 
||
 
6/19: Still trying to get FST example to compile on my computer. Worked on the reordering module for English in c++ <br />
 
6/19: Still trying to get FST example to compile on my computer. Worked on the reordering module for English in c++ <br />
 
6/20: Spent a lot of time trying to fix hashing errors with the c++ prototype, and then gave up with c++. Switched everything to python because I've been spending too much time working out c++ API. Wrote documentation. Still trying to get [https://github.com/unittest-cpp/unittest-cpp unittest++] and the fst example to compile. <br />
 
6/20: Spent a lot of time trying to fix hashing errors with the c++ prototype, and then gave up with c++. Switched everything to python because I've been spending too much time working out c++ API. Wrote documentation. Still trying to get [https://github.com/unittest-cpp/unittest-cpp unittest++] and the fst example to compile. <br />
  +
6/21: Separated the prototype (now scripted in python) into organized files.
6/21: Separated the prototype (now scripted in python) into organized files. The reordering module for English, Spanish, and Portuguese are under control (there is still some tedious work to with them, which I was not able to get to today). I was I think I am feeling much more confident in being able to get the prototype working by the evaluation deadline this week, for all of the languages that I wrote in my original proposal. I do feel out-of-place for fiddling with languages that I am not familiar with. However, I am not confident at all that any of this will even be of use (other than the basic idea) when we try to integrate it into Apertium because most of my code will probably be replaced with existing Lttoolbox functions. :(
 
  +
6/23: video meeting.
  +
6/24: migrated to source forge. added testing set.
 
||
 
||
  +
 
|-
 
|-
 
| || ||
 
| || ||
||
+
|
   
 
|-
 
|-
Line 35: Line 37:
 
|-
 
|-
   
| 5 || 6/26 - 7/2 ||
+
| 5 || 6/26 - 7/2 || hard-coded finite-state acceptor for 'take out'
  +
||
  +
6/26: trying to make the program backtrack when it gets to <ANY_CHAR> or <ANY_TAG> <br />
  +
6/27: debugging & supporting any number of tags <br />
  +
6/28: successfully reads and prints ^take<vblex><pres><tag1><tag2><tag3><tag4>$ ^the<det><tag1><tag2><tag3><tag4>$ ^thing<n><sg><tag><Tag>$ ^out<adv>$ <br />
  +
6/29: working on being selective about what middle words are accepted <br />
  +
7/1: python prototype for acceptor is pretty much working, just needs to be able to read from corpuses that don't put every sentence on a new line, and to assign numbers to states in a more elegant fashion. <br />
 
|-
 
|-
   
| 6 || 7/3 - 7/9 ||
+
| 6 || 7/3 - 7/9 || hard-coded finite-state transducer for 'take out'
  +
||
  +
7/3: tried to convert the python script to c++ code. trying to use lttoolbox's FST class. <br />
  +
7/4: still trying to convert to c++ and use lttoolbox <br />
  +
7/5:
  +
7/6:
  +
7/7:
  +
7/8:
  +
7/9:
 
|-
 
|-
   
| 7 || 7/10 - 7/16 ||
+
| 7 || 7/10 - 7/16 || xml format, working compiler and processor
 
|-
 
|-
   
| 8 || 7/17 - 7/23 ||
+
| 8 || 7/17 - 7/23 || improving dictionary, compiler, and processor
 
|-
 
|-
   
!'''Second evaluation''' !! 7/24 - 7/28 !! XML representation, finite-state implementation
+
!'''Second evaluation''' !! 7/24 - 7/28 !! finite-state implementation
 
|-
 
|-
   
Line 53: Line 69:
 
|-
 
|-
   
| 10 || 7/31 - 8/6 || support for individual language pairs
+
| 10 || 7/31 - 8/6 || superblanks, integration, fstp
  +
||
  +
7/31: worked on superblanks, used fstp object <br />
  +
8/1: improvements <br />
  +
8/2: insert superblanks between the # in e.g. 'take# out' and between words in counterexamples by amending 'in' and 'out' strings, fixed error where ^ was not printing at the end of reordering a sep. multiword, updated dictionary to improve success rate <br />
 
|-
 
|-
   
 
| 11 || 8/7 - 8/13 || (cont. support for individual language pairs)
 
| 11 || 8/7 - 8/13 || (cont. support for individual language pairs)
  +
||
  +
testing and refining for beta testing languages: kaz/kir, deu, eng, fao-nor
 
|-
 
|-
   
 
| 12 || 8/14 - 8/20 || (cont. support for individual language pairs)
 
| 12 || 8/14 - 8/20 || (cont. support for individual language pairs)
  +
||
  +
support for +thing, lsx-comp appends <j/> to the end of every entry, before <e/>, causes issues with paradigm, remove feature. <br/>
 
|-
 
|-
   
  +
8/17:
 
| 13 || 8/21 - 8/27 || (cont. support for individual language pairs)
 
| 13 || 8/21 - 8/27 || (cont. support for individual language pairs)
 
|-
 
|-

Latest revision as of 18:45, 17 August 2017

Workplan[edit]

8/17:
Week Dates Goals Progress/Notes Evaluation
1 5/30 - 6/4 some data, find test corpus
2 6/5 - 6/11 script to bootstrap separable multiwords from dictionaries, set up testing framework, support/preparing data for English separable verbs
3 6/12 - 6/18 preparing data, prototype script set up, read specifications of Lttoolbox API
4 6/19 - 6/25

6/19: separate out the language-dependent functions in the c++ prototype, work on reordering module for Romance languages (Spanish, Portuguese)
6/20: work on reordering module for Germanic language (Swedish)
6/21: work on reordering module for Celtic language (welsh)
6/22: work on reordering module for
6/23: work on reordering module for & prototype should be finished

6/19: Still trying to get FST example to compile on my computer. Worked on the reordering module for English in c++
6/20: Spent a lot of time trying to fix hashing errors with the c++ prototype, and then gave up with c++. Switched everything to python because I've been spending too much time working out c++ API. Wrote documentation. Still trying to get unittest++ and the fst example to compile.
6/21: Separated the prototype (now scripted in python) into organized files. 6/23: video meeting. 6/24: migrated to source forge. added testing set.

First evaluation 6/26 - 6/30 testing framework set up + prototype system in Python
5 6/26 - 7/2 hard-coded finite-state acceptor for 'take out'

6/26: trying to make the program backtrack when it gets to <ANY_CHAR> or <ANY_TAG>
6/27: debugging & supporting any number of tags
6/28: successfully reads and prints ^take<vblex><pres><tag1><tag2><tag3><tag4>$ ^the<det><tag1><tag2><tag3><tag4>$ ^thing<n><sg><tag><Tag>$ ^out<adv>$
6/29: working on being selective about what middle words are accepted
7/1: python prototype for acceptor is pretty much working, just needs to be able to read from corpuses that don't put every sentence on a new line, and to assign numbers to states in a more elegant fashion.

6 7/3 - 7/9 hard-coded finite-state transducer for 'take out'

7/3: tried to convert the python script to c++ code. trying to use lttoolbox's FST class.
7/4: still trying to convert to c++ and use lttoolbox
7/5: 7/6: 7/7: 7/8: 7/9:

7 7/10 - 7/16 xml format, working compiler and processor
8 7/17 - 7/23 improving dictionary, compiler, and processor
Second evaluation 7/24 - 7/28 finite-state implementation
9 7/24 - 7/30 integration with Apertium: fit module between pre-transfer and lt-proc-b
10 7/31 - 8/6 superblanks, integration, fstp

7/31: worked on superblanks, used fstp object
8/1: improvements
8/2: insert superblanks between the # in e.g. 'take# out' and between words in counterexamples by amending 'in' and 'out' strings, fixed error where ^ was not printing at the end of reordering a sep. multiword, updated dictionary to improve success rate

11 8/7 - 8/13 (cont. support for individual language pairs)

testing and refining for beta testing languages: kaz/kir, deu, eng, fao-nor

12 8/14 - 8/20 (cont. support for individual language pairs)

support for +thing, lsx-comp appends <j/> to the end of every entry, before <e/>, causes issues with paradigm, remove feature.

13 8/21 - 8/27 (cont. support for individual language pairs)
Final evaluation 8/29 - 9/5 finite-state implementation in C++ with lttoolbox