User:Kanmuri/Notes/Java Runtime Port/Interchunk vs Transfer

From Apertium
Jump to navigation Jump to search

I created this page to hold my notes as I go along trying to figure this out.

I wouldn't count on them being entirely correct, in fact they may be downright completely wrong in places, but hopefully not. ;) This is mainly just to have a place to organize my thoughts.

apertium_transfer.cc vs apertium_interchunk.cc

We are only really concerned with one function in these two files at the moment, main().

There's the obvious differences in the option parsing code, as interchunk doesn't have as many options as transfer does.

Transfer calls its read() function differently depending on the command-line options and parameters. Interchunk instead just assigns values to variables differently depending on the options and parameters, and then uses those to call its read() function.

Transfer and interchunk take different parameters for their read functions. Transfer takes a transfer file, a data file, and an fst file. Interchunk takes just a transfer file and a data file.

Transfer and interchunk then call their transfer() and interchunk() functions respectively.

apertium_transfer.cc vs ApertiumTransfer.java

Not a significant difference between these two. The basic premise is the same, parse command-line options and parameters, then call the read() and transfer() functions.

transfer.cc vs interchunk.cc

read()

void Transfer::read(string const &transferfile, string const &datafile, string const &fstfile)

void Interchunk::read(string const &transferfile, string const &datafile)

The read() functions first call readTransfer() and readInterchunk() respectively. They then both try and open the specified data file. After that, Transfer::read() also calls a function to read the fst file, called readBil(). Interchunk::read() does not have that call, and that function (readBil()) does not exist in interchunk.

transfer() vs interchunk()

Checks if null flush is set, then calls a null flush "wrapper" function (*_wrapper_null_flush()).

Initializes the MorphoStream. Then enters an infinite (while(true)) loop.

The code is the same up to the if(tmpword.size() != 0) line.

After that, the interchunk code diverges. It just outputs a '^', the contents of tmpword, and a '$', whereas the transfer code has a whole bunch of stuff it does.

They meet back up at tmpword.clear(), and continue the same up 'till the tt_eof case in a switch(current.getType()) statement. In the else clause of an if-else statement there, interchunk adds a line tmpblank.clear().

transfer_wrapper_null_flush() vs interchunk_wrapper_null_flush()

The only difference between the two of these is that one calls transfer(), and the other one calls interchunk().

If flushing on nulls, then what the wrapper does is call the respective function, which returns on a null, re-output the null, flush the output, then call the function again, until end of file.

One other significant thing is that it unsets the null_flush flag, so that we don't get the transfer/interchunk and the null flush functions infinitely calling each other. Instead it sets the internal_null_flush flag to indicate that null flush is actually enabled, but that we're already inside the wrapper.

applyWord()

An extra case in interchunk for skipping over "the unmodifiable part of the chunk" according to the comments.

applyRule()

Looks like Transfer has extra code for the bilingual dictionary. Also, Interchunk uses InterchunkWord and Transfer uses TransferWord.

readToken()

Interchunk has a bunch of extra if statements that look to be for jumping over chunks, and an "inword" flag.