Difference between revisions of "Talk:Reordering superblanks"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
==Ensuring transfer rules output all regular superblanks==
+
==Should we let rules deal with superblanks in t2x/t3x?==
  +
To deal with t2x/chunk-reordering issues "Transfer modules should ignore <code><nowiki><b pos="N"/></nowiki></code> elements, outputting non-inline blanks before rules". But after chunking, it should be fine to let the rule writer manually output b elements?
Transfer rules some times forget to include all (regular) superblanks from the input (see earlier [https://sourceforge.net/p/apertium/mailman/apertium-stuff/thread/20cf28cd0904300204v45f35e51i118f4d146f83748@mail.gmail.com/ discussion from 2009]). This can of course mess up HTML, and it is frustrating that the developer has to ensure all rules have the right number of <code><nowiki><b pos="N"/></nowiki></code>, e.g. for a three-lu pattern we need to output both <code><nowiki><b pos="1"/></nowiki></code> and <code><nowiki><b pos="2"/></nowiki></code>.
 
   
  +
But then the rule writer has to do manual blank handling again, we should avoid this.
This could be done mechanically by transfer at runtime instead of by the rule writer. Any rule will match a certain number of lu's, with one (super)blank between each lu (currently available in the b elements), and the action part will output a certain number of lu's.
 
  +
  +
However, we could mechanically output regular superblanks in between chunks.
  +
 
Any t2x rule will match a certain number of chunks, with one (super)blank between each chunk (currently available in the b elements), and the action part will output a certain number of chunks.
 
* For a 1-pattern rule, there can be no superblanks between patterns, so there are no superblanks to output. This is the simple case.
 
* For a 1-pattern rule, there can be no superblanks between patterns, so there are no superblanks to output. This is the simple case.
 
* For a 2-pattern rule, there is exactly one superblank between patterns. Now we have to run the rule, and look at the output before printing it.
 
* For a 2-pattern rule, there is exactly one superblank between patterns. Now we have to run the rule, and look at the output before printing it.
Line 11: Line 15:
 
** Read the second chunk, print that chunk, print the second superblank
 
** Read the second chunk, print that chunk, print the second superblank
 
** Etc. until all chunks are read, print remaining superblanks.
 
** Etc. until all chunks are read, print remaining superblanks.
 
This can be made backwards compatible with existing rule files, by simply ignoring any existing &lt;b&gt; elements that have the pos attribute.
 
 
However, this solution does '''not''' help with the blanks-in-chunks problem. The [[Reordering superblanks#Possible solution]], however, would.
 

Revision as of 09:10, 26 May 2014

Should we let rules deal with superblanks in t2x/t3x?

To deal with t2x/chunk-reordering issues "Transfer modules should ignore <b pos="N"/> elements, outputting non-inline blanks before rules". But after chunking, it should be fine to let the rule writer manually output b elements?

But then the rule writer has to do manual blank handling again, we should avoid this.

However, we could mechanically output regular superblanks in between chunks.

Any t2x rule will match a certain number of chunks, with one (super)blank between each chunk (currently available in the b elements), and the action part will output a certain number of chunks.

  • For a 1-pattern rule, there can be no superblanks between patterns, so there are no superblanks to output. This is the simple case.
  • For a 2-pattern rule, there is exactly one superblank between patterns. Now we have to run the rule, and look at the output before printing it.
    • If output contains zero or one chunks, put the superblank after the output.
    • If output contains two or more chunks, put the superblank after the first chunk.
  • Generalising this, look at the output, and interleave chunks and superblanks, that is:
    • Read the first chunk, print that chunk, print the first superblank
    • Read the second chunk, print that chunk, print the second superblank
    • Etc. until all chunks are read, print remaining superblanks.