As you can see, the merged result is quite optimized, and we would expect this to be much faster than using temporary framebuffers.
There are still some cases where we will need temporary framebuffers. For example, for large effect chains using complicated shaders, most shader compilers will choke on the merged result. For the games I've looked at, this has only been a problem with the following effect chain:
noisemask, lottes, blurhorizontal, separatechannel, separatechannel, hsladjust
The culprit here is Lotte's CRT shader, which is a large shader.
To be fair to the standard C2/C3 runtimes, even though this optimization would be nice to have in them, I can see why it wouldn't be feasible to implement. For example, some users may depend on some very specific behavior from the renderer, and this could be a non-backwards compatible change. Also, dealing with large effect chains (like the one above) may not be straightforward.
InvaderXYZ
Please check your inbox, you should have received a reply!
Cryptwalker
Thanks a lot! I will take a look when time allows :)
NetOne
Nice! I feel Iconoclasts definitely deserves all the praise it has received.