So that particular case touches on a pretty obscure part of the engine. Another piece of WebGL performance advice is not to submit huge buffers in one go, but to actually submit them in chunks. This also helps keep the memory usage down and reduce latency to issuing work to the GPU. So the engine issues chunks of several thousand quads at a time. In the quadissue case, it reaches extreme levels of sprite batching so you are seeing lots of chunks.
There is nothing to gain by improving this. It looks like it's submitting about 2500 sprites at a time, which means the draw call overhead is about 0.04% of the naive case of one call per sprite. If we increased this to say 5000, it would make such a tiny difference it is totally irrelevant (0.02%), while increasing memory usage and latency. So like most engineering tasks there's a tradeoff here, and we've aimed at a good sweet spot.
So you are in fact looking at the batching engine working in ideal circumstances, and accusing it of bad performance. You should not jump to conclusions about parts of the engine you don't understand.
GPU fillrate is the bottleneck that most people run in to, so that is probably the limit in your game too.