Understanding Draw Calls.

  • In my own project, checking the debugger I always notice draw calls using most CPU, more than all my game logic combined. Even if I'm not using any Blend modes, WebGL effects, particles, etc. I want to understand why it is so high and what I can do to reduce it.

    The only way I found to reduce it is to reduce the number of sprites on screen. Sometimes even merging graphics to one big sprite to reduce draw calls (but uses more memory). And as Draw calls goes up, I can notice frame rate dropping.

    My question goes out to the Ashley and the devs, if there's anything more they can do on their end to optimize this further on their and, and enlighten us a bit more of how it works and why it's using that much CPU. This seems to be a huge CPU hog even for simple games. Especially for me as I'm designing for mobile.

  • If you can provide a minimal .capx that shows high draw call usage, I'd be happy to investigate optimising the engine. Without that the most I can do is speculate.

  • Ashley No problem, i can send over the actual project. Where do I send it as I don't want to share it publicly?

  • As always a minimal project if you can provide one is far more useful, otherwise you can email it to

  • Ashely Sent you a project file on mail.

    Draw calls is another matter really, and happens on the CPU side. It's probably best to split that topic off to a new thread. We have OpenGL ES 3 equivalent capabilities with WebGL 2 though, so if at any point draw calls prove to be a bottleneck, it's something we can potentially optimise in exactly the same way a native app would adjust their draw calls to be more efficient. Most 3D APIs, WebGL included, are specifically designed to allow as much drawing as possible with the fewest draw calls, to as far as possible eliminate the CPU overhead.

    I'm not a programmer, but I just feel that drawing, and draw calls is not very optimized currently. Like it's drawing every single sprite, multiple times per frame, instead of drawing from a buffer, lot of things at once.

    And it feels like there's a lot of overhead currently. And that there's a lot of room for improvement. Especially when it comes to draw calls and rendering.

    [quote:2pwvp7hf] - next I tested drawing with ANGLE_instanced_arrays, object positions are computed on CPU, written to a (double-buffered) dynamic vertex buffer, and then rendered with a single draw call, in Chrome on Windows with NVIDIA I can get 450k instances before the performance drops below 60fps (so 450k particle position updates per frame in JS, and no sweat!), performance in a native app isn't better here, my suspicion is that the vertex buffer update is the limiter here (500k instances means 8MByte of dynamic vertex data shuffled to the GPU each frame), on my OSX MBP I can go up to about 180k instances (again very likely vertex throughput limited). However in this case, the way the dynamic vertex buffer works is also important, it looks like vertex buffer orphaning is useless in WebGL (see discussion here: https://groups.google.com/forum/#!topic ... MNXSNRAg8M), so I switched to double-buffering

    Reading that quite it seems some people seem to be getting way more performance out of WebGL that we currently can in C2/C3, which I believe is due to overhead. Maybe both from draw calls and the way it is rendered? Any possibility there's something to this?

    I'm not a engine programmer, I'm a designer, but it just seems C2/C3 could perform a lot better, than it currently is, by minimizing overhead.

  • I also noticed that I was able to get a good amount of performance boost, by merging most of my assets to as few sprites as possible, adding all assets to different frames, and animations, as everything is rendered "per texture", so that they are in the same spritesheet. If I wasn't doing that I wouldn't be getting as good performance as I currently am.

    So, my conclusion... use as few sprites as possible, but add all assets to the same sprite will increase performance, since they then will be on the same "TEXTURE" (spritesheet), will result in fewer draw calls, less overhead, and less drawing per frame.

    I was checking the c2runtime.js webGL the whole GL section.

    Are we allowed to modify c2runtime.js?, because i would like to make some test to see if I could make some improvements there.

  • Testing bunnymark VS my construct project rendering There is way less calls here, and far more bang for the buck. Looking at the WebGL inspector they are rendering things differently than C2 does. Seems to be using buffers.

    It seems the way C2 render stuff has a LOT more overhead...

    http://www.goodboydigital.com/pixijs/bunnymark/

    here's the link to bunnymark if anyone want to try it on their phone to test performance.

    I can have 1500 bunnies jumping around on a midrange (Nokia Lumia 830) before framerate goes below full 60fps.

    My construct project is struggling the same phone with 50 static object on screen. No animations, nothing moving.

    Here's a screenshot from my game, at an area with very few objects, CPU is pretty high, mostly due to draw calls. Framrate is getting low. About 50ish, with just a few static objects on the map.

    Here's a screenshot of Bunnymark with a lots of objects jumping around. at a similar framerate 60fps.

    I'm pretty confident that Ashley claiming near native performance is possible with WebGL, but not with the current implementation, as it's REALLY inefficient.

    Please take a look at this.... it's not only me experiencing bad performance, i think construct can do it better. It's just sloppy implementation, and bad optimization.

    And I think this should be a first priority, as people are choosing other engines due to performance issues.

  • This thread should have enough solid proof now that the way C2 does the rendering is not very efficient at all, considering it's WebGL, and what it should be capable of.

    If you can provide a minimal .capx that shows high draw call usage, I'd be happy to investigate optimising the engine. Without that the most I can do is speculate.

    So get on with it

    I'd be happy to play with the new superfast C2, C3, once the optimizations are in

  • You're worrying over nothing. There is nothing here to suggest any performance problems.

    The screenshot you posted shows it swapping texture between draw calls, which is normal. After you export - or in C3's preview mode, which uses in-editor spritesheeting - a great deal of those texture calls will disappear, as it combines most of the images on to just a few spritesheet textures. So that particular case is unique to preview mode in C2, and does not happen after export or in C3 at all.

    Based on the fact very few people have ever noticed a performance difference between preview and export in C2, I'd say the overhead doesn't matter anyway.

  • I've noticed. Thats why I came to the same conclusion and put many objects into one spritesheet so it exists only on 1 texture. This is why both Unity and Gamemaker have ways to intelligently merge sprites onto one texture. This is why I've requested an optimization feature for C3 (that got denied).

    Maybe only the more advanced users come to this conclusion and that's why you don't hear it often.

  • You're worrying over nothing. There is nothing here to suggest any performance problems.

    Are you kidding me? Here's a new screenshot.... The only thing i did was to increase the number of sprites in layout to about 1000... Take notice... IN LAYOUT, not on screen, none of them are moving, just static sprites, and framerate dropped to 30fps.

    Draw calls also increase along with the number of sprites, becuase you're not using buffers!

    Of course, 100draw calls is not very much for a small game on a powerful device, but people doing large games and games for mobile ARE noticing the bad performance. Because, you're not even implementing best practices... general things you should do.

    https://developer.mozilla.org/en-US/docs/Web/API/WebGL_API/WebGL_best_practices

    Fewer, larger draw operations will improve performance. If you have 1000 sprites to paint, try to do it as a single drawArrays() or drawElements() call. You can draw degenerate (flat) triangles if you need to draw discontinuous objects as a single drawArrays() call[/code:38chx5cy]. 
    that's exactly what bunnymark is doing when I check the webGL inspecitor.
    
    Using a webGL inspector i can clearly see you're not doing that! As I said you may not notice it, for small games on powerful devices, but you will notice it for LARGE games, and Mobile games.
    
    Please please please....  Just try to look in to at least using best practices, and use a drawArray. It's a known fact that WebGL overhead is an issue, and you're doing nothing to minimize it.
    
    Or is my only option to modify c2runtime.js myself to prove you wrong?
    
    I can easily say that just by that little tweak we would get a LOT better performance.
    
    If I'm wrong I'd be happy to send you a fine bottle of whiskey.
    If you're wrong, the only thing you have to lose is a little time, and getting more happy customers because of a small tweak to how things are drawn
  • Your screenshot shows FPS < 60 and CPU well under 100%, which is typically indicative of the GPU hardware being the bottleneck. So there's no evidence draw calls are the limitation there.

    [quote:3elchon3]

    Fewer, larger draw operations will improve performance. If you have 1000 sprites to paint, try to do it as a single drawArrays() or drawElements() call. You can draw degenerate (flat) triangles if you need to draw discontinuous objects as a single drawArrays() call[/code:3elchon3]. [/p]
    that's exactly what bunnymark is doing when I check the webGL inspecitor.[/p]
    [/p]
    Using a webGL inspector i can clearly see you're not doing that! As I said you may not notice it, for small games on powerful devices, but you will notice it for LARGE games, and Mobile games.[/p]
    [/p]
    The engine [i]does[/i] already do that, with a sophisticated batching engine. But changing texture is one of the operations that has to split the batch. In C3, or after export, textures are merged in to spritesheets and the batching works better since there are fewer texture swaps.[/p]
    [/p]
    So we're already doing everything you've asked for.
  • Try Construct 3

    Develop games in your browser. Powerful, performant & highly capable.

    Try Now Construct 3 users don't see these ads
  • To prove the point, try running the WebGL inspector on https://www.scirra.com/demos/c2/quadissueperf/. It can draw over 10,000 sprites in ~50 draw calls, most of which is just overhead to render the layout and text. In fact there's just one draw call that renders most of the sprites:

    [quote:1zpii4o0]37 drawElements(TRIANGLES, 11994, UNSIGNED_SHORT, 0)

    This is the batching engine working exactly as intended.

    This is a complex technical part of the engine, please don't jump to conclusions or make assumptions about what the engine is or isn't doing.

  • Your screenshot shows FPS < 60 and CPU well under 100%, which is typically indicative of the GPU hardware being the bottleneck. So there's no evidence draw calls are the limitation there.

    [quote:lmn072l8]

    Fewer, larger draw operations will improve performance. If you have 1000 sprites to paint, try to do it as a single drawArrays() or drawElements() call. You can draw degenerate (flat) triangles if you need to draw discontinuous objects as a single drawArrays() call[/code:lmn072l8]. 
    that's exactly what bunnymark is doing when I check the webGL inspecitor.
    
    Using a webGL inspector i can clearly see you're not doing that! As I said you may not notice it, for small games on powerful devices, but you will notice it for LARGE games, and Mobile games.
    

    The engine does already do that, with a sophisticated batching engine. But changing texture is one of the operations that has to split the batch. In C3, or after export, textures are merged in to spritesheets and the batching works better since there are fewer texture swaps.

    So we're already doing everything you've asked for.

    No it doesn't! Use a WebGl inspector and check for your self! The aim should be 1 draw per frame, that's it! Yes and splitting the batch you're creating 100's of draw, where you could be doing a single one, with all the sprites in one go!

    Stepping through the C2 draws, I can see what you're explaining... some things are batched together, drawing layer upon layer 100 times per frame, where you SHOULD be drawing 1 time per frame as the bunnymark example is doing. All the sprites in one go!! The implementation is sloppy, It's doing it completely wrong with loads of unnecessary overhead.

    There IS an overhead issue, and it scales directly with number of sprites(draws), as you're rending layer upon layer of "drawElements", where all of it could be drawn in one go.

    I'm getting lots draws per frame, layer upon layer, upon layer, and i can step through them one by one to see how it's layered.

    Bunnymark is using 1 draw per frame, as you SHOULD be aiming for, no matter how many bunnies on screen, it's always 1 draw per frame.

    I don't even know why I have to point out the obvious?

    Do I have your permission to modify c2runtime.js and do it the right way?

  • No it doesn't! Use a WebGl inspector and check for your self! The aim should be 1 draw per frame, that's it! Yes and splitting the batch you're creating 100's of draw, where you could be doing a single one, with all the sprites in one go!

    It already does. The entry I showed you draws all the sprites in one go. The other calls are to draw things like the text and the spinner in the corner.

    Sorry, but I don't think you actually understand how WebGL rendering works.

Jump to:
Active Users
There are 1 visitors browsing this topic (0 users and 1 guests)