How the Construct 2 WebGL renderer works

2
Official Construct Team Post
Ashley's avatar
Ashley
  • 26 Mar, 2014
  • 2,352 words
  • ~9-16 mins
  • 5,120 visits
  • 1 favourites

A post came up on the forum asking some technical questions about how Construct 2's WebGL renderer works. I wrote a brief overview in a reply, but thought I could break it out in to a blog post and go in to some more detail.

Construct 2's WebGL renderer is a batched back-to-front renderer. Batching means it efficiently builds a queue of rendering commands, eliminating redundant calls (like setting the same texture twice in a row), and merging similar calls (like incrementing a "draw 5 sprites" command to a "draw 6 sprites" command if the same sprite is drawn again). Our renderer is very good at this, building on our experience of writing a similar batched renderer in our old native DirectX 9 engine for Construct Classic. Back-to-front rendering means it starts rendering the bottom object on the bottom layer, and renders everything on-screen in upwards Z order. This means the last thing drawn has the top Z order, and therefore displays on top of everything else.

I'm not sure why, but people occasionally wonder if off-screen objects are rendered and if it would help if they were made invisible. Of course off-screen objects are not rendered and making them invisible will do nothing apart from needlessly complicate your events. The renderer works its way up the Z order and simply skips over anything that is not on-screen, so those objects have zero rendering impact and the GPU does not even find out about them until they come on-screen. GPUs are also well-designed enough to only process the on-screen parts of an object that appears partially off-screen. So literally nothing outside the window area is ever rendered by the GPU.

Image memory

In WebGL mode, images are loaded in to memory layout-by-layout. This means at the start of a layout, images are loaded in to memory for all the objects placed in that layout. This includes all frames of all animations for all sprites appearing in the layout. Any images that were previously loaded but aren't needed on the current layout are also released. This helps keep the memory usage down, since only the current layout needs to have its images in memory (as opposed to loading the entire game in to memory on startup). An image that has been loaded in to memory is called a "texture" (since that's what the graphics APIs we use call them).

As a result, we can ignore texture loading and unloading when talking about the batching system in the renderer. It can be assumed that any textures that will be needed are already in memory.

Batch commands

The renderer starts at the bottom of the Z order and works upwards. Due to the way graphics cards work, the primitive rendering operations include things like:

  • Set texture
  • Set opacity
  • Set blend mode
  • Set shader effect
  • Draw N quads (a "quad" is just a rendered rectangle, like a single Sprite object)

There are a few other commands in there for some other rendering features, but the above five are the main commands for rendering the actual game content.

The most interesting operation is "Draw N quads". This is a single command that is capable of drawing hundreds or even thousands of objects in one go. This is the key to high-performance rendering, allowing both the CPU and GPU to work as efficiently as possible. However note that "Draw N quads" can only use the same texture, opacity, blend mode and shader effect for all N objects being rendered. If some of the objects have a different texture, opacity, blend mode or shader effect, it must be broken up in to multiple commands. A good example of that is switching texture.

Batch optimisation

Each sprite object has a texture, an opacity setting, and a blend mode. (Remember blend modes are different to WebGL shaders, and are the simple rendering modes like "Normal", "Additive", "Destination out" and so on.) For now we'll ignore shader effects. To render correctly, each sprite must set the correct texture, opacity and blend mode, then render its quad. Suppose we have 3 "SpriteA" objects in consecutive Z order, so they are rendered one after the other, and they all use the same texture, opacity and blend mode. If we naively render this, the queue of rendering commands look like this:

  1. Set texture to "SpriteA"
  2. Set opacity to 100
  3. Set blend mode to "normal"
  4. Draw 1 quad
  5. Set texture to "SpriteA"
  6. Set opacity to 100
  7. Set blend mode to "normal"
  8. Draw 1 quad
  9. Set texture to "SpriteA"
  10. Set opacity to 100
  11. Set blend mode to "normal"
  12. Draw 1 quad

There is a lot of redundancy there, setting the same thing over and over again, and it draws at most one sprite at a time. "Batching" means rendering calls are not actually executed immediately, and are kept in a queue. As rendering calls are made the batch is inspected, and if the call is redundant, it is simply eliminated. Additionally if another quad is being rendered with the same settings, it can simply increment "Draw 1 quad" to "Draw 2 quads" instead of adding a separate "Draw" command. With these optimisations, the queue of rendering commands ends up looking like this instead:

  1. Set texture to "SpriteA"
  2. Set opacity to 100
  3. Set blend mode to "normal"
  4. Draw 3 quads

This queue is built up and then actually sent off to WebGL to be rendered at the end of the frame. There's a lot less work to be sent to the GPU there, making it faster to render.

Note the renderer works best when state stays the same. If all objects in the game use "normal" blend, the "set blend mode" command will only ever appear once. If all objects use the same opacity, "set opacity" only appears once. And if all three stay the same, it can draw an entire game with a single "Draw N quads" command! However in practice it will be broken up in several places as the texture changes for different objects. A good example of this is to look at the batch of three "SpriteA" appearing under three "SpriteB", with each object using a different texture. The batch then looks like this:

  1. Set texture to "SpriteA"
  2. Set opacity to 100
  3. Set blend mode to "normal"
  4. Draw 3 quads
  5. Set texture to "SpriteB"
  6. Draw 3 quads

Since the texture changes, it is not possible to issue a single "Draw 6 quads" command. It gets more complicated if we interleave the Z order of the objects, so they appear in Z order alternating between SpriteA and SpriteB:

  1. Set texture to "SpriteA"
  2. Set opacity to 100
  3. Set blend mode to "normal"
  4. Draw 1 quads
  5. Set texture to "SpriteB"
  6. Draw 1 quads
  7. Set texture to "SpriteA"
  8. Draw 1 quads
  9. Set texture to "SpriteB"
  10. Draw 1 quads
  11. Set texture to "SpriteA"
  12. Draw 1 quads
  13. Set texture to "SpriteB"
  14. Draw 1 quads

This is obviously less efficient. But don't worry about it too much: the WebGL renderer is highly optimised, and even batches like this are very efficient to run. Usually there are other more important things to worry about for performance, and these are described in Performance Tips. And as you can see the blend mode and opacity stayed the same, so there was no need to add any more of those commands while rendering the six objects.

Optimising the batch

If you have already gone over everything in the Performance Tips article and are happy to spend more time micro-optimising your game instead of making it more fun, then you could think about optimising your game from the WebGL batch point of view. The key is to make sure wherever possible objects with the same texture, opacity and blend mode appear consecutively in the Z order. A good way to do this is simply to organise everything in to layers. In the previous example, if SpriteA and SpriteB are all on the same layer, they could easily end up in mixed Z order and end up with the long batch. However if SpriteA is only ever created on Layer 1, and SpriteB is only ever created on Layer 2, then they will always be together in Z order and the renderer can keep the same texture set while rendering each layer. Of course, this may affect the display of your game, animations might mean the objects are showing different textures anyway, and the batch can still be split up by changing blend mode or opacity. So this is not always easy to achieve.

Similarly, putting objects with the same blend mode on the same layer allows the batch to reduce the number of "set blend mode" commands. The Space Blaster example puts all objects with "additive" blend (various explosions, lasers etc.) on the same layer. This means that layer can always be rendered with a single "set blend mode", avoiding the possibility of having a batch full of commands alternating back and forth between "additive" and "normal". For the same reason, you might want to avoid giving every object a different opacity (e.g. assigning a random opacity to lots of objects).

This is probably micro-optimisation territory so should probably be considered a last resort. Even long batches which are split up very frequently are still extremely fast to render, since the GPU still does the heavy lifting of actually putting pixels on the screen. However if you are dealing with very large numbers of similar instances, you may be able to measure a performance improvement if you take in to account how the batching works. In general though, my advice would be: ignore what the batching system does, it will likely be very fast no matter what you throw at it.

WebGL shader effects

Shader effects are another part of renderer state. The batch can include commands like "Set 'Warp' shader effect", "Draw 1 quad" and then "Set no shader effect". The reason this is dealt with separately is due to the fact you can have more than one shader per object. If you add two shaders to a Sprite object, you will force the batching engine to always alternate between the shaders for each individual object. Changing shader effects also involves a lot more overhead than changing something simpler like the opacity, since effects require a fairly complicated system of rendering to temporary off-screen surfaces and then being copied to the screen, involving several more rendering commands. For this reason it's generally best not to use shader effects on multiple instances. Prefer to add the shader effect to a whole layer, then put any objects you want to have that effect on that layer. This means the overhead of processing a shader only happens once.

Export-time optimisations

During export, Construct 2 runs several optimisations on your project to reduce the download size and memory use. Of particular interest to the batching system is the automatic spritesheeting. During preview, each animation frame is a separate texture, so rendering a number of sprites showing different animation frames may break up the batch with lots of "set texture" commands. However after export, it's possible the entire animation is placed on a single spritesheet (or a few sheets), allowing those sprites to be rendered with a single (or just a few) "set texture" commands. This is one reason you may see performance improve after export.

Aside: I'd thought about adding a "number of batch commands" debugger measurement, but I think it would just be misleading, since it could radically change after export where you wouldn't be able to re-measure it.

It's possible to render the entire game with one spritesheet, but as described by the linked article, various other factors come in to play such as hardware limits, download size and memory usage. It is also unlikely to make a big difference to performance since switching texture is a very fast operation already, and the engine can already efficiently batch together objects displaying the same spritesheet.

Canvas2D rendering

Any well-written browser with hardware-accelerated canvas2d will likely be internally using a similar batch system, so the above batching principles are likely to still apply. However canvas2d is usually slower due to requiring many more calls in to the browser - at least a few per object - and each javascript-to-native-to-javascript-again transition carries a CPU overhead. On the other hand the WebGL renderer can often do the same work with far fewer calls in to the browser, particularly thanks to having a single command to "draw N quads". Our WebGL batching engine is also tightly integrated to the Construct 2 engine taking advantage of pre-computed position data from the collision engine for maximum performance. Canvas2d on the other hand will have to perform some extra calculations in rendering calls. Canvas2d also of course lacks the ability to use shader effects - but be careful not to use so many effects in WebGL mode as to negate the performance benefit!

Since performance is most critical on mobile, WebGL support is important to get best performance. When we first added WebGL support, no mobile devices supported WebGL. Luckily times have changed, and there is broad support: Chrome for Android (and Crosswalk), Blackberry 10, Tizen, Firefox OS, and very likely Windows Phone 8.1 support WebGL. Let's hope iOS 8 adds support as well!

Conclusion

I hope this blog post was interesting and shed some light on some of the inner workings of the Construct 2 engine, as well as highlighting the work we do towards using native-grade technologies in a web-based engine. Again though, this is probably not your #1 concern for performance, just an "interesting to know" part of the engine. If you're about to run to your projects and obsess over whether things in consecutive Z order are similar, I'd encourage you to check the much more important advice in the Performance Tips manual entry first. Even after that, consider working on making your game more fun instead! But perhaps with a few tweaks like organising objects on to separate layers, you might be able to squeeze out a little more performance.

Subscribe

Get emailed when there are new posts!