One of the most important optimisations in Construct 2 is the collision cell optimisation. In short Construct 2 divides up the layout in to cells and keeps objects up-to-date with the cells they are in. Then when testing collisions it can test against just the objects from the same cells, instead of across the whole layout. This keeps collision testing fast, avoiding wasting time passing over far away objects. It means collision-testing performance stays about constant regardless of the layout size and number of objects, which is essential for ambitiously scaled games.
However the renderer still "brute-forces" all instances in the layout when drawing. If you have 10,000 objects, every frame it has to check 10,000 objects to see if they are in the viewport, and only draw the ones which are. Render cells, new in Construct 2 r191, aim to aim to bring the same optimisation approach used with collisions to the renderer. However it's a much more challenging problem. Let's investigate why that is - and find out why render cells are disabled by default, requiring you to deliberately opt-in.
Render cells vs. collision cells
There are two major differences between these cases. First of all, collision testing often involves far more work. If you test for collisions between A and B when there are 1000 instances of each, there could be up to 1,000,000 collision checks - a huge amount of work even if each check is very fast. Therefore a small amount of extra work per instance to reduce that vast workload (i.e. keeping collision cells up-to-date) easily makes a great improvement. However this multiplying effect simply does not exist with rendering. It's a simple linear check of N objects, and each "is in viewport" check is also very fast. Therefore a small amount of extra work per instance can negate any performance saving. As a result it is not always going to be faster in every case, and this is the main motivation for having it off by default.
Secondly there is a tough extra requirement to collision cells: Z order must be preserved. Construct 2 uses a back-to-front renderer, starting with the bottom of the layout in Z order and over-drawing other objects on top as it works towards the top of the Z order. Collision cells don't have any particular ordering requirement: objects can be collision-checked in any order, and it simply returns a yes or no answer. Render cells must preserve the Z order of their objects so they can be rendered in the correct order - and on top of that, if multiple render cells are in the viewport, all the objects from all those cells must be combined in to a single Z-ordered list for rendering. The obvious answer is to just sort it before rendering every frame, but as we've seen even a small amount of extra work can negate the performance saving, so we need to do better.
All in all, a demanding problem with tough requirements. The best kind!
Maintaining render cells
First of all, like with collision cells the layout is split up in to viewport-sized cells. Each layer keeps its own separate grid of cells, so objects belong to render cells from their own layer, as opposed to one global grid like with collision cells. This is useful to allow better exploitation of the trade-offs of using render cells, which are covered later. Objects keep track of which of their layer's cells they are in. They can easily be in multiple cells, either by sitting across a cell border, or simply being large enough.
When objects are created, destroyed, moved, or change their Z order, their render cells must be updated to add, remove or re-order instances according to their new Z order. This is done as infrequently as possible to avoid unnecessary overhead with techniques like batching updates and lazy-sorting at the latest point possible. However because of this updating work, the optimisation works best when most objects are static, and don't move or change Z order.
Since render cells maintain a Z-ordered list, if it is in the viewport then it can pass this list straight to the renderer.
It's also worth mentioning that render cells do work with parallaxed layers. Collision cells have to be ignored when parallaxing layers, since testing collisions between instances on differently parallaxed layers mean their collision cells no longer line up. However when rendering the layer, it only needs to be concerned with its own contents, and it can correctly take in to account parallax when looking in render cells. This is definitely useful, since large static parallaxed background layers are often used for decorative effect.
Visualising render cells
Here's an image from one of the stress tests used for testing render cells. It shows an entire 6400x6400 layout with viewport-sized areas drawn as a grid with 30,000 red and blue sprites randomly positioned all over it to simulate a large level. This helps visualise which render cells objects end up in. An example viewport is drawn in as a green rectangle. You can see how just the four cells around the viewport need to be considered for rendering, and all the other cells can be skipped.
Rendering from multiple cells
What if there are multiple render cells in the viewport? This is very common, since there only needs to be a border between two cells in the viewport for it to be the case, and this will happen in any scrolling game. The image above also demonstrates this, with four cells in the viewport.
Each cell can return a list of instances in Z order. If there are four render cells in the viewport, this produces a list of four sorted lists. The renderer needs a single sorted list to draw in the right order - it can't simply render each cell one-by-one, since objects in multiple cells (such as when they cross cell borders) won't be rendered in the right Z order relative to each other, or could even be drawn twice! This might make objects in one cell incorrectly appear above or below objects from other cells, as well as breaking effects like opacity and alpha channels where rendering twice produces a different result. So the lists must be merged.
Fortunately an interesting algorithm steps in at this point. Given a list of already-sorted lists, they can be merged in to one final fully-sorted without actually having to run a sort algorithm. While joining up the lists, it's possible to run through them simultaneously picking from each list in an overall sorted order. This algorithm is implemented to quickly provide the renderer with the final Z-ordered list of instances in, or near, the viewport.
Interestingly this doesn't solve the problem of duplicates - some objects can still have multiple entries in the render list. However the list is sorted by Z order, so duplicates appear consecutively. This makes it easy for the renderer to skip them simply by ignoring any entries which are the same as the last, ensuring nothing is drawn twice.
Render list caching
The sorting work is relatively cheap, but to obtain best performance nothing should be done if it doesn't need to. If the view is scrolling along, it will likely have the same render cells in the viewport for at least several frames before it reaches the next cell along. If nothing is changing in these cells, then the merging work will produce the same list every frame. Therefore the list from the previous frame is cached, and if nothing has changed, it skips the merging and just re-uses it. So even the fast merging without sorting can be skipped!
Since some extra work is done to maintain render cells and avoid the "check all objects" scenario, it's also possible to actually end up slower if a particular project ends up needing more extra work than is saved by the optimisation. Here's an overview of the extra work for objects on layers which have render cells enabled:
- Every time an object moves, it must also update its render cells.
- Every time an object changes its Z order, its render cell will later need re-sorting.
- Rendering the objects currently in the viewport involves some extra merging work.
In exchange, the benefit is:
- The renderer can avoid even checking objects far away from the viewport. However each check is very fast, so there need to be a large number of far-away objects to make this outweigh the extra work involved.
When to use it?
Given the trade-offs above, the ideal case for render cells is when using large layers with lots of static objects. Remember unlike collision cells, this can include parallaxed layers. For example an ambitious game may involve thousands of scenery objects strewn across a huge layout. Providing few of those scenery objects ever change, then enabling render cells for the layer should save a lot of work checking whether or not thousands of far-away objects are in the viewport.
Remember render cells can be enabled or disabled for individual layers. So you can enable render layers for a "static" background layer with rarely-moving scenery objects, and leave them disabled for other layers where lots of action happens with moving objects.
When to avoid it?
Any layers with lots of changing objects - either creating, destroying, moving or changing Z order - will probably incur more extra work than is saved by the cheap viewport checks. Also single-screen games should definitely not use render cells, since there is never much off-screen that needs skipping! The game will only be burdened with pointless extra work. In these cases using render cells could actually reduce the framerate.
As ever, the key is to measure performance. If it's right for your game, you should be able to identify a measurable improvement to the framerate or CPU utilisation as measured by Construct 2. Don't turn it on assuming it will be faster - it could in fact be slower!
To help ensure good performance in large layouts, it's important that the engine does not need to do any work for each instance every frame. Render cells helps avoid per-instance work in drawing. As of r191 the engine still does per-instance work in these cases:
- If an object has any behaviors, the engine must "tick" the behavior for every instance every frame. This is entirely skipped if there are no behaviors.
- If a Sprite is animated, the engine must "tick" every instance every frame to ensure the animation advances. If there is only one animation with one frame, this is skipped.
- Events relating to objects must of course check conditions and run actions for every object, with the exception of collision checks (which take advantage of collision cells).
While work may be done in future to help reduce that further, it's worth bearing in mind. For example if the scenery objects in the previous example all had a behavior, the engine would have to tick thousands of instance's behaviors every frame, involving a substantial workload - probably enough to make render cells irrelevant. So for objects to be truly "static", they should have no behaviors, animations, or events relating to them.
Here's an example of the previous stress test running at normal scale, with a view scrolled to a placeholder player object, and render cells disabled. This means every frame it has to check all 30,000 objects to see if they are in the viewport. It's running in Chrome 39 on a Nexus 5 with Android 5.
On a high-end mobile device with not much going on, it only manages 35 FPS and is burning up most of the CPU time. (Note the CPU measurement Construct 2 makes is only a ballpark figure, since it's based on timer measurements.) There's such little spare CPU time that any other action is likely to bring down the framerate further. Let's turn on render cells and see how it compares:
We're back up to 60 FPS, and the estimated CPU time is way down. Not only is the framerate smooth again, there's also plenty of spare CPU time for other action to happen in the game without hitting the framerate.
Render cells are an advanced optimisation tool for certain classes of large-scale projects. It's not right for every game, but is a useful option when building vast layouts. Be careful to avoid using it in cases where it will actually reduce performance due to the extra work involved. However if you are aware of when the engine does per-instance work, with both collision cells and render cells it's possible for far-away objects to have truly zero overhead, allowing for vast level design with no performance impact.