Our first blog post on performance focused on the runtime engine performance. Whilst upgrading our code for the new runtime, we also made some key improvements to Construct's behaviors. Two in particular have been significantly rearchitected: Physics and Pathfinding. Here's what we've done.
One of the major improvements of WebAssembly over asm.js is it supports memory growth with no performance overhead. With asm.js you had to choose between deoptimising performance to allow increasing memory use, or maximum performance with a fixed memory limit. We chose maximum performance with a fixed memory limit of 50mb. A downside of this is if Physics needs more than 50mb of memory for a game, it will crash as out-of-memory because it cannot increase the memory size. Another issue is if your game only needs a small amount of memory for Physics, it still has to allocate a full 50mb. The new WebAssembly Physics engine starts with just a 16mb memory allocation — that's 68% less than the asm.js version.
If the WebAssembly physics engine needs more memory, it simply allocates more memory and keeps going — and there's no performance overhead for doing that. So there's no memory limit for Physics any more! Your games can keep going as big as you like and you'll never hit a fixed memory limit.
Another benefit of WebAssembly is that the binary format is more compact, making the download size smaller. Most servers will send compressed resources, and after compressing both the asm.js and WebAssembly versions of the Box2D library, the WebAssembly version is still nearly 20% smaller.
This smaller binary format is also much quicker to parse, allowing a faster startup too. In fact some browsers can compile WebAssembly faster than it downloads, meaning there is essentially zero loading time.
WebAssembly is already supported in all major browsers, so these improvements will come to all platforms! We have entirely removed the old asm.js Physics engine from the C3 runtime, so you are guaranteed high-performance, low-overhead Physics.
Faster, multi-threaded pathfinding
The other behavior we've particularly improved is the Pathfinding behavior. This uses the A* pathfinding algorithm to find a path across a layout avoiding obstacles. It's great for making seemingly intelligent enemies that can find their way around the level without bumping in to things. However actually calculating the path is very CPU intensive, especially when used with a small grid size.
How quickly can it do that? In the new C3 runtime, individual paths can be calculated over 4x faster than in C2. This is so much faster the delay is almost unnoticable, whereas it took over half a second before.
We didn't stop there.
The pathfinding behavior in the C2 runtime uses a Web Worker to run pathfinding calculations in parallel to the game. This means that 500ms delay to calculate a path doesn't jank the game: it continues running smoothly and triggers On path found when the result is ready. However, the C2 runtime only ever creates one Web Worker. This misses the opportunity to use multiple CPU cores. Even some mobile devices have 8 CPU cores. On such systems, creating 8 Web Workers for pathfinding calculations would allow 8 paths to all be calculated in parallel. With one Web Worker, it must queue up all the paths to be calculated and run them one after the other, delaying the completion of later paths.
For the Construct 3 runtime, we created a whole new multi-core dispatching framework to allow plugins and behaviors to easily post tasks to multiple Web Workers. The runtime can create one Web Worker per CPU core, allowing maximum throughput. As a result the Pathfinding behavior can now run multiple pathfinding calculations in parallel — and the more cores, the more throughput. This essentially multiplies the throughput by the number of CPU cores.
To test the improvement, we took the previous test, created an extra 50 slightly spread out instances, and asked them all to calculate a path at the same time. The result is stunning. Testing on a quad-core system, the C3 runtime can finish the work 32x faster than the C2 runtime!
Just look how quickly results come in with the C3 runtime:
...compared to the C2 runtime:
Along with a fundamentally rearchitected runtime to boost overall performance, we've also done considerable work to redesign the Physics and Pathfinding behaviors for maximum performance. Physics is smaller, faster, uses less memory, and has no fixed memory limits. Individual pathfinding jobs are considerably faster, and when combined with multi-threaded processing, amounts to incredible throughput. Games involving large groups of objects all pathfinding around a complex level can now handle the pathfinding calculations far more efficiently — enough of a difference to raise the bar on the games you can design.
Physics in the Construct 3 runtime should now be as good as native — period. Multi-threaded code is notoriously difficult to get right in native programming languages, but we have built an easy-to-use Web Worker dispatching engine that any plugin or behavior can take advantage of, and the gains for pathfinding over the C2 runtime are enormous. This is just a small slice of the work we've done to exploit the latest web technologies to make the Construct 3 runtime even better.
Missed a previous post? Here's the blog series so far:
- Announcing the Construct 3 runtime
- New text features in the Construct 3 runtime