Expressions in the C2 runtime
In the old C2 runtime, expressions were represented as an Abstract Syntax Tree (AST). This is basically a tree representation of expressions. For example
myVar + 1 will be represented as an "add" node with "myVar" and "1" as the children, which can be drawn as a diagram like this:
The expression can then be calculated by starting at the top (root) node, and doing the sum. The "add" node will calculate its left value (retrieving
myVar), calculate its right value (retrieving
1), and then add them together.
More complex equations build up a bigger tree with more nodes. For example
myVar * 2 + 1 can be represented like this:
This time when the "add" node calculates its left value, it first calculates the "multiply" node, which retrieves
2, multiplies them together, and then passes the result back to the "add" node to add
1. Note how this respects the order of operations: the addition is done last, since it's at the top. Ultimately any kind of equation can be represented this way.
For many projects, this approach works fine. It's not until you start doing complex calculations with lots of loops or function calls, or handling thousands of instances all calculating expressions individually, where the overhead of this approach starts to become significant.
The initial C3 runtime approach
We first released an alpha of the C3 runtime around a month ago in r95. In that release the C3 runtime used much the same approach as the C2 runtime - it was more or less a direct port of the C2 code.
- There are no dynamic function calls. The values on either side are accessed directly, which is faster.
+ operator actually does all sorts of things, depending on if there are numbers, strings or even objects being added, often with odd results. It's only guaranteed to do numerical addition if both sides of the addition are numbers. Construct's
+ operator only does numerical addition (the
& operator does string concatenation). Further, Construct does have dynamic typing in a few places, where it's not known in advance if you'll be adding numbers or something else. If you try to add things that aren't numbers, Construct treats
a + b as just returning
+ when both sides are definitely numbers - but in that case they then work identically.
Another significant difference is that an expression like
Sprite.X doesn't actually just return one value. If you run an action affecting 100 instances of Sprite, then an expression including
Sprite.X will be evaluated 100 times, each time returning a different Sprite's X co-ordinate. It's more like a function being called on multiple different instances. So while
The AST approach is a good place to start. It matches how the editor handles expressions, making it straightforward, and it is also easy to customise it to work how we want.
A wide range of optimisations had already been implemented for the C3 runtime, including across the entire event engine. However expression evaluation itself so far only had a minor improvement to remove the "result" objects. If the rest of the engine speeds up more than expression evaluation, then it makes expression evaluation slower relative to the rest of the engine. In other words, it becomes a more significant bottleneck.
Consequently we'd seen expression evaluation ranking as relatively more time-consuming in performance benchmarks since the release of the C3 runtime - even if it was running faster overall. For example the "bunnymark" test ends up bottlenecked on how fast it can calculate expressions for the positions of thousands of instances. Some of our event-based performance benchmarks from our original C3 runtime announcement also spend relatively more time evaluating expressions, such as the calculation to find prime numbers in the primefind benchmark.
In the Chowdren forum thread a "fake 3D" project also turned up that was bottlenecked on expression evaluation while rotating the camera. It was quite inefficiently designed, repeating the same complex calculation many times only to produce a Y offset, but it's interesting to see this kind of thing - it shows it's a bottleneck "in the wild" in real projects, and the project provides a good independently-made sample to benchmark any optimisations with. To give you an idea of the complexity, it calculates positions according to X =
pOBJ_Car.X + (Self.SliceFrame + (pOBJ_Car.ZScale * ZScale)) * cos(LayoutAngle - 90) and Y =
pOBJ_Car.Y + ((Self.SliceFrame * pOBJ_Car.ZScale) * ZScale) * sin(LayoutAngle - 90) repeatedly for hundreds of instances.
Efficient and compatible operators
First let's look at how operators like addition are compiled efficiently and 100% compatible to the old approach. The obvious solution is to use functions instead of operators. The function can then implement addition exactly like Construct does. For example
+ operator instead?
It turns out - yes, in many cases we can! Construct already has a way of evaluating expressions only looking at the type. It uses this to validate parameters, such as to ensure you put a number (or something that could be a number) for a number parameter like an X co-ordinate. However this same approach can be used to look at the expression and say: yes, both sides are definitely a number. (In this case we're assuming myVar is a number event variable.)
So in these cases - which is most of the time - when Construct can prove both sides of
+ are a number, it compiles directly to
If Construct can't prove both sides are a number, which is relatively rare, it falls back to the
Compiling system expressions
System expressions basically compile down to function calls. For example the
dt expression can compile to
getDt(). To make sure the expression method is run correctly, we bind the expression method that returns dt to the system plugin, so we don't need to do something like
system.getDt(). It provides a single standalone function to directly call and get the expression result. We even use the same approach for "single global" plugins like Function, where there is guaranteed to only ever be one instance of the plugin that expressions are ever run on. This makes sure expressions like
Function.Param() are compiled to a single function call that directly runs Function plugin code on the right instance, entirely bypassing any event code to identify the picked instance.
Interestingly, expression parameters can simply be independently compiled. For example the expression
getDistance(0, 0, getMyX(), getMyY()). Previously there was code for this to gather parameters in to an array and pass them to the expression; now this too is entirely bypassed and parameters are added directly to the call.
Math.sqrt. So rather than call a system expression function that calls
sqrt(myVar) is compiled to
Compiling object expressions
Object expressions like
Sprite.X are probably the trickiest part. The same approach is used for behavior and instance variable expressions, but for simplicity we'll only cover object expressions.
Evaluating an object expression involves knowing which instances are picked by the event, and which is the current instance the expression should be calculating for. This involves too many runtime details to split off in to a simple function like
Sprite.X + 1:
On startup the runtime calls
getExpression(runtime), and it returns a new function. Call that, and it evaluates the actual expression. It's like creating a specialised expression function on startup. The runtime can also repeatedly call it for different instances so it works correctly when actions run on multiple instances. It's still faster: again the call is static rather than dynamic, any parameters can be directly inserted, and the rest of the expression evaluates without any tree overhead.
Another interesting trick is the actual call to
spriteXNode.Get() can be tweaked depending on the circumstances. For example retrieving a family instance variable involves a small amount of extra code to look up the variable. Construct can compile different calls for each case, like
One more bonus is you might use an expression like
A real example
Let's take the "fake 3D" expression that calculates an X co-ordinate from earlier. This is what it looks like as a Construct expression:
pOBJ_Car.X + (Self.SliceFrame + (pOBJ_Car.ZScale * ZScale)) * cos(LayoutAngle - 90)
pOBJ_CarX.Get() + (selfSliceFrame.Get() + (pOBJ_CarZScale.Get() * ZScale.Get())) * Math.cos(C3.toRadians(getLayoutAngle() - 90))
- There is a single function call per object/variable expression
cos system expression was compiled directly to a
- The parameter to
So what kind of performance improvement does this bring? Quite a lot!
Next let's re-evaluate the primefind benchmark that we used in our original C3 runtime announcement blog. This is quite intensive with calculations to identify prime numbers. How many iterations can it run in 10s?
Compiling expressions improves performance by 34% again for this test. The improvement over the C2 runtime increases from 2.3x faster to over 3x faster than the C2 runtime.
Next let's re-evaluate another test from the same blog post - using the Function object to evaluate the 30th fibonacci number with a deliberately intensive algorithm. This test is timing-based, so quicker results with lower numbers are better.
Compiling expressions improves performance by 18% for this test. The improvement over the C2 runtime increases from 2.5x faster to 3x faster than the C2 runtime.
Let's go back to that fake 3D project that was slow while rotating the camera. It's good to check projects that users have made since it's a more realistic example and isn't something we came up with ourselves. We measured the framerate while rotating the camera with 3000 instances on-screen.
Compiling expressions increases the framerate by 55% - the best result yet! The C3 runtime was already 29% faster, but this boosts it to 2x faster than the C2 runtime. Since the project was inefficiently designed, it's likely it can be brought up to 60 FPS by optimising the events themselves, especially with compiled expressions.