Then we can replace the engine "Run action" function with
func, and we're back to directly calling the action method! There is not even a step in between to evaluate the parameters. So engine code can still be completely bypassed even when parameters are used, as long as they're constant.
Interestingly actions like Add 1 to Variable1 count as having both parameters constant, because Variable1 always refers to the same variable. So it can still get that parameter on startup and bind it to an action function.
The end result is that running a system or single-global condition or action with no parameters, or up to 3 constant parameters, has virtually no engine overhead.
Deduplicating bound functions
Whenever the engine binds a function, it remembers it in a cache with any parameters it was bound with. Then if the same function is being bound, it can return the same function from the cache. This eliminates making duplicate functions that do the same thing, reducing the memory usage.
Since the benefits are limited only to System and single-global plugins, there are fewer cases where it brings a measurable benefit. Still if we re-run some of the performance tests from the original blog post where this makes a difference, we can see what kind of improvement it can bring. I've got four sets of results for these measurements: the original C2 runtime, the original C3 runtime as of r95 (labelled "C3"), the C3 runtime with the expression compiler as of r101.2 (labelled "C3+"), and the latest C3 runtime with this function binding improvement as of r102 (labelled "C3++").
First up, let's re-measure how many Repeat loop iterations can be run every tick and still hit 30 FPS. This test pretty much solely measures the engine overhead, so will show up the improvement clearly.
Thanks to the reduced overhead this boosts the loop performance by +32%. The expression compiler hardly helped with this test, but the new function binding optimisation helps a lot. This brings the C3 runtime to a total of nearly 4x faster than the C2 runtime.
Next up let's re-measure the primefind test which measures the number of iterations it can run in 10 seconds.
This is a great test for showing the improvement of each round of optimisation so far. The reduced engine overhead boosts intensive loop and function performance by +22%, bringing the total improvement to 3.3x faster than the C2 runtime.
Next up let's re-measure the function call overhead when naively calculating the 30th fibonacci number.
This test also clearly shows each round of optimisation. The reduced engine overhead boosts intensive function performance by 21%, bringing the total improvement to 3.9x faster than the C2 runtime.
Other tests like bunnymark don't show much of an improvement, because they don't intensively use System or single-global plugin events, so in this case the reduced overhead doesn't help much. It's mostly loops and functions that benefit from this.