I just want to caution everyone about the same old issues when people talk about performance. It's a way more subtle and complex topic than most people really understand.
Issue 1 is Construct's CPU/GPU measurements are only timer based, and all modern processors have advanced power management. This means it's possible to make a benchmark and slowly increase the amount of work, and then at some point the CPU/GPU measurement suddenly drops a lot. You might think "OMG! Adding more work made it faster, WTF?" but all that happened is you created enough work for the processor to step up from slow, low-power mode to fast, high-power mode. We warn about this:
CPU measurements can be unreliable, especially when the system is largely idle. Most modern devices deliberately slow down the CPU if not fully loaded in order to save power. This means work takes longer to get done, and these measurements will misleadingly return a higher measurement, since it's based on timing how long the work takes. It will generally only be reliable in the device's maximum performance mode, i.e. under full load.
But most people still seem to ignore this and design benchmarks that run at less than full load, which means the results might be nonsense. So most of the results in this thread may well be nonsense on that basis alone.
Issue 2 is that both Construct and JavaScript are so exceptionally well-optimised that, counter-intuitively, it can make changes to the workload seem much worse than they really are. Even Construct's event blocks, with the overhead of the event system, is so fast that one benchmark we did a few years ago showed Construct events being 5x faster than GameMaker Language in VM mode, and still nearly as fast as GML when compiled to C++!
To illustrate how a faster engine produces counter-intuitive seeming results, consider workload A which takes 1 CPU cycle, and workload B which takes 2 CPU cycles. If you make a benchmark, workload B is half the speed! Some people then start giving advice like "never do B, it is slow". However imagine working in a slower engine where workload A takes 10 CPU cycles, and workload B takes 11 CPU cycles. It's the same difference, but now it's only 10% slower. People might give advice like "both A and B are fine, there's not much difference". But the absolute difference is the same.
In other words, if an engine is already insanely fast, very small changes to the workload show up as disproportionately large changes in performance. However in most cases it just doesn't matter. In the example I gave earlier, the slowest workload B is still 5x faster than the fastest workload A in the slower engine. Something taking 2 CPU cycles instead of 1 is unlikely to ever affect real-world performance of an actual project, even though you can make a benchmark showing a large percentage difference. Making that benchmark and then saying "never do B" is may well actually be giving bad advice, causing people to do contrived, inconvenient things to their projects that are entirely unnecessary. This is really just another way to re-state that optimisation is usually a waste of time.
Lastly the event system has been fine-tuned for maximum performance for over 10 years, and it's got a lot of sophisticated optimizations. Whether or not these matter and how they apply depends on what your project does. For example the 'Is overlapping' condition can use the collision cells optimization, which means if you have something like 10,000 sprites spread across a large layout and test overlap with a single sprite, it only checks nearby instances - let's say just 100 instances (which might sound a lot but is just 1% of the total). However if you put a different condition first, it has to disable that optimization. So putting some other condition first that only picks a small number of instances may in fact, perhaps counter-intuitively, be much slower, as that condition will check all 10,000 sprites. It may also have the opposite result, and be faster that way, depending on how much work the condition is versus the very small amount of work of identifying just the nearby instances, as well as how many instances are involved, and where those instances are. So you may in fact be able to measure that collision cells are slower in some contrived benchmark. That does not mean you should change the best advice of "put collision checks first" because usually collision cells is an important and highly effective optimization.
I'd also note running events once with N instances picked is almost always more efficient than running events N times with one instance picked, because the latter repeats the overhead of the event engine. So for maximum performance, avoid "for each" unless you really need it. Again, a contrived benchmark may be able to measure the opposite. That should not change the general advice.
So really as usual my performance advice is:
- Ignore performance results unless you have a real-world performance problem in your project
- If you have a performance issue, rely on performance measurements in your actual real-world project, and avoid making contrived benchmarks
- Follow our official performance advice
- I'd say 95% of the benchmarks users make do not correctly take in to account processor power management, so my general advice would be to ignore user-made benchmarks. If you really want to use benchmarks, only pay attention to properly designed ones that max out the processor, and ignore all others as probably misleading.
- Remember that even a properly-made benchmark that appears to show a significant result may still be entirely irrelevant to any real-world projects.