Checking for visibility before collision = bad for CPU, don't try to optimize it?

Not favoritedFavorited Favorited 2 favourites
From the Asset Store
Particles support animations, collisions, effects and etc.
  • Even after looking a the runtime's source in C2 I'm having trouble coming up with consistent rules. C2 and C3 run similarly but C3 is faster.

    Anyways by default the picking system duplicates object lists a lot. Think at the top of every event block, with every iteration of a loop, and probably with some other special events. That is slow with a high amount of instances.

    But there are two major optimizations being used:

    1. When all the instances are picked then copying just sets a flag so nothing needs to be copied which is faster. That is why having the "for each" as the first condition is faster in 3 and 4.

    2. When exporting it keeps track if any of the following sub-events modify the SOL. And if they don't there's no reason to copy the SOL after that point. That's why 2 is fast. So effectively the loop would need to be the last condition in a block, and the sub events would need to do no picking to take advantage of that.

    The short version that's easier to remember is you'd want the loop to be the first or last condition in a code block. You can have non-picking events after the loop but it needs to be in a sub-event.

    The high cpu usage in 1 is from the worst case with no optimized code paths. And oddly enough number two performs just as bad if the "system compare" was directly below the loop instead of in a sub-event.

  • And oddly enough number two performs just as bad if the "system compare" was directly below the loop instead of in a sub-event.

    You are right, wow...

  • It's ridiculous that we have to do such intense testing to figure out which way to write code in C3 when, visually, every version that dop2000 wrote made sense.

    I agree there can be some small changes in resource usage but from 10% to 60% is completely insane.

    Now you can argue that "well there are a lot of objects" but if a project has 90% less objects but more code (complex UI, combat, quests, NPCs, etc) writing it in a "bad" (???) way can lead to say 1% CPU increase for no other reason than checking if Sprite.X instance var = something, versus Comparing if Sprite.X = something.

    And then we have projects that look like this:

    So imagine how that small 1% can stack up with the amount of code we had to write.

    We enjoy the easy entry in C3 but mastering it is a whole new level. And while we can test for every small thing that we need to implement, it kinda starts to waste our time when ridiculous bugs like these (10%-60%) appear.

    I remember something similar when a Tween that was disabled was still causing havoc. fedca reported it and it got fixed. Can only hope for something similar here, but then it begs the question: what other hidden CPU hogs are there in C3?

    Would be interesting to hear Ashley 's opinion on this.

  • Checking for visibility before collision wastes CPU power. Collision detection already handles visibility efficiently, so extra checks slow performance. Make sure collision systems do their job by not over-optimizing.

  • Try Construct 3

    Develop games in your browser. Powerful, performant & highly capable.

    Try Now Construct 3 users don't see these ads
  • Checking for visibility before collision wastes CPU power. Collision detection already handles visibility efficiently, so extra checks slow performance. Make sure collision systems do their job by not over-optimizing.

    Collision detection has NOTHING to do with visibility. What are you talking about?

  • It's not really about collisions or visibility, it's about picking. Replace "is visible" with any other condition, say, "is animation playing" - and the result will be the same.

  • I think I've found the Never-Ending Fountain of Knowledge. The more combinations you try which achieve the same thing, the more random results you get.

    Also a real world example of attempting to use these optimizations ended up in a 5% CPU decrease for my tooltip system! (Not 5% overall, but from the CPU profile)

  • Very insightful read, learned a lot! But also feel not 100% in tune.

    I suppose in some ways, it makes sense - when doing "Sprite inst var check", even if it's a For Each, it may be doing the typical Picking stuff, even though it's within a For Each loop and therefore only 1 is picked.

    Whereas "evaluate expression", doesn't do any picking, so using it during a For Each loop, it's taking away the task of picking, whilst still indeed cycling each sprite in order to get the correct variable for the evaluate.

    If this is true, perhaps C3 could be designed to take into account whether a For Each loop is active for an object type, therefore prevent the redundant picking and save some cpu... Although I wonder if that type of check could slightly worsen performance overall for any For Each loop, which wouldn't be ideal... Perhaps it's better to fully understand the exact mechanisms and work with it - but after a decade, never thought Evaluate would be extremely useful here.

    I seek to eek as much performance as possible in certain areas of my project, where the more performance, the more this system can be utilised (e.g. Event-based particle system, UI). Indeed a small 5% cpu gain could seem pointless in real-world scenarios, but that is a major difference for specific systems a dev might be trying to design.

    EDIT: Measured but this may be incorrect. BTW fun fact, if you set opacity to a decimal number, I recall it eating CPU far more. Makes sense in some ways, C3 gotta convert it to int, but I recall it was a measurable different in cpu. Worth noting if you make your own opacity fading systems, better to wrap your opacity result with int() instead.

  • BTW fun fact, if you set opacity to a decimal number, I recall it eating CPU far more. Makes sense in some ways, C3 gotta convert it to int, but I recall it was a measurable different in cpu. Worth noting if you make your own opacity fading systems, better to wrap your opacity result with int() instead.

    Very interesting, thanks for sharing!

  • > BTW fun fact, if you set opacity to a decimal number, I recall it eating CPU far more. Makes sense in some ways, C3 gotta convert it to int, but I recall it was a measurable different in cpu. Worth noting if you make your own opacity fading systems, better to wrap your opacity result with int() instead.

    Very interesting, thanks for sharing!

    You too quick to reply!! - I wrote my post, then went to measure just to fact-check and report back here, and I can't repro - I am certain I encountered this about a month or two ago, and kept flicking back between "int()" and not, and seeing a noticeable difference, hence why I posted this with confidence. I measured:

    1000 objects, setting opacity to decimal or not, seems to give same CPU either way. For each, or not... visible, or not... all same result.

    ... Either I'm going crazy, or it's a project-specific setting (WebGPU/Worker/Something), but hey, worth keeping in mind, if you ever wonder why your huge opacity system seems CPU-hungry; somehow this stuck to my mind after long experimenting/measuring a while ago. I'll update if I figure out what the exact scenario was in my main project.

  • Imagine having someone who really understands how things work under the hood, to explain it…

    But I guess the answer would be: "Don’t worry about that."

  • Imagine having someone who really understands how things work under the hood, to explain it…

    But I guess the answer would be: "Don’t worry about that."

    Lol it be that way occassionally, but I find it goes that way when it's random fluff and filler. I find that Ashley often posts insightful stuff - Often I am Googling a random specific action, and encounter all sorts of insights throughout old forum posts. Sometimes it's interesting to Google something vague, like "Construct 3 Wait" or "Construct 3 Tween Performance", you find lots of valuable information.

    My slight fear of a reply would be "measure" or "would this affect real world scenarios", but I feel the difference here is:

    Measuring IS helping, but it's opening a lot of "why" questions - Why "this" order of events (especially with For Each sometimes being performant when higher-level). So long as we know why we are choosing to use "evaluate expression" and such, even if it's an odd quirk, then we can adapt our thinking to work with C3, and become powerhouses in optimisation.

    "Real World" I like to hope this thread is showing it, particularly with Je Fawk, hopefully my own case, too. I spent a lot of time to eek out performance where it is needed (and I avoid micro-optimisations like "Should I use Bool or Number for my one-off tiny return function" or something). I still hope to find new methods to optimise, and indeed this thread has taught me a lot.

    "For Each", whilst I understand a lot tend to avoid this, you sorta find that you need it for specific cases, thus end up using it a lot "at the end of a list of conditions", or just before the end - Such as: an event block evaluating a bunch of stuff for a Sprite, and there comes a point where you must pick this sprite's child in a hierarchy, so you must use For Each - Totally fine, since all this "For each" is doing, is picking a child (rather than expensively evaluating the entire event block).

    Thanks to this thread, I will be hunting down those "for each" blocks, seeing if I do indeed have some subevents with a quick "Sprite Inst Var = 5", and change this to "Evaluate Expression", measure, see how much improvement is gained.

  • Another tip - Again, measure, but this one I know is true, and measured before writing this post, and have provided c3p to share.

    You can get a higher-performant For Each by using... Return functions! (Again MEASUREEEEE for your case!).

    Explaination

    Much like how you can do "Sprite n > 5, set Sprite opacity", and it applies to each Sprite without needing a "For Each", you can utilise this by using return functions, as they too run for each Sprite, without needing "For Each".

    I measured:

    1000 sprites, 500 have n = 0, 500 have n = 10

    Two test keyboard keys to hold:

    A Key = Sprite n > 5 , For Each , Pick Child , Set child opacity to 55

    S Key = Sprite n > 5 , Set Sprite dummyVar to Functions.DummyFunction(Sprite.UID)

    DummyFunction: Pick Sprite by UID , Pick Sprite Child , Set child opacity to 55

    Now, this seems wild - The "dummy return function" seems way more busy, and has to run a function, and pick the sprite via UID...

    ... But the results?:

    For Each - CPU 13% (Unlocked Frames: 900fps)

    Dummy Return - CPU 8% (Unlocked Frames: 1800fps)

    MAGIC.

    C3P below:

    drive.google.com/file/d/17CTtsNiO59uh5h8pjgY_4kTNPj6ooRQI/view

    (I would have included some clearer checks like a "Children updated" counter, but wanted it to be raw performance, but I quickly checked debugger for values being updated only for the last 500 children, and indeed it seems correct.)

    (Note: Be careful before rushing to implement this everywhere. Whilst I am using this in places and haven't had issue, I have a fear of "reaching call stack", which you may have seen when you run a function too many times in 1 tick - Perhaps this gets reached if over-using functions per tick, although I am not sure.)

  • Whaaaattt????

  • My slight fear of a reply would be "measure" or "would this affect real world scenarios", but I feel the difference here is:

    I don't think this is an unusual or fake case. The issue came up while optimizing a project I've been working on for months. Since I can't share that project publicly, I made a smaller example. The numbers may not match a typical project, but they help isolate and show the problem, problem that I'm facing now.

    I think most people will understand the need for abstraction here.

Jump to:
Active Users
There are 0 visitors browsing this topic (0 users and 0 guests)