Checking for visibility before collision = bad for CPU, don't try to optimize it?

Not favoritedFavorited Favorited 3 favourites
From the Asset Store
Particles support animations, collisions, effects and etc.
  • Ashley Could you please explain this anomaly, which R0J0hound pointed out?

    Logically, both events are identical, yet there's a huge difference in performance! Which I can also observe in Windows Task Manager and Resource Monitor.

    Is this a bug?

  • I'm pretty sure it's mentioned somewhere in the docs that one should try to avoid sub-events unless necessary (probably because there's got to be some kind of overhead) but in this case the opposite appears to be true.

    My guess:

    The slow version runs a for each loop, and then compares and sets each Sprite individually.

    In the fast version, the loop runs but doesn't do anything. The Sprites are then "collected" to bring the SOL into the sub-event, and the comparison and setting opacity is done as a single, optimized batch operation.

  • i arrive here from the discord discussion

  • ...in the case of "has tags", in one case you just compare a string which is a very simple and quick operation for a CPU. On the other hand "has tags" has to split the given string by spaces to extract individual tags, and then verify that all the provided individual tags are in the set of tags for the given instance. It's probably at least 10x as much work as just comparing a string. It's not that it's slow...It's just that you've used a feature which necessarily includes more complex steps. So if you make a benchmark that absolutely hammers that specific feature, you will probably see something like a 10x difference.

    This is a good example of understanding "under the hood" a bit more so that we can make decisions on what to use/how to structure things.

    Typical usage for any structure is going to perform great, but using "loops" or "many instances", it can be desirable to explore a different approach or understanding as many optimisation tips as possible (many small optimisations for a loop will add up).

    I too went to use tags as a form of identifying many instances within loops, did measurements, and opted not to (not really understanding why it gave higher CPU than other methods). With your insights, had I wrote a bug report about Tag performance, it would have cost my time and Scirra's time to do the bug report process to ultimately end with "this is by design".

    Without insights, I assumed tags was like a small dictionary hidden within instances and "has tags" automates a dictionary "has key" loop and picks objects; Didn't know that it always splits the string when using "has tags" (I've come to find heavy loops with "tokenat()" can be heavy, and try to use "Array Split String" outside of a loop when possible).

    I suppose this means it's ideal to not put "Has tags" in a heavy loop, and if wanting to do different things to many instances with various tags, keep it base level with a handful of "Has tags X", "Has tags Y", and each event can have For Each after this condition, if required (often not required).

  • Really insightful discussion!

    It’s amazing to see how many different ways there are to achieve the same result in Construct.

    I also really liked this old blog post, Common mis-used events and gotchas.

    Ashley: So for maximum performance, avoid "for each" unless you really need it.

    Tips like this are super valuable. I think people really need something like a Construct Cookbook or CheatSheet that shows the best practices for doing things the right way.

    Because there are so many different ways to achieve the same goal, simply taking the time to think about whether to use A or B is already very meaningful.

    — just like Jase00 said, “This is a good example of understanding ‘under the hood’ a bit more so that we can make decisions on what to use and how to structure things.”

  • The CPU profiler is great to see usage side by side, even though it might be wrong.

    When my game stutters on my M1 Mac that's when it needs optimizing, because if it has problems there, it will be worse on weaker computers.

    Hard to avoid looking at this thinking I'm testing wrong though:

    However Ashley you made some good points and provided interesting insight, thanks!

    > Ashley: So for maximum performance, avoid "for each" unless you really need it.

    Tips like this are super valuable. I think people really need something like a Construct Cookbook or CheatSheet that shows the best practices for doing things the right way.

    It's a good tip, for code that can go without it indeed. I first test without and then if it doesn't work after some changes around I sadly have to resort to For Each.

    I bet I could avoid it more often if I had more time to change things around more, but I need to balance finishing the project vs making it better and not having it published for some time.

    > ...in the case of "has tags", in one case you just compare a string which is a very simple and quick operation for a CPU. On the other hand "has tags" has to split the given string by spaces to extract individual tags, and then verify that all the provided individual tags are in the set of tags for the given instance. It's probably at least 10x as much work as just comparing a string. It's not that it's slow...It's just that you've used a feature which necessarily includes more complex steps. So if you make a benchmark that absolutely hammers that specific feature, you will probably see something like a 10x difference.

    I don't understand why it splits when it has 1 tag and no spaces. Feels like if there's a check "does it have any white spaces" it would not be so bad in performance for this scenario, but to be fair, I'd still use the instance var check.

    — just like Jase00 said, “This is a good example of understanding ‘under the hood’ a bit more so that we can make decisions on what to use and how to structure things.”

    It's what I've been saying all this time, every dev wants to understand more of what's happening under the hood so that we can avoid wasting time guess-testing.

  • The community wants to see more features added, not the removal of features! 🙏

    After the hierarchy which is amazing I stopped using any other features due to bugs, low performance, or it was stuff that I already do with my own systems.

    So more features, hmmm I don't know about that, depends who you ask.

  • So more features, hmmm I don't know about that, depends who you ask.

    Depends on the feature )

  • Even though I'm a big fan of C3, I really hate all the performance quirks it has. And you can mostly only find them by extensive testing and measurements (which with addition of all the random fluctuations and hardware dependency can be very time consuming and just lead to nothing). Well, also there is good amount of bugs too

    Ashely often says that people care too much about optimization and like "in 99% cases your game will perform great anyway, dont worry lol"

    But in C3 you can ruin optimization of the game with so many random small things, and it explained nowhere. Docs are really bad in that. There is so much hidden things that are never explained, and they matter like A LOT, both for how your game works and how it performs.

    Also events have so much overhead just by default. Only reason I was able to make my games perform good on mobile was using JS (because JS's "for..of" is literally 1000 + times faster than 'for each' in events, it's insane) and also using render debug tools like SpectorJS. Without that, it just a complete guess game and so many wasted hours

    For example, just changing 9patch settings "Edges" and "Fill" to "Tile" leads to CRAZY performance cost. With 100 9patch objects, the amount of draw operations increased from 23 to... 1213, that's a 5173% increase just from changing a few settings on the objects that doesn't seem like they matter at all. Is it mentioned anywhere? Of course not

  • Try Construct 3

    Develop games in your browser. Powerful, performant & highly capable.

    Try Now Construct 3 users don't see these ads
  • It's not feasible to document performance characteristics. Performance can vary across browsers, browser versions, operating system, operation system version, hardware, graphics driver versions, Construct versions, the project settings (e.g. WebGL vs WebGPU) and the specifics of what your project does. Anything we write may well only be true in certain specific combinations of hardware, software, settings, and project logic, and even then, what we wrote may become incorrect at any time as all those things change over time.

    As I described before, perhaps counter-intuitively, having an insanely fast engine makes relative differences seem worse than they really are. I think in many cases you're just seeing that effect. Ironically if Construct was much slower, changes would not seem to have much performance difference, and then you wouldn't have "performance quirks" - but everything would be worse overall. This is also compounded by the fact as I've described it's extremely difficult to come up with fair and accurate benchmarks that are applicable to real-world projects.

    I do believe it's true that for 99% of projects, these types of things just don't matter. Even taking your example with 9-patch: perhaps some settings mean it has to swap textures for each patch, versus rendering it all with the same texture. Changing texture may cause more draw operations (and may not, especially with WebGPU multitexturing). But so what? Going from 100 draw operations to 1000 might mean you are going from 0.5% utilisation to 5% utilisation of the overall system capability (which I must note that Construct's measurements won't accurately reflect due to the power management issue I mentioned) and so you still have absolutely loads of headroom left, and so you're wasting your time worrying about it. As I said the framerate is the ultimate measurement. If it still hits 60 FPS, it's fine.

  • I keep thinking and reflecting on all this stuff (as you'd imagine from my walls-of-text).

    It can also seem like Scirra are being closed-minded or something, but I think there's more to it.

    Everything in this topic is not relevant to most projects.

    It is mostly relevant to those who are making an ambitious project, or their project includes an ambitious system.

    An ambitious system could be desiring 500 active enemies, each behaving independently with path finding and various variables to "think" about their behaviour. A new user will hit performance issues in so many ways where other experienced users would have an easier time achieving this.

    But, if you're making a smaller game, or even a platformer with huge levels with many interactive objects, but no extreme systems, then usually this is achievable for new users, simply by doing the "intuitive" thing and throwing a layout together (and C3 works it magic with Collision Cels and such).

    Is it the optimal way to go? Maybe, maybe not. Does it matter? That's Scirra's point that is totally fair for general games. If you have an ambitious game or wish to target weak devices, then both require the will to learn/measure.

    If someone is new to Construct, Tech, or GameDev, then they will struggle to do ambitious ideas, until they learn and read and experiment.

    Take "picking" - Experienced people know the ins/outs of this, whereas newcomers, even those that used a programming language before, find "picking" to be alien/unintuitive/confusing.

    Whereas for me who has used Construct for nearly 2 decades, picking is the thing I have 0 issues with, even with the "quirks", which to me, no "quirk" in this area has prevented/stopped me/slowed me down from proceeding, we have many tools to manage picking - but when I was less experienced, you bet I was frustrated; looking back, it felt unfair to be frustrated, as the tools and documentation are provided for me to learn. It's easy for me to get stuck in my way of thinking.

    Some of this feels true with nerdError 's post:

    Even though I'm a big fan of C3, I really hate all the performance quirks it has. And you can mostly only find them by extensive testing and measurements (which with addition of all the random fluctuations and hardware dependency can be very time consuming and just lead to nothing). Well, also there is good amount of bugs too

    It feels like this implies there's a lot of performance quirks, but I disagree here.

    And "Extensive testing/measurements" - It's not like measurements take hours to achieve; New project, throw a sprite that "moves" (e.g. Sine behaviour) so FPS is not at 0, add your measurement tests (If needing to test many objects, a quick "For" loop to spawn many). Then test.

    One would only test if they've thrown something into their project and noticed lag/FPS drops/CPU spikes. Or, if someone has a system that works fine but they yearn for more.

    I do agree with the confusion being brought here with "how" to measure, but there's info here on some preferable methods. I also think that some measurements will give obvious results regardless of the CPU load and CPU speeds. Granular measurements are trickier, maybe the "I get X% CPU with 1000 objects, can I get a 5% CPU gain from another method with 1000 objects?" type of tests. But that kind of test is pointless; If you get your 5% CPU boost, then in-game when there's maybe commonly only 250 objects and rare moments of 1000, then the 250 object moments had gained maybe less than 1% CPU.

    But in C3 you can ruin optimization of the game with so many random small things, and it explained nowhere. Docs are really bad in that. There is so much hidden things that are never explained, and they matter like A LOT, both for how your game works and how it performs.

    With experience, it doesn't feel like "so many random small things". A lot of it is logical imo, and I'm sure most agree. E.G. Putting "For Each" as the top-level condition. Admittedly, I used to do this for my projects, but wasn't thinking logically; I just followed the rule of "Don't use them much", so I thought a single top-level "For Each" was ideal, but this was wrong. IIRC it is explained in the documentation for For Each.

    But I do somewhat agree with a few random small things affecting performance for an ambitious project or system; the act of just using "Evaluate expression" within a "For Each" to gain some CPU is wild. But I say "somewhat agree" because it makes logical sense; "Sprite > Compare Variable" would do picking under the hood, despite 1 Sprite to pick, and it would be a tiny insignificant thing that C3 does, but could add up if designing an ambitious system with 500 enemies with subevents/conditions below the "For Each".

    Also events have so much overhead just by default.

    It's known this is true, but again, not for general projects, just ambitious things - The great thing about this topic is finding ways to get the performance out of event blocks.

    Without that, it just a complete guess game and so many wasted hours

    Well it's not a guessing game - Measure it! I also have wasted hours in the past, especially by trying to "measure" within the main project, rather than making a blank new project.

    For example, just changing 9patch settings "Edges" and "Fill" to "Tile" leads to CRAZY performance cost. With 100 9patch objects, the amount of draw operations increased from 23 to... 1213, that's a 5173% increase just from changing a few settings on the objects that doesn't seem like they matter at all. Is it mentioned anywhere? Of course not

    I think this fair to bring up but not an overall fair example. For one, 9 Patch setting changes are brand new from about a month ago - If there's performance issues, report it, and if it's by design or impossible to resolve, often a documentation tweak occurs.

    And 2nd, Ashley highlighted, so won't repeat, but it's about thinking of the logic behind something - a 9-patch change is likely going to do something to the 9 pieces of texture it contains (except for the common Set Size and such).

  • If it still hits 60 FPS, it's fine.

    I think we should have higher standards than that. Most phones nowadays run at 120 fps. I know the average joe probably can't tell the difference, but I can :)

  • > So more features, hmmm I don't know about that, depends who you ask.

    Depends on the feature

    There's hundreds of features the community wants to see and is waiting paitently for:

    github.com/Scirra/Construct-feature-requests/issues

  • Even taking your example with 9-patch: perhaps some settings mean it has to swap textures for each patch, versus rendering it all with the same texture. Changing texture may cause more draw operations (and may not, especially with WebGPU multitexturing). But so what? Going from 100 draw operations to 1000 might mean you are going from 0.5% utilisation to 5% utilisation of the overall system capability

    Sorry for being to aggressive. That's just all the hours i spent on this make me a little emotional.

    Those 1000 draw operations can really matter on low devices, especially mobile. I had both lowend mobile and laptop to test, and decreasing amount of draw operations from 1500+ to ~500 is exactly what made games hit comfortable fps on both of them, because of the GPU bottlenecks. And again, I wasn't building some crazy ambitions game, it was just a clicker game with somewhat good visuals. Construct 3 is mainly used now to make web+mobile games, and that's where low ends devices are the main target basically. So all this is pretty important for sure

    And in terms of documentation, i understand all the complications. My only wish is that internals of the C3 were explained better. Like how renderer works, how it optimizes things and such. Even just in broad terms. And even if some of it may become deprecated, it's still a hint and better than completely nothing

    Maybe all of this is obvious for people who work with graphics, but for people like me who don't, it's not. Again i was only able to kinda understand the basics only thanks to spending hours analyzing Spector.JS reports. And yes it made my projects much more performant

    Like for me even just looking at draw operations count helped a lot. It's worked out as much more reliable performance indicator than CPU usage (and especially non-existent GPU usage). Even though i understand it's not that simple, but it good enough.

    For me it would be already a epic upgrade to the debugger to have more information about the render process, memory usage and such things

  • I think this fair to bring up but not an overall fair example. For one, 9 Patch setting changes are brand new from about a month ago - If there's performance issues, report it, and if it's by design or impossible to resolve, often a documentation tweak occurs.

    Things I mentioned are there from creation of 9patch plugin, and it worked like this forever. So no it's not related to the recent updates and totally fair in my OP

Jump to:
Active Users
There are 0 visitors browsing this topic (0 users and 0 guests)