Picking performance.

  • I'm following Ashley on twitter and especially the posts regarding the new C3 runtime, and from what I've heard it seems very promising with lots of performance improvements.

    One thing that always been bothering me a bit, especially dealing with large instance count is the cost of picking. Sometimes just picking objects can be a pretty CPU intensive task if you're dealing with large layouts and and a large amount of instances, so i created this test project test different methods of picking, and how they perform.

    https://www.dropbox.com/s/96e7y0njoqyqd ... e.c3p?dl=1

    They all basically do the same thing. Picking objects within a radius and setting the opacity. and deselecting if they are outside the radius and setting the opacity back again.

    This test demonstrates that there are good and bad ways to pick objects depending one what you're doing, and how many you need to pick.

    Picking with LOS behaviour in this case was by far the fastest in most cases (I guess it's because loops in javascript are generally faster than events) So I'm just curious if the C3 runtime has any performance improvements when it comes to basic picking and loops using events.

    Just out of curiosity.... Would be fun If Ashley could run the project using C3 runtime, and compare it with C2, to see if there's any improvements when it comes to picking and loops. <img src="{SMILIES_PATH}/icon_e_smile.gif" alt=":)" title="Smile">

    Keep up the good work. Looking forward to try out the new runtime!

  • Benchmarking is hard. You're not actually measuring picking performance in an isolated way.

    Firstly the test is set up a little weirdly - I'm not sure why you have that whole thing with the 'picked' flag going on, you don't need it and it means it's possible for some instances to be processed twice in one tick, which I think means it's not a fair test. The most obvious setup is every tick, reset the state for all instances, then in an event below that, have one single test that changes the state of picked instances.

    Secondly I profiled the first case; the main bottleneck is the distanceTo calculation (which involves a square root), so that one's basically a math benchmark.

    Thirdly, for distance comparisons like "distanceTo(a, b, c, d) < 10", it's actually quite a special case performance-wise: it comes down to a comparison like sqrt(dx*dx + dy*dy) < 10. Then if you square both sides you can test dx*dx + dy*dy < 100, which eliminates a costly square root calculation. In some cases the engine does this internally, but if you write distanceTo() in an expression it won't. So then you're comparing doing a square root to not doing a square root.

    Fourthly, LOS works completely differently: it's mainly designed to use a collision-detection algorithm. It also takes advantage of collision cells to reduce the test candidates, which none of the other options do.

    Basically this is mainly a math benchmark. It's not much faster in the C3 runtime, because math doesn't suddenly get faster there. Actually, I think it's a good sign that the engine is well optimised enough that something like whether or not you include a square root makes a measurable difference.

  • Ashley

    Thanks for the reply and the detailed explanation.

    The reason i used the "picked" flag was that it could reduce the CPU time quite a bit, before running the distance comparison. If I remove it, it would check distance to instances that already have the opacity set to 20. With the picked flag I filter out those, so they don't have to be checked again. And when you're deselecting, you don't have to check every single instance in the layout. Only those that have their opacity set to 20... You can try disable the "picked" boolean in the first example and suddenly the whole distance checking operation uses almost twice as much CPU.

    So by using booleans I can easily filter out how many that needs to be distance checked.

    Using the first test with pick by comparison.

    With the boolean I'm getting around 25% CPU usage, for the distance calculation.

    Without the boolean I'm getting about 43% CPU usage , for the distance calculation.

    Because I filtered out how many candidates that needs to be checked.

    I'm trying to figure out ways to do do things as close to as effective as the LOS behaviour as possible only using events.

    If the only reason LOS is so much faster is because the Collision cells, is that something that can be accessed by events in the future? Maybe a system condition or something similar? In the test project though, "Use Collision cells" is disabled for the LOS behaviour, and didn't make any difference if I enabled it or not. So that means math in behaviours are faster than events, or loops in behaviours are faster or maybe both?

    How would one benchmark that?

    I often tend to use very few behaviours/plugins as they are not as flexible as events. So that's why I'm curious if there is any measurable difference in loops and picking between the C2 and C3 runtime, and how one would go about to benchmark that?

    Last year I payed a developer make a simple behaviour for me (As I don't know how to code) that only does one thing. Checking Bounding Box Overlap.

    function intersectRect(r1, r2) {
      return !(r2.left > r1.right || 
               r2.right < r1.left || 
               r2.top > r1.bottom ||
               r2.bottom < r1.top);
    }[/code:33yvx5ul]
    
    I had to pay someone to do that, as it was not feasible to do that with events. It was just too heavy and CPU hungry doing the same thing with events.
    So that's why I'm curious. If events can ever get up to par with behaviours, performance wise.
  • Ashley

    Here's an example c3p file with bundled addon.

    https://www.dropbox.com/s/slu77hbw2wjvg ... p.c3p?dl=1

    Events and behaviour basically do the same thing. Just checking BBox Overlap. Behaviour is soooo much faster than doing it with events. That's why I was wondering why this is the case, and if there is any improvements in the runtime that are making events faster as well?

    <img src="{SMILIES_PATH}/icon_e_smile.gif" alt=":)" title="Smile"> Just curious because I would rather not wanna hire a plugin dev every time i need to do something simple as that just for performance improvements.

    I love working with events. Just wished they were a bit faster.

  • Interesting! I was getting a 5x performance increase for behaviour vs events

  • ,

    I read about square root being CPU costly before. After your comment I decided to compare (in C2):

    When I disable either event #3 or #4 and run the project in debug mode, there is almost no difference between two methods.

    With distance expression I'm getting 21-22 fps, and with squared values about 22-23 fps. CPU load is the same, around 97-100%.

    Is the difference really that small or am I doing something wrong?

  • ,

    I read about square root being CPU costly before. After your comment I decided to compare (in C2):

    When I disable either event #3 or #4 and run the project in debug mode, there is almost no difference between two methods.

    With distance expression I'm getting 21-22 fps, and with squared values about 22-23 fps. CPU load is the same, around 97-100%.

    Is the difference really that small or am I doing something wrong?

    I'm pretty sure math is not the problem. I think loops in behaviours and such have a lot less overhead, that's why they are performing better. Every time you pick something you have to loop through every single instance, that's why I always start with booleans. Only 1 bit of information.

    It would be interesting if someone could benchmark "is on screen" and a behavior doing the same thing. If i knew how to write plugins i would do it. There's no maths. Just checking XY positions of every instance in layout.

    Under the hood I'm not exactly sure how the Events work, I just know that they are less efficient than behaviours doing the same thing, and that's especially noticeable when you have to do it on many instances. It can be a bit problematic to optimize math. :p I doubt 5+5+5+5 is faster than 5*4 for example...

    They only thing I can control is reducing the amount of instances to "check", or do maths on, but in some cases there's no way around it. You have to check all, and that's when behaviours outperform events a lot.

  • The ^ operator calls Math.pow() which might be as slow as Math.sqrt(), I'm not sure. For best performance I just use value*value.

    If you profile a snippet of JavaScript vs. doing that in events, then yeah, events have a performance overhead. As with any higher level abstraction it tends to come with a performance cost, just like how Python tends to be slower than C. That's pretty much unavoidable, it's just how these things work. Most of the time though, it doesn't matter. It's only if you're doing something thousands of times every tick that you'll be able to measure a difference, and even then, will it actually affect a real-world game? Probably not - so I'd guess that writing a behavior to make a bunch of specific tests is not necessary. Also most of the usual suspects for performance are already highly optimised as internal code, such as collision detection and pathfinding.

    Exposing collision cells to events is an interesting idea - I think we could do something like "pick instances in rectangle area" without having to iterate all instances.

  • I tried replacing ^ with value*value, didn't notice any change in fps.

    And when I'm using actual sprite coordinates instead of random values, surprisingly distance() expression is faster!

    distance(Sprite.X, Sprite.Y, Sprite2.X, Sprite2.Y)<500

    is about 3-4 FPS faster than

    (Sprite.X-Sprite2.X)*(Sprite.X-Sprite2.X)+(Sprite.y-Sprite2.Y)*(Sprite.y-Sprite2.Y)<250000

    Anyway, I will stop worrying about using distance() in my games

  • I think we could do something like "pick instances in rectangle area" without having to iterate all instances.

    Rather than use a dummy object?

    Why not go one more and do a run time definable dummy polygon?

    Whats that, make it be like instances with indices so you can make multiples rather than use dummies all together?

    Yeah, sure go ahead. Sounds like a lot of work, but since you insist.

  • newt

    Stop trolling please, it's a bit unnecessary for you to make such sarcastic remarks towards improving c3

  • How would you reference a collision cell?

  • Try Construct 3

    Develop games in your browser. Powerful, performant & highly capable.

    Try Now Construct 3 users don't see these ads
  • Ashley not a bad idea. BUT it would also be cool if there were more behaviours like LOS, and the one I made check bounding box overlap, "Is on screen" and other commonly used conditions to reduce candidates when picking. since behaviours are that more faster it would be good if the behaviour library grew a bit with more useful stuff that you don't really wanna do with high level events.

  • Or maybe... how about a math behaviour or plugin, or something that's more flexible? Where you can pass multiple expressions to do faster per instance checks, than doing it with events?

    Events are great but a bit slow when you want to do big batches and check a LOT of instances. So quick tools to narrow down selection is good. That's why I pretty much always use LOS behaviour for my distance checks, At least the ones that have to run every tick on large amount of instances. Once the LOS behaviour has narrowed down my selection to the few objects that I need, I then run regular events on those.

    Shortlisting candidates needs to be quick. (This is where behaviours are very useful) So you don't need to run slow events on all the instances.

  • In the BBox example I linked earlier.

    Doing something like this will make the same operation A LOT faster, as the behaviour LOS can shortlist just a few nearby instances very efficiently THEN check for BBox overlap.. It's just as fast as my own behaviour that checks BBOX. Without the LOS running before the BBox event check CPU usage is about 40%. With the LOS behaviour shortlisting first (with a range set to 70px) the CPU usage is about 5% for the same BBox event checks. That really shows the power and importance of efficient shortlisting.

    Here's the link again to the BBOX test project updated with the LOS condition before the BBox events.

    https://www.dropbox.com/s/slu77hbw2wjvg ... p.c3p?dl=1

Jump to:
Active Users
There are 1 visitors browsing this topic (0 users and 1 guests)