perfomance r130 vs last beta

  • r130 - r153

    101445 - 65900 (almost 2 times difference)

    36865 - 24475

    635 - 531

    Why such a regression of performance?

    I now understand why the game that I converted from C2 to C3 showed itself even a little worse than on C2

    r130

    r153

    Tagged:

  • This ~40% regression with quad issue example seems to come from r152.

  • True. And I just used r152-153 for my game

    Ashley what do you think?

  • I can reproduce a performance difference in quadissueperf:

    r148: 205k

    r149: 205k

    r150: 172k

    This points to a change in r150, not r152. I filed an issue to investigate.

    I cannot explain any performance difference in fillrateperf at all. That test is completely different and is solely bottlenecked on the GPU hardware's memory bandwidth. So it shouldn't change much no matter what we do to the engine. Somehow I could measure an improvement in r150 with fillrateperf:

    r149: 2480

    r150: 2792

    This still doesn't make any sense, so my best guess is that test is actually fairly high variance and so the results should not be considered accurate. Maybe it depends on the hardware temperature and power state or something.

  • It's r152. Example: quadissueperf

    I ran before 5 times.

    r151 72-84k

    r152 47-53k

    Now i ran it again.

    r149 - 85k

    r150 - 78k

    r151 - 81k

    r152 - 52k, 51k, 49k

  • On my old dual core laptop... quadiss

    r153 - 46455

    r152 - 47695

    r151 - 67627

    r150 - 65437

    r149 - 70155

    r148 - 73726

    ...

    r130 - 57343

    I also tested the fill rate and it was ~90 regardless of version.

  • Just testing bbox on an old xeon indicates possible 2 points of regression.

    small hit at 150 then bigger hit at 152

    130 = 64529

    147 = 65289

    149 = 65537

    150 = 55340

    151 = 57345

    152 = 37358

    153 = 38661

  • So I looked in to this, and it turned out to be a really interesting/difficult issue. Part of the problem is the engine is so well optimised! Benchmarks like quadissueperf are so efficient that they are not bottlenecked on executing instructions - instead they are limited only by memory bandwidth. Therefore they cannot be optimised - or made slower - by changing the code: only the memory layout will affect it.

    In r150 we simply added a small new class for the instance's script interface for the scripting feature. Despite being small it used a bit more memory per instance, therefore requiring more memory bandwidth, therefore reducing the benchmark score. In r152 we made the class a little bit bigger, which made the problem worse. We've already seen the same thing happen in the past if you add a behavior or family to the object in the performance test - it takes a bit more memory to store that family or behavior, and the benchmark score drops a little as a result. It's really hard to avoid this.

    The good news is there are things we can do: mainly just making sure features that are not used do not allocate memory. Simply lazy-loading the script interface seems to solve this, since it means it doesn't use any memory at all unless you actually access it from script. We can also go further and make sure things like instance variables, behaviors and effects do not allocate memory unless they are actually used.

    So this should now be resolved in the next release - and hopefully we can go a little further in the next release or two after that, as there are more internal details that can be tweaked the same way.

  • Excellent! And when you're all done could you please box up your CPU and ship it to Iowa? My 8 core 4ghz machine with 32 gigs of ram still only managed 129k in r149. I want your toys.

  • Excellent! And when you're all done could you please box up your CPU and ship it to Iowa? My 8 core 4ghz machine with 32 gigs of ram still only managed 129k in r149. I want your toys.

    Heh, my office desktop has an i7-6700K 4 GHz with RAM running at 3 GHz, and I can get 344k with the latest release 😉 (that's on quadissueperf)

  • So this should now be resolved in the next release - and hopefully we can go a little further in the next release or two after that, as there are more internal details that can be tweaked the same way.

    Thanks

  • Try Construct 3

    Develop games in your browser. Powerful, performant & highly capable.

    Try Now Construct 3 users don't see these ads
  • > Excellent! And when you're all done could you please box up your CPU and ship it to Iowa? My 8 core 4ghz machine with 32 gigs of ram still only managed 129k in r149. I want your toys.

    Heh, my office desktop has an i7-6700K 4 GHz with RAM running at 3 GHz, and I can get 344k with the latest release 😉 (that's on quadissueperf)

    :P

  • Now

    110k for quad

    40k for bbox

    Nice

  • On the laptop

    153 = 38661

    154 = 49828

  • Dooood!

    Sticking with bbox

    149 = 65537

    153 = 38661

    154 = 72477 (and that was with a load of other stuff ruining in the background)

    So performance is even better than before!

    free beer for q3olegka spotting and raising it, I must admit that I thought things felt a bit sluggish recently but I figured is was me overdoing it in my project.

    and props to Ashley for knowing where to hit it with that hammer...

Jump to:
Active Users
There are 1 visitors browsing this topic (0 users and 1 guests)