Branching inside a loop and performance

Get answers to all your basic programming questions. No Ogre questions, please!
User avatar
Mikachu
Gnoll
Posts: 603
Joined: Thu Jul 28, 2005 4:11 pm
Location: Nice, France
x 35

Branching inside a loop and performance

Post by Mikachu »

Hi,

I know there has been some serious discussions recently about Ogre 2.0 and how having conditions in critical loops is very bad for performance because of branch misprediction.

My question is : does a condition that always produce the same result in the loop incur a performance penalty?

For example :
is this code

Code: Select all

void doTheLoop(bool condition)
{
  for (int i=0;i<1000000;++i)
  {
     // condition is never modified inside the loop
     if (condition)
      doSomething(i);
    else doSomethingElse(i);
  }
}
much slower than this one ?

Code: Select all

void doTheLoop(bool condition)
{
if (condition)
{
  for (int i=0;i<1000000;++i)
      doSomething(i);
}
else
{
  for (int i=0;i<1000000;++i)
    doSomethingElse(i);
}
}
Same question with this code :

Code: Select all

void doTheLoop(bool condition)
{
  for (int i=0;i<1000000;++i)
  {
    // doSomethingOrSomethingElseDependingOnCondition is an inline function
      doSomethingOrSomethingElseDependingOnCondition(i, condition);
  }
}
OgreProcedural - Procedural Geometry for Ogre3D
User avatar
Zonder
Ogre Magi
Posts: 1172
Joined: Mon Aug 04, 2008 7:51 pm
Location: Manchester - England
x 76

Re: Branching inside a loop and performance

Post by Zonder »

I would write the code as in the second sample anyway to remove the comparison as that's just how I write code. As I try to avoid redundant code in a loop.

But I think you won't get a hit, if it always evaluates the same I am sure I read that in the other threads.
There are 10 types of people in the world: Those who understand binary, and those who don't...
CABAListic
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 2903
Joined: Thu Jan 18, 2007 2:48 pm
x 58

Re: Branching inside a loop and performance

Post by CABAListic »

The question is whether the compiler can actually tell that the condition is never modified within the loop. And unless the condition involves only constant expressions, it probably can't - even if the compiler could be certain that the variables involved in the condition are not changed within the loop directly, there is still the theoretical possibility of another thread changing it. So in my opinion, for the sake of correctness the compiler can't make any assumptions and will therefore probably not be able to optimise the inner condition (in most cases).

So, if the condition is not formed of constants and you really do worry about the performance, then move the condition outside the loop.
User avatar
Zonder
Ogre Magi
Posts: 1172
Joined: Mon Aug 04, 2008 7:51 pm
Location: Manchester - England
x 76

Re: Branching inside a loop and performance

Post by Zonder »

CABAListic wrote:The question is whether the compiler can actually tell that the condition is never modified within the loop. And unless the condition involves only constant expressions, it probably can't - even if the compiler could be certain that the variables involved in the condition are not changed within the loop directly, there is still the theoretical possibility of another thread changing it. So in my opinion, for the sake of correctness the compiler can't make any assumptions and will therefore probably not be able to optimise the inner condition (in most cases).

So, if the condition is not formed of constants and you really do worry about the performance, then move the condition outside the loop.
I though he was more referring more to when the condition is compiled into the loop and the processor branch prediction kicks in instead.
There are 10 types of people in the world: Those who understand binary, and those who don't...
CABAListic
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 2903
Joined: Thu Jan 18, 2007 2:48 pm
x 58

Re: Branching inside a loop and performance

Post by CABAListic »

Oh, you're right. I guess I missed the point, kind of :)
User avatar
Klaim
Old One
Posts: 2565
Joined: Sun Sep 11, 2005 1:04 am
Location: Paris, France
x 56

Re: Branching inside a loop and performance

Post by Klaim »

Actually, it's a processor problem: ARM processors are known to have no prediction unit so you have to make sure the most probable branch is the if(true) one.
On most x86 chips branch prediction is very advanced so what you need to do is to make sure variations of branch choice between each cycle are not happening often. If they are, and you have a lot of cycles, then sorting might help.
You might combine the two mindset to benefit from no-prediction cpus.

Also note that ifs are not the worse that can happen in such kind of loop. I was worried about this kind of stuff long time ago but now I see that you have to focus on rewriting this kind of loop only if it's really a bottleneck and the code already runs fine, otherwise there is too variables to understand to make the right decisions.

For example, such kind of loop can be modified by the compiler, but also by the processor. So frankly, it's very very hard to know what branch organization is best for any case.
User avatar
Mikachu
Gnoll
Posts: 603
Joined: Thu Jul 28, 2005 4:11 pm
Location: Nice, France
x 35

Re: Branching inside a loop and performance

Post by Mikachu »

I did a small test locally with (I) and (II) and ran it a few times : sometimes (II) was 5% or 10% faster, sometimes both ran at the same speed, so it's hard to reach a clear conclusion with this test alone (besides that, at most, it doesn't make a big difference on a x86).
OgreProcedural - Procedural Geometry for Ogre3D
User avatar
tod
Troll
Posts: 1394
Joined: Wed Aug 02, 2006 9:41 am
Location: Bucharest
x 94

Re: Branching inside a loop and performance

Post by tod »

Mikachu wrote:I did a small test locally with (I) and (II) and ran it a few times : sometimes (II) was 5% or 10% faster, sometimes both ran at the same speed, so it's hard to reach a clear conclusion with this test alone (besides that, at most, it doesn't make a big difference on a x86).
It's hard to determine the difference in performance this way, for such a simple example. What you should do is simply put a breakpoint at the start of the function and look at the assembly. It may be the same.
Valentin Perrelle
Halfling
Posts: 54
Joined: Thu Sep 15, 2011 4:14 pm
x 2

Re: Branching inside a loop and performance

Post by Valentin Perrelle »

As it have been already answered, there is two questions :
  1. Does the compiler know how to optimize that. Most modern compilers knows how to extract invariant expressions from a loop, without any problem when these are expressions of local variables. Thus the condition may be computed once, stored in a register, and the generated code for the loops would only have one compare and one jump instruction, which are not in a critical path and are very unlikely to slow down your loop. (don't profile an empty loop, as a real loop would have code that a pipelined processor will execute in parallel) As for the extraction of the test to put it outside the loop, it should be possible for a compiler to do it, but I don't know if some production compiler implement this transformation.
  2. Does a test in a loop imply an execution time penalty. We are a talking about missprediction, which cost much more than the few cycles needed to actually execute the code related to the test. Because misspredictions cost a lot, there are complex prediction mechanisms in x86 processors. Those are capable of detecting regular patterns. For instance, in the following code:

    Code: Select all

    for (int i = 0 ; i < 10 ; i ++) {
      if (i % 2) {
        // Do something
      }
      else {
        // Do something else
      }
    }
    (which can be rewritten to avoid the test) the test is true every two iterations. This yields the pattern "FALSE TRUE FALSE TRUE FALSE ..." which (even 10 years old) x86 processor can recognize. Thus, he will be able to predict which branch to execute, thus, for the execution of this code, he may only misspredict the two first times. Of course, if the condition of a conditional structure is constant, he will predict so and avoid any missprediction, except maybe for the first prediction.
It's very rare that one need to optimize at this level. When it is the case though, you'll probably only focus on two things:
  • Trying to avoid unpredictable jumps/calls. (This includes virtual calls) I'm talking about unpredictable jumps, not predictable ones like in your example. There are a lot of techniques to do that.
  • Trying to avoid cache misses. Here again, this is a whole subject.
These are the two main problems. There are a lot of work to deal with this problem automatically, some of them have been implemented in production compilers. However, it often requires a higher level reorganization of the code, changes in data structures and even in code architecture, and can't simply be done by the compiler. (Especially when this require inter-module transformations) That's why it may be important to identify some potential bottlenecks early. In your example, though, you know that if you needed to switch from solution 1 to solution 2, only a little amount of work would be needed, and thus there is no need to optimize before the profiling of your application says that this loop is a bottleneck.
User avatar
syedhs
Silver Sponsor
Silver Sponsor
Posts: 2703
Joined: Mon Aug 29, 2005 3:24 pm
Location: Kuala Lumpur, Malaysia
x 51

Re: Branching inside a loop and performance

Post by syedhs »

Simple solution: Just refactor the code so that for block is inside the if-then block because by then, you ensure that the code runs as fast as possible in all conditions (CPU, platform,compiler).
A willow deeply scarred, somebody's broken heart
And a washed-out dream
They follow the pattern of the wind, ya' see
Cause they got no place to be
That's why I'm starting with me
Valentin Perrelle
Halfling
Posts: 54
Joined: Thu Sep 15, 2011 4:14 pm
x 2

Re: Branching inside a loop and performance

Post by Valentin Perrelle »

syedhs wrote:Simple solution: Just refactor the code so that for block is inside the if-then block because by then, you ensure that the code runs as fast as possible in all conditions (CPU, platform,compiler).
When the code is more complicated than the example, switching to the second solution may involve code duplication, which is never a good thing.
User avatar
Mikachu
Gnoll
Posts: 603
Joined: Thu Jul 28, 2005 4:11 pm
Location: Nice, France
x 35

Re: Branching inside a loop and performance

Post by Mikachu »

Valentin Perrelle wrote:
syedhs wrote:Simple solution: Just refactor the code so that for block is inside the if-then block because by then, you ensure that the code runs as fast as possible in all conditions (CPU, platform,compiler).
When the code is more complicated than the example, switching to the second solution may involve code duplication, which is never a good thing.
You're right, I oversimplified the example, there could be several conditions, in which case doing the if-then outside the for-loop leads to combinatorial explosion.
OgreProcedural - Procedural Geometry for Ogre3D