Surprisingly different performance of simple C# program

Below is a simple program that with a small change, makes a significant performance impact and I don't understand why.

What the program does is not really relevant, but it calculates PI in a very convoluted way by counting collisions between two object of different mass and a wall. What I noticed as I was changing the code around was a quite large variance in performance.

The rows in question are the commented ones which are mathematically equivalent. Using the slow version makes the entire program take roughly twice as long as using the fast version.

int iterations = 0;



for (int i = 4; i < 9; i++)

{

    Stopwatch s = Stopwatch.StartNew();



    double ms = 1.0;

    double mL = Math.Pow(100.0, i);

    double uL = 1.0;

    double us = 0.0;

    double msmLInv = 1d / (ms + mL);



    long collisions = 0;

    while (!(uL < 0 && us <= 0 && uL <= us))

    {

        Debug.Assert(++iterations > 0);

        ++collisions;



        double vs = (2 * mL * uL + us * (ms - mL)) * msmLInv;



        //double vL = (2 * ms * us - uL * (ms - mL)) * msmLInv; //fast

        double vL = uL + (us - vs) / mL; //slow





        Debug.Assert(Math.Abs(((2 * ms * us - uL * (ms - mL)) * msmLInv) - (uL + (us - vs) / mL)) < 0.001d); //checks equality between fast and slow

        if (vs > 0)

        {

            ++collisions;

            vs = -vs;

        }



        us = vs;

        uL = vL;

    }



    s.Stop();





    Debug.Assert(collisions.ToString() == "314159265359".Substring(0, i + 1)); //check the correctness

    Console.WriteLine($"i: {i}, T: {s.ElapsedMilliseconds / 1000f}, PI: {collisions}");

}



Debug.Assert(iterations == 174531180); //check that we dont skip loops



Console.Write("Waiting...");

Console.ReadKey();

My intuition says that because the fast version has 7 operations compared to 4 operations of the slow one, the slow one should be faster, but it is not.

I disassembled the program using .NET Reflector which shows that they are mostly equal, as expected, except for the part shown below. The code before and after an identical

//slow

ldloc.s uL

ldloc.2 

ldloc.s us

ldloc.s vs

sub 

mul 

ldloc.3 

div 

add

//fast

ldc.r8 2

ldloc.2 

mul 

ldloc.s us

mul 

ldloc.s uL

ldloc.2 

ldloc.3 

sub 

mul 

sub 

ldloc.2 

ldloc.3 

add 

div

This also shows that more code is executing with the fast version which also would lead me to expect it to be slower.

The only guess I have right now is that the slow version causes more cache misses, but I don't know how to measure that (a guide would be welcome). Other than that I am at a loss.

EDIT 1.
As per the request of @EricLippert here is the disassembly from the JIT for the inner while loop where the difference is.

EDIT 2.
Solved how to break in the release program and updated the disassembly so now there seems to be some difference. I got these results by running the release version, stopping the program in the same function with a ReadKey, attaching the debugger, making the program continue execution, breaking on the next row, going into disassembly window (ctrl+alt+d)

EDIT 3.
Change the code to an updated example base on all the suggestions.

//slow

    78: 

    79:                     vs = (2 * mL * uL + us * (ms - mL)) / (ms + mL);

00C10530  call        CA9AD013  

00C10535  fdiv        st,st(3)  

00C10537  faddp       st(2),st  

    80: 

    81:                     //double vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

    82:                     double vL = uL + ms * (us - vs) / mL; //slow

00C10539  fldz  

00C1053B  fcomip      st,st(1)  

00C1053D  jp          00C10549  

00C1053F  jae         00C10549  

00C10541  add         ebx,1  

00C10544  adc         edi,0  

00C10547  fchs  

00C10549  fld         st(1)  

    73: 

    74:                 while (!(uL < 0 && us <= 0 && uL <= us))

00C1054B  fldz  

00C1054D  fcomip      st,st(3)  

00C1054F  fstp        st(2)  

00C10551  jp          00C10508  

00C10553  jbe         00C10508  

00C10555  fldz  

00C10557  fcomip      st,st(1)  

00C10559  jp          00C10508  

00C1055B  jb          00C10508  

00C1055D  fxch        st(1)  

00C1055F  fcomi       st,st(1)  

00C10561  jnp         00C10567  

00C10563  fxch        st(1)  

00C10565  jmp         00C10508  

00C10567  jbe         00C1056D  

00C10569  fxch        st(1)  

00C1056B  jmp         00C10508  

00C1056D  fstp        st(1)  

00C1056F  fstp        st(0)  

00C10571  fstp        st(0)  

    92:                 }

    93: 

    94:                 s.Stop();

00C10573  mov         ecx,esi  

00C10575  call        71880260  

    95: 

    96:                 Console.WriteLine($"i: {i}, T: {s.ElapsedMilliseconds / 1000f}, PI: {collisions}");

00C1057A  mov         ecx,725B0994h  

00C1057F  call        00B930C8  

00C10584  mov         edx,eax  

00C10586  mov         eax,dword ptr [ebp-14h]  

00C10589  mov         dword ptr [edx+4],eax  

00C1058C  mov         dword ptr [ebp-34h],edx  

00C1058F  mov         ecx,725F3778h  

00C10594  call        00B930C8  

00C10599  mov         dword ptr [ebp-38h],eax  

00C1059C  mov         ecx,725F2C10h  

00C105A1  call        00B930C8  

00C105A6  mov         dword ptr [ebp-3Ch],eax  

00C105A9  mov         ecx,esi  

00C105AB  call        71835820  

00C105B0  push        edx  

00C105B1  push        eax  

00C105B2  push        0  

00C105B4  push        2710h  

00C105B9  call        736071A0  

00C105BE  mov         dword ptr [ebp-48h],eax  

00C105C1  mov         dword ptr [ebp-44h],edx  

00C105C4  fild        qword ptr [ebp-48h]  

00C105C7  fstp        dword ptr [ebp-40h]  

00C105CA  fld         dword ptr [ebp-40h]  

00C105CD  fdiv        dword ptr ds:[0C10678h]  

00C105D3  mov         eax,dword ptr [ebp-38h]  

00C105D6  fstp        dword ptr [eax+4]  

00C105D9  mov         edx,dword ptr [ebp-38h]  

00C105DC  mov         eax,dword ptr [ebp-3Ch]  

00C105DF  mov         dword ptr [eax+4],ebx  

00C105E2  mov         dword ptr [eax+8],edi  

00C105E5  mov         esi,dword ptr [ebp-3Ch]  

00C105E8  lea         edi,[ebp-30h]  

00C105EB  xorps       xmm0,xmm0  

00C105EE  movq        mmword ptr [edi],xmm0  

00C105F2  movq        mmword ptr [edi+8],xmm0  

00C105F7  push        edx  

00C105F8  push        esi  

00C105F9  lea         ecx,[ebp-30h]  

00C105FC  mov         edx,dword ptr [ebp-34h]  

00C105FF  call        724A2ED4  

00C10604  lea         eax,[ebp-30h]  

00C10607  push        dword ptr [eax+0Ch]  

00C1060A  push        dword ptr [eax+8]  

00C1060D  push        dword ptr [eax+4]  

00C10610  push        dword ptr [eax]  

00C10612  mov         edx,dword ptr ds:[3832310h]  

00C10618  xor         ecx,ecx  

00C1061A  call        72497A00  

00C1061F  mov         ecx,eax  

00C10621  call        72571934  

    61:             for (int i = 4; i < 9; i++)

00C10626  inc         dword ptr [ebp-14h]  

00C10629  cmp         dword ptr [ebp-14h],9  

00C1062D  jl          00C10496  

    97:             }

    98: 

    99:             Console.WriteLine(loops);

00C10633  mov         ecx,dword ptr [ebp-10h]  

00C10636  call        72C583FC  

   100:             Console.Write("Waiting...");

00C1063B  mov         ecx,dword ptr ds:[3832314h]  

00C10641  call        724C67F0  

00C10646  lea         ecx,[ebp-20h]  

00C10649  xor         edx,edx  

00C1064B  call        72C57984  

00C10650  lea         esp,[ebp-0Ch]  

00C10653  pop         ebx  

00C10654  pop         esi  

00C10655  pop         edi  

00C10656  pop         ebp  

00C10657  ret

//fast

 80: 

    81:                     double vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

02FD0550  or          al,83h  

    80: 

    81:                     double vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

02FD0552  ret  

02FD0553  add         dword ptr [ebx-3626FF29h],eax  

02FD0559  fchs  

02FD055B  fxch        st(1)  

02FD055D  fld         st(0)  

    73: 

    74:                 while (!(uL < 0 && us <= 0 && uL <= us))

02FD055F  fldz  

02FD0561  fcomip      st,st(2)  

02FD0563  fstp        st(1)  

02FD0565  jnp         02FD056B  

02FD0567  fxch        st(1)  

02FD0569  jmp         02FD050B  

02FD056B  ja          02FD0571  

02FD056D  fxch        st(1)  

02FD056F  jmp         02FD050B  

02FD0571  fldz  

02FD0573  fcomip      st,st(2)  

02FD0575  jnp         02FD057B  

02FD0577  fxch        st(1)  

02FD0579  jmp         02FD050B  

02FD057B  jae         02FD0581  

02FD057D  fxch        st(1)  

02FD057F  jmp         02FD050B  

02FD0581  fcomi       st,st(1)  

02FD0583  jnp         02FD0589  

02FD0585  fxch        st(1)  

02FD0587  jmp         02FD050B  

02FD0589  jbe         02FD0592  

02FD058B  fxch        st(1)  

02FD058D  jmp         02FD050B  

02FD0592  fstp        st(1)  

02FD0594  fstp        st(0)  

    92:                 }

    93: 

    94:                 s.Stop();

02FD0596  mov         ecx,esi  

02FD0598  call        71880260  

    95: 

    96:                 Console.WriteLine($"i: {i}, T: {s.ElapsedMilliseconds / 1000f}, PI: {collisions}");

02FD059D  mov         ecx,725B0994h  

02FD05A2  call        013830C8  

02FD05A7  mov         edx,eax  

02FD05A9  mov         eax,dword ptr [ebp-14h]  

02FD05AC  mov         dword ptr [edx+4],eax  

02FD05AF  mov         dword ptr [ebp-3Ch],edx  

02FD05B2  mov         ecx,725F3778h  

02FD05B7  call        013830C8  

02FD05BC  mov         dword ptr [ebp-40h],eax  

02FD05BF  mov         ecx,725F2C10h  

02FD05C4  call        013830C8  

02FD05C9  mov         dword ptr [ebp-44h],eax  

02FD05CC  mov         ecx,esi  

02FD05CE  call        71835820  

02FD05D3  push        edx  

02FD05D4  push        eax  

02FD05D5  push        0  

02FD05D7  push        2710h  

02FD05DC  call        736071A0  

02FD05E1  mov         dword ptr [ebp-50h],eax  

02FD05E4  mov         dword ptr [ebp-4Ch],edx  

02FD05E7  fild        qword ptr [ebp-50h]  

02FD05EA  fstp        dword ptr [ebp-48h]  

02FD05ED  fld         dword ptr [ebp-48h]  

02FD05F0  fdiv        dword ptr ds:[2FD06A8h]  

02FD05F6  mov         eax,dword ptr [ebp-40h]  

02FD05F9  fstp        dword ptr [eax+4]  

02FD05FC  mov         edx,dword ptr [ebp-40h]  

02FD05FF  mov         eax,dword ptr [ebp-44h]  

02FD0602  mov         dword ptr [eax+4],ebx  

02FD0605  mov         dword ptr [eax+8],edi  

02FD0608  mov         esi,dword ptr [ebp-44h]  

02FD060B  lea         edi,[ebp-38h]  

02FD060E  xorps       xmm0,xmm0  

02FD0611  movq        mmword ptr [edi],xmm0  

02FD0615  movq        mmword ptr [edi+8],xmm0  

02FD061A  push        edx  

02FD061B  push        esi  

02FD061C  lea         ecx,[ebp-38h]  

02FD061F  mov         edx,dword ptr [ebp-3Ch]  

02FD0622  call        724A2ED4  

02FD0627  lea         eax,[ebp-38h]  

02FD062A  push        dword ptr [eax+0Ch]  

02FD062D  push        dword ptr [eax+8]  

02FD0630  push        dword ptr [eax+4]  

02FD0633  push        dword ptr [eax]  

02FD0635  mov         edx,dword ptr ds:[4142310h]  

02FD063B  xor         ecx,ecx  

02FD063D  call        72497A00  

02FD0642  mov         ecx,eax  

02FD0644  call        72571934  

    61:             for (int i = 4; i < 9; i++)

02FD0649  inc         dword ptr [ebp-14h]  

02FD064C  cmp         dword ptr [ebp-14h],9  

02FD0650  jl          02FD0496  

    97:             }

    98: 

    99:             Console.WriteLine(loops);

02FD0656  mov         ecx,dword ptr [ebp-10h]  

02FD0659  call        72C583FC  

   100:             Console.Write("Waiting...");

02FD065E  mov         ecx,dword ptr ds:[4142314h]  

02FD0664  call        724C67F0  

02FD0669  lea         ecx,[ebp-28h]  

02FD066C  xor         edx,edx  

02FD066E  call        72C57984  

02FD0673  lea         esp,[ebp-0Ch]  

02FD0676  pop         ebx  

02FD0677  pop         esi  

02FD0678  pop         edi  

02FD0679  pop         ebp  

02FD067A  ret

edited Jan 19 at 11:51

AndreyAkinshin

9,6782378133

asked Jan 14 at 20:29

Imapler

1,079821

3

I didn't check this in detail, but it all looks somewhat complicated, also the condition in the loop header. Are you entirely sure that there isn't a minor logical error causing the inner loop to run a different amount of times? You could insert a dumb counter variable in the inner loop an count the number of runs before/after just to be safe.

– DanDan
Jan 14 at 20:36

1

calculates PI in a very convoluted way by counting collisions between two object of different mass and a wall - youtube.com/watch?v=HEfHFsfGXjs?

– GSerg
Jan 14 at 20:41

3

This is a very odd result! I cannot see any way that it can be cache misses. A better guess would be register scheduling in the FP registers. Are you doing your measurements in a release build and not in the debugger? Measuring performance in the debugger can cause register scheduling to be deoptimized.

– Eric Lippert
Jan 14 at 21:06

1

My best guess is register scheduling by the jitter then; it would be interesting to look at the generated code for both versions and see how they differ.

– Eric Lippert
Jan 14 at 21:19

2

It is a little tricky to get the generated code. Usually what I do in these situations is start the program outside of the debugger, but put a "readline" or some such thing that makes it pause just as it is about to finish; it has to be after all the code has jitted. Then while the program is paused, attach a debugger and force execution to step back into the method, and then look at the disassembly in the debugger. That way you definitely see the method as it was jitted without the debugger attached.

– Eric Lippert
Jan 14 at 21:25

|
show 5 more comments

Below is a simple program that with a small change, makes a significant performance impact and I don't understand why.

The rows in question are the commented ones which are mathematically equivalent. Using the slow version makes the entire program take roughly twice as long as using the fast version.

int iterations = 0;



for (int i = 4; i < 9; i++)

{

    Stopwatch s = Stopwatch.StartNew();



    double ms = 1.0;

    double mL = Math.Pow(100.0, i);

    double uL = 1.0;

    double us = 0.0;

    double msmLInv = 1d / (ms + mL);



    long collisions = 0;

    while (!(uL < 0 && us <= 0 && uL <= us))

    {

        Debug.Assert(++iterations > 0);

        ++collisions;



        double vs = (2 * mL * uL + us * (ms - mL)) * msmLInv;



        //double vL = (2 * ms * us - uL * (ms - mL)) * msmLInv; //fast

        double vL = uL + (us - vs) / mL; //slow





        Debug.Assert(Math.Abs(((2 * ms * us - uL * (ms - mL)) * msmLInv) - (uL + (us - vs) / mL)) < 0.001d); //checks equality between fast and slow

        if (vs > 0)

        {

            ++collisions;

            vs = -vs;

        }



        us = vs;

        uL = vL;

    }



    s.Stop();





    Debug.Assert(collisions.ToString() == "314159265359".Substring(0, i + 1)); //check the correctness

    Console.WriteLine($"i: {i}, T: {s.ElapsedMilliseconds / 1000f}, PI: {collisions}");

}



Debug.Assert(iterations == 174531180); //check that we dont skip loops



Console.Write("Waiting...");

Console.ReadKey();

My intuition says that because the fast version has 7 operations compared to 4 operations of the slow one, the slow one should be faster, but it is not.

I disassembled the program using .NET Reflector which shows that they are mostly equal, as expected, except for the part shown below. The code before and after an identical

//slow

ldloc.s uL

ldloc.2 

ldloc.s us

ldloc.s vs

sub 

mul 

ldloc.3 

div 

add

//fast

ldc.r8 2

ldloc.2 

mul 

ldloc.s us

mul 

ldloc.s uL

ldloc.2 

ldloc.3 

sub 

mul 

sub 

ldloc.2 

ldloc.3 

add 

div

This also shows that more code is executing with the fast version which also would lead me to expect it to be slower.

The only guess I have right now is that the slow version causes more cache misses, but I don't know how to measure that (a guide would be welcome). Other than that I am at a loss.

EDIT 1.
As per the request of @EricLippert here is the disassembly from the JIT for the inner while loop where the difference is.

EDIT 3.
Change the code to an updated example base on all the suggestions.

//slow

    78: 

    79:                     vs = (2 * mL * uL + us * (ms - mL)) / (ms + mL);

00C10530  call        CA9AD013  

00C10535  fdiv        st,st(3)  

00C10537  faddp       st(2),st  

    80: 

    81:                     //double vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

    82:                     double vL = uL + ms * (us - vs) / mL; //slow

00C10539  fldz  

00C1053B  fcomip      st,st(1)  

00C1053D  jp          00C10549  

00C1053F  jae         00C10549  

00C10541  add         ebx,1  

00C10544  adc         edi,0  

00C10547  fchs  

00C10549  fld         st(1)  

    73: 

    74:                 while (!(uL < 0 && us <= 0 && uL <= us))

00C1054B  fldz  

00C1054D  fcomip      st,st(3)  

00C1054F  fstp        st(2)  

00C10551  jp          00C10508  

00C10553  jbe         00C10508  

00C10555  fldz  

00C10557  fcomip      st,st(1)  

00C10559  jp          00C10508  

00C1055B  jb          00C10508  

00C1055D  fxch        st(1)  

00C1055F  fcomi       st,st(1)  

00C10561  jnp         00C10567  

00C10563  fxch        st(1)  

00C10565  jmp         00C10508  

00C10567  jbe         00C1056D  

00C10569  fxch        st(1)  

00C1056B  jmp         00C10508  

00C1056D  fstp        st(1)  

00C1056F  fstp        st(0)  

00C10571  fstp        st(0)  

    92:                 }

    93: 

    94:                 s.Stop();

00C10573  mov         ecx,esi  

00C10575  call        71880260  

    95: 

    96:                 Console.WriteLine($"i: {i}, T: {s.ElapsedMilliseconds / 1000f}, PI: {collisions}");

00C1057A  mov         ecx,725B0994h  

00C1057F  call        00B930C8  

00C10584  mov         edx,eax  

00C10586  mov         eax,dword ptr [ebp-14h]  

00C10589  mov         dword ptr [edx+4],eax  

00C1058C  mov         dword ptr [ebp-34h],edx  

00C1058F  mov         ecx,725F3778h  

00C10594  call        00B930C8  

00C10599  mov         dword ptr [ebp-38h],eax  

00C1059C  mov         ecx,725F2C10h  

00C105A1  call        00B930C8  

00C105A6  mov         dword ptr [ebp-3Ch],eax  

00C105A9  mov         ecx,esi  

00C105AB  call        71835820  

00C105B0  push        edx  

00C105B1  push        eax  

00C105B2  push        0  

00C105B4  push        2710h  

00C105B9  call        736071A0  

00C105BE  mov         dword ptr [ebp-48h],eax  

00C105C1  mov         dword ptr [ebp-44h],edx  

00C105C4  fild        qword ptr [ebp-48h]  

00C105C7  fstp        dword ptr [ebp-40h]  

00C105CA  fld         dword ptr [ebp-40h]  

00C105CD  fdiv        dword ptr ds:[0C10678h]  

00C105D3  mov         eax,dword ptr [ebp-38h]  

00C105D6  fstp        dword ptr [eax+4]  

00C105D9  mov         edx,dword ptr [ebp-38h]  

00C105DC  mov         eax,dword ptr [ebp-3Ch]  

00C105DF  mov         dword ptr [eax+4],ebx  

00C105E2  mov         dword ptr [eax+8],edi  

00C105E5  mov         esi,dword ptr [ebp-3Ch]  

00C105E8  lea         edi,[ebp-30h]  

00C105EB  xorps       xmm0,xmm0  

00C105EE  movq        mmword ptr [edi],xmm0  

00C105F2  movq        mmword ptr [edi+8],xmm0  

00C105F7  push        edx  

00C105F8  push        esi  

00C105F9  lea         ecx,[ebp-30h]  

00C105FC  mov         edx,dword ptr [ebp-34h]  

00C105FF  call        724A2ED4  

00C10604  lea         eax,[ebp-30h]  

00C10607  push        dword ptr [eax+0Ch]  

00C1060A  push        dword ptr [eax+8]  

00C1060D  push        dword ptr [eax+4]  

00C10610  push        dword ptr [eax]  

00C10612  mov         edx,dword ptr ds:[3832310h]  

00C10618  xor         ecx,ecx  

00C1061A  call        72497A00  

00C1061F  mov         ecx,eax  

00C10621  call        72571934  

    61:             for (int i = 4; i < 9; i++)

00C10626  inc         dword ptr [ebp-14h]  

00C10629  cmp         dword ptr [ebp-14h],9  

00C1062D  jl          00C10496  

    97:             }

    98: 

    99:             Console.WriteLine(loops);

00C10633  mov         ecx,dword ptr [ebp-10h]  

00C10636  call        72C583FC  

   100:             Console.Write("Waiting...");

00C1063B  mov         ecx,dword ptr ds:[3832314h]  

00C10641  call        724C67F0  

00C10646  lea         ecx,[ebp-20h]  

00C10649  xor         edx,edx  

00C1064B  call        72C57984  

00C10650  lea         esp,[ebp-0Ch]  

00C10653  pop         ebx  

00C10654  pop         esi  

00C10655  pop         edi  

00C10656  pop         ebp  

00C10657  ret

//fast

 80: 

    81:                     double vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

02FD0550  or          al,83h  

    80: 

    81:                     double vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

02FD0552  ret  

02FD0553  add         dword ptr [ebx-3626FF29h],eax  

02FD0559  fchs  

02FD055B  fxch        st(1)  

02FD055D  fld         st(0)  

    73: 

    74:                 while (!(uL < 0 && us <= 0 && uL <= us))

02FD055F  fldz  

02FD0561  fcomip      st,st(2)  

02FD0563  fstp        st(1)  

02FD0565  jnp         02FD056B  

02FD0567  fxch        st(1)  

02FD0569  jmp         02FD050B  

02FD056B  ja          02FD0571  

02FD056D  fxch        st(1)  

02FD056F  jmp         02FD050B  

02FD0571  fldz  

02FD0573  fcomip      st,st(2)  

02FD0575  jnp         02FD057B  

02FD0577  fxch        st(1)  

02FD0579  jmp         02FD050B  

02FD057B  jae         02FD0581  

02FD057D  fxch        st(1)  

02FD057F  jmp         02FD050B  

02FD0581  fcomi       st,st(1)  

02FD0583  jnp         02FD0589  

02FD0585  fxch        st(1)  

02FD0587  jmp         02FD050B  

02FD0589  jbe         02FD0592  

02FD058B  fxch        st(1)  

02FD058D  jmp         02FD050B  

02FD0592  fstp        st(1)  

02FD0594  fstp        st(0)  

    92:                 }

    93: 

    94:                 s.Stop();

02FD0596  mov         ecx,esi  

02FD0598  call        71880260  

    95: 

    96:                 Console.WriteLine($"i: {i}, T: {s.ElapsedMilliseconds / 1000f}, PI: {collisions}");

02FD059D  mov         ecx,725B0994h  

02FD05A2  call        013830C8  

02FD05A7  mov         edx,eax  

02FD05A9  mov         eax,dword ptr [ebp-14h]  

02FD05AC  mov         dword ptr [edx+4],eax  

02FD05AF  mov         dword ptr [ebp-3Ch],edx  

02FD05B2  mov         ecx,725F3778h  

02FD05B7  call        013830C8  

02FD05BC  mov         dword ptr [ebp-40h],eax  

02FD05BF  mov         ecx,725F2C10h  

02FD05C4  call        013830C8  

02FD05C9  mov         dword ptr [ebp-44h],eax  

02FD05CC  mov         ecx,esi  

02FD05CE  call        71835820  

02FD05D3  push        edx  

02FD05D4  push        eax  

02FD05D5  push        0  

02FD05D7  push        2710h  

02FD05DC  call        736071A0  

02FD05E1  mov         dword ptr [ebp-50h],eax  

02FD05E4  mov         dword ptr [ebp-4Ch],edx  

02FD05E7  fild        qword ptr [ebp-50h]  

02FD05EA  fstp        dword ptr [ebp-48h]  

02FD05ED  fld         dword ptr [ebp-48h]  

02FD05F0  fdiv        dword ptr ds:[2FD06A8h]  

02FD05F6  mov         eax,dword ptr [ebp-40h]  

02FD05F9  fstp        dword ptr [eax+4]  

02FD05FC  mov         edx,dword ptr [ebp-40h]  

02FD05FF  mov         eax,dword ptr [ebp-44h]  

02FD0602  mov         dword ptr [eax+4],ebx  

02FD0605  mov         dword ptr [eax+8],edi  

02FD0608  mov         esi,dword ptr [ebp-44h]  

02FD060B  lea         edi,[ebp-38h]  

02FD060E  xorps       xmm0,xmm0  

02FD0611  movq        mmword ptr [edi],xmm0  

02FD0615  movq        mmword ptr [edi+8],xmm0  

02FD061A  push        edx  

02FD061B  push        esi  

02FD061C  lea         ecx,[ebp-38h]  

02FD061F  mov         edx,dword ptr [ebp-3Ch]  

02FD0622  call        724A2ED4  

02FD0627  lea         eax,[ebp-38h]  

02FD062A  push        dword ptr [eax+0Ch]  

02FD062D  push        dword ptr [eax+8]  

02FD0630  push        dword ptr [eax+4]  

02FD0633  push        dword ptr [eax]  

02FD0635  mov         edx,dword ptr ds:[4142310h]  

02FD063B  xor         ecx,ecx  

02FD063D  call        72497A00  

02FD0642  mov         ecx,eax  

02FD0644  call        72571934  

    61:             for (int i = 4; i < 9; i++)

02FD0649  inc         dword ptr [ebp-14h]  

02FD064C  cmp         dword ptr [ebp-14h],9  

02FD0650  jl          02FD0496  

    97:             }

    98: 

    99:             Console.WriteLine(loops);

02FD0656  mov         ecx,dword ptr [ebp-10h]  

02FD0659  call        72C583FC  

   100:             Console.Write("Waiting...");

02FD065E  mov         ecx,dword ptr ds:[4142314h]  

02FD0664  call        724C67F0  

02FD0669  lea         ecx,[ebp-28h]  

02FD066C  xor         edx,edx  

02FD066E  call        72C57984  

02FD0673  lea         esp,[ebp-0Ch]  

02FD0676  pop         ebx  

02FD0677  pop         esi  

02FD0678  pop         edi  

02FD0679  pop         ebp  

02FD067A  ret

edited Jan 19 at 11:51

AndreyAkinshin

9,6782378133

asked Jan 14 at 20:29

Imapler

1,079821

3

I didn't check this in detail, but it all looks somewhat complicated, also the condition in the loop header. Are you entirely sure that there isn't a minor logical error causing the inner loop to run a different amount of times? You could insert a dumb counter variable in the inner loop an count the number of runs before/after just to be safe.

– DanDan
Jan 14 at 20:36

1

calculates PI in a very convoluted way by counting collisions between two object of different mass and a wall - youtube.com/watch?v=HEfHFsfGXjs?

– GSerg
Jan 14 at 20:41

3

This is a very odd result! I cannot see any way that it can be cache misses. A better guess would be register scheduling in the FP registers. Are you doing your measurements in a release build and not in the debugger? Measuring performance in the debugger can cause register scheduling to be deoptimized.

– Eric Lippert
Jan 14 at 21:06

1

My best guess is register scheduling by the jitter then; it would be interesting to look at the generated code for both versions and see how they differ.

– Eric Lippert
Jan 14 at 21:19

2

It is a little tricky to get the generated code. Usually what I do in these situations is start the program outside of the debugger, but put a "readline" or some such thing that makes it pause just as it is about to finish; it has to be after all the code has jitted. Then while the program is paused, attach a debugger and force execution to step back into the method, and then look at the disassembly in the debugger. That way you definitely see the method as it was jitted without the debugger attached.

– Eric Lippert
Jan 14 at 21:25

|
show 5 more comments

Below is a simple program that with a small change, makes a significant performance impact and I don't understand why.

The rows in question are the commented ones which are mathematically equivalent. Using the slow version makes the entire program take roughly twice as long as using the fast version.

int iterations = 0;



for (int i = 4; i < 9; i++)

{

    Stopwatch s = Stopwatch.StartNew();



    double ms = 1.0;

    double mL = Math.Pow(100.0, i);

    double uL = 1.0;

    double us = 0.0;

    double msmLInv = 1d / (ms + mL);



    long collisions = 0;

    while (!(uL < 0 && us <= 0 && uL <= us))

    {

        Debug.Assert(++iterations > 0);

        ++collisions;



        double vs = (2 * mL * uL + us * (ms - mL)) * msmLInv;



        //double vL = (2 * ms * us - uL * (ms - mL)) * msmLInv; //fast

        double vL = uL + (us - vs) / mL; //slow





        Debug.Assert(Math.Abs(((2 * ms * us - uL * (ms - mL)) * msmLInv) - (uL + (us - vs) / mL)) < 0.001d); //checks equality between fast and slow

        if (vs > 0)

        {

            ++collisions;

            vs = -vs;

        }



        us = vs;

        uL = vL;

    }



    s.Stop();





    Debug.Assert(collisions.ToString() == "314159265359".Substring(0, i + 1)); //check the correctness

    Console.WriteLine($"i: {i}, T: {s.ElapsedMilliseconds / 1000f}, PI: {collisions}");

}



Debug.Assert(iterations == 174531180); //check that we dont skip loops



Console.Write("Waiting...");

Console.ReadKey();

My intuition says that because the fast version has 7 operations compared to 4 operations of the slow one, the slow one should be faster, but it is not.

I disassembled the program using .NET Reflector which shows that they are mostly equal, as expected, except for the part shown below. The code before and after an identical

//slow

ldloc.s uL

ldloc.2 

ldloc.s us

ldloc.s vs

sub 

mul 

ldloc.3 

div 

add

//fast

ldc.r8 2

ldloc.2 

mul 

ldloc.s us

mul 

ldloc.s uL

ldloc.2 

ldloc.3 

sub 

mul 

sub 

ldloc.2 

ldloc.3 

add 

div

This also shows that more code is executing with the fast version which also would lead me to expect it to be slower.

The only guess I have right now is that the slow version causes more cache misses, but I don't know how to measure that (a guide would be welcome). Other than that I am at a loss.

EDIT 1.
As per the request of @EricLippert here is the disassembly from the JIT for the inner while loop where the difference is.

EDIT 3.
Change the code to an updated example base on all the suggestions.

//slow

    78: 

    79:                     vs = (2 * mL * uL + us * (ms - mL)) / (ms + mL);

00C10530  call        CA9AD013  

00C10535  fdiv        st,st(3)  

00C10537  faddp       st(2),st  

    80: 

    81:                     //double vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

    82:                     double vL = uL + ms * (us - vs) / mL; //slow

00C10539  fldz  

00C1053B  fcomip      st,st(1)  

00C1053D  jp          00C10549  

00C1053F  jae         00C10549  

00C10541  add         ebx,1  

00C10544  adc         edi,0  

00C10547  fchs  

00C10549  fld         st(1)  

    73: 

    74:                 while (!(uL < 0 && us <= 0 && uL <= us))

00C1054B  fldz  

00C1054D  fcomip      st,st(3)  

00C1054F  fstp        st(2)  

00C10551  jp          00C10508  

00C10553  jbe         00C10508  

00C10555  fldz  

00C10557  fcomip      st,st(1)  

00C10559  jp          00C10508  

00C1055B  jb          00C10508  

00C1055D  fxch        st(1)  

00C1055F  fcomi       st,st(1)  

00C10561  jnp         00C10567  

00C10563  fxch        st(1)  

00C10565  jmp         00C10508  

00C10567  jbe         00C1056D  

00C10569  fxch        st(1)  

00C1056B  jmp         00C10508  

00C1056D  fstp        st(1)  

00C1056F  fstp        st(0)  

00C10571  fstp        st(0)  

    92:                 }

    93: 

    94:                 s.Stop();

00C10573  mov         ecx,esi  

00C10575  call        71880260  

    95: 

    96:                 Console.WriteLine($"i: {i}, T: {s.ElapsedMilliseconds / 1000f}, PI: {collisions}");

00C1057A  mov         ecx,725B0994h  

00C1057F  call        00B930C8  

00C10584  mov         edx,eax  

00C10586  mov         eax,dword ptr [ebp-14h]  

00C10589  mov         dword ptr [edx+4],eax  

00C1058C  mov         dword ptr [ebp-34h],edx  

00C1058F  mov         ecx,725F3778h  

00C10594  call        00B930C8  

00C10599  mov         dword ptr [ebp-38h],eax  

00C1059C  mov         ecx,725F2C10h  

00C105A1  call        00B930C8  

00C105A6  mov         dword ptr [ebp-3Ch],eax  

00C105A9  mov         ecx,esi  

00C105AB  call        71835820  

00C105B0  push        edx  

00C105B1  push        eax  

00C105B2  push        0  

00C105B4  push        2710h  

00C105B9  call        736071A0  

00C105BE  mov         dword ptr [ebp-48h],eax  

00C105C1  mov         dword ptr [ebp-44h],edx  

00C105C4  fild        qword ptr [ebp-48h]  

00C105C7  fstp        dword ptr [ebp-40h]  

00C105CA  fld         dword ptr [ebp-40h]  

00C105CD  fdiv        dword ptr ds:[0C10678h]  

00C105D3  mov         eax,dword ptr [ebp-38h]  

00C105D6  fstp        dword ptr [eax+4]  

00C105D9  mov         edx,dword ptr [ebp-38h]  

00C105DC  mov         eax,dword ptr [ebp-3Ch]  

00C105DF  mov         dword ptr [eax+4],ebx  

00C105E2  mov         dword ptr [eax+8],edi  

00C105E5  mov         esi,dword ptr [ebp-3Ch]  

00C105E8  lea         edi,[ebp-30h]  

00C105EB  xorps       xmm0,xmm0  

00C105EE  movq        mmword ptr [edi],xmm0  

00C105F2  movq        mmword ptr [edi+8],xmm0  

00C105F7  push        edx  

00C105F8  push        esi  

00C105F9  lea         ecx,[ebp-30h]  

00C105FC  mov         edx,dword ptr [ebp-34h]  

00C105FF  call        724A2ED4  

00C10604  lea         eax,[ebp-30h]  

00C10607  push        dword ptr [eax+0Ch]  

00C1060A  push        dword ptr [eax+8]  

00C1060D  push        dword ptr [eax+4]  

00C10610  push        dword ptr [eax]  

00C10612  mov         edx,dword ptr ds:[3832310h]  

00C10618  xor         ecx,ecx  

00C1061A  call        72497A00  

00C1061F  mov         ecx,eax  

00C10621  call        72571934  

    61:             for (int i = 4; i < 9; i++)

00C10626  inc         dword ptr [ebp-14h]  

00C10629  cmp         dword ptr [ebp-14h],9  

00C1062D  jl          00C10496  

    97:             }

    98: 

    99:             Console.WriteLine(loops);

00C10633  mov         ecx,dword ptr [ebp-10h]  

00C10636  call        72C583FC  

   100:             Console.Write("Waiting...");

00C1063B  mov         ecx,dword ptr ds:[3832314h]  

00C10641  call        724C67F0  

00C10646  lea         ecx,[ebp-20h]  

00C10649  xor         edx,edx  

00C1064B  call        72C57984  

00C10650  lea         esp,[ebp-0Ch]  

00C10653  pop         ebx  

00C10654  pop         esi  

00C10655  pop         edi  

00C10656  pop         ebp  

00C10657  ret

//fast

 80: 

    81:                     double vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

02FD0550  or          al,83h  

    80: 

    81:                     double vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

02FD0552  ret  

02FD0553  add         dword ptr [ebx-3626FF29h],eax  

02FD0559  fchs  

02FD055B  fxch        st(1)  

02FD055D  fld         st(0)  

    73: 

    74:                 while (!(uL < 0 && us <= 0 && uL <= us))

02FD055F  fldz  

02FD0561  fcomip      st,st(2)  

02FD0563  fstp        st(1)  

02FD0565  jnp         02FD056B  

02FD0567  fxch        st(1)  

02FD0569  jmp         02FD050B  

02FD056B  ja          02FD0571  

02FD056D  fxch        st(1)  

02FD056F  jmp         02FD050B  

02FD0571  fldz  

02FD0573  fcomip      st,st(2)  

02FD0575  jnp         02FD057B  

02FD0577  fxch        st(1)  

02FD0579  jmp         02FD050B  

02FD057B  jae         02FD0581  

02FD057D  fxch        st(1)  

02FD057F  jmp         02FD050B  

02FD0581  fcomi       st,st(1)  

02FD0583  jnp         02FD0589  

02FD0585  fxch        st(1)  

02FD0587  jmp         02FD050B  

02FD0589  jbe         02FD0592  

02FD058B  fxch        st(1)  

02FD058D  jmp         02FD050B  

02FD0592  fstp        st(1)  

02FD0594  fstp        st(0)  

    92:                 }

    93: 

    94:                 s.Stop();

02FD0596  mov         ecx,esi  

02FD0598  call        71880260  

    95: 

    96:                 Console.WriteLine($"i: {i}, T: {s.ElapsedMilliseconds / 1000f}, PI: {collisions}");

02FD059D  mov         ecx,725B0994h  

02FD05A2  call        013830C8  

02FD05A7  mov         edx,eax  

02FD05A9  mov         eax,dword ptr [ebp-14h]  

02FD05AC  mov         dword ptr [edx+4],eax  

02FD05AF  mov         dword ptr [ebp-3Ch],edx  

02FD05B2  mov         ecx,725F3778h  

02FD05B7  call        013830C8  

02FD05BC  mov         dword ptr [ebp-40h],eax  

02FD05BF  mov         ecx,725F2C10h  

02FD05C4  call        013830C8  

02FD05C9  mov         dword ptr [ebp-44h],eax  

02FD05CC  mov         ecx,esi  

02FD05CE  call        71835820  

02FD05D3  push        edx  

02FD05D4  push        eax  

02FD05D5  push        0  

02FD05D7  push        2710h  

02FD05DC  call        736071A0  

02FD05E1  mov         dword ptr [ebp-50h],eax  

02FD05E4  mov         dword ptr [ebp-4Ch],edx  

02FD05E7  fild        qword ptr [ebp-50h]  

02FD05EA  fstp        dword ptr [ebp-48h]  

02FD05ED  fld         dword ptr [ebp-48h]  

02FD05F0  fdiv        dword ptr ds:[2FD06A8h]  

02FD05F6  mov         eax,dword ptr [ebp-40h]  

02FD05F9  fstp        dword ptr [eax+4]  

02FD05FC  mov         edx,dword ptr [ebp-40h]  

02FD05FF  mov         eax,dword ptr [ebp-44h]  

02FD0602  mov         dword ptr [eax+4],ebx  

02FD0605  mov         dword ptr [eax+8],edi  

02FD0608  mov         esi,dword ptr [ebp-44h]  

02FD060B  lea         edi,[ebp-38h]  

02FD060E  xorps       xmm0,xmm0  

02FD0611  movq        mmword ptr [edi],xmm0  

02FD0615  movq        mmword ptr [edi+8],xmm0  

02FD061A  push        edx  

02FD061B  push        esi  

02FD061C  lea         ecx,[ebp-38h]  

02FD061F  mov         edx,dword ptr [ebp-3Ch]  

02FD0622  call        724A2ED4  

02FD0627  lea         eax,[ebp-38h]  

02FD062A  push        dword ptr [eax+0Ch]  

02FD062D  push        dword ptr [eax+8]  

02FD0630  push        dword ptr [eax+4]  

02FD0633  push        dword ptr [eax]  

02FD0635  mov         edx,dword ptr ds:[4142310h]  

02FD063B  xor         ecx,ecx  

02FD063D  call        72497A00  

02FD0642  mov         ecx,eax  

02FD0644  call        72571934  

    61:             for (int i = 4; i < 9; i++)

02FD0649  inc         dword ptr [ebp-14h]  

02FD064C  cmp         dword ptr [ebp-14h],9  

02FD0650  jl          02FD0496  

    97:             }

    98: 

    99:             Console.WriteLine(loops);

02FD0656  mov         ecx,dword ptr [ebp-10h]  

02FD0659  call        72C583FC  

   100:             Console.Write("Waiting...");

02FD065E  mov         ecx,dword ptr ds:[4142314h]  

02FD0664  call        724C67F0  

02FD0669  lea         ecx,[ebp-28h]  

02FD066C  xor         edx,edx  

02FD066E  call        72C57984  

02FD0673  lea         esp,[ebp-0Ch]  

02FD0676  pop         ebx  

02FD0677  pop         esi  

02FD0678  pop         edi  

02FD0679  pop         ebp  

02FD067A  ret

edited Jan 19 at 11:51

AndreyAkinshin

9,6782378133

asked Jan 14 at 20:29

Imapler

1,079821

Below is a simple program that with a small change, makes a significant performance impact and I don't understand why.

The rows in question are the commented ones which are mathematically equivalent. Using the slow version makes the entire program take roughly twice as long as using the fast version.

int iterations = 0;



for (int i = 4; i < 9; i++)

{

    Stopwatch s = Stopwatch.StartNew();



    double ms = 1.0;

    double mL = Math.Pow(100.0, i);

    double uL = 1.0;

    double us = 0.0;

    double msmLInv = 1d / (ms + mL);



    long collisions = 0;

    while (!(uL < 0 && us <= 0 && uL <= us))

    {

        Debug.Assert(++iterations > 0);

        ++collisions;



        double vs = (2 * mL * uL + us * (ms - mL)) * msmLInv;



        //double vL = (2 * ms * us - uL * (ms - mL)) * msmLInv; //fast

        double vL = uL + (us - vs) / mL; //slow





        Debug.Assert(Math.Abs(((2 * ms * us - uL * (ms - mL)) * msmLInv) - (uL + (us - vs) / mL)) < 0.001d); //checks equality between fast and slow

        if (vs > 0)

        {

            ++collisions;

            vs = -vs;

        }



        us = vs;

        uL = vL;

    }



    s.Stop();





    Debug.Assert(collisions.ToString() == "314159265359".Substring(0, i + 1)); //check the correctness

    Console.WriteLine($"i: {i}, T: {s.ElapsedMilliseconds / 1000f}, PI: {collisions}");

}



Debug.Assert(iterations == 174531180); //check that we dont skip loops



Console.Write("Waiting...");

Console.ReadKey();

My intuition says that because the fast version has 7 operations compared to 4 operations of the slow one, the slow one should be faster, but it is not.

I disassembled the program using .NET Reflector which shows that they are mostly equal, as expected, except for the part shown below. The code before and after an identical

//slow

ldloc.s uL

ldloc.2 

ldloc.s us

ldloc.s vs

sub 

mul 

ldloc.3 

div 

add

//fast

ldc.r8 2

ldloc.2 

mul 

ldloc.s us

mul 

ldloc.s uL

ldloc.2 

ldloc.3 

sub 

mul 

sub 

ldloc.2 

ldloc.3 

add 

div

This also shows that more code is executing with the fast version which also would lead me to expect it to be slower.

The only guess I have right now is that the slow version causes more cache misses, but I don't know how to measure that (a guide would be welcome). Other than that I am at a loss.

EDIT 1.
As per the request of @EricLippert here is the disassembly from the JIT for the inner while loop where the difference is.

EDIT 3.
Change the code to an updated example base on all the suggestions.

//slow

    78: 

    79:                     vs = (2 * mL * uL + us * (ms - mL)) / (ms + mL);

00C10530  call        CA9AD013  

00C10535  fdiv        st,st(3)  

00C10537  faddp       st(2),st  

    80: 

    81:                     //double vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

    82:                     double vL = uL + ms * (us - vs) / mL; //slow

00C10539  fldz  

00C1053B  fcomip      st,st(1)  

00C1053D  jp          00C10549  

00C1053F  jae         00C10549  

00C10541  add         ebx,1  

00C10544  adc         edi,0  

00C10547  fchs  

00C10549  fld         st(1)  

    73: 

    74:                 while (!(uL < 0 && us <= 0 && uL <= us))

00C1054B  fldz  

00C1054D  fcomip      st,st(3)  

00C1054F  fstp        st(2)  

00C10551  jp          00C10508  

00C10553  jbe         00C10508  

00C10555  fldz  

00C10557  fcomip      st,st(1)  

00C10559  jp          00C10508  

00C1055B  jb          00C10508  

00C1055D  fxch        st(1)  

00C1055F  fcomi       st,st(1)  

00C10561  jnp         00C10567  

00C10563  fxch        st(1)  

00C10565  jmp         00C10508  

00C10567  jbe         00C1056D  

00C10569  fxch        st(1)  

00C1056B  jmp         00C10508  

00C1056D  fstp        st(1)  

00C1056F  fstp        st(0)  

00C10571  fstp        st(0)  

    92:                 }

    93: 

    94:                 s.Stop();

00C10573  mov         ecx,esi  

00C10575  call        71880260  

    95: 

    96:                 Console.WriteLine($"i: {i}, T: {s.ElapsedMilliseconds / 1000f}, PI: {collisions}");

00C1057A  mov         ecx,725B0994h  

00C1057F  call        00B930C8  

00C10584  mov         edx,eax  

00C10586  mov         eax,dword ptr [ebp-14h]  

00C10589  mov         dword ptr [edx+4],eax  

00C1058C  mov         dword ptr [ebp-34h],edx  

00C1058F  mov         ecx,725F3778h  

00C10594  call        00B930C8  

00C10599  mov         dword ptr [ebp-38h],eax  

00C1059C  mov         ecx,725F2C10h  

00C105A1  call        00B930C8  

00C105A6  mov         dword ptr [ebp-3Ch],eax  

00C105A9  mov         ecx,esi  

00C105AB  call        71835820  

00C105B0  push        edx  

00C105B1  push        eax  

00C105B2  push        0  

00C105B4  push        2710h  

00C105B9  call        736071A0  

00C105BE  mov         dword ptr [ebp-48h],eax  

00C105C1  mov         dword ptr [ebp-44h],edx  

00C105C4  fild        qword ptr [ebp-48h]  

00C105C7  fstp        dword ptr [ebp-40h]  

00C105CA  fld         dword ptr [ebp-40h]  

00C105CD  fdiv        dword ptr ds:[0C10678h]  

00C105D3  mov         eax,dword ptr [ebp-38h]  

00C105D6  fstp        dword ptr [eax+4]  

00C105D9  mov         edx,dword ptr [ebp-38h]  

00C105DC  mov         eax,dword ptr [ebp-3Ch]  

00C105DF  mov         dword ptr [eax+4],ebx  

00C105E2  mov         dword ptr [eax+8],edi  

00C105E5  mov         esi,dword ptr [ebp-3Ch]  

00C105E8  lea         edi,[ebp-30h]  

00C105EB  xorps       xmm0,xmm0  

00C105EE  movq        mmword ptr [edi],xmm0  

00C105F2  movq        mmword ptr [edi+8],xmm0  

00C105F7  push        edx  

00C105F8  push        esi  

00C105F9  lea         ecx,[ebp-30h]  

00C105FC  mov         edx,dword ptr [ebp-34h]  

00C105FF  call        724A2ED4  

00C10604  lea         eax,[ebp-30h]  

00C10607  push        dword ptr [eax+0Ch]  

00C1060A  push        dword ptr [eax+8]  

00C1060D  push        dword ptr [eax+4]  

00C10610  push        dword ptr [eax]  

00C10612  mov         edx,dword ptr ds:[3832310h]  

00C10618  xor         ecx,ecx  

00C1061A  call        72497A00  

00C1061F  mov         ecx,eax  

00C10621  call        72571934  

    61:             for (int i = 4; i < 9; i++)

00C10626  inc         dword ptr [ebp-14h]  

00C10629  cmp         dword ptr [ebp-14h],9  

00C1062D  jl          00C10496  

    97:             }

    98: 

    99:             Console.WriteLine(loops);

00C10633  mov         ecx,dword ptr [ebp-10h]  

00C10636  call        72C583FC  

   100:             Console.Write("Waiting...");

00C1063B  mov         ecx,dword ptr ds:[3832314h]  

00C10641  call        724C67F0  

00C10646  lea         ecx,[ebp-20h]  

00C10649  xor         edx,edx  

00C1064B  call        72C57984  

00C10650  lea         esp,[ebp-0Ch]  

00C10653  pop         ebx  

00C10654  pop         esi  

00C10655  pop         edi  

00C10656  pop         ebp  

00C10657  ret

//fast

 80: 

    81:                     double vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

02FD0550  or          al,83h  

    80: 

    81:                     double vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

02FD0552  ret  

02FD0553  add         dword ptr [ebx-3626FF29h],eax  

02FD0559  fchs  

02FD055B  fxch        st(1)  

02FD055D  fld         st(0)  

    73: 

    74:                 while (!(uL < 0 && us <= 0 && uL <= us))

02FD055F  fldz  

02FD0561  fcomip      st,st(2)  

02FD0563  fstp        st(1)  

02FD0565  jnp         02FD056B  

02FD0567  fxch        st(1)  

02FD0569  jmp         02FD050B  

02FD056B  ja          02FD0571  

02FD056D  fxch        st(1)  

02FD056F  jmp         02FD050B  

02FD0571  fldz  

02FD0573  fcomip      st,st(2)  

02FD0575  jnp         02FD057B  

02FD0577  fxch        st(1)  

02FD0579  jmp         02FD050B  

02FD057B  jae         02FD0581  

02FD057D  fxch        st(1)  

02FD057F  jmp         02FD050B  

02FD0581  fcomi       st,st(1)  

02FD0583  jnp         02FD0589  

02FD0585  fxch        st(1)  

02FD0587  jmp         02FD050B  

02FD0589  jbe         02FD0592  

02FD058B  fxch        st(1)  

02FD058D  jmp         02FD050B  

02FD0592  fstp        st(1)  

02FD0594  fstp        st(0)  

    92:                 }

    93: 

    94:                 s.Stop();

02FD0596  mov         ecx,esi  

02FD0598  call        71880260  

    95: 

    96:                 Console.WriteLine($"i: {i}, T: {s.ElapsedMilliseconds / 1000f}, PI: {collisions}");

02FD059D  mov         ecx,725B0994h  

02FD05A2  call        013830C8  

02FD05A7  mov         edx,eax  

02FD05A9  mov         eax,dword ptr [ebp-14h]  

02FD05AC  mov         dword ptr [edx+4],eax  

02FD05AF  mov         dword ptr [ebp-3Ch],edx  

02FD05B2  mov         ecx,725F3778h  

02FD05B7  call        013830C8  

02FD05BC  mov         dword ptr [ebp-40h],eax  

02FD05BF  mov         ecx,725F2C10h  

02FD05C4  call        013830C8  

02FD05C9  mov         dword ptr [ebp-44h],eax  

02FD05CC  mov         ecx,esi  

02FD05CE  call        71835820  

02FD05D3  push        edx  

02FD05D4  push        eax  

02FD05D5  push        0  

02FD05D7  push        2710h  

02FD05DC  call        736071A0  

02FD05E1  mov         dword ptr [ebp-50h],eax  

02FD05E4  mov         dword ptr [ebp-4Ch],edx  

02FD05E7  fild        qword ptr [ebp-50h]  

02FD05EA  fstp        dword ptr [ebp-48h]  

02FD05ED  fld         dword ptr [ebp-48h]  

02FD05F0  fdiv        dword ptr ds:[2FD06A8h]  

02FD05F6  mov         eax,dword ptr [ebp-40h]  

02FD05F9  fstp        dword ptr [eax+4]  

02FD05FC  mov         edx,dword ptr [ebp-40h]  

02FD05FF  mov         eax,dword ptr [ebp-44h]  

02FD0602  mov         dword ptr [eax+4],ebx  

02FD0605  mov         dword ptr [eax+8],edi  

02FD0608  mov         esi,dword ptr [ebp-44h]  

02FD060B  lea         edi,[ebp-38h]  

02FD060E  xorps       xmm0,xmm0  

02FD0611  movq        mmword ptr [edi],xmm0  

02FD0615  movq        mmword ptr [edi+8],xmm0  

02FD061A  push        edx  

02FD061B  push        esi  

02FD061C  lea         ecx,[ebp-38h]  

02FD061F  mov         edx,dword ptr [ebp-3Ch]  

02FD0622  call        724A2ED4  

02FD0627  lea         eax,[ebp-38h]  

02FD062A  push        dword ptr [eax+0Ch]  

02FD062D  push        dword ptr [eax+8]  

02FD0630  push        dword ptr [eax+4]  

02FD0633  push        dword ptr [eax]  

02FD0635  mov         edx,dword ptr ds:[4142310h]  

02FD063B  xor         ecx,ecx  

02FD063D  call        72497A00  

02FD0642  mov         ecx,eax  

02FD0644  call        72571934  

    61:             for (int i = 4; i < 9; i++)

02FD0649  inc         dword ptr [ebp-14h]  

02FD064C  cmp         dword ptr [ebp-14h],9  

02FD0650  jl          02FD0496  

    97:             }

    98: 

    99:             Console.WriteLine(loops);

02FD0656  mov         ecx,dword ptr [ebp-10h]  

02FD0659  call        72C583FC  

   100:             Console.Write("Waiting...");

02FD065E  mov         ecx,dword ptr ds:[4142314h]  

02FD0664  call        724C67F0  

02FD0669  lea         ecx,[ebp-28h]  

02FD066C  xor         edx,edx  

02FD066E  call        72C57984  

02FD0673  lea         esp,[ebp-0Ch]  

02FD0676  pop         ebx  

02FD0677  pop         esi  

02FD0678  pop         edi  

02FD0679  pop         ebp  

02FD067A  ret

c# performance

edited Jan 19 at 11:51

AndreyAkinshin

9,6782378133

asked Jan 14 at 20:29

Imapler

1,079821

edited Jan 19 at 11:51

AndreyAkinshin

9,6782378133

asked Jan 14 at 20:29

Imapler

1,079821

edited Jan 19 at 11:51

AndreyAkinshin

9,6782378133

edited Jan 19 at 11:51

AndreyAkinshin

9,6782378133

edited Jan 19 at 11:51

AndreyAkinshin

9,6782378133

asked Jan 14 at 20:29

Imapler

1,079821

asked Jan 14 at 20:29

Imapler

1,079821

asked Jan 14 at 20:29

Imapler

1,079821

3

I didn't check this in detail, but it all looks somewhat complicated, also the condition in the loop header. Are you entirely sure that there isn't a minor logical error causing the inner loop to run a different amount of times? You could insert a dumb counter variable in the inner loop an count the number of runs before/after just to be safe.

– DanDan
Jan 14 at 20:36

1

calculates PI in a very convoluted way by counting collisions between two object of different mass and a wall - youtube.com/watch?v=HEfHFsfGXjs?

– GSerg
Jan 14 at 20:41

3

This is a very odd result! I cannot see any way that it can be cache misses. A better guess would be register scheduling in the FP registers. Are you doing your measurements in a release build and not in the debugger? Measuring performance in the debugger can cause register scheduling to be deoptimized.

– Eric Lippert
Jan 14 at 21:06

1

My best guess is register scheduling by the jitter then; it would be interesting to look at the generated code for both versions and see how they differ.

– Eric Lippert
Jan 14 at 21:19

2

It is a little tricky to get the generated code. Usually what I do in these situations is start the program outside of the debugger, but put a "readline" or some such thing that makes it pause just as it is about to finish; it has to be after all the code has jitted. Then while the program is paused, attach a debugger and force execution to step back into the method, and then look at the disassembly in the debugger. That way you definitely see the method as it was jitted without the debugger attached.

– Eric Lippert
Jan 14 at 21:25

|
show 5 more comments

3

I didn't check this in detail, but it all looks somewhat complicated, also the condition in the loop header. Are you entirely sure that there isn't a minor logical error causing the inner loop to run a different amount of times? You could insert a dumb counter variable in the inner loop an count the number of runs before/after just to be safe.

– DanDan
Jan 14 at 20:36

1

calculates PI in a very convoluted way by counting collisions between two object of different mass and a wall - youtube.com/watch?v=HEfHFsfGXjs?

– GSerg
Jan 14 at 20:41

3

This is a very odd result! I cannot see any way that it can be cache misses. A better guess would be register scheduling in the FP registers. Are you doing your measurements in a release build and not in the debugger? Measuring performance in the debugger can cause register scheduling to be deoptimized.

– Eric Lippert
Jan 14 at 21:06

1

My best guess is register scheduling by the jitter then; it would be interesting to look at the generated code for both versions and see how they differ.

– Eric Lippert
Jan 14 at 21:19

2

It is a little tricky to get the generated code. Usually what I do in these situations is start the program outside of the debugger, but put a "readline" or some such thing that makes it pause just as it is about to finish; it has to be after all the code has jitted. Then while the program is paused, attach a debugger and force execution to step back into the method, and then look at the disassembly in the debugger. That way you definitely see the method as it was jitted without the debugger attached.

– Eric Lippert
Jan 14 at 21:25

I didn't check this in detail, but it all looks somewhat complicated, also the condition in the loop header. Are you entirely sure that there isn't a minor logical error causing the inner loop to run a different amount of times? You could insert a dumb counter variable in the inner loop an count the number of runs before/after just to be safe.

– DanDan
Jan 14 at 20:36

calculates PI in a very convoluted way by counting collisions between two object of different mass and a wall - youtube.com/watch?v=HEfHFsfGXjs?

– GSerg
Jan 14 at 20:41

This is a very odd result! I cannot see any way that it can be cache misses. A better guess would be register scheduling in the FP registers. Are you doing your measurements in a release build and not in the debugger? Measuring performance in the debugger can cause register scheduling to be deoptimized.

– Eric Lippert
Jan 14 at 21:06

My best guess is register scheduling by the jitter then; it would be interesting to look at the generated code for both versions and see how they differ.

– Eric Lippert
Jan 14 at 21:19

It is a little tricky to get the generated code. Usually what I do in these situations is start the program outside of the debugger, but put a "readline" or some such thing that makes it pause just as it is about to finish; it has to be after all the code has jitted. Then while the program is paused, attach a debugger and force execution to step back into the method, and then look at the disassembly in the debugger. That way you definitely see the method as it was jitted without the debugger attached.

– Eric Lippert
Jan 14 at 21:25

|
show 5 more comments

2 Answers
2

active

oldest

votes

I think the reason is CPU instruction pipelining. your slow equation depends on vs, that means vs must be calculated first, then vl is calculated.

but in your fast equation, more instructions can be pipelined as vs and vl can be calculated at same time because they don't depend on each other.

Please don't confuse this with multi threading. Instruction pipelining is some thing implemented at very low hardware level and tries to exploit as many CPU modules as possible at the same time to achieve maximum instruction throughput.

edited Jan 14 at 23:29

answered Jan 14 at 22:03

M.kazem Akhgary

12k53372

Interesting idea, i hadnt thought about pipelining and i agree that there will be more divisions in the slow version which are expesive but it seems unlikely to make that large difference. I did a few simple test by double o = 1d/us and o=1d/vs seperatly and then used them in the if statement to prevent to from being removed by optimization. This did not make any significant difference on the runtime though. Do you have any other ideas how to test that?

– Imapler
Jan 14 at 22:24

instruction pipelining is something implemented at hardware level. so in order to test that you should run this code on a processor without pipelining feature and compare slow and fast versions on that computer.

– M.kazem Akhgary
Jan 14 at 22:30

1

@Imapler for division i did not notice one thing important. your fast code has literally no division inside the loop.

– M.kazem Akhgary
Jan 14 at 22:33

@Imapler ok, division is not the reason. both equation divisions are optimized away outside of the loop. i guess only thing that remains is instruction pipelining.

– M.kazem Akhgary
Jan 14 at 22:35

Good catch however the compiler does not move it outside the loop. I precalculated the 1d/(ms+mL)and how it takes 2/3 of the time.

– Imapler
Jan 14 at 22:37

|
show 8 more comments

You calculations are not equal

double vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

double vL = uL + ms * (us - vs) / mL; //slow

Example: I miss vs in the fast version

I expect your while loop doing more iterations because of this?

edited Jan 14 at 20:50

AdrianHHH

11.3k124067

answered Jan 14 at 20:42

Aldert

1,3681212

Good thought, i ran the experiment and both variants run 174531180 iterations so no difference there. The equations looks quite different but they are derived using the conservation of momentum and energy. The difference is that the slow one uses the result from the first calculations while the fast recalculates everything.

– Imapler
Jan 14 at 21:03

And if you do both computations and compare the two values, they are always within a very small fraction of each other. They appear to be equivalent computations.

– Eric Lippert
Jan 14 at 21:10

2

@EricLippert Math.Abs(vL1 - vL2) > 0.001f is never true

– Imapler
Jan 14 at 21:14

1

They are mathematically equal, you can prove that just by algebra operations.

– Antonín Lejsek
Jan 15 at 1:45

I double checked with equation solver fast and slow are not equal: vL = uL + (2 * ms * (us - uL)) / (ms + mL); //slow vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

– Aldert
Jan 17 at 19:44

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54188731%2fsurprisingly-different-performance-of-simple-c-sharp-program%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

I think the reason is CPU instruction pipelining. your slow equation depends on vs, that means vs must be calculated first, then vl is calculated.

but in your fast equation, more instructions can be pipelined as vs and vl can be calculated at same time because they don't depend on each other.

edited Jan 14 at 23:29

answered Jan 14 at 22:03

M.kazem Akhgary

12k53372

Interesting idea, i hadnt thought about pipelining and i agree that there will be more divisions in the slow version which are expesive but it seems unlikely to make that large difference. I did a few simple test by double o = 1d/us and o=1d/vs seperatly and then used them in the if statement to prevent to from being removed by optimization. This did not make any significant difference on the runtime though. Do you have any other ideas how to test that?

– Imapler
Jan 14 at 22:24

instruction pipelining is something implemented at hardware level. so in order to test that you should run this code on a processor without pipelining feature and compare slow and fast versions on that computer.

– M.kazem Akhgary
Jan 14 at 22:30

1

@Imapler for division i did not notice one thing important. your fast code has literally no division inside the loop.

– M.kazem Akhgary
Jan 14 at 22:33

@Imapler ok, division is not the reason. both equation divisions are optimized away outside of the loop. i guess only thing that remains is instruction pipelining.

– M.kazem Akhgary
Jan 14 at 22:35

Good catch however the compiler does not move it outside the loop. I precalculated the 1d/(ms+mL)and how it takes 2/3 of the time.

– Imapler
Jan 14 at 22:37

|
show 8 more comments

I think the reason is CPU instruction pipelining. your slow equation depends on vs, that means vs must be calculated first, then vl is calculated.

but in your fast equation, more instructions can be pipelined as vs and vl can be calculated at same time because they don't depend on each other.

edited Jan 14 at 23:29

answered Jan 14 at 22:03

M.kazem Akhgary

12k53372

Interesting idea, i hadnt thought about pipelining and i agree that there will be more divisions in the slow version which are expesive but it seems unlikely to make that large difference. I did a few simple test by double o = 1d/us and o=1d/vs seperatly and then used them in the if statement to prevent to from being removed by optimization. This did not make any significant difference on the runtime though. Do you have any other ideas how to test that?

– Imapler
Jan 14 at 22:24

instruction pipelining is something implemented at hardware level. so in order to test that you should run this code on a processor without pipelining feature and compare slow and fast versions on that computer.

– M.kazem Akhgary
Jan 14 at 22:30

1

@Imapler for division i did not notice one thing important. your fast code has literally no division inside the loop.

– M.kazem Akhgary
Jan 14 at 22:33

@Imapler ok, division is not the reason. both equation divisions are optimized away outside of the loop. i guess only thing that remains is instruction pipelining.

– M.kazem Akhgary
Jan 14 at 22:35

Good catch however the compiler does not move it outside the loop. I precalculated the 1d/(ms+mL)and how it takes 2/3 of the time.

– Imapler
Jan 14 at 22:37

|
show 8 more comments

I think the reason is CPU instruction pipelining. your slow equation depends on vs, that means vs must be calculated first, then vl is calculated.

but in your fast equation, more instructions can be pipelined as vs and vl can be calculated at same time because they don't depend on each other.

edited Jan 14 at 23:29

answered Jan 14 at 22:03

M.kazem Akhgary

12k53372

I think the reason is CPU instruction pipelining. your slow equation depends on vs, that means vs must be calculated first, then vl is calculated.

but in your fast equation, more instructions can be pipelined as vs and vl can be calculated at same time because they don't depend on each other.

edited Jan 14 at 23:29

answered Jan 14 at 22:03

M.kazem Akhgary

12k53372

edited Jan 14 at 23:29

answered Jan 14 at 22:03

M.kazem Akhgary

12k53372

answered Jan 14 at 22:03

M.kazem Akhgary

12k53372

answered Jan 14 at 22:03

M.kazem Akhgary

12k53372

Interesting idea, i hadnt thought about pipelining and i agree that there will be more divisions in the slow version which are expesive but it seems unlikely to make that large difference. I did a few simple test by double o = 1d/us and o=1d/vs seperatly and then used them in the if statement to prevent to from being removed by optimization. This did not make any significant difference on the runtime though. Do you have any other ideas how to test that?

– Imapler
Jan 14 at 22:24

instruction pipelining is something implemented at hardware level. so in order to test that you should run this code on a processor without pipelining feature and compare slow and fast versions on that computer.

– M.kazem Akhgary
Jan 14 at 22:30

1

@Imapler for division i did not notice one thing important. your fast code has literally no division inside the loop.

– M.kazem Akhgary
Jan 14 at 22:33

@Imapler ok, division is not the reason. both equation divisions are optimized away outside of the loop. i guess only thing that remains is instruction pipelining.

– M.kazem Akhgary
Jan 14 at 22:35

Good catch however the compiler does not move it outside the loop. I precalculated the 1d/(ms+mL)and how it takes 2/3 of the time.

– Imapler
Jan 14 at 22:37

|
show 8 more comments

Interesting idea, i hadnt thought about pipelining and i agree that there will be more divisions in the slow version which are expesive but it seems unlikely to make that large difference. I did a few simple test by double o = 1d/us and o=1d/vs seperatly and then used them in the if statement to prevent to from being removed by optimization. This did not make any significant difference on the runtime though. Do you have any other ideas how to test that?

– Imapler
Jan 14 at 22:24

instruction pipelining is something implemented at hardware level. so in order to test that you should run this code on a processor without pipelining feature and compare slow and fast versions on that computer.

– M.kazem Akhgary
Jan 14 at 22:30

1

@Imapler for division i did not notice one thing important. your fast code has literally no division inside the loop.

– M.kazem Akhgary
Jan 14 at 22:33

@Imapler ok, division is not the reason. both equation divisions are optimized away outside of the loop. i guess only thing that remains is instruction pipelining.

– M.kazem Akhgary
Jan 14 at 22:35

Good catch however the compiler does not move it outside the loop. I precalculated the 1d/(ms+mL)and how it takes 2/3 of the time.

– Imapler
Jan 14 at 22:37

Interesting idea, i hadnt thought about pipelining and i agree that there will be more divisions in the slow version which are expesive but it seems unlikely to make that large difference. I did a few simple test by double o = 1d/us and o=1d/vs seperatly and then used them in the if statement to prevent to from being removed by optimization. This did not make any significant difference on the runtime though. Do you have any other ideas how to test that?

– Imapler
Jan 14 at 22:24

instruction pipelining is something implemented at hardware level. so in order to test that you should run this code on a processor without pipelining feature and compare slow and fast versions on that computer.

– M.kazem Akhgary
Jan 14 at 22:30

@Imapler for division i did not notice one thing important. your fast code has literally no division inside the loop.

– M.kazem Akhgary
Jan 14 at 22:33

@Imapler ok, division is not the reason. both equation divisions are optimized away outside of the loop. i guess only thing that remains is instruction pipelining.

– M.kazem Akhgary
Jan 14 at 22:35

Good catch however the compiler does not move it outside the loop. I precalculated the 1d/(ms+mL)and how it takes 2/3 of the time.

– Imapler
Jan 14 at 22:37

|
show 8 more comments

You calculations are not equal

double vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

double vL = uL + ms * (us - vs) / mL; //slow

Example: I miss vs in the fast version

I expect your while loop doing more iterations because of this?

edited Jan 14 at 20:50

AdrianHHH

11.3k124067

answered Jan 14 at 20:42

Aldert

1,3681212

Good thought, i ran the experiment and both variants run 174531180 iterations so no difference there. The equations looks quite different but they are derived using the conservation of momentum and energy. The difference is that the slow one uses the result from the first calculations while the fast recalculates everything.

– Imapler
Jan 14 at 21:03

And if you do both computations and compare the two values, they are always within a very small fraction of each other. They appear to be equivalent computations.

– Eric Lippert
Jan 14 at 21:10

2

@EricLippert Math.Abs(vL1 - vL2) > 0.001f is never true

– Imapler
Jan 14 at 21:14

1

They are mathematically equal, you can prove that just by algebra operations.

– Antonín Lejsek
Jan 15 at 1:45

I double checked with equation solver fast and slow are not equal: vL = uL + (2 * ms * (us - uL)) / (ms + mL); //slow vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

– Aldert
Jan 17 at 19:44

add a comment |

You calculations are not equal

double vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

double vL = uL + ms * (us - vs) / mL; //slow

Example: I miss vs in the fast version

I expect your while loop doing more iterations because of this?

edited Jan 14 at 20:50

AdrianHHH

11.3k124067

answered Jan 14 at 20:42

Aldert

1,3681212

Good thought, i ran the experiment and both variants run 174531180 iterations so no difference there. The equations looks quite different but they are derived using the conservation of momentum and energy. The difference is that the slow one uses the result from the first calculations while the fast recalculates everything.

– Imapler
Jan 14 at 21:03

And if you do both computations and compare the two values, they are always within a very small fraction of each other. They appear to be equivalent computations.

– Eric Lippert
Jan 14 at 21:10

2

@EricLippert Math.Abs(vL1 - vL2) > 0.001f is never true

– Imapler
Jan 14 at 21:14

1

They are mathematically equal, you can prove that just by algebra operations.

– Antonín Lejsek
Jan 15 at 1:45

I double checked with equation solver fast and slow are not equal: vL = uL + (2 * ms * (us - uL)) / (ms + mL); //slow vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

– Aldert
Jan 17 at 19:44

add a comment |

You calculations are not equal

double vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

double vL = uL + ms * (us - vs) / mL; //slow

Example: I miss vs in the fast version

I expect your while loop doing more iterations because of this?

edited Jan 14 at 20:50

AdrianHHH

11.3k124067

answered Jan 14 at 20:42

Aldert

1,3681212

You calculations are not equal

double vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

double vL = uL + ms * (us - vs) / mL; //slow

Example: I miss vs in the fast version

I expect your while loop doing more iterations because of this?

edited Jan 14 at 20:50

AdrianHHH

11.3k124067

answered Jan 14 at 20:42

Aldert

1,3681212

edited Jan 14 at 20:50

AdrianHHH

11.3k124067

edited Jan 14 at 20:50

AdrianHHH

11.3k124067

edited Jan 14 at 20:50

AdrianHHH

11.3k124067

answered Jan 14 at 20:42

Aldert

1,3681212

answered Jan 14 at 20:42

Aldert

1,3681212

answered Jan 14 at 20:42

Aldert

1,3681212

Good thought, i ran the experiment and both variants run 174531180 iterations so no difference there. The equations looks quite different but they are derived using the conservation of momentum and energy. The difference is that the slow one uses the result from the first calculations while the fast recalculates everything.

– Imapler
Jan 14 at 21:03

And if you do both computations and compare the two values, they are always within a very small fraction of each other. They appear to be equivalent computations.

– Eric Lippert
Jan 14 at 21:10

2

@EricLippert Math.Abs(vL1 - vL2) > 0.001f is never true

– Imapler
Jan 14 at 21:14

1

They are mathematically equal, you can prove that just by algebra operations.

– Antonín Lejsek
Jan 15 at 1:45

I double checked with equation solver fast and slow are not equal: vL = uL + (2 * ms * (us - uL)) / (ms + mL); //slow vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

– Aldert
Jan 17 at 19:44

add a comment |

Good thought, i ran the experiment and both variants run 174531180 iterations so no difference there. The equations looks quite different but they are derived using the conservation of momentum and energy. The difference is that the slow one uses the result from the first calculations while the fast recalculates everything.

– Imapler
Jan 14 at 21:03

And if you do both computations and compare the two values, they are always within a very small fraction of each other. They appear to be equivalent computations.

– Eric Lippert
Jan 14 at 21:10

2

@EricLippert Math.Abs(vL1 - vL2) > 0.001f is never true

– Imapler
Jan 14 at 21:14

1

They are mathematically equal, you can prove that just by algebra operations.

– Antonín Lejsek
Jan 15 at 1:45

I double checked with equation solver fast and slow are not equal: vL = uL + (2 * ms * (us - uL)) / (ms + mL); //slow vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

– Aldert
Jan 17 at 19:44

Good thought, i ran the experiment and both variants run 174531180 iterations so no difference there. The equations looks quite different but they are derived using the conservation of momentum and energy. The difference is that the slow one uses the result from the first calculations while the fast recalculates everything.

– Imapler
Jan 14 at 21:03

And if you do both computations and compare the two values, they are always within a very small fraction of each other. They appear to be equivalent computations.

– Eric Lippert
Jan 14 at 21:10

@EricLippert Math.Abs(vL1 - vL2) > 0.001f is never true

– Imapler
Jan 14 at 21:14

They are mathematically equal, you can prove that just by algebra operations.

– Antonín Lejsek
Jan 15 at 1:45

I double checked with equation solver fast and slow are not equal: vL = uL + (2 * ms * (us - uL)) / (ms + mL); //slow vL = (2 * ms * us - uL * (ms - mL)) / (ms + mL); //fast

– Aldert
Jan 17 at 19:44

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtyktl