Why is “2 * (i * i)” faster than “2 * i * i”?











up vote
15
down vote

favorite
2












The following Java program takes on average between 0.50s and 0.55s to run:



public static void main(String args) {
long startTime = System.nanoTime();
int n = 0;
for (int i = 0; i < 1000000000; i++) {
n += 2 * (i * i);
}
System.out.println((double) (System.nanoTime() - startTime) / 1000000000 + " s");
System.out.println("n = " + n);
}


If I replace 2 * (i * i) with 2 * i * i, it takes between 0.60 and 0.65s to run. How come?



Edit: I ran each version of the program 15 times, alternating between the two. Here are the results:



2 * (i * i): 0.5183738, 0.5298337, 0.5308647, 0.5133458, 0.5003011, 0.5366181, 0.515149, 0.5237389, 0.5249942, 0.5641624, 0.538412, 0.5466744, 0.531159, 0.5048032, 0.5232789



2 * i * i: 0.6246434, 0.6049722, 0.6603363, 0.6243328, 0.6541802, 0.6312638, 0.6241105, 0.627815, 0.6114252, 0.6781033, 0.6393969, 0.6608845, 0.6201077, 0.6511559, 0.6544526



The fastest run of 2 * i * i took longer than the slowest run of 2 * (i * i). If they were both as efficient, the probability of this happening would be less than 1/2^15 = 0.00305%.










share|improve this question




















  • 5




    How sure are you that your measurements are done correctly ? How many times did you run each version and how static was your test environment?
    – Joakim Danielson
    1 hour ago








  • 2




    I ran each version around 10 times, and the 2 * i * i version always takes slightly longer.
    – Stefan
    1 hour ago






  • 4




    look at the bytecode using a dissassmber.
    – OldProgrammer
    1 hour ago






  • 3




    Also please see: stackoverflow.com/questions/504103/…
    – lexicore
    1 hour ago








  • 3




    @nullpointer To find out for real why one is faster than the other, we'd have to get the disassembly or Ideal graphs for those methods. The assembler is very annoying to try and figure out, so I'm trying to get an OpenJDK debug build which can output nice graphs.
    – Jorn Vernee
    42 mins ago















up vote
15
down vote

favorite
2












The following Java program takes on average between 0.50s and 0.55s to run:



public static void main(String args) {
long startTime = System.nanoTime();
int n = 0;
for (int i = 0; i < 1000000000; i++) {
n += 2 * (i * i);
}
System.out.println((double) (System.nanoTime() - startTime) / 1000000000 + " s");
System.out.println("n = " + n);
}


If I replace 2 * (i * i) with 2 * i * i, it takes between 0.60 and 0.65s to run. How come?



Edit: I ran each version of the program 15 times, alternating between the two. Here are the results:



2 * (i * i): 0.5183738, 0.5298337, 0.5308647, 0.5133458, 0.5003011, 0.5366181, 0.515149, 0.5237389, 0.5249942, 0.5641624, 0.538412, 0.5466744, 0.531159, 0.5048032, 0.5232789



2 * i * i: 0.6246434, 0.6049722, 0.6603363, 0.6243328, 0.6541802, 0.6312638, 0.6241105, 0.627815, 0.6114252, 0.6781033, 0.6393969, 0.6608845, 0.6201077, 0.6511559, 0.6544526



The fastest run of 2 * i * i took longer than the slowest run of 2 * (i * i). If they were both as efficient, the probability of this happening would be less than 1/2^15 = 0.00305%.










share|improve this question




















  • 5




    How sure are you that your measurements are done correctly ? How many times did you run each version and how static was your test environment?
    – Joakim Danielson
    1 hour ago








  • 2




    I ran each version around 10 times, and the 2 * i * i version always takes slightly longer.
    – Stefan
    1 hour ago






  • 4




    look at the bytecode using a dissassmber.
    – OldProgrammer
    1 hour ago






  • 3




    Also please see: stackoverflow.com/questions/504103/…
    – lexicore
    1 hour ago








  • 3




    @nullpointer To find out for real why one is faster than the other, we'd have to get the disassembly or Ideal graphs for those methods. The assembler is very annoying to try and figure out, so I'm trying to get an OpenJDK debug build which can output nice graphs.
    – Jorn Vernee
    42 mins ago













up vote
15
down vote

favorite
2









up vote
15
down vote

favorite
2






2





The following Java program takes on average between 0.50s and 0.55s to run:



public static void main(String args) {
long startTime = System.nanoTime();
int n = 0;
for (int i = 0; i < 1000000000; i++) {
n += 2 * (i * i);
}
System.out.println((double) (System.nanoTime() - startTime) / 1000000000 + " s");
System.out.println("n = " + n);
}


If I replace 2 * (i * i) with 2 * i * i, it takes between 0.60 and 0.65s to run. How come?



Edit: I ran each version of the program 15 times, alternating between the two. Here are the results:



2 * (i * i): 0.5183738, 0.5298337, 0.5308647, 0.5133458, 0.5003011, 0.5366181, 0.515149, 0.5237389, 0.5249942, 0.5641624, 0.538412, 0.5466744, 0.531159, 0.5048032, 0.5232789



2 * i * i: 0.6246434, 0.6049722, 0.6603363, 0.6243328, 0.6541802, 0.6312638, 0.6241105, 0.627815, 0.6114252, 0.6781033, 0.6393969, 0.6608845, 0.6201077, 0.6511559, 0.6544526



The fastest run of 2 * i * i took longer than the slowest run of 2 * (i * i). If they were both as efficient, the probability of this happening would be less than 1/2^15 = 0.00305%.










share|improve this question















The following Java program takes on average between 0.50s and 0.55s to run:



public static void main(String args) {
long startTime = System.nanoTime();
int n = 0;
for (int i = 0; i < 1000000000; i++) {
n += 2 * (i * i);
}
System.out.println((double) (System.nanoTime() - startTime) / 1000000000 + " s");
System.out.println("n = " + n);
}


If I replace 2 * (i * i) with 2 * i * i, it takes between 0.60 and 0.65s to run. How come?



Edit: I ran each version of the program 15 times, alternating between the two. Here are the results:



2 * (i * i): 0.5183738, 0.5298337, 0.5308647, 0.5133458, 0.5003011, 0.5366181, 0.515149, 0.5237389, 0.5249942, 0.5641624, 0.538412, 0.5466744, 0.531159, 0.5048032, 0.5232789



2 * i * i: 0.6246434, 0.6049722, 0.6603363, 0.6243328, 0.6541802, 0.6312638, 0.6241105, 0.627815, 0.6114252, 0.6781033, 0.6393969, 0.6608845, 0.6201077, 0.6511559, 0.6544526



The fastest run of 2 * i * i took longer than the slowest run of 2 * (i * i). If they were both as efficient, the probability of this happening would be less than 1/2^15 = 0.00305%.







java optimization






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 1 hour ago

























asked 1 hour ago









Stefan

1175




1175








  • 5




    How sure are you that your measurements are done correctly ? How many times did you run each version and how static was your test environment?
    – Joakim Danielson
    1 hour ago








  • 2




    I ran each version around 10 times, and the 2 * i * i version always takes slightly longer.
    – Stefan
    1 hour ago






  • 4




    look at the bytecode using a dissassmber.
    – OldProgrammer
    1 hour ago






  • 3




    Also please see: stackoverflow.com/questions/504103/…
    – lexicore
    1 hour ago








  • 3




    @nullpointer To find out for real why one is faster than the other, we'd have to get the disassembly or Ideal graphs for those methods. The assembler is very annoying to try and figure out, so I'm trying to get an OpenJDK debug build which can output nice graphs.
    – Jorn Vernee
    42 mins ago














  • 5




    How sure are you that your measurements are done correctly ? How many times did you run each version and how static was your test environment?
    – Joakim Danielson
    1 hour ago








  • 2




    I ran each version around 10 times, and the 2 * i * i version always takes slightly longer.
    – Stefan
    1 hour ago






  • 4




    look at the bytecode using a dissassmber.
    – OldProgrammer
    1 hour ago






  • 3




    Also please see: stackoverflow.com/questions/504103/…
    – lexicore
    1 hour ago








  • 3




    @nullpointer To find out for real why one is faster than the other, we'd have to get the disassembly or Ideal graphs for those methods. The assembler is very annoying to try and figure out, so I'm trying to get an OpenJDK debug build which can output nice graphs.
    – Jorn Vernee
    42 mins ago








5




5




How sure are you that your measurements are done correctly ? How many times did you run each version and how static was your test environment?
– Joakim Danielson
1 hour ago






How sure are you that your measurements are done correctly ? How many times did you run each version and how static was your test environment?
– Joakim Danielson
1 hour ago






2




2




I ran each version around 10 times, and the 2 * i * i version always takes slightly longer.
– Stefan
1 hour ago




I ran each version around 10 times, and the 2 * i * i version always takes slightly longer.
– Stefan
1 hour ago




4




4




look at the bytecode using a dissassmber.
– OldProgrammer
1 hour ago




look at the bytecode using a dissassmber.
– OldProgrammer
1 hour ago




3




3




Also please see: stackoverflow.com/questions/504103/…
– lexicore
1 hour ago






Also please see: stackoverflow.com/questions/504103/…
– lexicore
1 hour ago






3




3




@nullpointer To find out for real why one is faster than the other, we'd have to get the disassembly or Ideal graphs for those methods. The assembler is very annoying to try and figure out, so I'm trying to get an OpenJDK debug build which can output nice graphs.
– Jorn Vernee
42 mins ago




@nullpointer To find out for real why one is faster than the other, we'd have to get the disassembly or Ideal graphs for those methods. The assembler is very annoying to try and figure out, so I'm trying to get an OpenJDK debug build which can output nice graphs.
– Jorn Vernee
42 mins ago












5 Answers
5






active

oldest

votes

















up vote
4
down vote













When the multiplication is 2 * (i * i), the JVM is able to factor out the multiplication by 2 from the loop, resulting in this equivalent but more efficient code:



int n = 0;
for (int i = 0; i < 1000000000; i++) {
n += i * i;
}
n *= 2;


but when the multiplication is (2 * i) * i, the JVM doesn't optimize it since the multiplication by a constant is no longer right before the addition.



Here are a few reasons why I think this is the case:




  • Adding an if (n == 0) n = 1 statement at the start of the loop results in both versions being as efficient, since factoring out the multiplication no longer guarantees that the result will be the same

  • The optimized version (by factoring out the multiplication by 2) is exactly as fast as the 2 * (i * i) version


Here is the test code that I used to draw these conclusions:



public static void main(String args) {
long fastVersion = 0;
long slowVersion = 0;
long optimizedVersion = 0;
long modifiedFastVersion = 0;
long modifiedSlowVersion = 0;

for (int i = 0; i < 10; i++) {
fastVersion += fastVersion();
slowVersion += slowVersion();
optimizedVersion += optimizedVersion();
modifiedFastVersion += modifiedFastVersion();
modifiedSlowVersion += modifiedSlowVersion();
}

System.out.println("Fast version: " + (double) fastVersion / 1000000000 + " s");
System.out.println("Slow version: " + (double) slowVersion / 1000000000 + " s");
System.out.println("Optimized version: " + (double) optimizedVersion / 1000000000 + " s");
System.out.println("Modified fast version: " + (double) modifiedFastVersion / 1000000000 + " s");
System.out.println("Modified slow version: " + (double) modifiedSlowVersion / 1000000000 + " s");
}

private static long fastVersion() {
long startTime = System.nanoTime();
int n = 0;
for (int i = 0; i < 1000000000; i++) {
n += 2 * (i * i);
}
return System.nanoTime() - startTime;
}

private static long slowVersion() {
long startTime = System.nanoTime();
int n = 0;
for (int i = 0; i < 1000000000; i++) {
n += 2 * i * i;
}
return System.nanoTime() - startTime;
}

private static long optimizedVersion() {
long startTime = System.nanoTime();
int n = 0;
for (int i = 0; i < 1000000000; i++) {
n += i * i;
}
n *= 2;
return System.nanoTime() - startTime;
}

private static long modifiedFastVersion() {
long startTime = System.nanoTime();
int n = 0;
for (int i = 0; i < 1000000000; i++) {
if (n == 0) n = 1;
n += 2 * (i * i);
}
return System.nanoTime() - startTime;
}

private static long modifiedSlowVersion() {
long startTime = System.nanoTime();
int n = 0;
for (int i = 0; i < 1000000000; i++) {
if (n == 0) n = 1;
n += 2 * i * i;
}
return System.nanoTime() - startTime;
}


And here are the results:



Fast version: 5.7274411 s
Slow version: 7.6190804 s
Optimized version: 5.1348007 s
Modified fast version: 7.1492705 s
Modified slow version: 7.2952668 s





share|improve this answer




























    up vote
    1
    down vote













    ByteCodes: https://cs.nyu.edu/courses/fall00/V22.0201-001/jvm2.html

    ByteCodes Viewer: https://github.com/Konloch/bytecode-viewer



    On my JDK (Win10 64 1.8.0_65-b17) I can reproduce and explain:



    public static void main(String args) {
    int repeat = 10;
    long A = 0;
    long B = 0;
    for (int i = 0; i < repeat; i++) {
    A += test();
    B += testB();
    }

    System.out.println(A / repeat + " ms");
    System.out.println(B / repeat + " ms");
    }


    private static long test() {
    int n = 0;
    for (int i = 0; i < 1000; i++) {
    n += multi(i);
    }
    long startTime = System.currentTimeMillis();
    for (int i = 0; i < 1000000000; i++) {
    n += multi(i);
    }
    long ms = (System.currentTimeMillis() - startTime);
    System.out.println(ms + " ms A " + n);
    return ms;
    }


    private static long testB() {
    int n = 0;
    for (int i = 0; i < 1000; i++) {
    n += multiB(i);
    }
    long startTime = System.currentTimeMillis();
    for (int i = 0; i < 1000000000; i++) {
    n += multiB(i);
    }
    long ms = (System.currentTimeMillis() - startTime);
    System.out.println(ms + " ms B " + n);
    return ms;
    }

    private static int multiB(int i) {
    return 2 * (i * i);
    }

    private static int multi(int i) {
    return 2 * i * i;
    }


    Output:



    ...
    405 ms A 785527736
    327 ms B 785527736
    404 ms A 785527736
    329 ms B 785527736
    404 ms A 785527736
    328 ms B 785527736
    404 ms A 785527736
    328 ms B 785527736
    410 ms
    333 ms


    So why?
    The Byte code is this:



     private static multiB(int arg0) { // 2 * (i * i)
    <localVar:index=0 , name=i , desc=I, sig=null, start=L1, end=L2>

    L1 {
    iconst_2
    iload0
    iload0
    imul
    imul
    ireturn
    }
    L2 {
    }
    }

    private static multi(int arg0) { // 2 * i * i
    <localVar:index=0 , name=i , desc=I, sig=null, start=L1, end=L2>

    L1 {
    iconst_2
    iload0
    imul
    iload0
    imul
    ireturn
    }
    L2 {
    }
    }


    The difference being:

    With brackets (2 * (i * i)):




    • push const stack

    • push local on stack

    • push local on stack

    • multiply top of stack

    • multiply top of stack


    Without brackets (2 * i * i):




    • push const stack

    • push local on stack

    • multiply top of stack

    • push local on stack

    • multiply top of stack


    Loading all on stack and then working back down is faster than switching between putting on stack and operating on it.






    share|improve this answer






























      up vote
      1
      down vote













      I tried a JMH using the default archetype: I also added optimized version based Runemoro' explanation .



      @State(Scope.Benchmark)
      @Warmup(iterations = 2)
      @Fork(1)
      @Measurement(iterations = 10)
      @OutputTimeUnit(TimeUnit.NANOSECONDS)
      //@BenchmarkMode({ Mode.All })
      @BenchmarkMode(Mode.AverageTime)
      public class MyBenchmark {
      @Param({ "100", "1000", "1000000000" })
      private int size;

      @Benchmark
      public int two_square_i() {
      int n = 0;
      for (int i = 0; i < size; i++) {
      n += 2 * (i * i);
      }
      return n;
      }

      @Benchmark
      public int square_i_two() {
      int n = 0;
      for (int i = 0; i < size; i++) {
      n += i * i;
      }
      return 2*n;
      }

      @Benchmark
      public int two_i_() {
      int n = 0;
      for (int i = 0; i < size; i++) {
      n += 2 * i * i;
      }
      return n;
      }
      }


      The result are here:



      Benchmark                           (size)  Mode  Samples          Score   Score error  Units
      o.s.MyBenchmark.square_i_two 100 avgt 10 58,062 1,410 ns/op
      o.s.MyBenchmark.square_i_two 1000 avgt 10 547,393 12,851 ns/op
      o.s.MyBenchmark.square_i_two 1000000000 avgt 10 540343681,267 16795210,324 ns/op
      o.s.MyBenchmark.two_i_ 100 avgt 10 87,491 2,004 ns/op
      o.s.MyBenchmark.two_i_ 1000 avgt 10 1015,388 30,313 ns/op
      o.s.MyBenchmark.two_i_ 1000000000 avgt 10 967100076,600 24929570,556 ns/op
      o.s.MyBenchmark.two_square_i 100 avgt 10 70,715 2,107 ns/op
      o.s.MyBenchmark.two_square_i 1000 avgt 10 686,977 24,613 ns/op
      o.s.MyBenchmark.two_square_i 1000000000 avgt 10 652736811,450 27015580,488 ns/op


      On my PC (Core i7 860, doing nothing much apart reading on my smartphone):





      • n += i*i then n*2 is first


      • 2 * (i * i) is second.


      The JVM is clearly not optimizing the same way than a human does (based on Runemoro answer).



      Now then, reading bytecode: javap -c -v ./target/classes/org/sample/MyBenchmark.class




      • Differences between 2*(i*i) (left) and 2*i*i (right) here: https://www.diffchecker.com/cvSFppWI

      • Differences between 2*(i*i) and the optimized version here: https://www.diffchecker.com/I1XFu5dP


      I am not expert on bytecode but we iload_2 before we imul: that's probably where you get the difference: I can suppose that the JVM optimize reading i twice (i is already here, there is no need to load it again) whilst in the 2*i*i it can't.





      share




























        up vote
        -1
        down vote













        I can't reproduce your claim. This java fiddle mostly gets the "opposite" result:



        enter image description here






        share|improve this answer



















        • 1




          Note - I wouldn't trust running a profile through an online service that is sharing CPU time with all the other requests going through it - so much more margin for error this way.
          – Krease
          1 hour ago






        • 3




          If you can't reproduce the results, comment and vote as such. This is not an answer.
          – Sotirios Delimanolis
          1 hour ago






        • 1




          Not completely resistant, but definitely much more resistant, since my own machine isn't hosting a webapp allowing people to run arbitrary programs on it
          – Krease
          1 hour ago






        • 1




          switch them in the main and your result should switch... this is not an answer
          – DSchmidt
          50 mins ago






        • 2




          OK, it's not an answer - can't post an image in a comment, and that fiddle can't save a url to put in a comment.. I think this question isn't a question, as much as this answer isn't an answer
          – Caius Jard
          49 mins ago


















        up vote
        -2
        down vote













        I got similar results:



        2 * (i * i): 0.458765943 s, n=119860736
        2 * i * i: 0.580255126 s, n=119860736


        I got the SAME results if both loops were in the same program, or each was in a separate .java file/.class, executed on a separate run.



        Finally, here is a javap -c -v <.java> decompile of each:



             3: ldc           #3                  // String 2 * (i * i):
        5: invokevirtual #4 // Method java/io/PrintStream.print:(Ljava/lang/String;)V
        8: invokestatic #5 // Method java/lang/System.nanoTime:()J
        8: invokestatic #5 // Method java/lang/System.nanoTime:()J
        11: lstore_1
        12: iconst_0
        13: istore_3
        14: iconst_0
        15: istore 4
        17: iload 4
        19: ldc #6 // int 1000000000
        21: if_icmpge 40
        24: iload_3
        25: iconst_2
        26: iload 4
        28: iload 4
        30: imul
        31: imul
        32: iadd
        33: istore_3
        34: iinc 4, 1
        37: goto 17


        vs.



             3: ldc           #3                  // String 2 * i * i:
        5: invokevirtual #4 // Method java/io/PrintStream.print:(Ljava/lang/String;)V
        8: invokestatic #5 // Method java/lang/System.nanoTime:()J
        11: lstore_1
        12: iconst_0
        13: istore_3
        14: iconst_0
        15: istore 4
        17: iload 4
        19: ldc #6 // int 1000000000
        21: if_icmpge 40
        24: iload_3
        25: iconst_2
        26: iload 4
        28: imul
        29: iload 4
        31: imul
        32: iadd
        33: istore_3
        34: iinc 4, 1
        37: goto 17


        FYI -



        java -version
        java version "1.8.0_121"
        Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
        Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)





        share|improve this answer

















        • 1




          A better answer and maybe you can vote to undelete - stackoverflow.com/a/53452836/1746118 ... Side note - I am not the downvoter anyway.
          – nullpointer
          1 hour ago












        • @nullpointer - I agree. I'd definitely vote to undelete, if I could. I'd also like to "double upvote" stefan for giving a quantitative definition of "significant"
          – paulsm4
          57 mins ago












        • That one was self-deleted since it measured the wrong thing - see that author's comment on the question above
          – Krease
          55 mins ago










        • With JIT, the bytecode doesn't matter much. Show the JIT code.
          – rustyx
          13 mins ago











        Your Answer






        StackExchange.ifUsing("editor", function () {
        StackExchange.using("externalEditor", function () {
        StackExchange.using("snippets", function () {
        StackExchange.snippets.init();
        });
        });
        }, "code-snippets");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "1"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














         

        draft saved


        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53452713%2fwhy-is-2-i-i-faster-than-2-i-i%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        5 Answers
        5






        active

        oldest

        votes








        5 Answers
        5






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes








        up vote
        4
        down vote













        When the multiplication is 2 * (i * i), the JVM is able to factor out the multiplication by 2 from the loop, resulting in this equivalent but more efficient code:



        int n = 0;
        for (int i = 0; i < 1000000000; i++) {
        n += i * i;
        }
        n *= 2;


        but when the multiplication is (2 * i) * i, the JVM doesn't optimize it since the multiplication by a constant is no longer right before the addition.



        Here are a few reasons why I think this is the case:




        • Adding an if (n == 0) n = 1 statement at the start of the loop results in both versions being as efficient, since factoring out the multiplication no longer guarantees that the result will be the same

        • The optimized version (by factoring out the multiplication by 2) is exactly as fast as the 2 * (i * i) version


        Here is the test code that I used to draw these conclusions:



        public static void main(String args) {
        long fastVersion = 0;
        long slowVersion = 0;
        long optimizedVersion = 0;
        long modifiedFastVersion = 0;
        long modifiedSlowVersion = 0;

        for (int i = 0; i < 10; i++) {
        fastVersion += fastVersion();
        slowVersion += slowVersion();
        optimizedVersion += optimizedVersion();
        modifiedFastVersion += modifiedFastVersion();
        modifiedSlowVersion += modifiedSlowVersion();
        }

        System.out.println("Fast version: " + (double) fastVersion / 1000000000 + " s");
        System.out.println("Slow version: " + (double) slowVersion / 1000000000 + " s");
        System.out.println("Optimized version: " + (double) optimizedVersion / 1000000000 + " s");
        System.out.println("Modified fast version: " + (double) modifiedFastVersion / 1000000000 + " s");
        System.out.println("Modified slow version: " + (double) modifiedSlowVersion / 1000000000 + " s");
        }

        private static long fastVersion() {
        long startTime = System.nanoTime();
        int n = 0;
        for (int i = 0; i < 1000000000; i++) {
        n += 2 * (i * i);
        }
        return System.nanoTime() - startTime;
        }

        private static long slowVersion() {
        long startTime = System.nanoTime();
        int n = 0;
        for (int i = 0; i < 1000000000; i++) {
        n += 2 * i * i;
        }
        return System.nanoTime() - startTime;
        }

        private static long optimizedVersion() {
        long startTime = System.nanoTime();
        int n = 0;
        for (int i = 0; i < 1000000000; i++) {
        n += i * i;
        }
        n *= 2;
        return System.nanoTime() - startTime;
        }

        private static long modifiedFastVersion() {
        long startTime = System.nanoTime();
        int n = 0;
        for (int i = 0; i < 1000000000; i++) {
        if (n == 0) n = 1;
        n += 2 * (i * i);
        }
        return System.nanoTime() - startTime;
        }

        private static long modifiedSlowVersion() {
        long startTime = System.nanoTime();
        int n = 0;
        for (int i = 0; i < 1000000000; i++) {
        if (n == 0) n = 1;
        n += 2 * i * i;
        }
        return System.nanoTime() - startTime;
        }


        And here are the results:



        Fast version: 5.7274411 s
        Slow version: 7.6190804 s
        Optimized version: 5.1348007 s
        Modified fast version: 7.1492705 s
        Modified slow version: 7.2952668 s





        share|improve this answer

























          up vote
          4
          down vote













          When the multiplication is 2 * (i * i), the JVM is able to factor out the multiplication by 2 from the loop, resulting in this equivalent but more efficient code:



          int n = 0;
          for (int i = 0; i < 1000000000; i++) {
          n += i * i;
          }
          n *= 2;


          but when the multiplication is (2 * i) * i, the JVM doesn't optimize it since the multiplication by a constant is no longer right before the addition.



          Here are a few reasons why I think this is the case:




          • Adding an if (n == 0) n = 1 statement at the start of the loop results in both versions being as efficient, since factoring out the multiplication no longer guarantees that the result will be the same

          • The optimized version (by factoring out the multiplication by 2) is exactly as fast as the 2 * (i * i) version


          Here is the test code that I used to draw these conclusions:



          public static void main(String args) {
          long fastVersion = 0;
          long slowVersion = 0;
          long optimizedVersion = 0;
          long modifiedFastVersion = 0;
          long modifiedSlowVersion = 0;

          for (int i = 0; i < 10; i++) {
          fastVersion += fastVersion();
          slowVersion += slowVersion();
          optimizedVersion += optimizedVersion();
          modifiedFastVersion += modifiedFastVersion();
          modifiedSlowVersion += modifiedSlowVersion();
          }

          System.out.println("Fast version: " + (double) fastVersion / 1000000000 + " s");
          System.out.println("Slow version: " + (double) slowVersion / 1000000000 + " s");
          System.out.println("Optimized version: " + (double) optimizedVersion / 1000000000 + " s");
          System.out.println("Modified fast version: " + (double) modifiedFastVersion / 1000000000 + " s");
          System.out.println("Modified slow version: " + (double) modifiedSlowVersion / 1000000000 + " s");
          }

          private static long fastVersion() {
          long startTime = System.nanoTime();
          int n = 0;
          for (int i = 0; i < 1000000000; i++) {
          n += 2 * (i * i);
          }
          return System.nanoTime() - startTime;
          }

          private static long slowVersion() {
          long startTime = System.nanoTime();
          int n = 0;
          for (int i = 0; i < 1000000000; i++) {
          n += 2 * i * i;
          }
          return System.nanoTime() - startTime;
          }

          private static long optimizedVersion() {
          long startTime = System.nanoTime();
          int n = 0;
          for (int i = 0; i < 1000000000; i++) {
          n += i * i;
          }
          n *= 2;
          return System.nanoTime() - startTime;
          }

          private static long modifiedFastVersion() {
          long startTime = System.nanoTime();
          int n = 0;
          for (int i = 0; i < 1000000000; i++) {
          if (n == 0) n = 1;
          n += 2 * (i * i);
          }
          return System.nanoTime() - startTime;
          }

          private static long modifiedSlowVersion() {
          long startTime = System.nanoTime();
          int n = 0;
          for (int i = 0; i < 1000000000; i++) {
          if (n == 0) n = 1;
          n += 2 * i * i;
          }
          return System.nanoTime() - startTime;
          }


          And here are the results:



          Fast version: 5.7274411 s
          Slow version: 7.6190804 s
          Optimized version: 5.1348007 s
          Modified fast version: 7.1492705 s
          Modified slow version: 7.2952668 s





          share|improve this answer























            up vote
            4
            down vote










            up vote
            4
            down vote









            When the multiplication is 2 * (i * i), the JVM is able to factor out the multiplication by 2 from the loop, resulting in this equivalent but more efficient code:



            int n = 0;
            for (int i = 0; i < 1000000000; i++) {
            n += i * i;
            }
            n *= 2;


            but when the multiplication is (2 * i) * i, the JVM doesn't optimize it since the multiplication by a constant is no longer right before the addition.



            Here are a few reasons why I think this is the case:




            • Adding an if (n == 0) n = 1 statement at the start of the loop results in both versions being as efficient, since factoring out the multiplication no longer guarantees that the result will be the same

            • The optimized version (by factoring out the multiplication by 2) is exactly as fast as the 2 * (i * i) version


            Here is the test code that I used to draw these conclusions:



            public static void main(String args) {
            long fastVersion = 0;
            long slowVersion = 0;
            long optimizedVersion = 0;
            long modifiedFastVersion = 0;
            long modifiedSlowVersion = 0;

            for (int i = 0; i < 10; i++) {
            fastVersion += fastVersion();
            slowVersion += slowVersion();
            optimizedVersion += optimizedVersion();
            modifiedFastVersion += modifiedFastVersion();
            modifiedSlowVersion += modifiedSlowVersion();
            }

            System.out.println("Fast version: " + (double) fastVersion / 1000000000 + " s");
            System.out.println("Slow version: " + (double) slowVersion / 1000000000 + " s");
            System.out.println("Optimized version: " + (double) optimizedVersion / 1000000000 + " s");
            System.out.println("Modified fast version: " + (double) modifiedFastVersion / 1000000000 + " s");
            System.out.println("Modified slow version: " + (double) modifiedSlowVersion / 1000000000 + " s");
            }

            private static long fastVersion() {
            long startTime = System.nanoTime();
            int n = 0;
            for (int i = 0; i < 1000000000; i++) {
            n += 2 * (i * i);
            }
            return System.nanoTime() - startTime;
            }

            private static long slowVersion() {
            long startTime = System.nanoTime();
            int n = 0;
            for (int i = 0; i < 1000000000; i++) {
            n += 2 * i * i;
            }
            return System.nanoTime() - startTime;
            }

            private static long optimizedVersion() {
            long startTime = System.nanoTime();
            int n = 0;
            for (int i = 0; i < 1000000000; i++) {
            n += i * i;
            }
            n *= 2;
            return System.nanoTime() - startTime;
            }

            private static long modifiedFastVersion() {
            long startTime = System.nanoTime();
            int n = 0;
            for (int i = 0; i < 1000000000; i++) {
            if (n == 0) n = 1;
            n += 2 * (i * i);
            }
            return System.nanoTime() - startTime;
            }

            private static long modifiedSlowVersion() {
            long startTime = System.nanoTime();
            int n = 0;
            for (int i = 0; i < 1000000000; i++) {
            if (n == 0) n = 1;
            n += 2 * i * i;
            }
            return System.nanoTime() - startTime;
            }


            And here are the results:



            Fast version: 5.7274411 s
            Slow version: 7.6190804 s
            Optimized version: 5.1348007 s
            Modified fast version: 7.1492705 s
            Modified slow version: 7.2952668 s





            share|improve this answer












            When the multiplication is 2 * (i * i), the JVM is able to factor out the multiplication by 2 from the loop, resulting in this equivalent but more efficient code:



            int n = 0;
            for (int i = 0; i < 1000000000; i++) {
            n += i * i;
            }
            n *= 2;


            but when the multiplication is (2 * i) * i, the JVM doesn't optimize it since the multiplication by a constant is no longer right before the addition.



            Here are a few reasons why I think this is the case:




            • Adding an if (n == 0) n = 1 statement at the start of the loop results in both versions being as efficient, since factoring out the multiplication no longer guarantees that the result will be the same

            • The optimized version (by factoring out the multiplication by 2) is exactly as fast as the 2 * (i * i) version


            Here is the test code that I used to draw these conclusions:



            public static void main(String args) {
            long fastVersion = 0;
            long slowVersion = 0;
            long optimizedVersion = 0;
            long modifiedFastVersion = 0;
            long modifiedSlowVersion = 0;

            for (int i = 0; i < 10; i++) {
            fastVersion += fastVersion();
            slowVersion += slowVersion();
            optimizedVersion += optimizedVersion();
            modifiedFastVersion += modifiedFastVersion();
            modifiedSlowVersion += modifiedSlowVersion();
            }

            System.out.println("Fast version: " + (double) fastVersion / 1000000000 + " s");
            System.out.println("Slow version: " + (double) slowVersion / 1000000000 + " s");
            System.out.println("Optimized version: " + (double) optimizedVersion / 1000000000 + " s");
            System.out.println("Modified fast version: " + (double) modifiedFastVersion / 1000000000 + " s");
            System.out.println("Modified slow version: " + (double) modifiedSlowVersion / 1000000000 + " s");
            }

            private static long fastVersion() {
            long startTime = System.nanoTime();
            int n = 0;
            for (int i = 0; i < 1000000000; i++) {
            n += 2 * (i * i);
            }
            return System.nanoTime() - startTime;
            }

            private static long slowVersion() {
            long startTime = System.nanoTime();
            int n = 0;
            for (int i = 0; i < 1000000000; i++) {
            n += 2 * i * i;
            }
            return System.nanoTime() - startTime;
            }

            private static long optimizedVersion() {
            long startTime = System.nanoTime();
            int n = 0;
            for (int i = 0; i < 1000000000; i++) {
            n += i * i;
            }
            n *= 2;
            return System.nanoTime() - startTime;
            }

            private static long modifiedFastVersion() {
            long startTime = System.nanoTime();
            int n = 0;
            for (int i = 0; i < 1000000000; i++) {
            if (n == 0) n = 1;
            n += 2 * (i * i);
            }
            return System.nanoTime() - startTime;
            }

            private static long modifiedSlowVersion() {
            long startTime = System.nanoTime();
            int n = 0;
            for (int i = 0; i < 1000000000; i++) {
            if (n == 0) n = 1;
            n += 2 * i * i;
            }
            return System.nanoTime() - startTime;
            }


            And here are the results:



            Fast version: 5.7274411 s
            Slow version: 7.6190804 s
            Optimized version: 5.1348007 s
            Modified fast version: 7.1492705 s
            Modified slow version: 7.2952668 s






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 27 mins ago









            Runemoro

            1,35011135




            1,35011135
























                up vote
                1
                down vote













                ByteCodes: https://cs.nyu.edu/courses/fall00/V22.0201-001/jvm2.html

                ByteCodes Viewer: https://github.com/Konloch/bytecode-viewer



                On my JDK (Win10 64 1.8.0_65-b17) I can reproduce and explain:



                public static void main(String args) {
                int repeat = 10;
                long A = 0;
                long B = 0;
                for (int i = 0; i < repeat; i++) {
                A += test();
                B += testB();
                }

                System.out.println(A / repeat + " ms");
                System.out.println(B / repeat + " ms");
                }


                private static long test() {
                int n = 0;
                for (int i = 0; i < 1000; i++) {
                n += multi(i);
                }
                long startTime = System.currentTimeMillis();
                for (int i = 0; i < 1000000000; i++) {
                n += multi(i);
                }
                long ms = (System.currentTimeMillis() - startTime);
                System.out.println(ms + " ms A " + n);
                return ms;
                }


                private static long testB() {
                int n = 0;
                for (int i = 0; i < 1000; i++) {
                n += multiB(i);
                }
                long startTime = System.currentTimeMillis();
                for (int i = 0; i < 1000000000; i++) {
                n += multiB(i);
                }
                long ms = (System.currentTimeMillis() - startTime);
                System.out.println(ms + " ms B " + n);
                return ms;
                }

                private static int multiB(int i) {
                return 2 * (i * i);
                }

                private static int multi(int i) {
                return 2 * i * i;
                }


                Output:



                ...
                405 ms A 785527736
                327 ms B 785527736
                404 ms A 785527736
                329 ms B 785527736
                404 ms A 785527736
                328 ms B 785527736
                404 ms A 785527736
                328 ms B 785527736
                410 ms
                333 ms


                So why?
                The Byte code is this:



                 private static multiB(int arg0) { // 2 * (i * i)
                <localVar:index=0 , name=i , desc=I, sig=null, start=L1, end=L2>

                L1 {
                iconst_2
                iload0
                iload0
                imul
                imul
                ireturn
                }
                L2 {
                }
                }

                private static multi(int arg0) { // 2 * i * i
                <localVar:index=0 , name=i , desc=I, sig=null, start=L1, end=L2>

                L1 {
                iconst_2
                iload0
                imul
                iload0
                imul
                ireturn
                }
                L2 {
                }
                }


                The difference being:

                With brackets (2 * (i * i)):




                • push const stack

                • push local on stack

                • push local on stack

                • multiply top of stack

                • multiply top of stack


                Without brackets (2 * i * i):




                • push const stack

                • push local on stack

                • multiply top of stack

                • push local on stack

                • multiply top of stack


                Loading all on stack and then working back down is faster than switching between putting on stack and operating on it.






                share|improve this answer



























                  up vote
                  1
                  down vote













                  ByteCodes: https://cs.nyu.edu/courses/fall00/V22.0201-001/jvm2.html

                  ByteCodes Viewer: https://github.com/Konloch/bytecode-viewer



                  On my JDK (Win10 64 1.8.0_65-b17) I can reproduce and explain:



                  public static void main(String args) {
                  int repeat = 10;
                  long A = 0;
                  long B = 0;
                  for (int i = 0; i < repeat; i++) {
                  A += test();
                  B += testB();
                  }

                  System.out.println(A / repeat + " ms");
                  System.out.println(B / repeat + " ms");
                  }


                  private static long test() {
                  int n = 0;
                  for (int i = 0; i < 1000; i++) {
                  n += multi(i);
                  }
                  long startTime = System.currentTimeMillis();
                  for (int i = 0; i < 1000000000; i++) {
                  n += multi(i);
                  }
                  long ms = (System.currentTimeMillis() - startTime);
                  System.out.println(ms + " ms A " + n);
                  return ms;
                  }


                  private static long testB() {
                  int n = 0;
                  for (int i = 0; i < 1000; i++) {
                  n += multiB(i);
                  }
                  long startTime = System.currentTimeMillis();
                  for (int i = 0; i < 1000000000; i++) {
                  n += multiB(i);
                  }
                  long ms = (System.currentTimeMillis() - startTime);
                  System.out.println(ms + " ms B " + n);
                  return ms;
                  }

                  private static int multiB(int i) {
                  return 2 * (i * i);
                  }

                  private static int multi(int i) {
                  return 2 * i * i;
                  }


                  Output:



                  ...
                  405 ms A 785527736
                  327 ms B 785527736
                  404 ms A 785527736
                  329 ms B 785527736
                  404 ms A 785527736
                  328 ms B 785527736
                  404 ms A 785527736
                  328 ms B 785527736
                  410 ms
                  333 ms


                  So why?
                  The Byte code is this:



                   private static multiB(int arg0) { // 2 * (i * i)
                  <localVar:index=0 , name=i , desc=I, sig=null, start=L1, end=L2>

                  L1 {
                  iconst_2
                  iload0
                  iload0
                  imul
                  imul
                  ireturn
                  }
                  L2 {
                  }
                  }

                  private static multi(int arg0) { // 2 * i * i
                  <localVar:index=0 , name=i , desc=I, sig=null, start=L1, end=L2>

                  L1 {
                  iconst_2
                  iload0
                  imul
                  iload0
                  imul
                  ireturn
                  }
                  L2 {
                  }
                  }


                  The difference being:

                  With brackets (2 * (i * i)):




                  • push const stack

                  • push local on stack

                  • push local on stack

                  • multiply top of stack

                  • multiply top of stack


                  Without brackets (2 * i * i):




                  • push const stack

                  • push local on stack

                  • multiply top of stack

                  • push local on stack

                  • multiply top of stack


                  Loading all on stack and then working back down is faster than switching between putting on stack and operating on it.






                  share|improve this answer

























                    up vote
                    1
                    down vote










                    up vote
                    1
                    down vote









                    ByteCodes: https://cs.nyu.edu/courses/fall00/V22.0201-001/jvm2.html

                    ByteCodes Viewer: https://github.com/Konloch/bytecode-viewer



                    On my JDK (Win10 64 1.8.0_65-b17) I can reproduce and explain:



                    public static void main(String args) {
                    int repeat = 10;
                    long A = 0;
                    long B = 0;
                    for (int i = 0; i < repeat; i++) {
                    A += test();
                    B += testB();
                    }

                    System.out.println(A / repeat + " ms");
                    System.out.println(B / repeat + " ms");
                    }


                    private static long test() {
                    int n = 0;
                    for (int i = 0; i < 1000; i++) {
                    n += multi(i);
                    }
                    long startTime = System.currentTimeMillis();
                    for (int i = 0; i < 1000000000; i++) {
                    n += multi(i);
                    }
                    long ms = (System.currentTimeMillis() - startTime);
                    System.out.println(ms + " ms A " + n);
                    return ms;
                    }


                    private static long testB() {
                    int n = 0;
                    for (int i = 0; i < 1000; i++) {
                    n += multiB(i);
                    }
                    long startTime = System.currentTimeMillis();
                    for (int i = 0; i < 1000000000; i++) {
                    n += multiB(i);
                    }
                    long ms = (System.currentTimeMillis() - startTime);
                    System.out.println(ms + " ms B " + n);
                    return ms;
                    }

                    private static int multiB(int i) {
                    return 2 * (i * i);
                    }

                    private static int multi(int i) {
                    return 2 * i * i;
                    }


                    Output:



                    ...
                    405 ms A 785527736
                    327 ms B 785527736
                    404 ms A 785527736
                    329 ms B 785527736
                    404 ms A 785527736
                    328 ms B 785527736
                    404 ms A 785527736
                    328 ms B 785527736
                    410 ms
                    333 ms


                    So why?
                    The Byte code is this:



                     private static multiB(int arg0) { // 2 * (i * i)
                    <localVar:index=0 , name=i , desc=I, sig=null, start=L1, end=L2>

                    L1 {
                    iconst_2
                    iload0
                    iload0
                    imul
                    imul
                    ireturn
                    }
                    L2 {
                    }
                    }

                    private static multi(int arg0) { // 2 * i * i
                    <localVar:index=0 , name=i , desc=I, sig=null, start=L1, end=L2>

                    L1 {
                    iconst_2
                    iload0
                    imul
                    iload0
                    imul
                    ireturn
                    }
                    L2 {
                    }
                    }


                    The difference being:

                    With brackets (2 * (i * i)):




                    • push const stack

                    • push local on stack

                    • push local on stack

                    • multiply top of stack

                    • multiply top of stack


                    Without brackets (2 * i * i):




                    • push const stack

                    • push local on stack

                    • multiply top of stack

                    • push local on stack

                    • multiply top of stack


                    Loading all on stack and then working back down is faster than switching between putting on stack and operating on it.






                    share|improve this answer














                    ByteCodes: https://cs.nyu.edu/courses/fall00/V22.0201-001/jvm2.html

                    ByteCodes Viewer: https://github.com/Konloch/bytecode-viewer



                    On my JDK (Win10 64 1.8.0_65-b17) I can reproduce and explain:



                    public static void main(String args) {
                    int repeat = 10;
                    long A = 0;
                    long B = 0;
                    for (int i = 0; i < repeat; i++) {
                    A += test();
                    B += testB();
                    }

                    System.out.println(A / repeat + " ms");
                    System.out.println(B / repeat + " ms");
                    }


                    private static long test() {
                    int n = 0;
                    for (int i = 0; i < 1000; i++) {
                    n += multi(i);
                    }
                    long startTime = System.currentTimeMillis();
                    for (int i = 0; i < 1000000000; i++) {
                    n += multi(i);
                    }
                    long ms = (System.currentTimeMillis() - startTime);
                    System.out.println(ms + " ms A " + n);
                    return ms;
                    }


                    private static long testB() {
                    int n = 0;
                    for (int i = 0; i < 1000; i++) {
                    n += multiB(i);
                    }
                    long startTime = System.currentTimeMillis();
                    for (int i = 0; i < 1000000000; i++) {
                    n += multiB(i);
                    }
                    long ms = (System.currentTimeMillis() - startTime);
                    System.out.println(ms + " ms B " + n);
                    return ms;
                    }

                    private static int multiB(int i) {
                    return 2 * (i * i);
                    }

                    private static int multi(int i) {
                    return 2 * i * i;
                    }


                    Output:



                    ...
                    405 ms A 785527736
                    327 ms B 785527736
                    404 ms A 785527736
                    329 ms B 785527736
                    404 ms A 785527736
                    328 ms B 785527736
                    404 ms A 785527736
                    328 ms B 785527736
                    410 ms
                    333 ms


                    So why?
                    The Byte code is this:



                     private static multiB(int arg0) { // 2 * (i * i)
                    <localVar:index=0 , name=i , desc=I, sig=null, start=L1, end=L2>

                    L1 {
                    iconst_2
                    iload0
                    iload0
                    imul
                    imul
                    ireturn
                    }
                    L2 {
                    }
                    }

                    private static multi(int arg0) { // 2 * i * i
                    <localVar:index=0 , name=i , desc=I, sig=null, start=L1, end=L2>

                    L1 {
                    iconst_2
                    iload0
                    imul
                    iload0
                    imul
                    ireturn
                    }
                    L2 {
                    }
                    }


                    The difference being:

                    With brackets (2 * (i * i)):




                    • push const stack

                    • push local on stack

                    • push local on stack

                    • multiply top of stack

                    • multiply top of stack


                    Without brackets (2 * i * i):




                    • push const stack

                    • push local on stack

                    • multiply top of stack

                    • push local on stack

                    • multiply top of stack


                    Loading all on stack and then working back down is faster than switching between putting on stack and operating on it.







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited 42 mins ago

























                    answered 53 mins ago









                    DSchmidt

                    18019




                    18019






















                        up vote
                        1
                        down vote













                        I tried a JMH using the default archetype: I also added optimized version based Runemoro' explanation .



                        @State(Scope.Benchmark)
                        @Warmup(iterations = 2)
                        @Fork(1)
                        @Measurement(iterations = 10)
                        @OutputTimeUnit(TimeUnit.NANOSECONDS)
                        //@BenchmarkMode({ Mode.All })
                        @BenchmarkMode(Mode.AverageTime)
                        public class MyBenchmark {
                        @Param({ "100", "1000", "1000000000" })
                        private int size;

                        @Benchmark
                        public int two_square_i() {
                        int n = 0;
                        for (int i = 0; i < size; i++) {
                        n += 2 * (i * i);
                        }
                        return n;
                        }

                        @Benchmark
                        public int square_i_two() {
                        int n = 0;
                        for (int i = 0; i < size; i++) {
                        n += i * i;
                        }
                        return 2*n;
                        }

                        @Benchmark
                        public int two_i_() {
                        int n = 0;
                        for (int i = 0; i < size; i++) {
                        n += 2 * i * i;
                        }
                        return n;
                        }
                        }


                        The result are here:



                        Benchmark                           (size)  Mode  Samples          Score   Score error  Units
                        o.s.MyBenchmark.square_i_two 100 avgt 10 58,062 1,410 ns/op
                        o.s.MyBenchmark.square_i_two 1000 avgt 10 547,393 12,851 ns/op
                        o.s.MyBenchmark.square_i_two 1000000000 avgt 10 540343681,267 16795210,324 ns/op
                        o.s.MyBenchmark.two_i_ 100 avgt 10 87,491 2,004 ns/op
                        o.s.MyBenchmark.two_i_ 1000 avgt 10 1015,388 30,313 ns/op
                        o.s.MyBenchmark.two_i_ 1000000000 avgt 10 967100076,600 24929570,556 ns/op
                        o.s.MyBenchmark.two_square_i 100 avgt 10 70,715 2,107 ns/op
                        o.s.MyBenchmark.two_square_i 1000 avgt 10 686,977 24,613 ns/op
                        o.s.MyBenchmark.two_square_i 1000000000 avgt 10 652736811,450 27015580,488 ns/op


                        On my PC (Core i7 860, doing nothing much apart reading on my smartphone):





                        • n += i*i then n*2 is first


                        • 2 * (i * i) is second.


                        The JVM is clearly not optimizing the same way than a human does (based on Runemoro answer).



                        Now then, reading bytecode: javap -c -v ./target/classes/org/sample/MyBenchmark.class




                        • Differences between 2*(i*i) (left) and 2*i*i (right) here: https://www.diffchecker.com/cvSFppWI

                        • Differences between 2*(i*i) and the optimized version here: https://www.diffchecker.com/I1XFu5dP


                        I am not expert on bytecode but we iload_2 before we imul: that's probably where you get the difference: I can suppose that the JVM optimize reading i twice (i is already here, there is no need to load it again) whilst in the 2*i*i it can't.





                        share

























                          up vote
                          1
                          down vote













                          I tried a JMH using the default archetype: I also added optimized version based Runemoro' explanation .



                          @State(Scope.Benchmark)
                          @Warmup(iterations = 2)
                          @Fork(1)
                          @Measurement(iterations = 10)
                          @OutputTimeUnit(TimeUnit.NANOSECONDS)
                          //@BenchmarkMode({ Mode.All })
                          @BenchmarkMode(Mode.AverageTime)
                          public class MyBenchmark {
                          @Param({ "100", "1000", "1000000000" })
                          private int size;

                          @Benchmark
                          public int two_square_i() {
                          int n = 0;
                          for (int i = 0; i < size; i++) {
                          n += 2 * (i * i);
                          }
                          return n;
                          }

                          @Benchmark
                          public int square_i_two() {
                          int n = 0;
                          for (int i = 0; i < size; i++) {
                          n += i * i;
                          }
                          return 2*n;
                          }

                          @Benchmark
                          public int two_i_() {
                          int n = 0;
                          for (int i = 0; i < size; i++) {
                          n += 2 * i * i;
                          }
                          return n;
                          }
                          }


                          The result are here:



                          Benchmark                           (size)  Mode  Samples          Score   Score error  Units
                          o.s.MyBenchmark.square_i_two 100 avgt 10 58,062 1,410 ns/op
                          o.s.MyBenchmark.square_i_two 1000 avgt 10 547,393 12,851 ns/op
                          o.s.MyBenchmark.square_i_two 1000000000 avgt 10 540343681,267 16795210,324 ns/op
                          o.s.MyBenchmark.two_i_ 100 avgt 10 87,491 2,004 ns/op
                          o.s.MyBenchmark.two_i_ 1000 avgt 10 1015,388 30,313 ns/op
                          o.s.MyBenchmark.two_i_ 1000000000 avgt 10 967100076,600 24929570,556 ns/op
                          o.s.MyBenchmark.two_square_i 100 avgt 10 70,715 2,107 ns/op
                          o.s.MyBenchmark.two_square_i 1000 avgt 10 686,977 24,613 ns/op
                          o.s.MyBenchmark.two_square_i 1000000000 avgt 10 652736811,450 27015580,488 ns/op


                          On my PC (Core i7 860, doing nothing much apart reading on my smartphone):





                          • n += i*i then n*2 is first


                          • 2 * (i * i) is second.


                          The JVM is clearly not optimizing the same way than a human does (based on Runemoro answer).



                          Now then, reading bytecode: javap -c -v ./target/classes/org/sample/MyBenchmark.class




                          • Differences between 2*(i*i) (left) and 2*i*i (right) here: https://www.diffchecker.com/cvSFppWI

                          • Differences between 2*(i*i) and the optimized version here: https://www.diffchecker.com/I1XFu5dP


                          I am not expert on bytecode but we iload_2 before we imul: that's probably where you get the difference: I can suppose that the JVM optimize reading i twice (i is already here, there is no need to load it again) whilst in the 2*i*i it can't.





                          share























                            up vote
                            1
                            down vote










                            up vote
                            1
                            down vote









                            I tried a JMH using the default archetype: I also added optimized version based Runemoro' explanation .



                            @State(Scope.Benchmark)
                            @Warmup(iterations = 2)
                            @Fork(1)
                            @Measurement(iterations = 10)
                            @OutputTimeUnit(TimeUnit.NANOSECONDS)
                            //@BenchmarkMode({ Mode.All })
                            @BenchmarkMode(Mode.AverageTime)
                            public class MyBenchmark {
                            @Param({ "100", "1000", "1000000000" })
                            private int size;

                            @Benchmark
                            public int two_square_i() {
                            int n = 0;
                            for (int i = 0; i < size; i++) {
                            n += 2 * (i * i);
                            }
                            return n;
                            }

                            @Benchmark
                            public int square_i_two() {
                            int n = 0;
                            for (int i = 0; i < size; i++) {
                            n += i * i;
                            }
                            return 2*n;
                            }

                            @Benchmark
                            public int two_i_() {
                            int n = 0;
                            for (int i = 0; i < size; i++) {
                            n += 2 * i * i;
                            }
                            return n;
                            }
                            }


                            The result are here:



                            Benchmark                           (size)  Mode  Samples          Score   Score error  Units
                            o.s.MyBenchmark.square_i_two 100 avgt 10 58,062 1,410 ns/op
                            o.s.MyBenchmark.square_i_two 1000 avgt 10 547,393 12,851 ns/op
                            o.s.MyBenchmark.square_i_two 1000000000 avgt 10 540343681,267 16795210,324 ns/op
                            o.s.MyBenchmark.two_i_ 100 avgt 10 87,491 2,004 ns/op
                            o.s.MyBenchmark.two_i_ 1000 avgt 10 1015,388 30,313 ns/op
                            o.s.MyBenchmark.two_i_ 1000000000 avgt 10 967100076,600 24929570,556 ns/op
                            o.s.MyBenchmark.two_square_i 100 avgt 10 70,715 2,107 ns/op
                            o.s.MyBenchmark.two_square_i 1000 avgt 10 686,977 24,613 ns/op
                            o.s.MyBenchmark.two_square_i 1000000000 avgt 10 652736811,450 27015580,488 ns/op


                            On my PC (Core i7 860, doing nothing much apart reading on my smartphone):





                            • n += i*i then n*2 is first


                            • 2 * (i * i) is second.


                            The JVM is clearly not optimizing the same way than a human does (based on Runemoro answer).



                            Now then, reading bytecode: javap -c -v ./target/classes/org/sample/MyBenchmark.class




                            • Differences between 2*(i*i) (left) and 2*i*i (right) here: https://www.diffchecker.com/cvSFppWI

                            • Differences between 2*(i*i) and the optimized version here: https://www.diffchecker.com/I1XFu5dP


                            I am not expert on bytecode but we iload_2 before we imul: that's probably where you get the difference: I can suppose that the JVM optimize reading i twice (i is already here, there is no need to load it again) whilst in the 2*i*i it can't.





                            share












                            I tried a JMH using the default archetype: I also added optimized version based Runemoro' explanation .



                            @State(Scope.Benchmark)
                            @Warmup(iterations = 2)
                            @Fork(1)
                            @Measurement(iterations = 10)
                            @OutputTimeUnit(TimeUnit.NANOSECONDS)
                            //@BenchmarkMode({ Mode.All })
                            @BenchmarkMode(Mode.AverageTime)
                            public class MyBenchmark {
                            @Param({ "100", "1000", "1000000000" })
                            private int size;

                            @Benchmark
                            public int two_square_i() {
                            int n = 0;
                            for (int i = 0; i < size; i++) {
                            n += 2 * (i * i);
                            }
                            return n;
                            }

                            @Benchmark
                            public int square_i_two() {
                            int n = 0;
                            for (int i = 0; i < size; i++) {
                            n += i * i;
                            }
                            return 2*n;
                            }

                            @Benchmark
                            public int two_i_() {
                            int n = 0;
                            for (int i = 0; i < size; i++) {
                            n += 2 * i * i;
                            }
                            return n;
                            }
                            }


                            The result are here:



                            Benchmark                           (size)  Mode  Samples          Score   Score error  Units
                            o.s.MyBenchmark.square_i_two 100 avgt 10 58,062 1,410 ns/op
                            o.s.MyBenchmark.square_i_two 1000 avgt 10 547,393 12,851 ns/op
                            o.s.MyBenchmark.square_i_two 1000000000 avgt 10 540343681,267 16795210,324 ns/op
                            o.s.MyBenchmark.two_i_ 100 avgt 10 87,491 2,004 ns/op
                            o.s.MyBenchmark.two_i_ 1000 avgt 10 1015,388 30,313 ns/op
                            o.s.MyBenchmark.two_i_ 1000000000 avgt 10 967100076,600 24929570,556 ns/op
                            o.s.MyBenchmark.two_square_i 100 avgt 10 70,715 2,107 ns/op
                            o.s.MyBenchmark.two_square_i 1000 avgt 10 686,977 24,613 ns/op
                            o.s.MyBenchmark.two_square_i 1000000000 avgt 10 652736811,450 27015580,488 ns/op


                            On my PC (Core i7 860, doing nothing much apart reading on my smartphone):





                            • n += i*i then n*2 is first


                            • 2 * (i * i) is second.


                            The JVM is clearly not optimizing the same way than a human does (based on Runemoro answer).



                            Now then, reading bytecode: javap -c -v ./target/classes/org/sample/MyBenchmark.class




                            • Differences between 2*(i*i) (left) and 2*i*i (right) here: https://www.diffchecker.com/cvSFppWI

                            • Differences between 2*(i*i) and the optimized version here: https://www.diffchecker.com/I1XFu5dP


                            I am not expert on bytecode but we iload_2 before we imul: that's probably where you get the difference: I can suppose that the JVM optimize reading i twice (i is already here, there is no need to load it again) whilst in the 2*i*i it can't.






                            share











                            share


                            share










                            answered 1 min ago









                            NoDataFound

                            5,3161739




                            5,3161739






















                                up vote
                                -1
                                down vote













                                I can't reproduce your claim. This java fiddle mostly gets the "opposite" result:



                                enter image description here






                                share|improve this answer



















                                • 1




                                  Note - I wouldn't trust running a profile through an online service that is sharing CPU time with all the other requests going through it - so much more margin for error this way.
                                  – Krease
                                  1 hour ago






                                • 3




                                  If you can't reproduce the results, comment and vote as such. This is not an answer.
                                  – Sotirios Delimanolis
                                  1 hour ago






                                • 1




                                  Not completely resistant, but definitely much more resistant, since my own machine isn't hosting a webapp allowing people to run arbitrary programs on it
                                  – Krease
                                  1 hour ago






                                • 1




                                  switch them in the main and your result should switch... this is not an answer
                                  – DSchmidt
                                  50 mins ago






                                • 2




                                  OK, it's not an answer - can't post an image in a comment, and that fiddle can't save a url to put in a comment.. I think this question isn't a question, as much as this answer isn't an answer
                                  – Caius Jard
                                  49 mins ago















                                up vote
                                -1
                                down vote













                                I can't reproduce your claim. This java fiddle mostly gets the "opposite" result:



                                enter image description here






                                share|improve this answer



















                                • 1




                                  Note - I wouldn't trust running a profile through an online service that is sharing CPU time with all the other requests going through it - so much more margin for error this way.
                                  – Krease
                                  1 hour ago






                                • 3




                                  If you can't reproduce the results, comment and vote as such. This is not an answer.
                                  – Sotirios Delimanolis
                                  1 hour ago






                                • 1




                                  Not completely resistant, but definitely much more resistant, since my own machine isn't hosting a webapp allowing people to run arbitrary programs on it
                                  – Krease
                                  1 hour ago






                                • 1




                                  switch them in the main and your result should switch... this is not an answer
                                  – DSchmidt
                                  50 mins ago






                                • 2




                                  OK, it's not an answer - can't post an image in a comment, and that fiddle can't save a url to put in a comment.. I think this question isn't a question, as much as this answer isn't an answer
                                  – Caius Jard
                                  49 mins ago













                                up vote
                                -1
                                down vote










                                up vote
                                -1
                                down vote









                                I can't reproduce your claim. This java fiddle mostly gets the "opposite" result:



                                enter image description here






                                share|improve this answer














                                I can't reproduce your claim. This java fiddle mostly gets the "opposite" result:



                                enter image description here







                                share|improve this answer














                                share|improve this answer



                                share|improve this answer








                                edited 46 mins ago

























                                answered 1 hour ago









                                Caius Jard

                                7,83111135




                                7,83111135








                                • 1




                                  Note - I wouldn't trust running a profile through an online service that is sharing CPU time with all the other requests going through it - so much more margin for error this way.
                                  – Krease
                                  1 hour ago






                                • 3




                                  If you can't reproduce the results, comment and vote as such. This is not an answer.
                                  – Sotirios Delimanolis
                                  1 hour ago






                                • 1




                                  Not completely resistant, but definitely much more resistant, since my own machine isn't hosting a webapp allowing people to run arbitrary programs on it
                                  – Krease
                                  1 hour ago






                                • 1




                                  switch them in the main and your result should switch... this is not an answer
                                  – DSchmidt
                                  50 mins ago






                                • 2




                                  OK, it's not an answer - can't post an image in a comment, and that fiddle can't save a url to put in a comment.. I think this question isn't a question, as much as this answer isn't an answer
                                  – Caius Jard
                                  49 mins ago














                                • 1




                                  Note - I wouldn't trust running a profile through an online service that is sharing CPU time with all the other requests going through it - so much more margin for error this way.
                                  – Krease
                                  1 hour ago






                                • 3




                                  If you can't reproduce the results, comment and vote as such. This is not an answer.
                                  – Sotirios Delimanolis
                                  1 hour ago






                                • 1




                                  Not completely resistant, but definitely much more resistant, since my own machine isn't hosting a webapp allowing people to run arbitrary programs on it
                                  – Krease
                                  1 hour ago






                                • 1




                                  switch them in the main and your result should switch... this is not an answer
                                  – DSchmidt
                                  50 mins ago






                                • 2




                                  OK, it's not an answer - can't post an image in a comment, and that fiddle can't save a url to put in a comment.. I think this question isn't a question, as much as this answer isn't an answer
                                  – Caius Jard
                                  49 mins ago








                                1




                                1




                                Note - I wouldn't trust running a profile through an online service that is sharing CPU time with all the other requests going through it - so much more margin for error this way.
                                – Krease
                                1 hour ago




                                Note - I wouldn't trust running a profile through an online service that is sharing CPU time with all the other requests going through it - so much more margin for error this way.
                                – Krease
                                1 hour ago




                                3




                                3




                                If you can't reproduce the results, comment and vote as such. This is not an answer.
                                – Sotirios Delimanolis
                                1 hour ago




                                If you can't reproduce the results, comment and vote as such. This is not an answer.
                                – Sotirios Delimanolis
                                1 hour ago




                                1




                                1




                                Not completely resistant, but definitely much more resistant, since my own machine isn't hosting a webapp allowing people to run arbitrary programs on it
                                – Krease
                                1 hour ago




                                Not completely resistant, but definitely much more resistant, since my own machine isn't hosting a webapp allowing people to run arbitrary programs on it
                                – Krease
                                1 hour ago




                                1




                                1




                                switch them in the main and your result should switch... this is not an answer
                                – DSchmidt
                                50 mins ago




                                switch them in the main and your result should switch... this is not an answer
                                – DSchmidt
                                50 mins ago




                                2




                                2




                                OK, it's not an answer - can't post an image in a comment, and that fiddle can't save a url to put in a comment.. I think this question isn't a question, as much as this answer isn't an answer
                                – Caius Jard
                                49 mins ago




                                OK, it's not an answer - can't post an image in a comment, and that fiddle can't save a url to put in a comment.. I think this question isn't a question, as much as this answer isn't an answer
                                – Caius Jard
                                49 mins ago










                                up vote
                                -2
                                down vote













                                I got similar results:



                                2 * (i * i): 0.458765943 s, n=119860736
                                2 * i * i: 0.580255126 s, n=119860736


                                I got the SAME results if both loops were in the same program, or each was in a separate .java file/.class, executed on a separate run.



                                Finally, here is a javap -c -v <.java> decompile of each:



                                     3: ldc           #3                  // String 2 * (i * i):
                                5: invokevirtual #4 // Method java/io/PrintStream.print:(Ljava/lang/String;)V
                                8: invokestatic #5 // Method java/lang/System.nanoTime:()J
                                8: invokestatic #5 // Method java/lang/System.nanoTime:()J
                                11: lstore_1
                                12: iconst_0
                                13: istore_3
                                14: iconst_0
                                15: istore 4
                                17: iload 4
                                19: ldc #6 // int 1000000000
                                21: if_icmpge 40
                                24: iload_3
                                25: iconst_2
                                26: iload 4
                                28: iload 4
                                30: imul
                                31: imul
                                32: iadd
                                33: istore_3
                                34: iinc 4, 1
                                37: goto 17


                                vs.



                                     3: ldc           #3                  // String 2 * i * i:
                                5: invokevirtual #4 // Method java/io/PrintStream.print:(Ljava/lang/String;)V
                                8: invokestatic #5 // Method java/lang/System.nanoTime:()J
                                11: lstore_1
                                12: iconst_0
                                13: istore_3
                                14: iconst_0
                                15: istore 4
                                17: iload 4
                                19: ldc #6 // int 1000000000
                                21: if_icmpge 40
                                24: iload_3
                                25: iconst_2
                                26: iload 4
                                28: imul
                                29: iload 4
                                31: imul
                                32: iadd
                                33: istore_3
                                34: iinc 4, 1
                                37: goto 17


                                FYI -



                                java -version
                                java version "1.8.0_121"
                                Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
                                Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)





                                share|improve this answer

















                                • 1




                                  A better answer and maybe you can vote to undelete - stackoverflow.com/a/53452836/1746118 ... Side note - I am not the downvoter anyway.
                                  – nullpointer
                                  1 hour ago












                                • @nullpointer - I agree. I'd definitely vote to undelete, if I could. I'd also like to "double upvote" stefan for giving a quantitative definition of "significant"
                                  – paulsm4
                                  57 mins ago












                                • That one was self-deleted since it measured the wrong thing - see that author's comment on the question above
                                  – Krease
                                  55 mins ago










                                • With JIT, the bytecode doesn't matter much. Show the JIT code.
                                  – rustyx
                                  13 mins ago















                                up vote
                                -2
                                down vote













                                I got similar results:



                                2 * (i * i): 0.458765943 s, n=119860736
                                2 * i * i: 0.580255126 s, n=119860736


                                I got the SAME results if both loops were in the same program, or each was in a separate .java file/.class, executed on a separate run.



                                Finally, here is a javap -c -v <.java> decompile of each:



                                     3: ldc           #3                  // String 2 * (i * i):
                                5: invokevirtual #4 // Method java/io/PrintStream.print:(Ljava/lang/String;)V
                                8: invokestatic #5 // Method java/lang/System.nanoTime:()J
                                8: invokestatic #5 // Method java/lang/System.nanoTime:()J
                                11: lstore_1
                                12: iconst_0
                                13: istore_3
                                14: iconst_0
                                15: istore 4
                                17: iload 4
                                19: ldc #6 // int 1000000000
                                21: if_icmpge 40
                                24: iload_3
                                25: iconst_2
                                26: iload 4
                                28: iload 4
                                30: imul
                                31: imul
                                32: iadd
                                33: istore_3
                                34: iinc 4, 1
                                37: goto 17


                                vs.



                                     3: ldc           #3                  // String 2 * i * i:
                                5: invokevirtual #4 // Method java/io/PrintStream.print:(Ljava/lang/String;)V
                                8: invokestatic #5 // Method java/lang/System.nanoTime:()J
                                11: lstore_1
                                12: iconst_0
                                13: istore_3
                                14: iconst_0
                                15: istore 4
                                17: iload 4
                                19: ldc #6 // int 1000000000
                                21: if_icmpge 40
                                24: iload_3
                                25: iconst_2
                                26: iload 4
                                28: imul
                                29: iload 4
                                31: imul
                                32: iadd
                                33: istore_3
                                34: iinc 4, 1
                                37: goto 17


                                FYI -



                                java -version
                                java version "1.8.0_121"
                                Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
                                Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)





                                share|improve this answer

















                                • 1




                                  A better answer and maybe you can vote to undelete - stackoverflow.com/a/53452836/1746118 ... Side note - I am not the downvoter anyway.
                                  – nullpointer
                                  1 hour ago












                                • @nullpointer - I agree. I'd definitely vote to undelete, if I could. I'd also like to "double upvote" stefan for giving a quantitative definition of "significant"
                                  – paulsm4
                                  57 mins ago












                                • That one was self-deleted since it measured the wrong thing - see that author's comment on the question above
                                  – Krease
                                  55 mins ago










                                • With JIT, the bytecode doesn't matter much. Show the JIT code.
                                  – rustyx
                                  13 mins ago













                                up vote
                                -2
                                down vote










                                up vote
                                -2
                                down vote









                                I got similar results:



                                2 * (i * i): 0.458765943 s, n=119860736
                                2 * i * i: 0.580255126 s, n=119860736


                                I got the SAME results if both loops were in the same program, or each was in a separate .java file/.class, executed on a separate run.



                                Finally, here is a javap -c -v <.java> decompile of each:



                                     3: ldc           #3                  // String 2 * (i * i):
                                5: invokevirtual #4 // Method java/io/PrintStream.print:(Ljava/lang/String;)V
                                8: invokestatic #5 // Method java/lang/System.nanoTime:()J
                                8: invokestatic #5 // Method java/lang/System.nanoTime:()J
                                11: lstore_1
                                12: iconst_0
                                13: istore_3
                                14: iconst_0
                                15: istore 4
                                17: iload 4
                                19: ldc #6 // int 1000000000
                                21: if_icmpge 40
                                24: iload_3
                                25: iconst_2
                                26: iload 4
                                28: iload 4
                                30: imul
                                31: imul
                                32: iadd
                                33: istore_3
                                34: iinc 4, 1
                                37: goto 17


                                vs.



                                     3: ldc           #3                  // String 2 * i * i:
                                5: invokevirtual #4 // Method java/io/PrintStream.print:(Ljava/lang/String;)V
                                8: invokestatic #5 // Method java/lang/System.nanoTime:()J
                                11: lstore_1
                                12: iconst_0
                                13: istore_3
                                14: iconst_0
                                15: istore 4
                                17: iload 4
                                19: ldc #6 // int 1000000000
                                21: if_icmpge 40
                                24: iload_3
                                25: iconst_2
                                26: iload 4
                                28: imul
                                29: iload 4
                                31: imul
                                32: iadd
                                33: istore_3
                                34: iinc 4, 1
                                37: goto 17


                                FYI -



                                java -version
                                java version "1.8.0_121"
                                Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
                                Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)





                                share|improve this answer












                                I got similar results:



                                2 * (i * i): 0.458765943 s, n=119860736
                                2 * i * i: 0.580255126 s, n=119860736


                                I got the SAME results if both loops were in the same program, or each was in a separate .java file/.class, executed on a separate run.



                                Finally, here is a javap -c -v <.java> decompile of each:



                                     3: ldc           #3                  // String 2 * (i * i):
                                5: invokevirtual #4 // Method java/io/PrintStream.print:(Ljava/lang/String;)V
                                8: invokestatic #5 // Method java/lang/System.nanoTime:()J
                                8: invokestatic #5 // Method java/lang/System.nanoTime:()J
                                11: lstore_1
                                12: iconst_0
                                13: istore_3
                                14: iconst_0
                                15: istore 4
                                17: iload 4
                                19: ldc #6 // int 1000000000
                                21: if_icmpge 40
                                24: iload_3
                                25: iconst_2
                                26: iload 4
                                28: iload 4
                                30: imul
                                31: imul
                                32: iadd
                                33: istore_3
                                34: iinc 4, 1
                                37: goto 17


                                vs.



                                     3: ldc           #3                  // String 2 * i * i:
                                5: invokevirtual #4 // Method java/io/PrintStream.print:(Ljava/lang/String;)V
                                8: invokestatic #5 // Method java/lang/System.nanoTime:()J
                                11: lstore_1
                                12: iconst_0
                                13: istore_3
                                14: iconst_0
                                15: istore 4
                                17: iload 4
                                19: ldc #6 // int 1000000000
                                21: if_icmpge 40
                                24: iload_3
                                25: iconst_2
                                26: iload 4
                                28: imul
                                29: iload 4
                                31: imul
                                32: iadd
                                33: istore_3
                                34: iinc 4, 1
                                37: goto 17


                                FYI -



                                java -version
                                java version "1.8.0_121"
                                Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
                                Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)






                                share|improve this answer












                                share|improve this answer



                                share|improve this answer










                                answered 1 hour ago









                                paulsm4

                                75.6k898122




                                75.6k898122








                                • 1




                                  A better answer and maybe you can vote to undelete - stackoverflow.com/a/53452836/1746118 ... Side note - I am not the downvoter anyway.
                                  – nullpointer
                                  1 hour ago












                                • @nullpointer - I agree. I'd definitely vote to undelete, if I could. I'd also like to "double upvote" stefan for giving a quantitative definition of "significant"
                                  – paulsm4
                                  57 mins ago












                                • That one was self-deleted since it measured the wrong thing - see that author's comment on the question above
                                  – Krease
                                  55 mins ago










                                • With JIT, the bytecode doesn't matter much. Show the JIT code.
                                  – rustyx
                                  13 mins ago














                                • 1




                                  A better answer and maybe you can vote to undelete - stackoverflow.com/a/53452836/1746118 ... Side note - I am not the downvoter anyway.
                                  – nullpointer
                                  1 hour ago












                                • @nullpointer - I agree. I'd definitely vote to undelete, if I could. I'd also like to "double upvote" stefan for giving a quantitative definition of "significant"
                                  – paulsm4
                                  57 mins ago












                                • That one was self-deleted since it measured the wrong thing - see that author's comment on the question above
                                  – Krease
                                  55 mins ago










                                • With JIT, the bytecode doesn't matter much. Show the JIT code.
                                  – rustyx
                                  13 mins ago








                                1




                                1




                                A better answer and maybe you can vote to undelete - stackoverflow.com/a/53452836/1746118 ... Side note - I am not the downvoter anyway.
                                – nullpointer
                                1 hour ago






                                A better answer and maybe you can vote to undelete - stackoverflow.com/a/53452836/1746118 ... Side note - I am not the downvoter anyway.
                                – nullpointer
                                1 hour ago














                                @nullpointer - I agree. I'd definitely vote to undelete, if I could. I'd also like to "double upvote" stefan for giving a quantitative definition of "significant"
                                – paulsm4
                                57 mins ago






                                @nullpointer - I agree. I'd definitely vote to undelete, if I could. I'd also like to "double upvote" stefan for giving a quantitative definition of "significant"
                                – paulsm4
                                57 mins ago














                                That one was self-deleted since it measured the wrong thing - see that author's comment on the question above
                                – Krease
                                55 mins ago




                                That one was self-deleted since it measured the wrong thing - see that author's comment on the question above
                                – Krease
                                55 mins ago












                                With JIT, the bytecode doesn't matter much. Show the JIT code.
                                – rustyx
                                13 mins ago




                                With JIT, the bytecode doesn't matter much. Show the JIT code.
                                – rustyx
                                13 mins ago


















                                 

                                draft saved


                                draft discarded



















































                                 


                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53452713%2fwhy-is-2-i-i-faster-than-2-i-i%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Mouse cursor on multiple screens with different PPI

                                Agildo Ribeiro

                                Sometime when accessing a menu: “Ubuntu 16.04 has experienced an internal error”