How To Run Simple Micro-Benchmarks in Java

Image Link

There was a time when I used to love running micro-benchmarks. I enjoyed finding out how much time it took to concatenate a million strings using the '+' operator vs. using a StringBuffer, or was the enhanced for loop slower than a regular for loop while iterating collections? All sorts of things. The numbers and ratios I derived from these tests made me happy and since I taught programming in my free time, I enjoyed sharing that information with my students.

Running micro-benchmarks can be fun, but you have to do them right, otherwise the results could come out heavily skewed. Sometimes the results could be skewed even if you do all the right things, and many people warn against running any kind of micro-benchmarks, but I don't agree with them. In most cases the results do give some idea of which approach of tackling a problem, or just making code faster would be better.

We'll take a very simple example -- of iterating across a collection -- and show how one approach is clearly slower than the other.

for(int i=0; i<list.size(); i++) {

// do something with the list element

}

Code Example 1

The code shown above in Code Example 1 is a very simple for loop that iterates across all the elements of a List. Do you notice something unusual in there? The list.size() method is inside the for loop causing it to be invoked in every iteration. Intuitively, a faster approach to write the same code is to remove the size check and put it before the for loop as shown below in Code Example 2.

int size = list.size();

for(int i=0; i<size; i++) {

// do something with the list element

}

Code Example 1

Keep in mind that this article is about running micro-benchmarks and not about the correct way to iterate across Lists. Using an Iterator or an enhanced for loop is the recommended way to iterate across collections. Primitive for loops are certainly not recommended for this task, and yet I have chosen them because the code is simple to understand and it makes a great example for showing how to run a micro-benchmark.

Take a quick look at the code, most of it is self explanatory, but I will explain it briefly along with the important parts. Pay special attention to the "warm up" part of the code.

import java.util.ArrayList;

import java.util.List;

public class PerformantIteration {

private static int theSum = 0;

public static void main(String[] args) {

System. out.println("Starting micro-benchmark test" );

// Add 50K elements to the ArrayList

List<Integer> nums = new ArrayList<Integer>();

for(int i=0; i<50000; i++) {

nums.add(i);

}

System. out.println("Warming up ..." );

//warm up... make sure all JIT compiling is done before the actual benchmarking begins

for(int i=0; i<10; i++) {

iterateWithConstantSize(nums);

iterateWithDynamicSize(nums);

}

//start benchmarks

System. out.println("Starting the actual test" );

long constantSizeBenchmark = iterateWithConstantSize(nums);

long dynamicSizeBenchmark = iterateWithDynamicSize(nums);

System. out.println("Test completed... printing results" );

System. out.println("constantSizeBenchmark : " + constantSizeBenchmark);

System. out.println("dynamicSizeBenchmark : " + dynamicSizeBenchmark);

System. out.println("dynamicSizeBenchmark/constantSizeBenchmark : " +

((double)dynamicSizeBenchmark/(double)constantSizeBenchmark));

}

private static long iterateWithDynamicSize(List<Integer> nums) {

int sum=0;

long start = System.nanoTime();

for(int i=0; i<nums.size(); i++) {

// appear to do something useful

sum += nums.get(i);

}

long end = System.nanoTime();

setSum(sum);

return end-start;

}

private static long iterateWithConstantSize(List<Integer> nums) {

int count = nums.size();

int sum=0;

long start = System.nanoTime();

for(int i=0; i<count; i++) {

// appear to do something useful

sum += nums.get(i);

}

long end = System.nanoTime();

setSum(sum);

return end-start;

}

// invocations to this method simply exist to fool the VM into

// thinking that we are doing something useful in the loop

private static void setSum(int sum) {

theSum = sum++;

}

Code Example 3

As I had said earlier, the code is fairly simple. We create an ArrayList and add 50K integers to it. We are going to iterate the list using both the methods we described at the beginning of this article. But just iterating and keeping time will not give us accurate results. This is because classes need to be loaded the first time we create them, so the first attempt to benchmark anything will always appear slower because of the time it takes to load classes. Besides this, the JVM's JIT Compiler performs a lot of optimizations as the code runs, which could cause the first few runs to be a bit slower than subsequent ones. And yet another factor is the working of the Garbage Collector, which can significantly slow down the program when it runs.

To ensure our results don't get skewed because of these factors, it is highly recommended to have a few warm up runs before running the actual test. The warm up runs will ensure that all required classes are loaded and all JIT optimizations are also done. In the code shown above the warm up code invokes both the test methods 10 times to ensure that the actual test run occurs without any interference.

Adding warm up code is definitely better than not doing so, but it still feels a bit like shooting in the dark. How do we know that all the actual test code really ran without any interference? It's hard to place any kind of confidence in the test results without such a guarantee. Even though it is hard to offer such a guarantee, we can print the inner workings of the JVM and manually verify if the test code was able to run without any interference. Using the -XX:+PrintCompilation and -verbose:gc flags will make the JVM print out all the information we need.

What we see below is the result from running the code in Code Example 3. Examining the log tells us that all class loading was done before the actual tests ran, and that the tests ran without any interference - or at least without any detectable interference. I'd once again like to stress that just because we did not detect anything, does not guarantee a clean run. We cannot use these results for a scientific analysis, but in my experience we can use them to guide our code with at least some amount of confidence.

    152   1       java.lang.String::charAt (33 bytes)
    160   2       java.lang.String::indexOf (151 bytes)
    165   3Starting microbenchmark on iterating over collections with a call to size() in each iteration       java.lang.String::hashCode (60 bytes)
    171   4       sun.nio.cs.UTF_8$Encoder::encodeArrayLoop (490 bytes)
    183   5
       java.lang.String::lastIndexOf (156 bytes)
    197   6       java.io.UnixFileSystem::normalize (75 bytes)
    200   7       java.lang.Object::<init> (1 bytes)
    205   8       java.lang.Number::<init> (5 bytes)
    206   9       java.lang.Integer::<init> (10 bytes)
    211 10       java.util.ArrayList::add (29 bytes)
    211 11       java.util.ArrayList::ensureCapacity (58 bytes)
    217 12       java.lang.Integer::valueOf (35 bytes)
    221   1%      performance.api.PerformantIteration::main @ 21 (173 bytes)
Warming up ...
    252 13       java.util.ArrayList::get (11 bytes)
    252 14       java.util.ArrayList::rangeCheck (22 bytes)
    253 15       java.util.ArrayList::elementData (7 bytes)
    260   2%      performance.api.PerformantIteration::iterateWithConstantSize @ 19 (59 bytes)
    268   3%      performance.api.PerformantIteration::iterateWithDynamicSize @ 12 (57 bytes)
    272 16       performance.api.PerformantIteration::iterateWithConstantSize (59 bytes)
    278 17       performance.api.PerformantIteration::iterateWithDynamicSize (57 bytes)
Starting the actual test
Test completed... printing results
constantSizeBenchmark : 301688
dynamicSizeBenchmark : 782602
dynamicSizeBenchmark/constantSizeBenchmark : 2.5940773249184588

And finally, as you can see calling the size() method on every iteration causes the code to be almost 2.5 times slower. The results could vary heavily depending on the JDK/JVM and the actual machine/architecture. This test was run with JDK 1.6 on a Intel i5, 2.5 GHz machine with 8 GB RAM.