Why does Java switch on contiguous ints appear to run faster with added cases?
Asked 07 September, 2021
Viewed 1.6K times
  • 63

I am working on some Java code which needs to be highly optimized as it will run in hot functions that are invoked at many points in my main program logic. Part of this code involves multiplying double variables by 10 raised to arbitrary non-negative int exponents. One fast way (edit: but not the fastest possible, see Update 2 below) to get the multiplied value is to switch on the exponent:

double multiplyByPowerOfTen(final double d, final int exponent) {
   switch (exponent) {
      case 0:
         return d;
      case 1:
         return d*10;
      case 2:
         return d*100;
      // ... same pattern
      case 9:
         return d*1000000000;
      case 10:
         return d*10000000000L;
      // ... same pattern with long literals
      case 18:
         return d*1000000000000000000L;
         throw new ParseException("Unhandled power of ten " + power, 0);

The commented ellipses above indicate that the case int constants continue incrementing by 1, so there are really 19 cases in the above code snippet. Since I wasn't sure whether I would actually need all the powers of 10 in case statements 10 thru 18, I ran some microbenchmarks comparing the time to complete 10 million operations with this switch statement versus a switch with only cases 0 thru 9 (with the exponent limited to 9 or less to avoid breaking the pared-down switch). I got the rather surprising (to me, at least!) result that the longer switch with more case statements actually ran faster.

On a lark, I tried adding even more cases which just returned dummy values, and found that I could get the switch to run even faster with around 22-27 declared cases (even though those dummy cases are never actually hit while the code is running). (Again, cases were added in a contiguous fashion by incrementing the prior case constant by 1.) These execution time differences are not very significant: for a random exponent between 0 and 10, the dummy padded switch statement finishes 10 million executions in 1.49 secs versus 1.54 secs for the unpadded version, for a grand total savings of 5ns per execution. So, not the kind of thing that makes obsessing over padding out a switch statement worth the effort from an optimization standpoint. But I still just find it curious and counter-intuitive that a switch doesn't become slower (or perhaps at best maintain constant O(1) time) to execute as more cases are added to it.

switch benchmarking results

These are the results I obtained from running with various limits on the randomly-generated exponent values. I didn't include the results all the way down to 1 for the exponent limit, but the general shape of the curve remains the same, with a ridge around the 12-17 case mark, and a valley between 18-28. All tests were run in JUnitBenchmarks using shared containers for the random values to ensure identical testing inputs. I also ran the tests both in order from longest switch statement to shortest, and vice-versa, to try and eliminate the possibility of ordering-related test problems. I've put my testing code up on a github repo if anyone wants to try to reproduce these results.

So, what's going on here? Some vagaries of my architecture or micro-benchmark construction? Or is the Java switch really a little faster to execute in the 18 to 28 case range than it is from 11 up to 17?

github test repo "switch-experiment"

UPDATE: I cleaned up the benchmarking library quite a bit and added a text file in /results with some output across a wider range of possible exponent values. I also added an option in the testing code not to throw an Exception from default, but this doesn't appear to affect the results.

UPDATE 2: Found some pretty good discussion of this issue from back in 2009 on the xkcd forum here: http://forums.xkcd.com/viewtopic.php?f=11&t=33524. The OP's discussion of using Array.binarySearch() gave me the idea for a simple array-based implementation of the exponentiation pattern above. There's no need for the binary search since I know what the entries in the array are. It appears to run about 3 times faster than using switch, obviously at the expense of some of the control flow that switch affords. That code has been added to the github repo also.

5 Answer