System Benchmarks

That benchmark is nonsense.
Benchmarks fall into a few common traps because of under-reporting in context and lack of detail in results.  The typical benchmark report doesn't reveal the benchmark's goal, full details of the hardware and software used, how the results were edited if at all, how to reproduce the results, detailed reporting on the system's performance during the test, and an interpretation and explanation of the results.

The Benchmark Goal

When you say "benchmark," what do you mean, exactly? What was your purpose in running the benchmark? Some common purposes include:
  • Validating that your hardware is configured properly
  • Comparing systems against each other
  • Capacity planning
  • Checking for regressions
  • Trying to reproduce a problem so you can verify whether it's solved
  • Trying to make the system behave badly; stress-testing it to see where its bottleneck is
Your intended purpose makes all the difference in how you should run the benchmark and report the results, and readers of the report need to know.

Hardware and Software

What hardware and software, exactly, did you use in the benchmark? For hardware, it is important to know details such as the following:
  • The CPUs: the exact vendor, model, clock speed, number of sockets and cores, and cache sizes
  • The memory: the total size and speed of the installed memory
  • The network: the general characteristics such as 1GigE or 10Gig E and any salient factors such as network topology
  • The storage: for hard drives, the interconnect, number, size, capacity, and rotational speed; for solid-state the exact interconnect, vendor, capacity, and model; for RAID controllers the interconnect, vendor, model, and cache configuration
It is also important to disclose the details about the benchmark and operating system software and version:
  • The version of the benchmark software
  • The version of the software you are testing, including any details such as the compiler with which it was built
  • The operating system distribution, kernel version, system library version, filesystem, and (in Linux) device queue scheduler

Benchmark Parameters

You should report the configuration used to run the benchmark:
  • The software's configuration; the exact configuration file is ideal
  • The command-line parameters used to execute the benchmark tool
  • The preparation or warm-up for the benchmark
  • The duration of the benchmark.
  • The data set against which the benchmark executed, and how it was generated
It is a good idea to summarize these points briefly, especially for items such as the concurrency, data set size, and duration of the benchmark.

Editing the Results

If you modify the results in any way, such as discarding outliers, you need to report that.  Ideally, readers should have access to the raw results.

Detailed Reporting on System Performance

This is perhaps the most crucial aspect of a well-reported benchmark. Many benchmarks fall into the trap of simply reporting the system's throughput, which is an average of the entire run. But averages obscure vitally important details.  Consider the following benchmark plot:
The problem with this plot is that each bar represents an average throughput measurement for some benchmark run, but you cannot see the variations in performance within that run.  As a result, you do not know how high-quality the system's performance was during the run. Consistent, predictable performance matters much more than raw throughput. As Facebook engineer Mark Callaghan says

"Great average performance combined with high variance is a great way to waste a lot of time debugging problems in production."

An improvement over the above chart would be to report a percentile, such as 95th or 99th percentile, which reveals how well the system performs most of the time instead of hiding all of the details in an average.

It's even better to show more details.  There are two especially helpful ways to do this. The first is showing a time-series plot of throughput.  This shows how consistent the system's performance is over time:
Which system would you rather run? Which system do you think provides a higher average throughput?

The second helpful way to reveal more details about the system's performance is to plot the throughput points every specific interval of time, such as every ten seconds, instead of as a single bar or line. This is a jitter graph:
With this chart you can see how consistent and stable the results are.

Interpreting The Results

The final important detail is to interpret the results for the reader. Explain what the results show and why they are as they are. Are there any anomalies? What do they reveal, and how do you explain them? If you do not know yet how to explain the results, it is acceptable to state that more research is needed, and to follow up later.  Publishing results without an explanation is okay, as long as the intention is to either understand the results later or to provide an opportunity for others to do so.

Notes

The benchmark plots on this page were taken from http://www.mysqlperformanceblog.com/2012/02/22/benchmarks-of-intel-320-ssd-600gb/ and http://blog.montyprogram.com/benchmarking-mariadb-5-3-4/ and slightly edited or cropped in some cases to clarify them for purposes of illustration.
ą
Baron Schwartz,
Feb 28, 2012, 9:01 AM
ą
Baron Schwartz,
Feb 28, 2012, 8:58 AM
ą
Baron Schwartz,
Feb 28, 2012, 7:50 AM
Comments