architecture	availability	max. frequency
Intel Skylake	2015	4.5 GHz
Intel Ice Lake	2019	4.1 GHz

Actual measurements

// returns the average
double transcode(const std::string& source, size_t iterations);


...

  for(size_t i = iterations_start; i <= iterations_end; i+=step) {
    std::vector<double> averages;
    for(size_t j = 0; j < 30; j++) { averages.push_back(transcode(source, i)); }
    std::cout << i << "\t" << compute_std_dev(averages) << std::endl;
  }

N	average	minimum
200	3.44%	1.38%
2000	2.66%	1.19%
10000	2.95%	1.27%

trial	mispredicted branches
1	50%
2	18%
3	6%
4	2%
5	1%
6	0.3%
7	0.15%
8	0.15%
9	0.1%

Accurate and efficient software microbenchmarks

Background

Where is the code ?

How fast is your disk?

CPU Frequencies are stagnating

Fact

Solution?

Hypothesis

Simple

Complex system

JIT

System calls

Data access

Tiny functions

Take statically compiled code

Use the average?

Repeated measures increase accuracy

Simulation

Actual measurements

Sigma events

Measuring sigma events

What if we dealt with log-normal distributions?

What if we measured the minimum?

CPU performance counters

Limitations

Counters in the cloud

Instruction counts are accurate

Using performance counters

Generally, fewer instructions means faster code

Take away 1

Take away 2

Links