Measures are
Sum is
Average is
mu, sigma = 10000, 5000
for N in range(20, 2000+1):
s = [sum(np.random.default_rng().normal(mu, sigma, N))/N for i in range(30)]
print(N,np.std(s))

// returns the average
double transcode(const std::string& source, size_t iterations);
...
for(size_t i = iterations_start; i <= iterations_end; i+=step) {
std::vector<double> averages;
for(size_t j = 0; j < 30; j++) { averages.push_back(transcode(source, i)); }
std::cout << i << "\t" << compute_std_dev(averages) << std::endl;
}


Take 300 measures after warmup, and measure the worst relative deviation
$ for i in {1..10}; do sudo ./sigma_test; done
4.56151
4.904
7.43446
5.73425
9.89544
12.975
3.92584
3.14633
4.91766
5.3699

for N in range(20, 2000+1):
s = [sum(np.random.default_rng().lognormal(1, 4, N))/N for i in range(30)]
print(N,np.std(s))

Relative standard deviation (
| N | average | minimum |
|---|---|---|
| 200 | 3.44% | 1.38% |
| 2000 | 2.66% | 1.19% |
| 10000 | 2.95% | 1.27% |
Processors have zero-overhead counters recording instruction retired, actual cycles, and so forth.
No need to freeze the CPU frequency: you can measure it.

If you are adding speculative branching, make sure your test input is large.
while (howmany != 0) {
val = random();
if( val is an odd integer ) {
out[index] = val;
index += 1;
}
howmany--;
}
2000 'random' elements, AMD Rome
| trial | mispredicted branches |
|---|---|
| 1 | 50% |
| 2 | 18% |
| 3 | 6% |
| 4 | 2% |
| 5 | 1% |
| 6 | 0.3% |
| 7 | 0.15% |
| 8 | 0.15% |
| 9 | 0.1% |

---