Continually measure
When writing code, it's tempting to get complex. Especially when you are concerned about performance. But, you should write it simply first. Measure its performance. With that data, then write your improvements. Using your performance measurements as guides.
My failure to measure
I was writing a statistic gathering service Java. The statistics were very simple counts of items seen overtime. The constraints of the system were:
- high throughput
- Thread safe
Java has a great class for thread-safe incremental statistics LongAdder
. It's fantastic for fast writes but LongAdder#reset
is not thread-safe. I needed to be able to grab the latest full count and then reset. In comes ReadWriteLock
! ReadWriteLock#readLock
can be used for all the increment actions and then ReadWriteLock#writeLock
for grabbing the latest total. The resulting service ended up looking like this:
public static class Accumulator {
private final LongAdder statsAccumulator = new LongAdder();
private final ReadWriteLock readWriteLock = new ReentrantReadWriteLock(true);
public Accumulator inc() {
readWriteLock.readLock().lock();
try {
this.statsAccumulator.increment();
return this;
} finally {
readWriteLock.readLock().unlock();
}
}
public InferenceStats currentStatsAndReset() {
readWriteLock.writeLock().lock();
try {
Stats stats = currentStats(Instant.now());
this.statsAccumulator.reset();
return stats;
} finally {
readWriteLock.writeLock().unlock();
}
}
public InferenceStats currentStats(Instant timeStamp) {
return new Stats(statsAccumulator.longValue(), timeStamp);
}
}
I thought I had the perfect high throughput, thread-safe, statistics gathering class. I mean, it doesn't ever block the writes unless we grab the currentStatsAndReset
. The common hot-path of inc()
is normally not blocking.
Perfect.
But I didn't do one thing. I never measured performance of this implementation against a dead-simple synchronized
version. 🤦
Start simple, then measure
Here is the simple version:
public static class Accumulator {
private long statsAccumulator = 0L;
public synchronized Accumulator inc() {
this.statsAccumulator++;
return this;
}
public synchronized InferenceStats currentStatsAndReset() {
Stats stats = currentStats(Instant.now());
this.statsAccumulator = 0L;
return stats;
}
public InferenceStats currentStats(Instant timeStamp) {
return new Stats(statsAccumulator, timeStamp);
}
}
It doesn't use any of those classes designed for low contentioning locking. Just a plain 'ol synchronized
methods. Surely, all that use of synchronized
would increase contention on write.
Another developer on my team called me out on this. He was curious to see if my version was truly faster. I knew I was right, so I wrote a JMH benchmark to prove him wrong.
The results were not on my side:
Benchmark Mode Cnt Score Error Units
MultiThreadedStatsAccumulatorBenchmark.rwAccumulator_1 avgt 20 5957.399 ± 112.892 us/op
MultiThreadedStatsAccumulatorBenchmark.rwAccumulator_128 avgt 20 7480921.908 ± 255364.820 us/op
MultiThreadedStatsAccumulatorBenchmark.syncAccumulator_1 avgt 20 421.662 ± 2.616 us/op
MultiThreadedStatsAccumulatorBenchmark.syncAccumulator_128 avgt 20 792910.927 ± 52219.577 us/op
My complex version (rwAccumulator
) was almost 10x SLOWER. The simple, fully synchronized
version (syncAccumulator
) kicked my butt. Both with 1, and 128 separate threads!
Measure, Measure, Measure
My lessons were humbling reminders.
- Always write the simple way first.
- If you think it should be faster, change it and measure it.
- Measure, measure, measure
Top comments (1)
JMH Benchmark for the curious