Measure twice, write twice

#beginners #programming #developers #java

Continually measure

When writing code, it's tempting to get complex. Especially when you are concerned about performance. But, you should write it simply first. Measure its performance. With that data, then write your improvements. Using your performance measurements as guides.

My failure to measure

I was writing a statistic gathering service Java. The statistics were very simple counts of items seen overtime. The constraints of the system were:

high throughput
Thread safe

Java has a great class for thread-safe incremental statistics LongAdder. It's fantastic for fast writes but LongAdder#reset is not thread-safe. I needed to be able to grab the latest full count and then reset. In comes ReadWriteLock! ReadWriteLock#readLock can be used for all the increment actions and then ReadWriteLock#writeLock for grabbing the latest total. The resulting service ended up looking like this:

public static class Accumulator {
    private final LongAdder statsAccumulator = new LongAdder();
    private final ReadWriteLock readWriteLock = new ReentrantReadWriteLock(true);

    public Accumulator inc() {
        readWriteLock.readLock().lock();
        try {
            this.statsAccumulator.increment();
            return this;
        } finally {
            readWriteLock.readLock().unlock();
        }
    }

    public InferenceStats currentStatsAndReset() {
        readWriteLock.writeLock().lock();
        try {
            Stats stats = currentStats(Instant.now());
            this.statsAccumulator.reset();
            return stats;
        } finally {
            readWriteLock.writeLock().unlock();
        }
    }

    public InferenceStats currentStats(Instant timeStamp) {
        return new Stats(statsAccumulator.longValue(), timeStamp);
    }
}

I thought I had the perfect high throughput, thread-safe, statistics gathering class. I mean, it doesn't ever block the writes unless we grab the currentStatsAndReset. The common hot-path of inc() is normally not blocking.

Perfect.

But I didn't do one thing. I never measured performance of this implementation against a dead-simple synchronized version. 🤦

Start simple, then measure

Here is the simple version:

public static class Accumulator {
    private long statsAccumulator = 0L; 
    public synchronized Accumulator inc() {
        this.statsAccumulator++;
        return this;
    }

    public synchronized InferenceStats currentStatsAndReset() {
        Stats stats = currentStats(Instant.now());
        this.statsAccumulator = 0L;
        return stats;
    }

    public InferenceStats currentStats(Instant timeStamp) {
        return new Stats(statsAccumulator, timeStamp);
    }
}

It doesn't use any of those classes designed for low contentioning locking. Just a plain 'ol synchronized methods. Surely, all that use of synchronized would increase contention on write.

Another developer on my team called me out on this. He was curious to see if my version was truly faster. I knew I was right, so I wrote a JMH benchmark to prove him wrong.

The results were not on my side:

    Benchmark                                                   Mode  Cnt        Score        Error  Units
    MultiThreadedStatsAccumulatorBenchmark.rwAccumulator_1      avgt   20     5957.399 ±    112.892  us/op
    MultiThreadedStatsAccumulatorBenchmark.rwAccumulator_128    avgt   20  7480921.908 ± 255364.820  us/op
    MultiThreadedStatsAccumulatorBenchmark.syncAccumulator_1    avgt   20      421.662 ±      2.616  us/op
    MultiThreadedStatsAccumulatorBenchmark.syncAccumulator_128  avgt   20   792910.927 ±  52219.577  us/op

My complex version (rwAccumulator) was almost 10x SLOWER. The simple, fully synchronized version (syncAccumulator) kicked my butt. Both with 1, and 128 separate threads!