Aemie Jariwala

Posted on Apr 13

Illuminating Unnoticed Efficiency Pitfalls with AtomicLongMap

#java #softwareengineering #google #testing

Benchmark testing for AtomicLongMap

Why am I doing this benchmark test?

It's important to verify if a particular change for a package, in this case Google Guava performs in an optimized manner within a multi-threaded environment and doesn't trigger a thread contention state. Well, I came across one such change in the implementation for AtomicLongMap which suffers from thread contention with the new releases.

Which method within AtomicLongMap is suffering from thread contention?

The underlying method - addAndGet(K key, int delta)

When was a new implementation introduced for `addAndGet` method in Google Guava?

Google Guava - v21.0

Why was this change introduced?

According to the release notes:

AtomicLongMap: added a number of methods such as accumulateAndGet(K, LongBinaryOperator) that take advantage of new Java functional types.

How did I notice it?

I bumped my java package to use GoogleGuava - v32.0.0 from GoogleGuava - v20.0 and my package blew up on load-testing. It took me a bit of digging through profiling and monitoring and was able to pinpoint that the trigger for the issue is none other than AtomicLongMap! This sparked the curiousity into understanding why did a new release start suffering from thread contention and that's the reason for the benchmark testing.

Enough story time, let's dig into the implementation difference

via Giphy

Google Guava 20.0 Implementation Flow [Psuedo]

Link to code: AtomicLongMap.java

addAndGet(K key, int delta) ---> atomic = map.putIfAbsent(key, delta) 
---> if  atomic.get() == 0 : 
       map.replace(atomic.get(), atomic, delta) 
       return delta
---> else: 
        val = atomic.get()
        atomic.compareAndSet(val, val + delta)
        return val + delta

Google Guava 32.0.0 Implementation Flow [Psuedo]

Note: Upon back-tracking, the below implementation flow was introduced in Google Guava v21.0 so this degradation is existing since v21.0 : https://github.com/google/guava/wiki/Release21 however the benchmark testing is carried out using Google Guava v32.0.0

Link to code: AtomicLongMap.java

addAndGet(K key, int delta) ---> accumulateAndGet(key, delta, Long::sum as aFunc) ---> 
updateAndGet(key, o -> aFunc.applyAsLong(o, delta) as uFunc) --->
map.compute(key, (k, x) -> uFunc.applyAsLong(x == null ? 0 : x.longValue()))

What's the possible difference in their behavior within a multi-threaded environment?

It is use of the specific underlying method of ConcurrentMap that is being utilised and the impact in their performance is majorly driven by how well it behaves in highly contended environment.

In case of Google Guava 20.0, putIfAbsent is utilised and when using putIfAbsent, the operation succeeds only if the key is not already present in the map. If the key is present, the operation does not perform any update and returns the existing value associated with the key. This behavior helps reduce contention because it avoids the need for complex coordination between threads when updating values. This is followed by using compareAndSet which is another atomic operation which may offer better performance in a multi-threaded environment compared to the compute method, especially when you're dealing with atomic updates on a single shared variable. This is because compareAndSet operates directly on a single atomic variable and doesn't involve the overhead of interacting with a concurrent data structure like a map.

In case of Google Guava 32.0.0, compute method is utilized and using compute method
with an accumulator function involves a more complex operation. It requires reading the current value associated with the key, applying the accumulatorfunction to compute a new value based on the old value (if any) and the input,
and then updating the map with the new value. In highly contended scenarios, multiple threads attempting to perform such operations concurrently may lead to increased contention and synchronization overhead, potentially affecting performance.

LET'S GET INTO TESTING !!

via Giphy

Note: Setup of the testing is present in the End of the blog

Case 1 : NumThreads = 2

Who won the battle? Google Guava v20.0
What's the difference in performance? 78.6%

Version = 20.0

Trial	Time to Execute (s)
1	14
2	14
3	14
4	14
5	14
Average	14

Version = 32.0.0

Trial	Time to Execute (s)
1	25
2	25
3	25
4	25
5	25
Average	25

Case 2 : NumThreads = 10

Who won the battle? Google Guava v20.0
What's the difference in performance? 73.2%

Version = 20.0

Trial	Time to Execute (s)
1	73
2	73
3	73
4	74
5	73
Average	73.2

Version = 32.0.0

Trial	Time to Execute (s)
1	126
2	126
3	127
4	127
5	128
Average	126.8

Case 3 : NumThreads = 32

Who won the battle? Google Guava v20.0
What's the difference in performance? 72.5%

Version = 20.0

Trial	Time to Execute (s)
1	236
2	235
3	234
4	234
5	235
Average	234.8

Version = 32.0.0

Trial	Time to Execute (s)
1	403
2	405
3	401
4	410
5	406
Average	405

Case 4 : NumThreads = 64

Who won the battle? Google Guava v20.0
What's the difference in performance? 70.7%

Version = 20.0

Trial	Time to Execute (s)
1	474
2	474
3	473
4	473
5	472
Average	473.2

Version = 32.0.0

Trial	Time to Execute (s)
1	804
2	811
3	809
4	809
5	805
Average	807.6

Conclusion

via Giphy

Google Guava v20.0 has a simplified implementation which performs 70% better than the later releases for AtomicLongMap.java. Will love to hear other people's opinion on the same and why do you think the new implementation of AtomicLongMap is suffering from thread contention!

Setup of the test

It is a simple benchmark test to test the performance of the 2 GoogleGuava releases addAndGet() method defined within the AtomicLongMap class in a multi-threaded environment

Main.java

import java.time.Clock;
import main.java.com.benchmark.Benchmark;

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class Main {
    private static Clock clock = Clock.systemDefaultZone();
    private static Benchmark benchmark = new Benchmark();
    private static final long NUM_CALLS = 1000000000;
    private static final int NUM_THREADS = 1;

    public static void main(String[] args) {
        long current = clock.millis();
        System.out.println("Current time : " + current);

        ExecutorService executor = Executors.newFixedThreadPool(NUM_THREADS);
        IncrementTask task = new IncrementTask(benchmark, NUM_CALLS);

        for (int threadPool = 0; threadPool < NUM_THREADS * 2; ++threadPool) {
            executor.submit(task);
        }

        executor.shutdown();

        try {
            executor.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

        long completed = clock.millis();
        long difference = completed - current;
        System.out.println("Completed time: " + completed);
        System.out.println("Time difference for Google Guava: " + difference/1000 + " seconds.");
    }
}

class IncrementTask implements Runnable {

    private Benchmark benchmark;
    private long numCalls;

    public IncrementTask(Benchmark benchmark, long numCalls) {
        this.benchmark = benchmark;
        this.numCalls = numCalls;
    }

    @Override
    public void run() {
        long threadId = Thread.currentThread().getId();
        String threadName = Thread.currentThread().getName();
        System.out.println("Task executed by Thread " + threadId + " (" + threadName + ")");

        synchronized (benchmark) {
            for (int trial = 0; trial < numCalls ; ++trial) {
                benchmark.increment(1);
            }
        }
    }
}

Benchmark.java

public class Benchmark {

    private final AtomicLongMap<String> counters =  AtomicLongMap.create();
    private final String KEY = "Key";

    public void increment(int delta) {
        counters.addAndGet(KEY, delta);
    }
}

DEV Community

Illuminating Unnoticed Efficiency Pitfalls with AtomicLongMap

Benchmark testing for AtomicLongMap

Why am I doing this benchmark test?

Which method within AtomicLongMap is suffering from thread contention?

When was a new implementation introduced for `addAndGet` method in Google Guava?

Why was this change introduced?

How did I notice it?

Enough story time, let's dig into the implementation difference

Google Guava 20.0 Implementation Flow [Psuedo]

Google Guava 32.0.0 Implementation Flow [Psuedo]

What's the possible difference in their behavior within a multi-threaded environment?

LET'S GET INTO TESTING !!

Case 1 : NumThreads = 2

Case 2 : NumThreads = 10

Case 3 : NumThreads = 32

Case 4 : NumThreads = 64

Conclusion

Setup of the test

Top comments (0)

Read next

Google Workspace Security Part 1: Common Security Settings and 2SV

Learn any programming language Like a Expert

What Is Bearer Tokens for REST APIs and How to Debug It With Code & Tools

Java scenario based interview questions

Benchmark testing for AtomicLongMap

Why am I doing this benchmark test?

Which method within AtomicLongMap is suffering from thread contention?

When was a new implementation introduced for addAndGet method in Google Guava?

Why was this change introduced?

How did I notice it?

Enough story time, let's dig into the implementation difference

Google Guava 20.0 Implementation Flow [Psuedo]

Google Guava 32.0.0 Implementation Flow [Psuedo]

What's the possible difference in their behavior within a multi-threaded environment?

LET'S GET INTO TESTING !!

Case 1 : NumThreads = 2

Case 2 : NumThreads = 10

Case 3 : NumThreads = 32

Case 4 : NumThreads = 64

Conclusion

Setup of the test

Read next

Google Workspace Security Part 1: Common Security Settings and 2SV

Learn any programming language Like a Expert

What Is Bearer Tokens for REST APIs and How to Debug It With Code & Tools

Java scenario based interview questions

When was a new implementation introduced for `addAndGet` method in Google Guava?