N + 1 problem is a common issue in many enterprise projects. The worst is that you don't notice it until the amount of data becomes huge. Unfortunately, the code might reach the stage when dealing with N + 1 problem becomes an unbearable task.
In this article, I'm telling you:
- How to track N + 1 problem automatically?
- How to write a test to check that the query count does not exceed the expected value?
The tech stack consists of Java, Spring Boot, Spring Data JPA, and PostgreSQL. You can check out the repository with code examples by this link.
There are no restrictions to apply Spring Boot or Hibernate specifically. If you interact with
javax.sql.DataSource
in your codebase, then the solution will help you. Even if you don't use Spring at all.
An example of the N + 1 problem
Supposing we are working on the application that manages zoos. In that case, there are two core entities: Zoo
and Animal
. Look at the code snippet below:
@Entity
@Table(name = "zoo")
public class Zoo {
@Id
@GeneratedValue(strategy = IDENTITY)
private Long id;
private String name;
@OneToMany(mappedBy = "zoo", cascade = PERSIST)
private List<Animal> animals = new ArrayList<>();
}
@Entity
@Table(name = "animal")
public class Animal {
@Id
@GeneratedValue(strategy = IDENTITY)
private Long id;
@ManyToOne(fetch = LAZY)
@JoinColumn(name = "zoo_id")
private Zoo zoo;
private String name;
}
Now we want to retrieve all existing zoos with their animals. Look at the code ZooService
code below.
@Service
@RequiredArgsConstructor
public class ZooService {
private final ZooRepository zooRepository;
@Transactional(readOnly = true)
public List<ZooResponse> findAllZoos() {
final var zoos = zooRepository.findAll();
return zoos.stream()
.map(ZooResponse::new)
.toList();
}
}
Also, we want to check that everything works smoothly. So, here is a simple integration test:
@DataJpaTest
@AutoConfigureTestDatabase(replace = NONE)
@Transactional(propagation = NOT_SUPPORTED)
@Testcontainers
@Import(ZooService.class)
class ZooServiceTest {
@Container
static final PostgreSQLContainer<?> POSTGRES = new PostgreSQLContainer<>("postgres:13");
@DynamicPropertySource
static void setProperties(DynamicPropertyRegistry registry) {
registry.add("spring.datasource.url", POSTGRES::getJdbcUrl);
registry.add("spring.datasource.username", POSTGRES::getUsername);
registry.add("spring.datasource.password", POSTGRES::getPassword);
}
@Autowired
private ZooService zooService;
@Autowired
private ZooRepository zooRepository;
@Test
void shouldReturnAllZoos() {
/* data initialization... */
zooRepository.saveAll(List.of(zoo1, zoo2));
final var allZoos = assertQueryCount(
() -> zooService.findAllZoos(),
ofSelects(1)
);
/* assertions... */
assertThat(
...
);
}
}
I skipped the data initialization and assertions parts for the sake of simplicity. They are not important for the article's topic. Anyway, you can check out the whole test suite by this link.
I have a particular piece about testing data layer in Spring Boot application with Testcontainers. If you're unfamiliar with the topic, you should definitely look through it.
The test passes successfully. However, if you log SQL statements, you'll notice something that may concern you. Look at the output below:
-- selecting all zoos
select z1_0.id,z1_0.name from zoo z1_0
-- selecting animals for the first zoo
select a1_0.zoo_id,a1_0.id,a1_0.name from animal a1_0 where a1_0.zoo_id=?
-- selecting animals for the second zoo
select a1_0.zoo_id,a1_0.id,a1_0.name from animal a1_0 where a1_0.zoo_id=?
As you can see, we have a separate select
query for each present Zoo
. The total number of queries equals the number of selected zoos + 1. Therefore, this is the N + 1 problem.
This may cause crucial performance penalties. Especially on a large scale of data.
Tracking the N + 1 problem automatically
Of course, you can run tests, look through the logs, and count queries by yourself to determine viable performance issues. Anyway, this is both tedious and inefficient. Thankfully, there is a better approach.
There is a cool library called datasource-proxy. It provides a convenient API to wrap javax.sql.DataSource
interface with a proxy containing specific logic. For example, we can register callbacks invoked before and after query execution. What's interesting is that the library also contains out-of-the-box solution to count executed queries. We're going to alter it a bit to serve our needs.
Query Count Service
Firstly, add the library to the dependencies:
implementation "net.ttddyy:datasource-proxy:1.8"
Now create the QueryCountService
. It’s the singleton that holds the current count of executed queries and allows you to clean it. Look at the code snippet below.
@UtilityClass
public class QueryCountService {
static final SingleQueryCountHolder QUERY_COUNT_HOLDER = new SingleQueryCountHolder();
public static void clear() {
final var map = QUERY_COUNT_HOLDER.getQueryCountMap();
map.putIfAbsent(keyName(map), new QueryCount());
}
public static QueryCount get() {
final var map = QUERY_COUNT_HOLDER.getQueryCountMap();
return ofNullable(map.get(keyName(map))).orElseThrow();
}
private static String keyName(Map<String, QueryCount> map) {
if (map.size() == 1) {
return map.entrySet()
.stream()
.findFirst()
.orElseThrow()
.getKey();
}
throw new IllegalArgumentException("Query counts map should consists of one key: " + map);
}
}
In that case, we make an assumption there is a single
DataSource
in our application. That’s why thekeyName
function throws an exception otherwise. However, the code won’t differ much with multiple data sources usage.
The SingleQueryCountHolder
stores all QueryCount
objects in a regular ConcurrentHashMap
.
On the contrary,
ThreadQueryCountHolder
stores the values inThreadLocal
object. ButSingleQueryCountHolder
is enough for our case.
The API provides two methods. The get
method returns the current amount of executed queries whilst the clear
one sets the count to zero.
BeanPostProccessor and DataSource proxy
Now we need to register the QueryCountService
to make it collect the data from the DataSource
. In that case, the BeanPostProcessor interface comes in handy. Look at the code example below.
@TestComponent
public class DatasourceProxyBeanPostProcessor implements BeanPostProcessor {
@Override
public Object postProcessAfterInitialization(Object bean, String beanName) {
if (bean instanceof DataSource dataSource) {
return ProxyDataSourceBuilder.create(dataSource)
.countQuery(QUERY_COUNT_HOLDER)
.build();
}
return bean;
}
}
I mark the class with
@TestComponent
annotation and put it tosrc/test
directory because I don't need to count queries outside of the test scope.
As you can see, the idea is trivial. If a bean is DataSource
, then wrap it with ProxyDataSourceBuilder
and put the QUERY_COUNT_HOLDER
value as the QueryCountStrategy
.
Finally, we want to assert the amount of executed queries for the specific method. Look at the code snippet with custom assertions below:
Custom assertions
@UtilityClass
public class QueryCountAssertions {
@SneakyThrows
public static <T> T assertQueryCount(Supplier<T> supplier, Expectation expectation) {
QueryCountService.clear();
final var result = supplier.get();
final var queryCount = QueryCountService.get();
assertAll(
() -> {
if (expectation.selects >= 0) {
assertEquals(expectation.selects, queryCount.getSelect(), "Unexpected selects count");
}
},
() -> {
if (expectation.inserts >= 0) {
assertEquals(expectation.inserts, queryCount.getInsert(), "Unexpected inserts count");
}
},
() -> {
if (expectation.deletes >= 0) {
assertEquals(expectation.deletes, queryCount.getDelete(), "Unexpected deletes count");
}
},
() -> {
if (expectation.updates >= 0) {
assertEquals(expectation.updates, queryCount.getUpdate(), "Unexpected updates count");
}
}
);
return result;
}
}
The algorithm is straightforward:
- Set the current queries count to zero.
- Executed the provided lambda.
- Assert the query count to the given
Expectation
object. - If everything passes successfully, return the result of execution.
Also, you’ve noticed an additional condition. If the provided type of count is less than zero, skip the assertion. It’s convenient, when you don’t care about other queries count.
The Expectation
class is just a regular data structure. Look at its declaration below:
@With
@AllArgsConstructor
@NoArgsConstructor
public static class Expectation {
private int selects = -1;
private int inserts = -1;
private int deletes = -1;
private int updates = -1;
public static Expectation ofSelects(int selects) {
return new Expectation().withSelects(selects);
}
public static Expectation ofInserts(int inserts) {
return new Expectation().withInserts(inserts);
}
public static Expectation ofDeletes(int deletes) {
return new Expectation().withDeletes(deletes);
}
public static Expectation ofUpdates(int updates) {
return new Expectation().withUpdates(updates);
}
}
The final example
Let’s see how it works. At first, I add query assertions in the previous case with N + 1 problem. Look at the code block below:
final var allZoos = assertQueryCount(
() -> zooService.findAllZoos(),
ofSelects(1)
);
Don't forget to import
DatasourceProxyBeanPostProcessor
as a Spring bean in your tests.
If we rerun the test, we'll get the output below.
Multiple Failures (1 failure)
org.opentest4j.AssertionFailedError: Unexpected selects count ==> expected: <1> but was: <3>
Expected :1
Actual :3
So, the assertion does work. We managed to track the N + 1 problem automatically. Time to replace the regular selection with JOIN FETCH
. Look at the code snippet below.
public interface ZooRepository extends JpaRepository<Zoo, Long> {
@Query("FROM Zoo z LEFT JOIN FETCH z.animals")
List<Zoo> findAllWithAnimalsJoined();
}
@Service
@RequiredArgsConstructor
public class ZooService {
private final ZooRepository zooRepository;
@Transactional(readOnly = true)
public List<ZooResponse> findAllZoos() {
final var zoos = zooRepository.findAllWithAnimalsJoined();
return zoos.stream()
.map(ZooResponse::new)
.toList();
}
}
Let's run the test again and check out the result:
Meaning that the assertion tracks N + 1 problems correctly. Besides, it passes successfully, if the amount of queries equals to the expected one. Great!
Conclusion
As a matter of fact, it is possible to prevent N + 1 problems with regular tests. I think that’s a great opportunity to put guards for those code parts that are crucial to performance perspective.
That’s all I wanted to tell you about dealing with N + 1 problem in the automatic way. If you have any questions or suggestions, leave your comments down below. Also, if you like this piece, share it with your friends and colleagues. Perhaps they’ll find it beneficial too. Thanks for reading!
Top comments (2)
Great one again!!
So, why cascade.persist and not cascade.all?
Also, Now people are kind of against Lombok whats your stand?
@mahadev_k thank you!
Cascade.ALL
also addsCascade.REMOVE
that should be avoided.I have an opposite opinion. I think that you should always use Lombok. Consider
@EqualsAndHashCode
annotation. You can click hot key in IDEA and generate those methods. Suppose you add a new field to the class. That means you also have to updateequals
andhashCode
accordingly. Therefore, one change requires updating code in two places. That can also be the reason of merge conflicts, if two people changing the content of the same class. As a matter of fact, you can look at other JVM languages. For example, Scala and Kotlin don't need Lombok because there is thedata class
pattern. So, the problem is solved on the language level. As I see it, Java does need to Lombok to deal with this design pitfalls.