Germano Schneider

Posted on Feb 7 • Edited on Feb 16

Chunk vs Tasklet, which one should I use?

#springbatch #springboot

Introduction

Commonly we search for good alternatives when the content is high application data consumption. In Java development, we can use Spring Batch as a light and resilient option to execute daily schedulers to process a huge amount of data. In addition, it's possible to do it in small, medium, or even big batches to optimize the application memory consumption.

Part of the Spring Batch ecosystem is about step executions included in a job. So, there are two definition types for it, they are called Chunk and Tasklet. I will explain both in detail to you.

Chunk

Try to figure out a scenario where your application needs to read a thousand registers from a table that returns integer numbers, then you need to sort all data in ascending order. For example, if the returned number is 54321, it should be sorted to 12345. Finally, the processed data should be updated in the database.

Given this scenario, we can create a job that will execute ten separate batches, each executing a hundred registers each time. This way, the application will manage and process the data with little memory consumption.

@Bean
Step sortNumbersInNaturalOrderStep(JobRepository jobRepository, PlatformTransactionManager platformTransactionManager) {

    return new StepBuilder("sortNumbersInNaturalOrderStep", jobRepository)
     .<Map<Integer, Integer>, Map<Integer, Integer>>chunk(10, platformTransactionManager)
     .reader(cursorItemReader())
     .processor(new SortNumberProcessor())
     .writer(new NumberWriter(dataSource))
     .build();
}

As you can see, we have a step called sortNumbersInNaturalOrderStep, it has a chunk that will receive and process a map of integers, where the key is the ID, and the value will be the processed number. In addition, we are setting the item’s quantity (10) to be processed in each batch and also a transaction manager.

When we work with the chunk concept, we should attribute a reader, processor, and writer. To make it clearer, I will explain how each one works.

1. Reader

A reader should read the data from some specific source, it can be a JSON file or even a database. In our example we are setting a reader implementation from an abstraction offered by Spring Batch, it’s called JdbcCursorItemReader. It reads the data from a specific JDBC data source, which should be informed. In this example, we are getting all data from the numbers table in an H2-embedded database.

@Bean
JdbcCursorItemReader<Map<Integer, Integer>> cursorItemReader() {

    JdbcCursorItemReader<Map<Integer, Integer>> reader = new JdbcCursorItemReader<>();

    reader.setSql("SELECT * FROM numbers");
    reader.setDataSource(dataSource);
    reader.setRowMapper(new DatabaseRowMapper());

    return reader;
}

We are doing a SELECT statement in the database to find all the data. Note that we also set a DatabaseRowMapper, this is to map the received value from the database and transform it into a map of integers.

@Override
public Map<Integer, Integer> mapRow(ResultSet rs, int rowNum) throws SQLException {

      Map<Integer, Integer> map = new HashMap<>();

      map.put(rs.getInt("id"), rs.getInt("number"));

      return map;
}

The mapRow interface returns a map with the ID and number from the numbers table.

2. Processor

This is an optional configuration, we could only read the data and then write it, without having to inform a processor. But, for this example, we are setting a processor to sort each number that we received from the reader step.

@Override
public Map<Integer, Integer> process(Map<Integer, Integer> item) throws Exception {

      final Map<Integer, Integer> newMap = new HashMap<>();

      for (Map.Entry<Integer, Integer> mapValue : item.entrySet()) {

         Integer key = mapValue.getKey();
         Integer value = mapValue.getValue();

         List<String> numbers = new ArrayList<>(stream(value.toString()
                    .split(""))
                    .toList());

         numbers.sort(Comparator.naturalOrder());

         Integer newValue = parseInt(join("", numbers));

         newMap.put(key, newValue);
     }

     return newMap;
}

What we are doing here is getting each map, capturing the ID and number, and then sorting it to return a newly updated map with the all ordered numbers.

3. Writer

After the data is processed, it goes to the writer layer which will be responsible for updating the number on the database according to the given ID. The items are bonded in a list and then we iterate and update the captured value.

@Override
public void write(Chunk<? extends Map<Integer, Integer>> chunk) throws Exception {

        for (Map<Integer, Integer> value : chunk.getItems()) {

           for (Map.Entry<Integer, Integer> mapValue : value.entrySet())
              jdbcTemplate.update("UPDATE numbers SET number = (?) WHERE id = (?)", mapValue.getValue(), mapValue.getKey());
        }
}

Tasklet

Unlike the chunk’s concept, the tasklet step assumes the responsibility to execute a unique task individually. There’s no necessity to process the data in multiple batches. We can use, for example, a tasklet execution if we need to remove duplicated registers from the database.

@Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {

      String sql = "DELETE FROM numbers t1 WHERE t1.id > (SELECT MIN(t2.id) FROM numbers t2 WHERE t1.number = t2.number)";

      jdbcTemplate.execute(sql);

      return RepeatStatus.FINISHED;
}

In the example above, we put a DELETE statement to remove all duplicated data from the database. So, the only thing that we need to do is invoke this step after the previous one.

Conclusion

It's not so difficult, is it? The purpose here was to share a little bit about the main concepts between Chunk and Tasklet. Please note that the task’s execution should be managed by you, the Spring Batch framework usually works with some common schedulers, like the Spring Scheduler, Control-M, and so on.

Do you still have questions or concerns about Chunk and Tasklet? Please, look at the official Spring Batch documentation to complement your knowledge about it, or leave your comment here, and I will try to help you.

To see the whole code used in this article, please take a look at my GitHub:

https://github.com/GermanoSchneider/spring-batch-overview

Thank you, see you next time.

References

https://docs.spring.io/spring-batch/reference/index.html

DEV Community

Chunk vs Tasklet, which one should I use?

Introduction

Chunk

Tasklet

Conclusion

References

Top comments (0)

Read next

Choosing the Right Java Microservices Framework: Spring Boot, Quarkus, Micronaut, and Beyond

Spring Boot 3 application on AWS Lambda - Part 13 Measuring cold and warm starts with GraalVM Native Image

Building a Simple Voucher System for Small Businesses

Connecting AWS RDS to Spring Boot