Jose Boretto

Posted on Oct 19 • Edited on Oct 21

How to join multiple Kafka topics - Benchmark

#kafka #eventdriven #kafkastream

Approach comparison

Apache Kafka is a powerful tool for building scalable real-time streaming applications, and a common use case is joining data from multiple Kafka topics. Depending on the needs of your application, there are several ways to approach this problem. In this article, we'll explore three approaches:

N consumers write to 1 table (reader-oriented) using Kafka consumers.
N consumers write to N tables (writer-oriented) using Kafka consumers.
N consumers write to 1 table using Kafka Streams.

Each of these strategies has its own trade-offs in terms of complexity, scalability, and performance. If you want to deep dive in each alternative, please check the following post: https://dev.to/joseboretto/how-to-join-multiple-kafka-topics-overview-19j8 .

Feature request

What do we need to satisfy our user's needs?

Allow our sellers to see and filter their catalog in https://www.manomano.fr/
Allow our sellers to see their catalog stats in https://www.manomano.fr/

System load

The microservice has to manage the following load.

Kafka Throughput (Writer)


Topic	Per Day (Million)	Requests/s	Partitions
inboundoffer	27.3	316.2	6
offer	24.62	284.9	6
product	13.13	152	12
availablestock	3.2	37	6
xcart_products	1.26	14	4
colibri_product_generic	0.13	1	4

Database size (Reader)

N Consumers Write to N Tables (Writer-Oriented)


Table	Rows (Million)	Size (GB)	Avg Size (Bytes)
inboundoffer	96	189	1968
offer	107	43	2488
product	126	90	1400
availablestock	92	10.2	108
xcart_products	118	32	271
colibri_product_generic	92	9.5	103

N consumers write to 1 table using Kafka Streams.


Table	Rows (Million)	Size (GB)	Avg Size (Bytes)
main	119	147	1235

Platform infrastructure

Database

Provider: Amazon RDS - https://aws.amazon.com/rds/
Engine: PostgreSQL
Size: db.r6g.8xlarge
Config: 2 readers and 1 writer


Model	Core Count	vCPU	Memory (GiB)	Storage	Dedicated EBS Bandwidth (Mbps)	Networking Performance (Gbps)
db.r6g.large	-	2	16	EBS-Only	up to 4,750	Up to 10
db.r6g.xlarge	-	4	32	EBS-Only	up to 4,750	Up to 10
db.r6g.2xlarge	-	8	64	EBS-Only	up to 4,750	Up to 10
db.r6g.4xlarge	-	16	128	EBS-Only	4,750	Up to 10
db.r6g.8xlarge	-	32	256	EBS-Only	9,000	12
db.r6g.12xlarge	-	48	384	EBS-Only	13,500	20
db.r6g.16xlarge	-	64	512	EBS-Only	19,000	25

Kafka cluster

Provider: Amazon MSK - https://aws.amazon.com/msk/
Version: 3.6
Brokers: 6
Size: kafka.m5.4xlarge


Broker Instance	vCPU	Memory (GiB)
kafka.t3.small	2	2
kafka.m5.large	2	8
kafka.m5.xlarge	4	16
kafka.m5.2xlarge	8	32
kafka.m5.4xlarge	16	64
kafka.m5.8xlarge	32	128
kafka.m5.12xlarge	48	192
kafka.m5.16xlarge	64	256
kafka.m5.24xlarge	96	384

Case 2: N Consumers Write to N Tables (Writer-Oriented) with Kafka Consumer

In this method, each consumer reads from its Kafka topic and writes to its own dedicated table in the database. This is a "writer-oriented" approach, where each consumer is responsible for maintaining its isolated dataset.

Hardware

Provider: Amazon Elastic Kubernetes Service (EKS)
Pods: 6
Memory: 8GiB (Used: 50%)
CPU: 2000m (Used: 100%)

Software

Language: Java 21
Framework:
- Spring Boot 3.2.2
- https://mvnrepository.com/artifact/org.springframework.boot/spring-boot/3.2.2
- Reactor Core 3.6.4
- https://mvnrepository.com/artifact/io.projectreactor/reactor-core/3.6.4

Performance

Writer


Resource Name	Requests	Avg Latency	p50 Latency	p75 Latency	p90 Latency	p95 Latency	Requests/s
inboundOffer	30.7M	94.2 μs	79.3 μs	95.5 μs	126 μs	162 μs	355 req/s
offer	26.1M	190 μs	157 μs	224 μs	283 μs	310 μs	302 req/s
product	10.8M	431 μs	380 μs	450 μs	595 μs	717 μs	125 req/s
availableStock	2.58M	182 μs	178 μs	214 μs	258 μs	283 μs	29.9 req/s
xcart_products	1.27M	95.2 μs	62.9 μs	124 μs	192 μs	221 μs	14.6 req/s
colibri_product_generic	129k	74.1 μs	54.7 μs	76.9 μs	132 μs	170 μs	1.5 req/s

Reader


Resource Name	Requests	Avg Latency	p50 Latency	p75 Latency	p90 Latency	p95 Latency	Requests/s
GET products v1	100.0k	1.01 s	77.4 ms	413 ms	925 ms	1.89 s	0.2 req/s
GET stats v1	27.0k	5.76 s	371 ms	2.17 s	10.4 s	26.8 s	< 0.1 req/s

Database Considerations:

This architecture requires joining 6 tables.
All indexes are optimized for search by filters.
No partitioned tables.

Case 3: Kafka Streams Write to Another Kafka Topic, then a Consumer Writes to 1 table

In this enhanced version of the Kafka Streams approach, Kafka Streams instances read from multiple topics, process the data (joins, aggregations, filtering, etc.), and then write the processed output to a new Kafka topic. A separate Kafka consumer (or consumers) can then read from this topic and handle the writes to the database.

Hardware

Provider: Amazon Elastic Kubernetes Service (EKS)
Pods: 12
Memory: 8GiB (Used: 100%)
CPU: 4000m (Used: 20%)
Disk: 100Gi (Used: 10%)
Kafka internal topics: 26 topics with total size of 500G (without taking into account the replication factor)

Software

Language: Java 21
Framework:
- Spring Boot 3.2.2
- https://mvnrepository.com/artifact/org.springframework.boot/spring-boot/3.2.2
- Kafka Streams 3.6.2
- https://mvnrepository.com/artifact/org.apache.kafka/kafka-streams/3.6.2

Performance

Kafka stream writer (From input topic to main topic)


Resource Name	Requests	Avg Latency	p50 Latency	p75 Latency	p90 Latency	p95 Latency	Requests/s
inboundOffer	61.0M	21.5 μs	15.1 μs	21.9 μs	44.0 μs	62.9 μs	706 req/s
offer	51.8M	59.1 μs	31.3 μs	76.9 μs	134 μs	183 μs	599 req/s
product	21.9M	305 μs	92.6 μs	392 μs	740 μs	1.07 ms	254 req/s
availableStock	5.17M	50.6 μs	34.9 μs	67.9 μs	103 μs	126 μs	59.9 req/s
xcart_products	2.47M	33.7 μs	17.4 μs	38.9 μs	89.8 μs	122 μs	28.6 req/s
colibri_product_generic	259k	26.9 μs	13.3 μs	32.8 μs	60.9 μs	84.4 μs	3.0 req/s

Kafka writer (From main topic to database)


Resource Name	Requests	Avg Latency	p50 Latency	p75 Latency	p90 Latency	p95 Latency	Requests/s
Main topic	17.2M	17.9 μs	12.5 μs	23.7 μs	32.3 μs	36.0 μs	199 req/s

Reader


Resource Name	Requests	Avg Latency	p50 Latency	p75 Latency	p90 Latency	p95 Latency	Requests/s
GET products v2	49.7k	211 ms	44.3 ms	114 ms	365 ms	756 ms	< 0.1 req/s
GET stats v2	9.12k	135 ms	34.6 ms	114 ms	308 ms	599 ms	< 0.1 req/s

Database Considerations:

Queries involve only 1 table (no joins).
Partitioned table with all necessary indexes for filtering.

Performance Comparison


Metric	N Consumers Write to N Tables (Writer-Oriented)	Kafka Streams
Write Throughput (requests/s)
InboundOffer	355 req/s	706 req/s
Offer	302 req/s	599 req/s
Product	125 req/s	254 req/s
AvailableStock	29.9 req/s	59.9 req/s
Xcart_Products	14.6 req/s	28.6 req/s
Colibri_Product_Generic	1.5 req/s	3.0 req/s
p50 Write Latency
InboundOffer	79.3 μs	15.1 μs
Offer	157 μs	31.3 μs
Product	380 μs	92.6 μs
AvailableStock	178 μs	34.9 μs
Xcart_Products	62.9 μs	17.4 μs
Colibri_Product_Generic	54.7 μs	13.3 μs
GET Products
p50 Read Latency	77.4 ms	44.3 ms
p90 Read Latency	925 ms	365 ms
GET Stats
p50 Read Latency	371 ms	34.6 ms
p90 Read Latency	10.4 s	308 ms
Infrastructure
CPU Usage	100% CPU usage (on 2000m CPU, 6 pods)	20% CPU usage (on 4000m CPU, 12 pods)
Memory Usage	50% of 8GiB (6 pods)	100% of 8GiB (12 pods)
Disk Usage	0 GiB	10% of 100GiB (12 pods)
Database Write Complexity	Writes to 6 tables, requires joining for reads	Writes to 1 table, no joins required for reads

Conclusion

Write Latency (p50): Kafka Streams achieves much lower p50 write latency across all topics, indicating faster processing of the majority of requests.
Read Latency (p50): Kafka Streams shows a clear advantage in read latency, with a p50 latency of 44.3 ms compared to 77.4 ms in GET Products (2x improvement) and p50 latency of 34.6 ms compared to 371 ms in GET Stats (10x improvement) for the N Consumers approach. This improvement is due to writing to a single table, avoiding complex joins.
Resource Efficiency: Kafka Streams uses more memory and disk space.

Credits

To all the developers involved in this 3 year journey.

DEV Community

How to join multiple Kafka topics - Benchmark

Approach comparison

Feature request

System load

Kafka Throughput (Writer)

Database size (Reader)

Platform infrastructure

Database

Kafka cluster

Case 2: N Consumers Write to N Tables (Writer-Oriented) with Kafka Consumer

Hardware

Software

Performance

Case 3: Kafka Streams Write to Another Kafka Topic, then a Consumer Writes to 1 table

Hardware

Software

Performance

Performance Comparison

Conclusion

Credits

Top comments (0)

Read next

Technical Interview Questions - Part 7 - Promises Methods

How to detect query which holds the lock in Postgres?

Check Table size in Postgresql Database

Removing a Specific Item from an Array in JavaScript