DEV Community

Cover image for How to join multiple Kafka topics - Benchmark
Jose Boretto
Jose Boretto

Posted on • Edited on

How to join multiple Kafka topics - Benchmark

Approach comparison 

Apache Kafka is a powerful tool for building scalable real-time streaming applications, and a common use case is joining data from multiple Kafka topics. Depending on the needs of your application, there are several ways to approach this problem. In this article, we'll explore three approaches:

  1. N consumers write to 1 table (reader-oriented) using Kafka consumers.

  2. N consumers write to N tables (writer-oriented) using Kafka consumers.

  3. N consumers write to 1 table using Kafka Streams.

Each of these strategies has its own trade-offs in terms of complexity, scalability, and performance. If you want to deep dive in each alternative, please check the following post: https://dev.to/joseboretto/how-to-join-multiple-kafka-topics-overview-19j8 .

Feature request 

What do we need to satisfy our user's needs?

Image description

System load

The microservice has to manage the following load.

Kafka Throughput (Writer)

Topic Per Day (Million) Requests/s Partitions
inboundoffer 27.3 316.2 6
offer 24.62 284.9 6
product 13.13 152 12
availablestock 3.2 37 6
xcart_products 1.26 14 4
colibri_product_generic 0.13 1 4

Database size (Reader)

N Consumers Write to N Tables (Writer-Oriented)

Table Rows (Million) Size (GB) Avg Size (Bytes)
inboundoffer 96 189 1968
offer 107 43 2488
product 126 90 1400
availablestock 92 10.2 108
xcart_products 118 32 271
colibri_product_generic 92 9.5 103

N consumers write to 1 table using Kafka Streams.

Table Rows (Million) Size (GB) Avg Size (Bytes)
main 119 147 1235

Platform infrastructure

Database

Model Core Count vCPU Memory (GiB) Storage Dedicated EBS Bandwidth (Mbps) Networking Performance (Gbps)
db.r6g.large - 2 16 EBS-Only up to 4,750 Up to 10
db.r6g.xlarge - 4 32 EBS-Only up to 4,750 Up to 10
db.r6g.2xlarge - 8 64 EBS-Only up to 4,750 Up to 10
db.r6g.4xlarge - 16 128 EBS-Only 4,750 Up to 10
db.r6g.8xlarge - 32 256 EBS-Only 9,000 12
db.r6g.12xlarge - 48 384 EBS-Only 13,500 20
db.r6g.16xlarge - 64 512 EBS-Only 19,000 25

Kafka cluster

Broker Instance vCPU Memory (GiB)
kafka.t3.small 2 2
kafka.m5.large 2 8
kafka.m5.xlarge 4 16
kafka.m5.2xlarge 8 32
kafka.m5.4xlarge 16 64
kafka.m5.8xlarge 32 128
kafka.m5.12xlarge 48 192
kafka.m5.16xlarge 64 256
kafka.m5.24xlarge 96 384

Case 2: N Consumers Write to N Tables (Writer-Oriented) with Kafka Consumer

In this method, each consumer reads from its Kafka topic and writes to its own dedicated table in the database. This is a "writer-oriented" approach, where each consumer is responsible for maintaining its isolated dataset.

Image description

Hardware

  • Provider: Amazon Elastic Kubernetes Service (EKS)

  • Pods: 6

  • Memory: 8GiB (Used: 50%)

  • CPU: 2000m (Used: 100%)

Software

Performance

Writer

Resource Name Requests Avg Latency p50 Latency p75 Latency p90 Latency p95 Latency Requests/s
inboundOffer 30.7M 94.2 μs 79.3 μs 95.5 μs 126 μs 162 μs 355 req/s
offer 26.1M 190 μs 157 μs 224 μs 283 μs 310 μs 302 req/s
product 10.8M 431 μs 380 μs 450 μs 595 μs 717 μs 125 req/s
availableStock 2.58M 182 μs 178 μs 214 μs 258 μs 283 μs 29.9 req/s
xcart_products 1.27M 95.2 μs 62.9 μs 124 μs 192 μs 221 μs 14.6 req/s
colibri_product_generic 129k 74.1 μs 54.7 μs 76.9 μs 132 μs 170 μs 1.5 req/s

Reader

Resource Name Requests Avg Latency p50 Latency p75 Latency p90 Latency p95 Latency Requests/s
GET products v1 100.0k 1.01 s 77.4 ms 413 ms 925 ms 1.89 s 0.2 req/s
GET stats v1 27.0k 5.76 s 371 ms 2.17 s 10.4 s 26.8 s < 0.1 req/s

Database Considerations:

  • This architecture requires joining 6 tables.

  • All indexes are optimized for search by filters.

  • No partitioned tables.

Case 3: Kafka Streams Write to Another Kafka Topic, then a Consumer Writes to 1 table

In this enhanced version of the Kafka Streams approach, Kafka Streams instances read from multiple topics, process the data (joins, aggregations, filtering, etc.), and then write the processed output to a new Kafka topic. A separate Kafka consumer (or consumers) can then read from this topic and handle the writes to the database.

Image description

Hardware

  • Provider: Amazon Elastic Kubernetes Service (EKS)

  • Pods: 12

  • Memory: 8GiB (Used: 100%)

  • CPU: 4000m (Used: 20%)

  • Disk: 100Gi (Used: 10%)

  • Kafka internal topics: 26 topics with total size of 500G (without taking into account the replication factor) 

Software

Performance

Kafka stream writer (From input topic to main topic)

Resource Name Requests Avg Latency p50 Latency p75 Latency p90 Latency p95 Latency Requests/s
inboundOffer 61.0M 21.5 μs 15.1 μs 21.9 μs 44.0 μs 62.9 μs 706 req/s
offer 51.8M 59.1 μs 31.3 μs 76.9 μs 134 μs 183 μs 599 req/s
product 21.9M 305 μs 92.6 μs 392 μs 740 μs 1.07 ms 254 req/s
availableStock 5.17M 50.6 μs 34.9 μs 67.9 μs 103 μs 126 μs 59.9 req/s
xcart_products 2.47M 33.7 μs 17.4 μs 38.9 μs 89.8 μs 122 μs 28.6 req/s
colibri_product_generic 259k 26.9 μs 13.3 μs 32.8 μs 60.9 μs 84.4 μs 3.0 req/s

Kafka writer (From main topic to database)

Resource Name Requests Avg Latency p50 Latency p75 Latency p90 Latency p95 Latency Requests/s
Main topic 17.2M 17.9 μs 12.5 μs 23.7 μs 32.3 μs 36.0 μs 199 req/s

Reader

Resource Name Requests Avg Latency p50 Latency p75 Latency p90 Latency p95 Latency Requests/s
GET products v2 49.7k 211 ms 44.3 ms 114 ms 365 ms 756 ms < 0.1 req/s
GET stats v2 9.12k 135 ms 34.6 ms 114 ms 308 ms 599 ms < 0.1 req/s

Database Considerations:

  • Queries involve only 1 table (no joins).

  • Partitioned table with all necessary indexes for filtering.

Performance Comparison

Metric N Consumers Write to N Tables (Writer-Oriented) Kafka Streams
Write Throughput (requests/s)
InboundOffer 355 req/s 706 req/s
Offer 302 req/s 599 req/s
Product 125 req/s 254 req/s
AvailableStock 29.9 req/s 59.9 req/s
Xcart_Products 14.6 req/s 28.6 req/s
Colibri_Product_Generic 1.5 req/s 3.0 req/s
p50 Write Latency
InboundOffer 79.3 μs 15.1 μs
Offer 157 μs 31.3 μs
Product 380 μs 92.6 μs
AvailableStock 178 μs 34.9 μs
Xcart_Products 62.9 μs 17.4 μs
Colibri_Product_Generic 54.7 μs 13.3 μs
GET Products
p50 Read Latency 77.4 ms 44.3 ms 
p90 Read Latency 925 ms 365 ms
GET Stats
p50 Read Latency 371 ms 34.6 ms
p90 Read Latency 10.4 s 308 ms
Infrastructure
CPU Usage 100% CPU usage (on 2000m CPU, 6 pods) 20% CPU usage (on 4000m CPU, 12 pods)
Memory Usage 50% of 8GiB (6 pods) 100% of 8GiB (12 pods)
Disk Usage 0 GiB 10% of 100GiB (12 pods)
Database Write Complexity Writes to 6 tables, requires joining for reads Writes to 1 table, no joins required for reads

Conclusion

  • Write Latency (p50): Kafka Streams achieves much lower p50 write latency across all topics, indicating faster processing of the majority of requests.

  • Read Latency (p50): Kafka Streams shows a clear advantage in read latency, with a p50 latency of 44.3 ms compared to 77.4 ms in GET Products (2x improvement) and p50 latency of 34.6 ms compared to 371 ms in GET Stats (10x improvement) for the N Consumers approach. This improvement is due to writing to a single table, avoiding complex joins.

  • Resource Efficiency: Kafka Streams uses more memory and disk space. 

Credits

To all the developers involved in this 3 year journey.

Top comments (0)