When working with Kafka, you produce data and consume it. This post will explain how to configure a Kafka consumer. Unlike other message bus systems, Kafka allows for different consumer "modes", depending on your use case. For instance:
- You want an event to be consumed by a single consumer
- You want an event to be consumed by more than one consumer (broadcast)
Exploring Kafka Consumer Modes
Quick recap: Kafka organizes data in topics and partitions. A topic is broken down into one or more partitions. When setting up a consumer, you might do it like this:
let mut c = {
let mut cb = Consumer::from_hosts(cfg.brokers)
.with_group(cfg.group)
.with_fallback_offset(cfg.fallback_offset)
.with_fetch_max_wait_time(Duration::from_secs(1))
.with_fetch_min_bytes(1_000)
.with_fetch_max_bytes_per_partition(100_000)
.with_retry_max_bytes_limit(1_000_000)
.with_offset_storage(cfg.offset_storage)
.with_client_id("kafka-rust-console-consumer".into());
for topic in cfg.topics {
cb = cb.with_topic(topic);
}
cb.create().unwrap()
};
(source:
https://github.com/kafka-rust/kafka-rust/blob/master/examples/console-consumer.rs#L27-L40)
Among other settings, the configuration above contains:
- a list of brokers
- a group id
- a topic
We want to spend some time talking about the group id. Its value influences the consumer's behavior (depending on the number of partitions for the configured topic).
All Consumers are in the Same Consumer Group
If the configured topic has a single partition and only one consumer,
this consumer works off all data. Once the topic gets split up into more partitions, the consumer consumes data from all partitions. Our code stays the same; Kafka does this behind the scenes.
Things get more interesting once another consumer joins. If the topic has two partitions, each consumer is responsible for one partition as long they are both in the same consumer group. Each consumer has its own "work stream", meaning it does not interfere with the other consumer's work. Each message gets processed by only one consumer.
Consumers in Different Consumer Groups
Sometimes you need to connect multiple consumers to a topic and partition. For instance, a topic payment_processed
contains a stream of successfully processed payments in an online store. When a payment is processed successfully, two things need to happen:
- Kick off the shipping process
- Inform the user of successful payment and the next steps
We need more than one consumer to consume a message in such cases. To make sure both consumers can work off of the same partition, we assign them different group IDs.
Both of these groups can have more than one consumer. In that case, only one member gets to process the message.
How does a Consumer track its Progress?
Systems like RabbitMQ delete a payload from the queue once a subscriber has acknowledged it. On the other hand, Kafka does not delete data from a topic. Topics must persist to allow several consumers to work with the same data.
How does a consumer keep track of the work? Every consumer maintains an offset counter, indicating the index of the last read message. Once a consumer reads a message, it returns the offset to Kafka. Kafka stores offsets in a separate topic __consumer_offsets
. While the information is stored with the broker, it does not do anything with it. It's the consumer's responsibility to keep track.
The advantage of persisting offsets: If the consumer crashes, it can start working off messages where it left off. Also, if it turns out the consumer's behavior was faulty, you can "rewind" and start processing events again from the beginning. With this approach, Kafka can quickly scale up consumers per topic.
Summary
Kafka Partitions and Consumers work hand in hand. Depending on the problem we're trying to solve, we must divide consumers into (separate) groups.
The group controls when a consumer gets to consume a message:
- All Consumers are in the same group: Each message getting processed by only one consumer
- Consumers belong to different groups: Have several consumers process a message
Top comments (0)