DEV Community

Streaming Audio: A Confluent podcast about Apache Kafka®

Streaming Analytics on 50M Events Per Day with Confluent Cloud at Picnic

What are useful practices for migrating a system to Apache Kafka® and Confluent Cloud, and why use Confluent to modernize your architecture?

Dima Kalashnikov (Technical Lead, Picnic Technologies) is part of a small analytics platform team at Picnic, an online-only, European grocery store that processes around 45 million customer events and five million internal events daily. An underlying goal at Picnic is to try and make decisions as data-driven as possible, so Dima's team collects events on all aspects of the company—from new stock arriving at the warehouse, to customer behavior on their websites, to statistics related to delivery trucks. Data is sent to internal systems and to a data warehouse.

Picnic recently migrated from their existing solution to Confluent Cloud for several reasons:

  • Ecosystem and community: Picnic liked the tooling present in the Kafka ecosystem. Since being a small team means they aren't able to devote extra time to building boilerplate-type code such as connectors for their data sources or functionality for extensive monitoring capabilities. Picnic also has analysts that use SQL so appreciated the processing capabilities of ksqlDB. Finally, they found that help isn't hard to locate if one gets stuck.
  • Monitoring: They wanted better monitoring; specifically they found it challenging to measure for SLAs with their former system as they couldn't easily detect the positions of consumers in their streams.
  • Scaling and data retention times: Picnic is growing so they needed to scale horizontally without having to worry about manual reassignment. They also hit a wall with their previous streaming solution with respect to the length of time they could save data, which is a serious issue for a company that makes data-first decisions. 
  • Cloud: Another factor of being a small team is that they don't have resources for extensive maintenance of their tooling.

Dima's team was extremely careful and took their time with the migration. They ran a pilot system simultaneously with the old system, in order to make sure it could achieve their fundamental performance goals: complete stability, zero data loss, and no performance degradation. They also wanted to check it for costs.

The pilot was successful and they actually have a second, IoT pilot in the works that uses Confluent Cloud and Debezium to track the robotics data emanating from their automatic fulfillment center. And it's a lot of data, Dima mentions that the robots in the center generate data sets as large as their customer events streams. 

EPISODE LINKS

Episode source