Headless BI with streaming data

#realtimedata #kafka #dataengineering #analytics

Demand to make data operational and real-time is ever-growing. Both consumers and enterprises expect applications to react to changes in real time, and be proactive with intelligent notifications and alerts.

How can Cube help?

At Cube, our primary sources of data have been cloud data warehouses with batched data—but since our first days, we have wanted to work with streaming data too. Our ambition has been to create a seamless experience for data engineers to build applications on top of a single, headless data access layer, regardless of whether the data is a stream, a batch, or a union of both.

Today, we’re happy to announce a major step in building streaming support into Cube, and to share our plans for what’s next.

We are hosting an online event to reveal these features. Register now to join us on October 13.

The challenge

Historically, it has been hard to work with truly real-time data. Specifically, it’s been challenging to combine historical and real-time data. For example, in use cases such as notifications or recommendations, we can’t rely only on last-minute data; we need to look back to run analysis, and then act in real-time, based on both the most recent data and historical trends.

There’s also the concern of needing to master unique languages for specific data sources; in the past, it often was necessary to build one-off integrations for each streaming application to accommodate real-time streams.

The opportunity

Streaming data enables a variety of use cases, including real-time, in-product analytics; automation; personalization; and alerting. Fortunately, recent innovations have made it easier for us to accommodate real-time streams.

In the last few years, streaming SQL technologies such as ksqlDB, Materialize, and Apache Flink have significantly progressed. These technologies enable us to process streaming data and run analysis with SQL—without needing to learn a new language or build specific language-unique integrations.

ksqlDB support

We’re excited to announce that it’s now possible to use Cube to build data modeling, caching, and access control layers on top of streaming SQL, just as you can with data from cloud data warehouses.

Cube now can connect to streaming SQL engines and expose streaming data via our REST, GraphQL, and SQL APIs to your downstream applications. Our REST API already supports websockets and provides a seamless experience for developers and data engineers to build real-time data apps. Our data modeling, caching, access control, and querying APIs are the same whether you work with streaming or batch data.

Want to see it in action? Check out our live ksqlDB demo application featuring a real-time dashboard built with Kafka, ksqlDB, and Cube.

That alone opens many opportunities for building on top of streaming data, but we’re also committed to addressing the complex problem of merging batch and streaming data, so that we can provide a single interface for unioned data.

Lambda architecture support

We’re also introducing lambda pre-aggregations. These follow the lambda architecture design to create a union of streaming and batch data. Cube pre-aggregates batch data from the cloud data warehouse into one pre-aggregation, data from streaming SQL engines into another, then unites these pre-aggregations during query processing—so we can return merged data.

For developers or data consumers, the interface remains the same—REST, GraphQL, or SQL queries. Our Lambda architecture makes it possible to consume data as a single dataset regardless of whether it is streaming, batch, or combined data.

Cube hides the complexity of this architecture within our data modeling layer, and therefore significantly simplifies the data consumption process. In addition, our access control and security layers are applied using the same rules as you’ve configured them for batched data.

Coming next

Naturally, our work isn’t done yet. The above features are generally available soon—and available today to Cube users who request access. Soon, you’ll also have access to

Materialize streaming pre-aggregations
Flink SQL support
Spark Streaming support

If you’re interested in early access to these features, to help influence their development and craft the future of streaming headless BI, simply get in touch.

Everything you need to build powerful real-time applications
We’re excited to launch our Kafka and ksqlDB integration and lambda pre-aggregations today because these bring the power and simplicity of our headless BI architecture to new types of data applications.

These innovations are available to everyone, for free: request ksqlDB preview access, connect a real-time data source, then (please!) give us your feedback.

We look forward to powering the next generation of data applications—we can’t wait to see what you build.