Confluent
Streams and Tables: Two Sides of the Same Coin
Stream Processing

Streams and Tables: Two Sides of the Same Coin

Matthias J. SaxGuozhang Wang

We are happy to announce that our paper Streams and Tables: Two Sides of the Same Coin is published and available for free download. The paper was presented at the Twelfth International Workshop on Real-Time Business Intelligence and Analytics (BIRTE) held in conjunction with the 44th International Conference on Very Large Data Bases (VLDB) in Rio de Janeiro, Brazil, in August of this year.

The BIRTE workshop attracted many participants and hosted a keynote, research, industry and demo session as well as a panel discussion about data stream processing.

Paper summary

The paper is a joint work between Confluent and Humboldt-Universität zu Berlin that describes the Dual Streaming Model, which is the foundation of Kafka Streams’ and KSQL’s stream processing semantics:

In this paper, we introduce the Dual Streaming Model to reason about physical and logical order in data stream processing. This model presents the result of an operator as a stream of successive updates, which induces a duality of results and streams. As such, it provides a natural way to cope with inconsistencies between the physical and logical order of streaming data in a continuous manner, without explicit buffering and reordering. We further discuss the trade-offs and challenges faced when implementing this model in terms of correctness, latency, and processing cost. A case study based on Apache Kafka illustrates the effectiveness of our model in the light of real-world requirements.
Original Source

The Dual Streaming Model builds on the so-called stream-table duality, which allows you to unify data streams and relational tables into a holistic data processing model. Thus, data streams and continuously updating tables are the two core abstractions in the model. Additionally, the Dual Streaming Model decouples the handling of data that arrives later (i.e., out-of-order) from latency concerns and opens up a design space between processing cost, accepted latency and result completeness for the user that no other model offers.

Figure 1. Design Space

Figure 1. Design space

The wide adoption and growth of Kafka Streams and KSQL among enterprises shows that the Dual Streaming Model solves real-world problems across all types of industries. As a result, we are elated to share our paper for free so you can become the stream processing expert in your company and take the business to the next level.

Happy reading! 🙂

Next steps

Subscribe to the Confluent Blog

Subscribe

More Articles Like This

Available Now: Stream Processing Cookbook Featuring KSQL Recipes
Joanna Schloss

KSQL Recipes Available Now in the Stream Processing Cookbook

Joanna Schloss . .

For those of you who are hungry for more stream processing, we are pleased to share the recent release of Confluent’s Stream Processing Cookbook, which features short and tasteful KSQL ...

Troubleshooting KSQL – Part 2: What's Happening Under the Covers?
Robin Moffatt

Troubleshooting KSQL – Part 2: What’s Happening Under the Covers?

Robin Moffatt . .

Previously in part 1, we saw how to troubleshoot one of the most common issues that people have with KSQL—queries that don’t return data even though data is expected. Here, ...

Troubleshooting KSQL – Part 1: Why Isn't My Query Returning Data?
Robin Moffatt

Troubleshooting KSQL – Part 1: Why Isn’t My KSQL Query Returning Data?

Robin Moffatt . .

KSQL is the powerful SQL streaming engine for Apache Kafka®. Using SQL statements, you can build powerful stream processing applications. In this article, we’ll see how to troubleshoot some of ...

Leave a Reply

Your email address will not be published. Required fields are marked *

Try Confluent Platform

Download Now

We use cookies to understand how you use our site and to improve your experience. Click here to learn more or change your cookie settings. By continuing to browse, you agree to our use of cookies.