Project Metamorphosis : dévoilement de la plateforme de streaming d'événements nouvelle générationEn savoir plus

Conquering All Your Stream Processing Needs with Kafka and Spark

On-demand recording

Kafka Summit 2016 | Systems Track

Apache Spark, specifically Spark Streaming, is becoming one of the most widely used stream processing system for Kafka. At its heart, Spark is an extremely fast and general-purpose distributed data processing platform. This allows the unification of all kinds of data processing using a single framework – streaming, SQL, and machine learning. For Kafka users, this means that they can use Spark to run batch jobs, streaming pipelines as well as interactive queries on Kafka data. In this talk, I am going to give a brief overview of the Spark framework and elaborate on how different components of Spark can be used to process data from Kafka. Specifically, I am going to cover the following.

  • Real-time processing of Kafka streams with Spark Streaming
  • Batch and interactive querying of Kafka data with Spark and Spark SQL
  • Schema-aware streaming ETL from with Streaming DataFrames


Tathagata Das, Software Engineer, Databricks