Wednesday, January 4, 2017

Spark Structured Streaming Supports Kafka Since November 2016

As I noted in my May 14, 2016 blog post, Spark Structured Streaming, which brings the ability to stream a data source into a DataFrame and query it with SQL in real-time, was announced with much fanfare (along with Spark 2.0) at Spark Summit 2016, but notably absent at the time was its support for Kafka.

Diagram from databricks.com

Yes, Spark 2.1, released last week, now supports Kafka in Spark Structured Streaming. But so does Spark 2.0.2, quietly released on November 14, 2016.

So we no longer "have to wait for it" as I blogged last May.