Technical Tidbit of the Day: Structured Streaming for Lambda Architecture in Spark But Have To Wait For It

Saturday, May 14, 2016

Structured Streaming for Lambda Architecture in Spark But Have To Wait For It

Image credits: lambda-architecture.net and Michael Armbrust's Spark Summit East presentation.

Some have the misconception that Lambda Architecture just means you have separate paths for batch and realtime. They miss a key part of Lambda Architecture: the ability to query a unified view of both batch and realtime.

Structured Streaming, also known as Structured Dataframes, will provide a critical piece: the ability to stream directly into a Dataframe, which can then of course be queried with SQL.

To provide the unified view, it will probably be possible to join such a Streaming Dataframe containing the realtime data with an ORC-backed Dataframe containing the historical data. However, as of today (May 14, 2016), the only two data sources available to populate a Streaming Dataframe are memory and file. Notably absent are streaming sources such as Apache Kafka, and last week Michael Armbrust indicated support for non-file data sources might come after Spark 2.0. And then this week Reynold Xin advised:

stay tuned to this blog for more details on Structured Streaming in Spark 2.0, including details on what is possible in this release and what is on the roadmap for the near future

There are still key adds in Spark 2.0: full SQL support including subqueries, and yet another 10x performanceimprovement due to "Tungsten 2.0" (on top of the 2x-10x improvement Tungsten brought over Spark 1.4, 1.5, and 1.6). Currently, Druid is still the reigning champ when it comes to Lambda in a Box. But Spark will likely take that crown before the end of this year.

9 comments:

Tejuteju said...: It was really a nice post and I was really impressed by reading this Big Data Hadoop Online Training Hyderabad; November 12, 2018 at 10:46 PM
Patell Priya said...: This is a very good post, I glad to reading your excellent content. I want more different post like this...!
Linux Training in Chennai
best linux training institute in chennai
Social Media Marketing Courses in Chennai
Placement Training in Chennai
Soft Skills Training in Chennai
Oracle Training in Chennai
Spark Training in Chennai
Pega Training in Chennai
Tableau Training in Chennai
Unix Training in Chennai
Linux Training in Anna Nagar; August 31, 2019 at 5:08 AM
Adhuntt said...: Excellent blog thanks for sharing Your website is the portal to your brand identity. The look and feel of every page carry a strong message. This is why your brand needs the best web design company in chennai to capture your visions and make it art. Adhuntt Media is graced with the most creative design team in Chennai.
digital marketing company in chennai; December 11, 2019 at 11:02 PM
Karuna said...: Nice blog thanks for sharing Say no to unethical ways for growing plants fast. We at Karuna Nursery Gardens stick to traditional and environment friendly methods of caring and growing therefore enabling us to showcase the largest collection on organic plants in Chennai.
plant nursery in chennai; December 11, 2019 at 11:30 PM
Pixies said...: Excellent blog thanks for sharing It’s very important to have the best beauty parlour equipment to run a successful salon. Pixies Beauty Shop is the best place in Chennai to get high quality imported top brands at the best price.
Cosmetics Shop in Chennai; December 11, 2019 at 11:47 PM
Faizal said...: Keep This Excellent share!!!
Java training in chennai | Java training in annanagar | Java training in omr | Java training in porur | Java training in tambaram | Java training in velachery; May 23, 2020 at 1:41 AM
Devi said...: Thanks for sharing a useful information.. we have learnt so much information from your blog.... oracle training in chennai; August 4, 2020 at 1:10 PM
Devi said...: Is a software job your dream? Then we, Infycle Technologies, are with you to make your dream into existence. Infycle Technologies is the Best Software Training Institute in Chennai, which offers multiple courses such as Oracle, Python, Java, AWS, etc., with 100% practical training besides specialized trainers in the field. Furthermore, the mock interviews will be arranged for the students to face the job interviews without any fear. Additionally, 100% placement assurance will be given here. Call 7502633633 to Infycle Technologies and grab a free demo to know more.Best Software Training Institute in Chennai | Infycle Technologies; May 20, 2021 at 9:48 AM
Devi said...: Infycle Technologies, the No.1 software training institute in Chennai offers the best Oracle course in Chennai for tech professionals and students at the best offers. In addition to the Oracle training, other in-demand courses such as Python, Selenium, Big Data, Java, Python, Power BI, Digital Marketing will be trained with 100% practical classes. After the completion of training, the trainees will be sent for placement interviews in the top MNC's. Call 7504633633 to get more info and a free demo.Best Oracle Course in Chennai | Infycle Technologies; August 2, 2021 at 4:33 AM