Monday, March 21, 2016

DataFrame/DataSet swap places in Spark 2.0



In Spark 1.6, the developers behind Spark created DataSets by copying and pasting the code from DataFrames (and then added genericization and type safety). But in Spark 2.0, the tables are turned. Last week, Reynold Xin resolved SPARK-13880 "Rename DataFrame.scala as DataSet.scala. So what happens to DataFrames in Spark 2.0? Reduced to a single line of code:

type DataFrame = Dataset[Row]

So whereas it could be said in Spark 1.6 that DataSets are a derivation of DataFrames, it is specifically the case in Spark 2.0 that DataFrames are a derivation of DataSets.

No comments: