In Spark 1.6, the developers behind Spark created DataSets by copying and pasting the code from DataFrames (and then added genericization and type safety). But in Spark 2.0, the tables are turned. Last week, Reynold Xin resolved
SPARK-13880 "Rename DataFrame.scala as DataSet.scala. So what happens to DataFrames in Spark 2.0? Reduced to a
single line of code:
type DataFrame = Dataset[Row]
So whereas it could be said in Spark 1.6 that DataSets are a derivation of DataFrames, it is specifically the case in Spark 2.0 that DataFrames are a derivation of DataSets.
No comments:
Post a Comment