so last month's spark API's are obsolete already?
Spark 1.6 came out in January. It's "dataset" API obsoleted the experimental "dataframe" API from last spring, which obsoleted the "RDD" API from before that.
Now Matei is saying that Spark 2.0 will change those APIs. Again?
FFS: give us some stability alongside the features. Say what you like about MapReduce but old code still works, as does its (somewhat clunky HDFS API). While for SQL, queries sitting around from 1998 still work (if you pay Oracle enough). But Spark just won't stabilise it's APIs at all, chucking out new versions so fast that you are two versions behind before your project is ever ready. Which is turning out to be one of the things holding us back from production: the version we are trying to use is obsolete and unsupported by the time our code is ready. We're adopting a policy of "play with in the notebook, but not for production". Which is a shame, as the underlying system works well.
Maybe Spark 2.0 will stop breaking code that's three months old. We'll have to wait until the summer, for Spark 2.1 to see if they're trying to do that.