Reply to post: Re: so last month's spark API's are obsolete already?

Spark man Zaharia on 2.0 and why it's 'not that important' to upstage MapReduce

Anonymous Curd

Re: so last month's spark API's are obsolete already?

No, DataSet and DataFrame are distinct APIs, and neither obsoleted the RDD API.

RDD is the low-level, nuts and bolts API, working directly on collections of objects, for when the other higher level APIs just don't do what you want.

DataFrames are what they say on the tin, with the trade-off that your data must be tabular and you lose some interaction patterns (e.g. lambdas) and compile-time type safety, but get easy performance gains (i.e. transparent optimisation) for common query patterns, and easy data handling in many use cases.

DataSets are the compromise between the two. You get the type safety and low level Object-oriented handling of RDDs with the expressiveness and optimisation of DataFrames.

And yes, 2.0 will change those APIs. Because Spark uses Semantic Versioning. That's what 2.0 means. In return DataSets, which are supremely useful, will stop being experimental.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2022