Re: @AC 'Facebook: Information in Hive not readily accessible'
Yes the information in Hive is readily accessible. However the queries will suck up quite a bit of resources doing full table scans.
You seem to think this lack of resources should be the user's problem. It's not. If Facebook cannot comply with the legal requirements of GDPR then it's very much their problem. At the very least they'll need to start working towards an architecture that does allow them to comply (because, let's face it, they're not going to stop collecting that data in the first place).
Who ever peddled this story is hoping that there aren't people reading it who actually know Hadoop or FB's internals.
If you read the article it addresses the GDPR related aspects of the difficulty in gaining access to the data, in various places including this:
Moreover, he pointed out that if the request is excessive, it is only because the amount of data collected and sent to Facebook is too large for one of the biggest companies in the world to retrieve.
"Which seems to be a breach of [GDPR's requirement for] data minimisation rather than my fault as a data subject requesting this data," he observed.
If Facebook are collecting reams of data, so much so that it's almost impossible for them to fulfil an access request for it, then that has connotations about whether they're actually collecting the bare minimum required to provide their service.
They've also rendered themselves unable to fulfil a legal requirement, so of course there will be an investigation. Rightly or wrongly, the internals of Hadoop are largely irrelevant to the law - if it means you can't comply, the view will likely be you should use a technology that _does_ allow you to comply.