If Theodore Sturgeon was right...
...when he said 90% of Sci-Fi was crap because 90% of everything is crap; Then 90% of the Petabytes stored in the "cloud" is crap too.
Hype alert; hype alert; Big Data is coming our way. A new volcano has blasted its way above the surface of the marketing sea, spewing out "big data" messages in enormous flows of thought leader bullshit. What the heck is this big data thing? EMC says it's to do with handling data at the petabyte scale, where things like …
Not sure on the value of de-dupe in this space. I could understand wanting to de-duplication data in the ETL layer, although a lot of products there are DB-based rather than file-based. However, since the aim is for a 'single source of truth' via third normal form, where's the value of de-dupe in the DW? Assuming the reporting tier is ROLAP (as part of that single source of truth that everyone's striving for), there's very little data there apart from cube dimensions.
I suppose you might want limited MOLAP for performance reasons, then de-dupe that, but that ought to happen at the DB level, surely?
Great article as usual Chris and thanks for the mention.
As you pointed out RainStor de-dupes structured data without sacrificing the original form. We preserve the immutable structure of the data while magically de-duplicating the values so that the footprint physically shrinks 40-1 or more.
We are all about taking petabytes of data and reducing it to terabytes thereby allowing limitless amounts of data to be stored at the lowest possible cost.
Many thanks again for the article and the reference to RainStor.
Ramon Chen
VP Product Management
www.rainstor.com