
'Big Data' not New Just Cheaper to Do
'Big data' (formerly and sometimes still known as database marketing on very large databases, [enterprise] data warehousing, combined with statistical analyses, no wait that's data mining, I mean analytics)... sigh! Every time a new buzzword comes up, it's usually because some vendor wants to put a new spin on their frankly old products and ideas to distinguish them from the crowd.
Not so long ago, because of the costs of storing and managing large volumes of data, we used to be very selective about which data we tried to collect. The continuing poor quality of much data collected in organisations makes that ever more important. Models built on this unstable mountain will be very unreliable indeed.
But, now that data storage costs have fallen (predictably in accord with Moore's Law), companies want to recklessly capture as much data on their customers as they can. I also believe it has something to do with what I call the Microsoft Market Effect, i.e. if Microsoft seriously enters a product market, it becomes a commodity (software licence costs fall and everyone wants one on their desktop).
As for keeping all data forever, this ignores the very real rules of Data Protection legislation (even in the US, Safe Harbour agreements often mean that large companies need to be cognizant of EU and other international laws) that forbid such a thing. Also, as the business evolves, very old data becomes non-representative of the current business. Retention requirements and archiving policy need to be evaluated on a careful (business) case-by-case basis, not simply because the storage/database vendors tell you can store as much as you want or at least more than your competitors ('mine is bigger than yours' springs to mind).
The real risk in all this is that as the fad grows, it exacerbates the demand for truly experienced technologists who have the gravitas to tell their decision makers, "No! This is how we do it the right way so that you spend less and get meaningful results quicker." If decision makers then really want to make automated decisions based on faulty data and even worse models, they risk turning loyal customers into sworn enemies--downward spiral begins. And oh, how I have seen this already beginning. Caveat emptor.