What a load of bollocks
As said above, it is hardly surprising that a datamining CEO would trumpet the utter indispensability of his company's tools. Fine, it's a good PR piece.
But let's look at the reality of things, hmm ?
"Unstructured data must also be thought of in its textual form of Word documents, emails, social media messages and other as yet undefined data shapes." - sorry, social media has a noise to data ratio that is far too important to make any sort of data mining useless. Yes, you will probably find tweets that say your company is good, and others saying the reverse. You will never be able to map that to a customer having bought something from you unless said customer specifically signs up on your site to tell you that his Twitter account is FancyPants35748 and his credit card name is Jonathan Smith. Sure, some twits will tell you, but most will probably be a bit reluctant to acknowledge that their online persona is GorgeousJunk69.
Just a few paragraphs later, the article quotes "The fact is that context will always rank as ace high".
So let's just write off social media now. You'll never get the context in a 160-char tweet.
Next point : “Those who are still relying on human interpretation will be trying to stay afloat on the unstructured data tsunami with one hand tied behind their back,” dixit Andrew Anderson, CEO of Celaton. What is he saying ? Humans cannot be trusted to manage data in a timely fashion and we must hand over our analysis procedures to computers.
Yeah, sure. Because we know how to teach computers to distinguish between "programmer" and "Oracle developer" and "business analyst". Yeah, let's hand it over to computers, that'll work a lot better. Just like it works fine in Australia, for processing payments. Tell me, if we can still find major companies capable of botching up a comparatively simple job of paying salaries, how can we expect to be able to get relevant information from a tsunami of unstructured data ?
“Having a system in place that can understand a candidate’s CV without the need for human intervention is crucial." Indeed. Too bad we don't have a reliable system that can do that automatically without human intervention.
"correlating point-of-sale transactions with social feeds can provide great insight into how a consumer felt about the company and the product" - yes, except you don't know that that is indeed a consumer of your product, and not some hacker or troll pulling your data leg.
"This estimates that the digital universe of western Europe will grow from 538 exabytes to 5.0 zettabytes between 2012 and 2020" - yup, and 99% of that will be of absolutely no interest to anyone after a week.
"We know that a huge amount of unstructured data is spam" - finally something I can agree with. And you want me to waste my time and money building a system that is going to analyse my spam mail to tell me I'm getting spam ? Get lost.
The reality of data analysis is sort your data first. The bigger the volume of data, the more stringent your data retention criteria must be. The only data worth analysing is the data relevant to your company, the rest is a waste of resources. This article tries to make me believe that I must become the NSA and gather as much data as I can to hoard it and endlessly analyse it. I say bullshit. Recovering every tweet where my company is named is not going to actually give me a proper image of my company. Looking at my sales figures will.