Reply to post: Accuracy != Robustness

Fujitsu sidesteps data scientists with a move toward tuned machine learning

Anonymous Coward
Anonymous Coward

Accuracy != Robustness

Having worked for 15 years or so in analytics (which has somehow now become data science) it's been great to see the recent expansion of available tools, particularly open source projects such as Spark. While I welcome all of them, this particular initiative seems to be missing a number of very important points that generally constitute the time-consuming part of an analyst's work. First, a single accuracy measure is often not terribly useful - especially for rare events. [Simple example, if something only happens 1% of the time then a model that simply says that things will never happen will have 99% accuracy. Clearly that's not a very useful model.] Generally it's more useful to look at precision vs. recall, which are inversely correlated and influenced by the cut-off rules applied in the model so you are looking for the optimum trade-off between the two. There are measures that combine the two into a single value that can be compared across different models, but the choice of the measure is normally subjective based on business needs.

Secondly (and especially important for large-scale machine learning problems with very large numbers of features) you also need to understand the amount of bias/over-fitting in your model - i.e. how likely is it that the model performance seen in train & test data sets will be seen in new data. This can be handled through cross-validation techniques (which I don't see mentioned in the article?), but again often comes down to a subjective evaluation of the trade-off between "accuracy" and "robustness". It is these subjective evaluations that generally take most time rather than waiting for computers to crunch through data.

In short - I would take these claims with a massive pinch of salt in terms of their impact on real-life analytics (oh, okay ... data science) applications. If you have analysts spending a week to create a single model (even if they are testing different algorithms) then you have a problem with your analysts not your machine learning tools. And, of course, creating a good predictive model doesn't actually do anything for your business - you generally need to fit it into a robust decision management framework, which typically is much more complicated.

/rant

Apologies for going on about this, but as happy as I am to read more about machine learning in El Reg, I'd love for you to apply your usual level of skepticism. Happy to provide awkward questions for you to ask if you'd like!

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2021