back to article Turns out humans are leading AI systems astray because we can't agree on labeling

Top datasets used to train AI models and benchmark how the technology has progressed over time are riddled with labeling errors, a study shows. Data is a vital resource in teaching machines how to complete specific tasks, whether that's identifying different species of plants or automatically generating captions. Most neural …

Page:

  1. Michael H.F. Wilkinson

    I don't like saying i told you so, but ...

    I frequently rail against claims made in AI papers that if the ground truth contains a percentage of errors, any AI system trained on them is likely to end up with a similar actual error rate. I have seen people claim an increase in performance from 97.6% to 98.1% (error bars not included) on data sets where there are two ground truths, drawn up by to medics, which are at odds with each other. In our own earlier work, we managed to get a sort of pareto optimum of 92.5 ± 0.6% on both ground truths, but were in places penalised for finding blood vessels the doctors had missed. It turns out, somehow ground truth 1 has been elevated to The Ground Truth, and the other demoted to "a human observer". And now AIs are better than the poor "human observer" simply because they have been taught to copy all the mistakes the other human has made.

    If ImageNet contains up to 6% error, I will continue to take all claims of 99% or better performance with a considerable pinch of salt. Furthermore, if error bars ar not included, how can they claim to be better than an earlier method if the differences are sub 1%.

    I am not saying deep learning and CNNs are useless, it is just that sloppy science does them a disservice.

  2. theblackhand

    Simples...

    We need another standard....

    And an AI to create the standard of course.

  3. Sandstone
    Devil

    Evil Human

    When some friends of ours had their first child and he was learning to talk, I would point to random objects and say, "This is an aardvark."

    1. Eclectic Man Silver badge

      Re: Evil Human

      In the Dorling Kindersley 'ABC' book for toddlers, each letter of the (Roman) alphabet has several pictures. A is for Apple, Ant and Avocado, but I didn't spot an aardvark.

      However, I applaud your teaching the little one that adults are inveterate liars and can be naughty too. That lesson should serve him well in the coming post-apocalypse dystopia, however it is caused.

  4. Anonymous Coward
    Anonymous Coward

    compounding errors

    It always irritates me that when the "I am not a robot" captcha asks to highlight all motorcycles, the user has to also highlight scooters or else the result will not be accepted.

    1. DeathSquid
      Stop

      One weird old trick to deal with captchas...

      Look near the top of your browser window. There's a back button. Press it, and the captcha disappears.

    2. Henry Wertz 1 Gold badge

      Re: compounding errors

      I do what it asks -- I will not mark scooters as motorcycles since they aren't; I will not mark SUVs as cars since an SUV is not a car. Go ahead and claim I'm wrong, I know I'm right.

      The other one that it seems to have problems with are marking the traffic lights, where I'll mark off the traffic lights and it'll claim I'm wrong (I assume some people are probably marking off the whole pole and everything, not the lights? I don't even know.)

      For some reason, some people don't seem to know what a crosswalk is either, given being able to exactly mark the crosswalks and having it fail.

      1. Ken Moorhouse Silver badge

        Re: For some reason, some people don't seem to know what a crosswalk is

        I think it has already been mentioned, but there is no such thing in UK, so YMMV with how us Brits will answer.

        "The other one that it seems to have problems with are marking the traffic lights, "

        I read the question, where it says click on those that *contain* a traffic light, I'm afraid that I will click on all the images containing the pole, because to my mind the traffic light cannot exist without the supporting pole. Google's devious reaction probably would be to show a traffic light fixed to brickwork with a white line painted to the ground (looking like a pole - the resolution of the images is so awful it is guesswork at the best of times), just to be awkward.

        Thinking about it Google could easily profile people by their answers. Show pictures of Armstrong on the Moon, for example and ask users to tick those images showing the moon. Conspiracy theorists could be segregated by their refusal to select those images.

  5. a_yank_lurker Silver badge

    Languages

    Human languages are not exactly the most precise in many ways. Anyone who has ever done any translation knows the imprecision in the source makes an accurate translation difficult at time. Also, context is very important as to how something should be interpreted. This nuance makes Artificial Idiocy not Artificial Intelligence the likely outcome of these systems.

    I would like know how these idiot systems would handle the phrase 'While you've a lucifer to light your fag' from the WWI song 'Pack Up Troubles in Your Kit Bag'.

  6. Anonymous Coward
    Anonymous Coward

    A very small casserole.

    A bucket of baseballs should be labelled as a bucket of baseballs.

    God the Renaissance was something that just happened to other people for you wasn't it Baldrick.

  7. Henry Wertz 1 Gold badge

    Reason for errors

    So, I think there's 2 main reasons for these errors:

    1) People (both in the article and forum) have discussed the structural problem, if you have to give each image "a" tag, a bucket of balls is not accurately tagged.

    2) The other issue -- who tagged these things? I bet when these were tagged, you either had someone getting paid minimum wage to go through 1000s of images; some Amazon mechanical turk type thing where they're getting like 1 cent an image (which might make it even less accurate since they'd then prefer to just tag them as fast as possible and probably still not get minimum wage); or student interns (whether paid or not) being asked to tag piles of images. I don't have a suggestion of a better way of doing it, but a) I'm guessing most people would do this as quickly as possible rather than as accurately. b) Even if the person doing it was going for accuracy, after like 1000 images how many people will be paying full attention to what they're doing still?

  8. Fruit and Nutcase Silver badge
  9. amanfromMars 1 Silver badge

    Thank your lucky stars such is so far as light years from the truth

    Turns out humans are leading AI systems astray because we can't agree on labeling

    Oh please, failing to follow leading AI systems is the all too apparent default astray human condition/endemic systemic weakness and exploitable vulnerability resulting in a vast catalogue of problems and conflicts presenting in mayhem and madness.

    However/Nevertheless ......

    At some point soon, through virtual intervention and systems collapses aka program and project hacks, will civilisation be transformed, and that is the stock it would be rank foolish not to be investing in. Such is an inevitability and foregone conclusion held by many but still only practically known to a relative few.

    Such is the current present state of virtually adept Great Games Play. I Kid U Not.

    Forewarned usually affords one the opportunity and pleasure to be forearmed. Good luck with finding the suitable weapons to wield and to yield in defence and attack against such as all of that.

    ........ is certainly infinitely better than the punitive alternatives headlined for consideration and possible activation they be in reply to ...... Hedge Fund CIO: "At Some Point, Through Inflation, War Or Confiscation, The System Will Restart"

    Why is it that so many tend to congregate to try repeating past failed methodologies expecting them to produce something different and new? It is surely illogical and may even be an indicator of a weakness supporting one's flights of crazy fancy into a personal hell and private madness?

  10. wjake

    Terrible example

    "What would happen if a self-driving car is trained on a dataset with frequent label errors that mislabel a three-way intersection as a four-way intersection?"

    What a crap example.

    The self-driving car should have accurate map information, so it won't have to rely on recognizing what type of intersection is coming up.

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2021