back to article Please check your data: A self-driving car dataset failed to label hundreds of pedestrians, thousands of vehicles

It's a long weekend in the US, though sadly not in Blighty. So, for those of you starting your week, here's some bite-sized machine-learning news, beyond what we've recently covered, if that's your jam. Check your training data: A popular dataset for training self-driving vehicles, including an open-source autonomous car …

  1. macjules
    Paris Hilton

    Google, YouTube, Twitter, and Facebook have sent Clearview cease-and-desist letters

    Yes, only Google, YouTube, Twitter, and Facebook should be scraping their users' images and selling them for profit.

    1. John Brown (no body) Silver badge

      Re: Google, YouTube, Twitter, and Facebook have sent Clearview cease-and-desist letters

      The Ts&Cs their users agreed to say that, so yeah, they can.

      1. EnviableOne

        Re: Google, YouTube, Twitter, and Facebook have sent Clearview cease-and-desist letters

        ywah but who actually read those untill after their parent s had uploaded half their baby fotos

  2. Anonymous Coward
    Anonymous Coward

    The first rule of computing has always been "Garbage In Garbage Out".

    Systems that have been "trained" need to be tested even more thoroughly than systems that have been explicitly coded because they can only be "black box" tested.

    1. katrinab Silver badge
      Meh

      The training data is the code, and the thing that processes it is the compiler that converts it from tagged jpeg or whatever into a language the computer understands. You need to check both the training data and the "compiler".

    2. Anonymous Coward
      Anonymous Coward

      Careful what you wish for....

      Maybe the pedestrian data was accurate and any mistakes will be fixed during the "data validation" phase?

      TL;DR: funeral homes will soon be a growth business

  3. Pete 2 Silver badge

    absence of proof or proof of absence?

    > Thousands of vehicles, hundreds of pedestrians, and dozens of cyclists were not labelled.

    It seems to me that there are two sorts of mistake. There is labelling something incorrectly, such as labelling an elephant as a bicycle. And there is a different sort of error which is not labelling something at all.

    In the first case, mislabelling obviously leads to errors in the system that is trained on the data. But what about the second?

    If something is not labelled, then it doesn't form part of the training. It is as if the unlabelled object doesn't appear in the image. ML systems are not trained to identify every single thing in an image: every leaf on a tree, every piece of litter. So things that are not labelled will simply be ignored: not forming any sort of association.

    So the question is: does it matter if some images don't have some things labelled?

    1. Black Betty

      Re: absence of proof or proof of absence?

      Yes it matters. Particularly with [semi]autonomous machine learning. Such a system misclassifying an object as "looks like, but isn't" puts that object squarely in the path of an autonomous vehicle seeking to avoid a collision with another object clearly classified as "must avoid".

      We generally don't know exactly how a neural net builds it's recognition matrix, so it's conceivable that it might pick up on some minor/irrelevant detail common to untagged objects in it's training data and end up putting all such objects into its ignore bin.

      The real fun is with edge cases such as the occasional unicyclist, trike or recumbent bike. A system trained on data containing untagged objects offers the real and dangerous possibility of a "bin" for objects that don't neatly fit into any of it's trained classifications.

      1. katrinab Silver badge
        Unhappy

        Re: absence of proof or proof of absence?

        Like Über's system - pedestrian at recognised crossing point: danger, avoid, slam on the brakes

        pedestrian elsewhere, is it a piece of litter? an animal? no idea, keep going

    2. Neil Barnes Silver badge

      Re: absence of proof or proof of absence?

      Um, yes... if it's something that should be seen, like, say, a pedestrian or another vehicle: the purpose of that training set is (was?) to allow a categoriser to identify things and see how closely it matches the training set.

      Would you like to be the one in three pedestrians that the system doesn't know it can't see because it was never told about its error?

      1. Pete 2 Silver badge

        Re: absence of proof or proof of absence?

        > the purpose of that training set is (was?) to allow a categoriser to identify things

        Yes, we agree. And as long as those same things appear in other parts of the training set, it doesn't matter if they are missed out in one or two single frames. The AI will still "learn" them from the other frames where they ARE labelled.

        The article neither says nor suggests that entire classes of object (all traffic lights or all prams, for example) are systematically omitted. Missing from the entire set. Just that the occasional thing that is labelled in most frames has not been labelled in a few.

        1. FrogsAndChips Silver badge

          Re: absence of proof or proof of absence?

          The dataset needs to contain a wide range of objects if we expect it to recognize them in all circumstances. What if it only contained images of pedestrians wearing black coats, or with long hair? It may never recognize one with a green coat or a bald person. So the absence of labelling is an issue in that it can deprive you of significant diversity in the dataset.

    3. Anonymous Coward
      Anonymous Coward

      Re: absence of proof or proof of absence?

      I don't know about labelling an elephant as a bicycle. If this was being tested in the USA, we could have 'trained' IAs to that think the country is full of elephants though....

    4. o p

      Re: absence of proof or proof of absence?

      >If something is not labelled, then it doesn't form part of >the training

      Wrong. It is part of it. The training says: look: There is no pedestrian here. There is nothing.

  4. Pascal Monett Silver badge
    Flame

    Patent trolls

    I am sick of hearing of this scum bothering people that are actually trying to make a product.

    You have a patent ? Good for you. Are you making anything with it ? No ? Then shut up and fuck off.

    Patent law should be changed to include an article that states that a patent is valid only if the patent holder can justify that product is being made and sold using that patent. Doesn't need to be the holder, who can grant usage to whoever he likes, but product must be made or you have no right to complain.

    That would clear out quite a few portfolios that are in zombie mode right now.

    1. dajames

      Re: Patent trolls

      Patent law should be changed to include an article that states that a patent is valid only if the patent holder can justify that product is being made and sold using that patent. Doesn't need to be the holder, who can grant usage to whoever he likes, but product must be made or you have no right to complain.

      Made ... or at least in active development ...

      ... but what do you do about the inventor who has a brilliant idea but can't afford to develop it and is looking for a partner or sponsor to help with the development?

      I agree that it's not helpful to patent something and just sit on the patent and use it to prevent anyone else from developing any product that it might be thought to cover. That's not in the spirit of the original idea of patents, which was to ensure that ideas were published, but to allow the original inventor some opportunity to profit from an invention before opening it up to all and sundry.

  5. Rich 11

    Obituary: ARPA-E

    The DOE’s Advanced Research Projects Agency-Energy (ARPA-E) will be particularly hard hit: not only does the proposed budget effectively eliminate the agency, it must pay back $311m to the treasury.

    Even though the agency was proposed and first funded under Bush it didn't open its doors until Obama had just been sworn in. That fatal association, plus the fact that it insists on looking at new energy technology rather than just saying "Coal good, coal good, oil good, coal good", means that Trump has to shut it down. The only surprise is that it took him so long.

  6. SVV

    Self driving car image training

    Whenever I've encountered a "check you're not a bot" test on logging into a website in the past few months, I've always been asked to select the images that contained a pedestrian crossing or a bus, or some other street based scene. If this is considered a good AI defeating test, then it doesn't say much for the prospects of this use of the technology.

    1. Tom 38

      Re: Self driving car image training

      Recaptcha is a google project for using humans to train AIs. They used it to improve their OCR for google books, street signs for google maps streetview, and now they use it to train waymo for autonomous driving.

      1. FrogsAndChips Silver badge

        Re: Self driving car image training

        Ob XKCD

    2. Medical Cynic

      Re: Self driving car image training

      And they're all from the USof A, so some of the bits of traffic lights or whatever that are seen in the corner of an adjacent square may be hard to recognise this side of the pond. Most UK traffic lights are on a pole; many in the US hang from wires. Where does the 'traffic light' stop and the network of supporting wire begin? Pedestrian crossings have some differences, too.

      1. FrogsAndChips Silver badge

        Re: Self driving car image training

        You need to identify the lights (i.e. the bulbs), not whatever supports them.

  7. whileI'mhere

    "The mysterious patent owner, Voice Tech Corp, turned out to a brand new company in Texas, USA, and it's address was someone’s bungalow,"

    There seems to be one missing word and one erroneous apostrophe in the extract above.

    1. KarMann Silver badge

      To be or not to be, thats the question.

      Apostrophe donated to the article.

    2. Anonymous Coward
      Joke

      > There seems to be one missing word and one erroneous apostrophe in the extract above.

      Dear WhileI'mHere,

      Thank-you for bringing this to my attention. I am the lawful and legitimate owner of the genuinely innovative and original invention known as "someones bungalow". This attempt to subvert my patent by adding an apostrophe is clearly just that and I have instructed my attorneys to sue for infringement.

      I also own the patent for "subverting patents by insertion of extraneous apostrophes" and I have instructed my attorneys to sue for this second infringement.

      Please do not reply as I might have to sue you.

      1. Kernel

        "I am the lawful and legitimate owner of the genuinely innovative and original invention known as "someones bungalow". "

        While we all, I am sure, appreciate your humour, just for the record the extraneous apostrophe referred to is in "it's", not "someone's".

      2. The Nazz

        Prior art

        ha ha 2=2=5, nice try. But prior art.

        My Dad used to visit a bungalow, and as he popped inside, we'd be left waiting in the car for what seemed like hours (probably 20 minutes). Eventually one of us asked who was in the bungalow. "Ah, just someone i know" he'd say. Always seemed to set off his asthma too.

  8. Anonymous Coward
    Anonymous Coward

    Trump is living on a different planet

    to the rest of us. Sadly our universes have a junction that surrounds him. He only sees his version of reality.

    Spending billions on something that is already pretty well sorted yet calling time on the US Military Newspaper, Stars and Stripes at a cost of $15.5M per year shows just how out of touch he is. Never mind, he'll be off spending another $5M next weekend to go and have a round of golf.

  9. batfink

    “5G research and development (R&D)"

    Surely in a proper capitalist country, The Market should be relied upon to make this investment. This government funding smacks of (shudder) Socialism to me.

    If other companies can't keep up with the market leader, then they deserve to wither on the vine.

  10. John Brown (no body) Silver badge
    Facepalm

    "not operated on public streets for several years"

    So, it HAS operated on the public streets using this "educational data" not intended for live usage. Oh dear.

  11. Mike 137 Silver badge

    "Training an autonomous car on such an incomplete dataset could potentially be dangerous"

    Could actually be bloody lethal, as could using an educational training data set for (even experimental) production. The sloppy thinking that drives web development has clearly migrated to the life-critical development domain. In the words of Private Fraser "we're doomed".

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like