Top datasets used to train AI models and benchmark how the technology has progressed over time are riddled with labeling errors, a study shows. Data is a vital resource in teaching machines how to complete specific tasks, whether that's identifying different species of plants or automatically generating captions. Most neural …

COMMENTS

Post your comment

House rules Send corrections

Add to 'My topics'

Page:

1. Friday 2nd April 2021 17:10 GMT DS999
  
  Re: And of course people are bloody minded ...
  
  As I just confessed above before seeing this post lol!
  
  1 0 Reply
  1. Friday 2nd April 2021 19:12 GMT jake
    
    Re: And of course people are bloody minded ...
    
    :-)
    
    It's Friday. Have a beer.
    
    0 0 Reply
2. Monday 5th April 2021 10:12 GMT Tom 7
  
  Re: And of course people are bloody minded ...
  
  Asimov had this covered - his computers took this into account. Given left pondians still have trouble with UK sarc I guess he was a bit ahead of his time or perhaps we need a new Intelligence definition.
  
  0 1 Reply
Friday 2nd April 2021 00:29 GMT ecofeco

I think I see the problem

This work is often outsourced work to services like Amazon Mechanical Turk, where workers are paid the square root of sod all to sift through the data piece by piece,

You get what you pay for. GIGO!

20 0 Reply
1. Friday 2nd April 2021 04:09 GMT doublelayer
  
  Re: I think I see the problem
  
  Yes. Mechanical Turk pays the participants so little that they have a lot of data pollution problems. Also, they have the problem that the people doing the work are either bored people who will give up in fifteen minutes or people who really don't have better ways to get money, so you can't expect consistency or strict attention to detail.
  
  It's along the lines of all those studies they do at universities where students are paid to participate in research with an amount of money which could be used in a vending machine in 1995. Especially the economics studies which effectively boil down to "Would you take this action if we cut your meaningless money to even more meaningless money?". I participated in a few research programs while studying, but always because I was bored and didn't mind wasting a few minutes. I did actual work to earn money.
  
  10 0 Reply
  1. Friday 2nd April 2021 11:18 GMT Terry 6
    
    Re: I think I see the problem
    
    Also, they have the problem that the people doing the work are either bored people who will give up in fifteen minutes or people who really don't have better ways to get money, so you can't expect consistency or strict attention to detail.
    
    Long before the computer age I had a job with a mail order company.Our work was filng little slips of paper - coupons or something- with people's name and address they'd sent in. These had to be packed really tightly into stiff plastic wallets in alphabetical order. It was tedious and painful, the plastic would cut into your fingers and forcing the paper apart to insert the new slip was really difficult. None of us had been there long. And it was obvious why. The wallets were a mess. Sections would be in order, then they'd be random. As a new bunch of underlings were recruited, got bored and were fired. As we were.
    
    The crap pay and the mean fisted decision to force as many slips as possible into the envelopes as tightly as possible meant that the filing system was close to useless..
    
    The spirit of those days (early 80s?- the Woolworth's fire happened across the road while I was there) apparently lives on. And since there is a phrase "Spoiling the ship for a happ'orth of tar" it seems to go back a long way before that too.
    
    10 0 Reply
  2. Monday 5th April 2021 10:16 GMT Tom 7
    
    Re: I think I see the problem
    
    Economic studies in universities are generally there to confirm a certain view of economics and only that view is correct. Its a view that wont pay for the truth.
    
    1 4 Reply
    1. Monday 5th April 2021 17:29 GMT doublelayer
      
      Re: I think I see the problem
      
      If that's the case, they're not very good at their job. Most of the economics studies involving meaningless money contradict many other theories. Behavioral economics really likes these. Whether that's because the previous theories were wrong (probably), because people act different when they actually care about incentives (probably) or because the researcher is deliberately messing with the results (probably not), they don't tend to be blatantly confirmatory.
      
      1 0 Reply
2. Friday 2nd April 2021 09:17 GMT H in The Hague
  
  Re: I think I see the problem
  
  "You get what you pay for. GIGO!"
  
  Yup.
  
  There's currently a project commissioning human translators to translate a fairly large amount of text, for use as input for a Machine Translation system. Not a bad idea, but there are some issues.
  
  Offering 1/2 to 1/3 of the going rate probably doesn't help attracting competent translators. Asking the translators to provide two alternative translations isn't a bad idea. However, that fails when you have to translate the Dutch 'Het gras is groen' into English, as 'The grass is green' is essentially the only sensible translation. An alternative translation would simply add misleading input. Finally, all the sentences to be translated for this project are completely unconnected from each other. That's a major issue, as anyone who's done more than a day or two of translating will tell you that 'context is everything'.
  
  15 0 Reply
Friday 2nd April 2021 01:47 GMT Blofeld's Cat

So ...

Everything is fine until your self-driving car spots someone carrying a basket of balls across a five-way intersection ...

29 0 Reply
1. Friday 2nd April 2021 02:25 GMT jake
  
  Re: So ...
  
  ... and has to pick between running over them, or the crocodile in the next lane, doing a fair approximation of a lightbulb.
  
  29 0 Reply
Friday 2nd April 2021 07:37 GMT CrackedNoggin

"One picture is worth 1000 words" - not just ONE WORD dummies!

FAIL

6 0 Reply
Friday 2nd April 2021 08:14 GMT T. F. M. Reader

Does anyone know...

...what happens to a British-trained AI-driven car when it crosses the Channel? And what happens when it crosses back a month later?

Just curious because it's not always trivial for humans.

7 0 Reply
1. Friday 2nd April 2021 08:34 GMT sad_loser
  
  Re: Does anyone know...
  
  GPS means it is location-aware, so just switches to using the horn instead of indicating. Problem solved.
  
  24 0 Reply
  1. Friday 2nd April 2021 19:14 GMT jake
    
    Re: Does anyone know...
    
    Would it think it was tomorrow, moving East of Greenwich?
    
    (I've seen worse programming errors ... )
    
    2 0 Reply
    1. Saturday 3rd April 2021 16:16 GMT Primus Secundus Tertius
      
      Re: Does anyone know...
      
      I once saw an aeroplane system 'leap' from Cologne to Leipzig. A bit embarrassing for an RAF plane to suddenly be over East Germany.
      
      2 0 Reply
  2. Friday 2nd April 2021 23:00 GMT katrinab
    
    Re: Does anyone know...
    
    But would the British-trained car have microphones to hear the horn blasts emitted by others, and know how to interpret them?
    
    1 0 Reply
    1. Sunday 4th April 2021 09:22 GMT Rich 11
      
      Re: Does anyone know...
      
      And would it have two fingers to stick up in response?
      
      3 0 Reply
2. Friday 2nd April 2021 22:59 GMT katrinab
  
  Re: Does anyone know...
  
  Or even if it crosses the border into Scotland?
  
  In Scotland you sometimes see a (70) speed limit sign in situations where you would see a ( / ) sign in England. But ( / ) signs do exist in Scotland.
  
  1 0 Reply
  1. Saturday 3rd April 2021 15:20 GMT Dave559
    
    Re: Does anyone know...
    
    Different Road Traffic Acts in England than in Scotland. As far as I know, "Special Roads" in Scotland (namely, motorways and some others under similar special legislation (with restrictions on what sort of vehicles can use them) such as the Edinburgh city bypass) have to show the "70" sign instead of the "national speed limit applies" sign. Exactly why this bit of signage hyper-precision exists, I don't know, but the folks at SABRE and the like probably do…
    
    2 0 Reply
Friday 2nd April 2021 09:15 GMT Gomez Adams

I am gob-smacked that labellers are allowed to know what other labellers are identifying objects as while doing their job! :o

4 0 Reply
1. Friday 2nd April 2021 17:28 GMT Doctor Syntax
  
  Maybe they don't. Maybe they just click on something until its accepted.
  
  3 0 Reply
2. Friday 2nd April 2021 19:16 GMT jake
  
  Why can't people ...
  
  ... stop putting labels on things and just accept them as they are?
  
  7 1 Reply
  1. Saturday 3rd April 2021 21:32 GMT Terry 6
    
    Re: Why can't people ...
    
    Because the labels are how we a) communicate shared concepts and b) store them for recollection.
    
    Think compression alogorithms
    
    0 1 Reply
Friday 2nd April 2021 09:27 GMT analyzer

AI in everything is the problem

Current AI is just pattern recognition on steroids, there is still no real intelligence in any AI system. The single word tagging of pictures has been known to be suboptimal for a long time now and yet it is still done. If there was any of the I in AI then multiple word tags should not be an issue as it should be capable of recognition of multiple items in one picture.

Additionally, and I can't find the fine article that was in this exalted place, you can carry a tag around with you and be misidentified as the tag because the AI 'learns' the tag as well as the picture.

Certainly artificial, definitely not yet intelligent.

8 0 Reply
1. Friday 2nd April 2021 16:37 GMT John Brown (no body)
  
  Re: AI in everything is the problem
  
  I think you might need an apple iPad
  
  2 0 Reply
2. Friday 2nd April 2021 21:15 GMT veti
  
  Re: AI in everything is the problem
  
  Well, if only we could agree on a (non-circular) definition of "intelligence", that would be a step in the right direction. Until then, I don't see how we can hope to get anywhere.
  
  Notice how no-one talks about the Turing test any more? That's because it was passed, and so everyone promptly decided "oh no, that's not intelligence after all". As long as we're allowed to keep moving the goalposts like that, they're not going to make it.
  
  2 1 Reply
  1. Saturday 3rd April 2021 16:45 GMT Ken Hagan
    
    Re: AI in everything is the problem
    
    "Notice how no-one talks about the Turing test any more? That's because it was passed, ..."
    
    It was? When? How? Who? I've never seen any computer system get anywhere close. Come to that, I've seen *people* fail it (usually working in institutions as customer-facing staff and following rule sets, to be fair). Or did they re-define Turing test, so that I'm not allowed to ask questions that any 5-year-old could answer but that fall outside the computer's domain of expertise? (For example, does this picture show baseballs or a bucket? Obviously a 5-year-old could answer that, but apparently an AI researcher can't.)
    
    5 0 Reply
    1. Sunday 4th April 2021 05:12 GMT doublelayer
      
      Re: AI in everything is the problem
      
      It's unclear. People like to hold Turing test challenges, and programs in those challenges have been ruled human before. That might not be a great basis to declare the test passed, but it is what the test specifies. It also depends a lot on what we want them to do. The original Turing test didn't include sending images to the other party, therefore not requiring the AIs to see. Also, a program trying to pass the Turing test, because it sends back text, has the ability to say "both" where the programs here which are just identifying things have to pick a single one.
      
      In my opinion, the Turing test is a rough test that is likely to have too much uncertainty to prove intelligence. There were probably a lot of people who liked to talk about it back when it seemed impossible, but now they're pointing out defficiencies in the concept. The only problem is that a lot of people attack anything termed AI without even trying to define what they think AI is, or provide an unrealistic explanation which means something is only AI if it acts entirely human and attained sapience itself without ever being programmed.
      
      2 0 Reply
3. Saturday 3rd April 2021 23:20 GMT C.Carr
  
  Re: AI in everything is the problem
  
  To state the obvious, the labels need to consist in more than their complex relationships with other mere labels. The actual things need to be represented by sensory data, and the AI system needs to be located, discretely, in 3D space --- that is, if we want a system to actually *know* what things are.
  
  0 0 Reply
Friday 2nd April 2021 09:36 GMT Ken Moorhouse

Reminds me of...

Google bombing.

3 0 Reply
Friday 2nd April 2021 09:54 GMT steelpillow

Simples

All we need is an AI that translates between AI data labelling languages.

1 0 Reply
Friday 2nd April 2021 10:28 GMT iron

I do hope that somewhere in ImageNet thereis a picture of a banana that has been labelled "female aardvark."

10 0 Reply
1. Friday 2nd April 2021 12:23 GMT Mark192
  
  My local Tesco occasionally has the bananas mislabelled as "small, off-duty Czechoslovakian traffic wardens".
  
  Serves them right for leaving the label maker out ;)
  
  19 0 Reply
  1. Friday 2nd April 2021 19:39 GMT jake
    
    The Petaluma (California) Whole Foods sometimes has ...
    
    ... lychee labeled as Hedgehog Eggs. Down the street and around the corner, the local Lola Market has occasionally had their Spiked Choyote (Sechium edule) relabeled as Porcupine Eggs. Given their proximity, I suspect the joker is the same person at both stores. No, it's not me.
    
    3 0 Reply
    1. Friday 2nd April 2021 19:56 GMT Ken Moorhouse
      
      Re: Hedgehog Eggs
      
      In the UK we had Hedgehog Crisps which caused the Advertising Standards Authority some headaches.
      
      4 0 Reply
      1. Sunday 4th April 2021 00:58 GMT keith_w
        
        Re: Hedgehog Eggs
        
        Are they related to Spring Surprise Chocolates?
        
        1 0 Reply
  2. Friday 2nd April 2021 19:48 GMT Ken Moorhouse
    
    Re: "small, off-duty Czechoslovakian traffic wardens".
    
    I learn something new every day from this site - much of which has nothing to do with technology.
    
    ===
    
    Warhol would have run rings around AI with his placement of everyday objects as art. Bananas reminds me of Nico's distinctive vocals...
    
    https://www.youtube.com/watch?v=AkDJcUCyjCU
    
    0 0 Reply
  3. Friday 2nd April 2021 21:16 GMT Eclectic Man
    
    Small off duty ...
    
    Mark192: bananas mislabelled as "small, off-duty Czechoslovakian traffic wardens".
    
    That is ridiculous, don't they know that the Czech republic and Slovakia split apart in 1993?
    
    https://kafkadesk.org/2018/10/30/why-did-czechoslovakia-break-up/
    
    "On January 1, 1993, Czechoslovakia split into two independent states, the Czech Republic and Slovakia, in what is now known as the “Velvet divorce” (in a reference to the Velvet revolution) due to its peaceful and negotiated nature."
    
    0 0 Reply
    1. Saturday 3rd April 2021 04:03 GMT StuartMcL
      
      Re: Small off duty ...
      
      https://www.youtube.com/watch?v=oB-NnVpvQ78
      
      0 0 Reply
    2. Saturday 3rd April 2021 15:31 GMT Dave559
      
      Re: Small off duty ...
      
      "And while Slovak nationalism sentiment strived for more autonomy, Czech nationalism embraced Czechoslovakism, mainly due to their privileged position within the federation."
      
      -- https://kafkadesk.org/2018/10/30/why-did-czechoslovakia-break-up/
      
      As the article itself alludes to, I'm sure we can all think of certain other multi-ethnic countries where similar feelings apply…
      
      0 0 Reply
2. Saturday 3rd April 2021 09:30 GMT Anonymous Coward
  
  It'd be wrong though. Clearly it's the Bolivian navy on manoeuvres in the South Pacific.
  
  1 0 Reply
Friday 2nd April 2021 10:38 GMT Anonymous Coward

Bias is endemic to society, right down to the very roots of the evolution of the languages we use. Cultural biases of thousands of years are embedded in every culture on the planet, and it results in biases in the data sets, even if they are "correctly" labelled. Prejudices in society that lead to fewer members of a minority being successful in particular fields leads the machines to conclude that people of such background are *unlikely* to be qualified, rather than that they are under-represented.

It is a thorny issue, and one that isn't easily solved, because it begins to tread on concepts of morality and justice in society, and the one thing we can all agree on is that no one agrees on those subjects.

8 1 Reply
1. Friday 2nd April 2021 14:47 GMT EarthDog
  
  No one mentioned it but how are idioms handled? They are probably the hardest things to translate.
  
  3 0 Reply
  1. Friday 2nd April 2021 14:55 GMT doublelayer
    
    With automatic translation software, it either just looks at the words and tries to do them literally or it finds an idiom in a big dictionary. When I did some translation, I would always replace them with factual statements. It had less flavor, but at least I knew the reader would understand it without taking the risk that the closest idiom I could come up with was regional. Then again, I was not a professional translator, just a person who spoke multiple languages and hadfriends who didn't.
    
    3 0 Reply
2. Friday 2nd April 2021 15:34 GMT stiine
  
  I have to disagree, just to prove your point. Please take no offence, or take offence, I don't care.
  
  0 0 Reply
Friday 2nd April 2021 12:50 GMT sketharaman

GIGO

In short, AI suffers from GIGO. zzzzz.

1 0 Reply
Friday 2nd April 2021 14:38 GMT EarthDog

GIGO

Garbage In Garbage Out. A saying as old as computing. When I was spending a large amount of time merging data from other sources into the databases of a company I was working for I made sure my juniors and the SMEs spent a bit of effort vetting those data. All our data had to be defensible in court. It is the cavalier attitude to data and assumption all data are perfect which caused to avoid those areas. There are no standards of quality though they wouldn't be hard to to develop.

4 0 Reply
1. Friday 2nd April 2021 16:41 GMT John Brown (no body)
  
  Re: GIGO
  
  "Garbage In Garbage Out. A saying as old as computing."
  
  Try saying that to someone in the bulk recycling business :-)
  
  3 0 Reply
Friday 2nd April 2021 14:43 GMT Skiron

So why isn't there an AI program somewhere that can actually label the photograph itself? 'Cos AI isn't AI really, it's what the programmers tell it to do. As stated above, GIGO and it always will be until a 'machine' is actually cognitive.

2 0 Reply