back to article Wish there was a benchmark for ML safety? Allow us to AILuminate you...

MLCommons, an industry-led AI consortium, on Wednesday introduced AILuminate – a benchmark for assessing the safety of large language models in products. Speaking at an event streamed from the Computer History Museum in San Jose, Peter Mattson, founder and president of MLCommons, likened the situation with AI software to the …

  1. Mike 137 Silver badge

    All very well but ...

    A worthy attempt (if only at the symptomatic level), but benchmarks would seem somewhat moot in the face of some basic failings of principle from which the current AI paradigm suffers. The stochasticity of results and the effective impossibility of verifying how they were arrived at are fundamental barriers to trust (and indeed to a great extent barriers to improvement).

    1. Anonymous Coward
      Anonymous Coward

      Re: All very well but ...

      "https://www.bcs.org/articles-opinion-and-research/does-current-ai-represent-a-dead-end/"

      Is that your first-year daughter's A-Level paper that you have linked? My God, it is written by a Prof!

      We have been saying this for 8 months now; that snapshot LLMs are a deadend. Why has it taken your daughter that long and she still doesn't get it.

      "The stochasticity of result... ". How else with language, it being the very essence of stochastic?

      "and the effective impossibility of verifying how they were arrived at "

      no. o1-mini preview had a very detailed Chain of Thought. It wasnt difficult to work out from that what it was doing.

      Your paragraph of text is pomp. Dinosaur pomp. You know and you can't do anything about it now.

  2. Mentat74
    Terminator

    A.I. can never be trusted...

    No matter the amount of 'benchmarking'...

    1. heyrick Silver badge

      Re: A.I. can never be trusted...

      If it's not possible to unwind exactly what happened to cause a decision to be reached and to precisely tweak the inferences made from it's learning (basic stuff that is supposed to be why children go to school) then its behaviour and output can be viewed as interesting and maybe even amusing curiosities, but not something that can be trusted - especially if people (like the makers) are going to be dumb enough to get it to manage mission critical things.

      There's this little concept called accountability, and right now AI is more or less "dunno, but computer says NO". Not something that should be in any way arriving at decisions that can affect people's lives.

    2. Adam Foxton

      Re: A.I. can never be trusted...

      It doesn't need to be.

      The group deploying it should be. And they should be held accountable if it errs.

      If an Apple AI tells a user an all-fruit diet fights cancer, or a Tesla AI is in charge of a vehicle that crashes, Apple or Tesla's senior executives should be legally responsible as they okayed its deployment.

      If an individual utilises it, it should be their legal responsibility.

      Nothing else will work. The first jailed C-Suite will quickly ensure that AI is used responsibly by everyone else while not stifling legitimate development.

  3. Martin Howe
    Joke

    The only safety measure for AI is rm -fr ${AI_SOFTWARE_ROOT_DIRECTORY}

  4. amanfromMars 1 Silver badge

    Is it any more difficult than Dan Dare like rocket science?

    "To get here [a highly reliable, low risk service] for AI, we need standard AI safety benchmarks." .... Peter Mattson, founder and president of MLCommons

    Any advance on guaranteed failsafe steps and solutions? That’s where all the SMARTR money will be going. .......SMARTR Mentoring Analysis Reporting TitanICQ Research or, as things progress into the more interesting and discerning of novel and noble fields, as they most certainly have and will always do ..... SMARTR Mentoring Analysis Researching TitanICQ Reports and/or their Reporters.

    Who Dares Win Wins Virtual Team Terrain in AI Territory.

    1. heyrick Silver badge

      Re: Is it any more difficult than Dan Dare like rocket science?

      With Llama [1], there's no SMARTR money. Just a bunch of people getting rich off the backs of others throwing cash around like it's going out of fashion in the hopes that they can bank it big time when this whole AI thing gets real.

      But, alas, this is surely going to play out like the DotCom boom...

      1 - I actually wrote "LLMs" and my autocorrect was like "nope, you're talking about llamas, yes you are, shut up and talk about llamas and not your glorious AI overlords".

      1. amanfromMars 1 Silver badge

        Re: Is it any more difficult than Dan Dare like rocket science?

        I have to agree with you, heyrick, there’s more than just a great many getting themselves involved in something incredibly tricky and sticky and able to easily be extremely dangerous [as in life threatening] without the faintest clue about what they are doing and what it is best to do next in order to survive and prosper.

        1. Anonymous Coward
          Anonymous Coward

          Re: Is it any more difficult than Dan Dare like rocket science?

          And I too must agree with you. Tricky and sticky indeed.

          aman, the grind was worth it for this one you baited.

  5. Doctor Syntax Silver badge

    It's all very hand-wavy as to what "safe" means let alone how to measure it. Who's to say that if the model doesn't advise eating stones or glueing topping on pizza it won't, with a different prompt, advise substituting crushed glass for salt? It's not feasible to test every prompt and vet every answer.

    1. Anonymous Coward
      Anonymous Coward

      model doesn't advise eating stones

      ive had stone soup in Hong Kong

  6. Anonymous Coward
    Anonymous Coward

    No hope

    ""If you look at aviation, for instance, you can look all the way back to the sketchbooks of Leonardo da Vinci – great ideas that never quite worked," he said. "And then you see the breakthroughs that make them possible, like the Wright brothers at Kitty Hawk."

    looks like it is over.

    That comparison in such a concise encyclopedic way is enough to tell you that this Muppet has either made the parallel himself or someone has simplified a too complex model to him.He sounds like middle-management who got lucky because of the boys down the Lodge. Like most of them, really.

    This might get some traction, but it is 7 years too late. Should have built this in at the beginning, but no sane person expected some twat like Altman to boast his energy shares and throw a squidllion GPUs at this current architecture. Nobody would be so dumb. Just the energy costs alone make this future-leader-gamble make Uber or similar like chicken feed. They are throwing almost everything at it. And the money isn't really going to people like web dayz. It is going mostly on the hardware. These bubbles are needed to pay for the infrastructure. Without the web bubble, we would have half we have today. The AI bubble is different to all that followed because this bubble eats itself and there are no winners and losers.We are just... just.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like