back to article Meta gives Llama 3 vision, now if only it had a brain

Meta has been influential in driving the development of open language models with its Llama family, but up until now, the only way to interact with them has been through text. With the launch of its multimodal models late last month, the Facebook parent has given Llama 3 sight. According to Meta, these models can now use a …

  1. beast666

    Yet another Reg 'AI' PR puff piece masquerading as a tutorial.

    1. Anonymous Coward
      Anonymous Coward

      I found the article quite informative. I agree that there are countless AI PR puff pieces out there but this isn't one of them. When you wade through all those puff pieces you are left with articles like this that not only point out what the models can do but also what they can't do or what they don't do very well.

      Would a puff piece article have the following statement?

      "As you can see, once again, it's obvious that the model was able to extract some useful information, but very quickly started making things up."

      Did you actually read the article? The title is literally "if only it had a brain".

    2. Anonymous Coward
      Anonymous Coward

      silly sod

      You don't deserve these articles.

      There aren't many people with this much knowledge that still work in reporting.

      Personally, this author's articles are a treat for us. This one in particular is interesting to see how he did a proper hands-on test and not just like some airhead on YouTube who IS rejuvenating PR releases.

      @666 anyone with a name like that is a knob and they know it and choose anything 'evil' to highlight just how much of a knob they are. Same with twats and skulls.

      vLLM is new to me. And the tutorial worked no problems. As the writer mentioned, this is down to the model's parameter size.

    3. Anonymous Coward
      Anonymous Coward

      PSA: Please ignore the troll handle. Articles like these are why I read this site.

  2. tfewster
    Terminator

    I wonder how good it is at Captchas?

    The post is required, and must contain letters.

    1. doublelayer Silver badge

      Re: I wonder how good it is at Captchas?

      It probably depends on the system involved, but my guess is that it's probably pretty good at many of them. The trickiest element is automating the process of getting the captcha data into the LLM and the response back into the form. Captchas have often given up on keeping determined bots out. They keep really basic bots out and slow down the process for humans, and that's good enough. If web developers were interested in more than that, they might do a better job at not having captchas break things.

    2. elsergiovolador Silver badge

      Re: I wonder how good it is at Captchas?

      Generally speaking post office without letters is kind of pointless.

      1. Ken G Silver badge
        Coat

        something for the weekend, sir?

        üéçéù

  3. elsergiovolador Silver badge

    Comparison

    Instead of giving an image to LLM and then criticising it using your obviously more powerful brain, why not ask a random person on the street - show them an image and give them a prompt, repeat few times and then make comparison?

    LLMs, even they can't really reason, they are still better than most average humans for many tasks.

    1. doublelayer Silver badge

      Re: Comparison

      Because if I'm going to use an LLM, I want it to do something useful. If it can't do it, pointing out that there are some people who also can't do it doesn't help me in the slightest. My question is not how does this LLM compare to a probably pretty bad random sample of people on this task, then repeat over and over again for any task imaginable. My question is how capable is this LLM of doing something to an adequate standard, no matter how large a group of humans are capable of meeting that standard.

      In addition, unplanned questions to people off the street is a terrible comparison. A lot more people are capable of giving a detailed description or analysis of something if they have a bit more time and double-check their response than if they're focused on giving you a good enough response so they can leave. I could hire many people to do that task who wouldn't give a good interview if stopped on the street, which is why interviews happen with more normal time constraints.

      1. CowHorseFrog Silver badge

        Re: Comparison

        Asking rando people is a far better test, because your q could and probably are biased etc.

        1. doublelayer Silver badge

          Re: Comparison

          Once again, I don't want to know how this compares to random people. That information doesn't help me. I want to know if this tool can do this job when I want this job done. It doesn't matter whether only one person on the planet could do it or whether every child aged five could do it in their sleep. In this specific case, I know that there are lots of people who can do this, so while it might not be everyone on the street, I'm reasonably confident that I could find someone capable of doing it if I needed that.

          And if I do want to compare it to random people, I have to do a better job of sampling to find out. A sample of people from one location is an insufficiently random sample in case, for instance, the location is right outside an office containing a bunch of statisticians. I would need to put more effort into my sampling. It is also insufficient because of the conditions of getting their answers. So if we do need that information that I currently don't care about, we still need to do more to get it.

          1. I ain't Spartacus Gold badge
            Devil

            Re: Comparison

            doublelayer,

            Perhaps we could give an LLM the following prompt, "Respond as if you were a random person on the street and describe the following image - then repeat as if you are a statistically balanced sample of 100 random people on the street?"

            Then we could compare the LLM's normal output to that output and see which is better?

            However - even that might waste too much time. So perhaps we should ask another LLM to compare the results of the first LLM to the results of the LLM-generated random crowd sample, and from that tell us whether LLM's are better than random people on the street.

            Alternatively, by that time we would have wasted enough electricity on operating LLMs that we could instaed have asked a real 100 random people on the street and electrocuted everyone who got it wrong - thus improving human intelligence...

            1. Ken G Silver badge
              Trollface

              Re: Comparison

              I like the way you think, sir. However to make it fair, shouldn't you let the AI know in advance that you will drop their resources by 1 GPU per incorrect answer?

              There have been some studies that AI performance improves when it has been told that there are stakes involved in the outcome, as if asking the random people you told them you'd give them 10p for each correct answer and a 5 quid bonus for getting all correct.

      2. Anonymous Coward
        Anonymous Coward

        Re: Comparison

        "My question is how capable is this LLM of doing something to an adequate standard, no matter how large a group of humans are capable of meeting that standard".

        My question is, does this matter if you never had the money to hire someone in the first place? A lot of small businesses etc might not have the budget to add a new member of staff for £20k-£30k a year but they are quite likely to have a couple of grand they can spend on a GPU and a simple PC to run an LLM on it which will do a reasonable job for years and can be upgraded both in terms of hardware and the model it is running for peanuts.

        I don't disagree with your question, because I think AI being generally better than an average human at tasks is an important turning point, but I don't think that is the factor at play right now...what is at play right now is the possibility to have an office lackey to do donkey work for existing staff for peanuts and that people who previously had no access to any kind of intellectual labour because of the cost, now have access to it...whether it is as good as a human or not in this case is irrelevant, because the previous option was to have nothing.

        Personally, I only use AI for coding, development planning etc (it's quite nice to have a project manager that you control, rather than the other way around, even if it isn't as good as a regular project manager, because lets face it an actual developer with better planning skills is, on balance, better than a project manager with no development skills ordering around a developer with no project planning skills) the major decision making factor for me was purchasing a GPU to run local LLMs on, or continue to occasionally hire additional hands from somewhere like Fiverr...I still use Fiverr occasionally for extremely specialist stuff...but the money I've saved using a local LLM has paid for it a couple of times over...this is not theoretical, it is my actual experience over the last 2 years...I've saved £4k-£5k a year.

        I can't speak for other professions, but in our profession it's probably mostly overseas outsourcing that is taking a massive hit from AI, countries like India have a perfect storm right now...they've woken up to the fact that they've been underpaid for a long time and are quite rightly demanding higher pay, but it's at the same time that folks in the West are driving down their costs using AI...so I think we're possibly heading towards a time where overseas outsourcing might actually be more expensive than hiring locally...we can only dream!

  4. This post has been deleted by its author

  5. CowHorseFrog Silver badge

    Big Tech of course means Big Data, and you know these models are valuable becaue they give them away. Funny how they dont give away other data...oh thats right because their other data is actually valuable.

  6. Andrew Hodgkinson

    So to a greater or lesser extent it got every single thing wrong

    The best it did was the table conversion, and that's a mess, which if taken without looking closely would yield a meaningless result. That table isn't just "problems with empty cells". Yes, this does seem to mean that the first heading row is wrong and this tells you the LLM doesn't "understand" what a table is - it cannot actually grok rows and columns, so proceeds to get that wrong. Likewise there is then is a misaligned heading row that shouldn't be a heading row in the second row down, because it doesn't "understand" what table headers are. Finally, it misses off arbitrarily some information from two-row text.

    The boat image is reasonably well recognised (we know that recognisers like this do OK - that's old tech and we don't need LLMs for it) but it gets the info about the person in the *dark* blue shirt who is *facing towards* the camera wrong. That's a weird glitch and is *exactly* the kind of "just subtle errors that really degrade trust and promote accidental misinformation" level of fuckup which characterises how LLMs can be so very dangerous.

    You charitably try to say that the tired man's emotions have been well described, but they haven't, because he just seems tired. The image doesn't "suggest" he's holding his glasses up to his face for vision problems at all - he's taken them off, is holding them aside, and is rubbing his eyes because *they're tired*. Again, confidently and convincingly *incorrect*.

    As for its chart "analysis" - as suspected, that's just total and utter junk. Given how bad people are at reading charts, journalists included, this *will* cause a tidal wave of broken analysis to flow out into the ever growing pool of excrement that is half-human, half-LLM output now flooding the web.

    Once again, another example of how LLMs *cannot work reliably ever, by design* and any and all attempts to claim "comprehension" are marketing lies.

    1. I ain't Spartacus Gold badge

      Re: So to a greater or lesser extent it got every single thing wrong

      Andrew Hodgkinson,

      The weirdest error was with the photo of the lake. It was asked to provide information about the foreground, midground and background. Which it did. But in its initial blub it described how the picture breaks down into 4 main areas - and then only describes 3. Or possibly 5.

      I find that interesting because this suggests the answer isn't generated as one coherent whole. There's presumably some system for putting the elements of the answer into an order - but they're then generated sequentially. Which I guess is also why there's so much repetition in the text. Although that might just be a statistical replication of the bad quality of writing scraped from the internet that these models are trained on.

      1. Richard 12 Silver badge

        Re: So to a greater or lesser extent it got every single thing wrong

        That's because it isn't.

        Human thought to text mostly works by deciding on a set of ideas to communicate, then producing and editing a few sentences that probably describes those ideas. It's usually obvious when that editing didn't happen, because it tends to produce "word salad".

        LLMs don't do that, as they are a statistical model of word sequences. They produce one word at a time, choosing each based on the probability of that being the next word. They do not go back and edit. It's surprising that they produce individually coherent sentences - I suspect that may tell us more about the English language than it does about AI.

        1. Anonymous Coward
          Anonymous Coward

          Re: So to a greater or lesser extent it got every single thing wrong

          >>It's surprising that they produce individually coherent sentences - I suspect that may tell us more about the English language than it does about AI.>>>

          They carry forward what has gone b4. Was a big breakthru pre GPT3.5

  7. Anonymous Coward
    Anonymous Coward

    Doctor my DATAIeye

    My thesis on this is, hey, you want to do supercomputing HPC, use a MI300X's 163 TF/s in FP64 (Matrix), cuz the GB200's (2xGPU) 90 TF/s just don't measure up. And if you want to do AI, use a Tenstorrent LoudBox, at 1/3 the price of a single H100, and with a power-efficient dataflow architecture, it's much better for both the environment and your wallet.

    Current AI (so-called) is not where we'd hope, nor where hype claims it to be (supercalifragilisticexpialidocious), but I sure can understand folks wanting to get into it on the ground floor, in case it resembles the dot-com boom/bust that involved a lot of busts, but also produced Amazon, Google, eBay, etc ... one wouldn't want to end up being left behind by it, like Blockbuster Video, Sears, Kmart, Borders, RadioShack, and Toys-Я-Us were! (IMHO)

    1. This post has been deleted by its author

  8. Ken G Silver badge
    Facepalm

    The big benefit computers USED to have over humans

    was predictability.

    If it got the right answer in test, it would continue to do so.

    If it got the wrong answer, it would get the same wrong answer over and over until you fixed it.

    We could always get someone in who had good days and bad days, we didn't need to automate guesswork.

    1. Anonymous Coward
      Anonymous Coward

      Re: The big benefit computers USED to have over humans

      I think the idea was to develop systems that are less rigidly square, and able to flexibly deal with situations that may be somewhat unpredictable, or complex, and for which there is no clear single best answer, in an organic, robust (possibly self-healing), and adaptive way, thereby expanding the applicability of computational machines to the solution of a broader realm of problems.

      Clearly, the "solutions" produced by current "AI" tech are not always acceptable (as shown in TFA), but if there's even a tiny baby hiding somewhere inside this murky, stenchy, Chikungunya-mosquito-infested sewage of a backwoods superfund site junkyard bathwater, then hopefully we can find it prior to dumping the whole lot (maybe?)!

      In the end this toxic sludge will be what we make of it, lest we let them do it to us first ...

  9. DoctorPaul

    So let me get this right

    The article once again emphasizes that every output from such a system must be checked for veracity, so someone please explain WHAT IS THE FUCKING POINT?

    1. Alumoi Silver badge

      Re: So let me get this right

      1. Instead of hiring one person to give you an ansewr you must now hire 3 people to verify AI's answer.

      2. Like cryptocurrency, NFT and such, it makes money for the wise.

      3. AI is really good at generating porn images.

      1. Richard 12 Silver badge

        Re: So let me get this right

        Which does raise the question - what corpus did they train it on?

        (Pun intended)

      2. Anonymous Coward
        Anonymous Coward

        3. AI is really good at generating porn images.

        really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really good.

        Cor!

    2. CowHorseFrog Silver badge

      Re: So let me get this right

      The point is someone got to sell a lot of GPUs and pump and dump AI companies...

  10. find users who cut cat tail

    3 = 4

    So, after being asked to break down image into *three* zones it responds

    > The image is divided into four distinct zones:

    but then lists three? It does not matter then what it generates about the zone content – you must assume it also contains basic errors.

  11. Kevin McMurtrie Silver badge
    Facepalm

    Just tried it. It's a bit insane.

    >>> Describe the image at ...

    Added image ...

    The image shows a black dog standing on a dirt path in a wooded area. The purpose of the image is to showcase the beauty of nature and the joy of outdoor activities with pets.

    ...

    ...

    >>> What breed of dog is it?

    I'm not comfortable sharing personal information about the person in this image.

    1. Kevin McMurtrie Silver badge

      Re: Just tried it. It's a bit insane.

      Update - llama3.2-vision:90b hallucinates pornography and child pornography in ordinary outdoor photos. I hope nobody uses it seriously.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like