The Register Home Page

back to article PrismML debuts energy-sipping 1-bit LLM in bid to free AI from the cloud

PrismML, an AI venture out of Caltech, has released a 1-bit large language model that outperforms weightier models, with the expectation that it will improve AI efficiency and viability on mobile devices, among other applications. The model, dubbed Bonsai 8B, manages to be small and fast, with modest power demands and …

  1. munnoch Silver badge

    1 bit?

    I'm waiting for the 0 bit model. It'll outperform all the others significantly in terms of resource usage...

    1. Paul Herber Silver badge

      Re: 1 bit?

      Is that signed or unsigned?

      1. David M

        Re: 1 bit?

        > Is that signed or unsigned?

        The article answers this: the weights are +1/-1 with a shared scale (exponent), though it doesn't say how many weights share the same scale, or how big the scale factor is. But one could imagine something like a 32-bit value containing 24 1-bit weights (±1) together with a shared eight-bit exponent. Of course that would limit the available values - each weight would have to be ±2ⁿ, and n would have to be the same for the whole group, but I guess that's good enough for this kind of model.

        1. Anonymous Coward
          Anonymous Coward

          Re: 1 bit?

          Whooosh!

          >>> I'm waiting for the 0 bit model

          >> Is that signed or unsigned?

          1. NoneSuch Silver badge
            Devil

            Re: 1 bit?

            1-bit LLM.

            'Computer says no...'

        2. Anonymous Coward
          Anonymous Coward

          Re: 1 bit?

          Yeah, the gory details are in the "white paper [PDF]" (TFA link) where their Q1_0_g128 1-bit Format is described as storing "one sign bit per weight and one shared FP16 scale for each group of 128 weights". They also note that "1-bit Bonsai 8B is built from [Alibaba] Qwen3-8B".

          Interestingly, the "1-bit Hardware‍" section of the "model, dubbed Bonsai 8B" (TFA link) notes that the reported performance "gains come primarily from the reduced memory footprint of 1-bit models, not yet from fully exploiting the 1-bit structure of the weights during inference", and, wrt future hardware "1-bit weights make it possible to perform inference with little or no multiplication, replacing much of the computation with simple additions" -- which should be a great thing in this specialized space.

          Their high density regime slims down hefty models of languagerie without effective weight loss, which is nice. But they obviously don't answer the age-old question of what kind of actually useful stuff (rather than fashionable oopla) these (now fat-free) talkative portly models are good for, especially if they don't sport the latest in antigravity cleavage-enhancing harnesses and doomsday YOLO claws. I mean, couldn't the procedural skills cantilevered by such girdles be just as well showcased without a corpulent underlying model in the first place (whether rotund or skim)?! And if not, why not?

          1. bombastic bob Silver badge
            Devil

            Re: 1 bit?

            time to get out my slide rule and make sure the "shared factor" is relatively logarithmic to expand the range [if not already]. Power of 2 as an exponent might also be a good addition, i.e. one additional byte for 2^+/-127 .

            A 16 or 32-bit to 8 bit "log" lookup could pretty fast. I assume they're not doing something like this already...

            /me has done something *like* this before in a microcontroller experimental project...

      2. PM.

        Re: 1 bit?

        Yes

    2. O'Reg Inalsin Silver badge

      Sorry - that's already been patented

      Empty Mind 無心 The experience of an instantaneous severing of thought that occurs in the course of a thoroughgoing pursuit of a Buddhist meditative exercise.

      But then, so was "what is the sound of one bit flipping", but here we are.

  2. brep51r
    Linux

    everyone wins when slop machines get cheaper

    this is the logical end point isn’t it? soon enough improvements at the cutting edge of the slopularity will matter less for the average slop monkey and super optimised, edge device models like these will receive filtered down improvements from previous generations. plus, on-device offers inherent efficiency and privacy advantages. that ought to spell trouble for the funding case for hyperscalers.

    i can imagine AI bros insistently sticking to bleeding edge SaaS LLMs but the moment on device becomes good enough for everyone, the big firms will have lost the market.

    1. Jeroen Braamhaar
      Thumb Up

      Re: everyone wins when slop machines get cheaper

      You can draw the parallels to manufacturing:

      Early manufacturing had huge machines with low efficiency and limited usability.

      Over time, machines got better, more versatile .. and then then CNC revolution happened.

      Right now you can already get CNC machines at the home level now (micro mill/lathe), to say nothing of how 3d printing has revolutionized DIY.

      It's likely that eventually this will happen to LLM's as well.

      Will it be a better world ? I have no clue. I can't (and won't) judge -- but it'll be interesting to watch.

    2. bombastic bob Silver badge
      Big Brother

      Re: everyone wins when slop machines get cheaper

      I just want local-only speech recognition and natural language processing so everything I say isn't run through some cloud server...

      1. brep51r

        Re: everyone wins when slop machines get cheaper

        unless i’m quite mistaken about this, we are already at the point where this is possible using newer generation special purpose distilled models. there’s a raft of apps that just wrap a gui and some workflow integrations around local text recognition.

  3. An_Old_Dog Silver badge

    LLM Performance Measures

    All the specs I've seen about LLMs to date, rate their performance in tokens generated per second.

    These are not tied to "correct and useful output".

    Have better LLM benchmarks been invented yet?

    1. Korev Silver badge
      Flame

      Re: LLM Performance Measures

      Still better than capacity being measured in GW ie how much damage to the planet their datacentre is doing

  4. Bitsminer

    Benchmarks

    We define intelligence density as the negative of the log of ....

    Passenger: Will I arrive alive to my destination?

    Airline We have the least fatalities per million miles flown...

    1. This post has been deleted by its author

  5. T. F. M. Reader

    Apples, oranges, avocado....

    I asked myself what the ... "intelligence density" was when I first encountered it in the article. I decided to read on. "[A] metric that shows its models in a good light" provided a good answer.

    [Speaking of metrics, tokens per second seems to belong to a disjoint category, especially as pricing tends to be measured in dollars per token or similar nowadays.]

    I am happy to note that the model will fit into the RAM of my oldest still working computer (and it is old). I'd give it a spin, but will it give me a 1-bit answer (yes/no) where appropriate?

  6. T. F. M. Reader

    Intelligence density

    Is it intelligent or dense? A 1-bit answer will do.

  7. Groo The Wanderer - A Canuck Silver badge

    I'll have to see if I can find their paper anywhere; this is an interesting idea, especially if it performs as well as they claim. In particular, I'm wondering whether it would make it feasible to run one of the currently-huge LLMs that take up around 100GB of VRAM to run sanely on a 12GB consumer card. If they could accomplish that, it would yank the rug out from the big model companies and see to it that the likes of OpenAI, Microsoft, and others trying to hoover up all your data, modelling, and architectural approaches to your project by letting you do everything locally. That leak of Anthropic code should be a real eye opener to everyone as to just how insanely greedy those "businesses" are.

    1. bombastic bob Silver badge
      Alien

      you cannot blame them for wanting to earn back what it costs to develop the tech...

      However the PC revolution is a history of a pendulum swing between 'heavy client / light server' and 'heavy server / light client'. TODAY the AI is "in the cloud". When it's "on the LAN" or "on the PC/phone" we will have those fully autonomous robots and devices that understand natural language [even with accents] that you see in sci-fi. C3PO could be your next appliance.

      1. Groo The Wanderer - A Canuck Silver badge

        LLMs are useful tools but they are not and never will be "intelligent." It's a dead end approach to AGI - they're only even talking AGI as part of the ever popular "pump the stock" game for those who don't actually understand the technology vs. what is actually going to be required.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon