back to article Benchmarks show even an old Nvidia RTX 3090 is enough to serve LLMs to thousands

If you want to scale a large language model (LLM) to a few thousand users, you might think a beefy enterprise GPU is a hard requirement. However, at least according to Backprop, all you actually need is a four-year-old graphics card. In a recent post, the Estonian GPU cloud startup demonstrated how a single Nvidia RTX 3090, …

  1. Gene Cash Silver badge

    GTX 1060 6GB

    I didn't know benchmark software could snicker. I guess we do have AI after all..

  2. Michael Hoffmann Silver badge
    Meh

    Old?

    <gently pets his 3090 that still does anything a 4080 can do now>

    There, there, the article writer is just a big meanie. You're not old, you still can keep up with the kids, just fine! I'll still put all the settings on Max/Ultra and you'll chug along in 4K at around 60fps.

    1. katrinab Silver badge
      Windows

      Re: Old?

      On the CPU side of things, an i7-3770 is still a very capable chip, and that is about 12 years old now.

      1. druck Silver badge

        Re: Old?

        I had an i7-4770 desktop before they stated making us use laptops, best box I ever had, could run all day flat out, no throttling, no fans going mad.

    2. Anonymous Coward
      Anonymous Coward

      Re: Old?

      We have a TITAN XP running our AI LLM in-house. Not a power house by any means, but with 12GB vRAM it works just fine.

  3. mark l 2 Silver badge

    Didn't someone demonstrated a few months ago that you can even get an LLM running on a APU inside a Ryzen chip if you give it enough VRAM?

  4. HuBo Silver badge
    Go

    Angle on protracting the Arc?

    The Arc 770, that some folks seem to have at hand for much enjoyed Hands On-type of AI shenanigans (tutorials), looks to have perf in some way comparable to the 3090 (say 39 vs 36 TFLOPS in FP16) ... and maybe a slightly lower price point. Inquiring minds might relish seeing an upcoming "tuto"/PoC where Llama 3.1-8B is run at 1-10 concurrent requests on this Arch (and compared to the Estonian plot for RTX 3090 world domination) imho.

    1. HuBo Silver badge
      Holmes

      Re: Angle on protracting the Arc?

      I guess part of the secret sauce here may be vLLM's use of continuous batching (up to 23x throughput improvement) ... a technique that likely inspired nVidia's H100 in-flight batching (if I read well).

  5. John Smith 19 Gold badge
    Coat

    Soooo.

    Size the app to keep it all in main memory.

    As good a piece of advice today as it was when designing for a Cray 1.

    And let's just take a moment to consider that 142 terra FLOPS (albeit at 16bit accuracy) is not even SoA in 2024.

    My instinct in this is the real trick is devising tools that can analyse numeric source code and map them into the most resolution preserving ways.

    For example Pi is approximated by 22/7 (Error 4x 10^-4) but the lesser known 355/113 (error 8x 10^-8) has an error 10 000x smaller for one additional digit top and bottom. Which as a layman I think is pretty impressive, but the real trick is the algorithm that can be applied to any calculation to find those approximations.

    I'll leave it there.

    1. Michael Hoffmann Silver badge
      Coat

      Re: Soooo.

      "142 terra FLOPS :

      Yes, but some of us prefer our computers a little less... earthy.

      1. David 132 Silver badge
        Coat

        Re: Soooo.

        For my computing, I use a timing signal from the moon. It works for me, but call it lunar-tick if you will…

        1. HuBo Silver badge
          Alien

          Re: Soooo.

          Them Lunar-ticks are crazy accurate for this, just a mad bin and loon away from the obsessive-compulsive Mars-Eniac standard, both major improvements over the torturously thin Neptune tood-le, pegged through Uranus with super-positry roulette timing (or so I'm told ... not an expert).

          It's key tech to benchmark computational astronaut jobs ... amazing that these flops work at all ... can't wait for the coming of age of the Zitty-scale (after the earthy terra-, yummy pita-, and 6-sided hexa-scales)!

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like