back to article Just how deep is Nvidia's CUDA moat really?

Nvidia is facing its stiffest competition in years with new accelerators from Intel and AMD that challenge its best chips on memory capacity, performance, and price. However, it's not enough just to build a competitive part: you also have to have software that can harness all those FLOPS – something Nvidia has spent the better …

  1. Yet Another Anonymous coward Silver badge

    purportedly up to 95 percent

    And as we all know, the first 95% of the port takes 95% of the effort, while the last 5% takes 95% of the effort

  2. mostly average
    Mushroom

    Intel Arc...

    ...Sucks. I have an A750 with "8Gb VRAM." I put that in quotes because you can't initialize an array larger than 4Gb using ipex or oneapi. So technically it has 8Gb, you just can't use it for compute. Intel, aware of their bug, has stated they have no intention to fix it. Intel Arc is useless for language models. It's almost a passable gaming card, except they refuse to support VR. I'm never buying Intel again. My next computer will have an AMD processor and Nvidia graphics. Icon because buyer's remorse.

  3. Jonathan Richards 1 Silver badge

    It's a small point...

    ... but since when did we write C++ as C-plus-plus?

    1. Anonymous Coward
      Anonymous Coward

      Re: It's a small point...

      Maybe a literary sociolect of the webomorphic persuasion, as seen in https://visualstudio.microsoft.com/vs/features/cplusplus/, while some concisely prefer CPP (eg. https://www.w3schools.com/cpp/), and yet others lean towards unreadable hexadecimal percentages: https://en.wikipedia.org/wiki/C%2B%2B. A belt and suspenders principle might suggest: C+Plus%2B+Plus%2B! </ahem!>

  4. Anonymous Coward
    Anonymous Coward

    Quite unexpected

    I guess it was particularly hard to foresee the raytracing of this CUDA castle moat by king Nvidia, seeing how he spent most of his time playing video games. And yet, there it is, in all its nearly unassailable medieval glory, mostly safe from catapults, battering rams, ballistas, siege towers, and other telescoping software!

    But this period might not bear long-term remembrance as history lest it be associated also with such social movements as grand fanaticism, inquisition, and related embrace of an equivocal orthodoxy, along with specialized torture methods for the heretic rebel fighters: surprise, fear, ruthless efficiency, and torture by pillow and comfy chair. Didn't expect that?

    Well, nobody expects the Spanish Inquisition either! ;)

    1. MyffyW Silver badge

      Re: Quite unexpected

      I'm not sure what I expected in the comments section, but it wasn't that.

    2. HuBo Silver badge
      Gimp

      Re: Quite unexpected

      Speaking of moats, just saw this quite strikingly cool one (albeit somewhat in ruins) in "Meurtres à Château-Thierry" on French TV. It's actually at the Château de Fère en Tardenois, but not too far from Château-Thierry proper. The site features an awesome covered bridge (as seen on the photo). French Champagne is made to the East of this (eg. Epernay).

  5. John Smith 19 Gold badge
    Unhappy

    Amanfrommars competitor?

    Sounds like it.

    But seriously if a hardware mfg wants to compete with another hardware mfg isn't up to them to make the transition path as smooth as possible? the lower level that happens at the wider the market.

    Logically every command could be mapped with macros provided their hardware can somehow carry out the exact same task.

    There are 2 challenges. The external challenge is to make developers code "Just work" so it's no more of a chore to run on your hardware than nvidia's but the internal challenge is to maintain a commitment to improve the port till it gives both 100% compatibility (or at least nothing but a few minor tweaks, the kind a simple script could make) and at least 100% speed compared to a native nvidia port.

    Time will tell which of its competitors executes this well enough to look to developers like a serious competitor.

    And who knows, maybe prices might start coming down?

    1. MyffyW Silver badge

      Re: Amanfrommars competitor?

      The price of pick-axes and spades will stay high as long as the insane gold rush of LLM remains in full fever.

      Personally, I used to like amanfrommars ramblings assuming they came from his own strange mind and perhaps a hand-crafted algorithm or two. Now that any old generative engine can spew it out, it's much less entertaining (but I was tickled by the idea of "torture by comfy chair and pillow")

  6. Pascal Monett Silver badge
    Windows

    "people want to write at higher levels of abstraction"

    Well duh, ya think ?

    Who wants to write code like

    ASR A

    BCC ASC

    LDA A ACIA+1

    AND A #$7F

    Not me. I prefer writing

    Function getASC(data as String) as Integer

    Dim Char as String

    Char = left(data,1)

    getASC = ASC(Char)

    End Function

    When I look at Assembly code, I have no effin' clue what it is supposed to be doing, whereas even a non-developer can take a guess at what my preferred code is doing.

    1. CowHorseFrog Silver badge

      Re: "people want to write at higher levels of abstraction"

      Some of those 6502 mnemonics are invalid. Not sure why you are inserting those extra "A"...

      its LDA not LDA A, theres no point to the second A, as the LD A already mentions the destination.

      Same for AND, they always imply the A when an immediate value is iused.

      1. Lord Elpuss Silver badge

        Re: "people want to write at higher levels of abstraction"

        Used, not iused.

        If you're going to pick apart somebody else's post to point out an irrelevant detail, you could at least learn to write yours correctly.

        1. CowHorseFrog Silver badge

          Re: "people want to write at higher levels of abstraction"

          Oh yes im wrong because you say the lord of the universe says so....not because you give a sample with corrections.

          1. Lord Elpuss Silver badge

            Re: "people want to write at higher levels of abstraction"

            You’re still wrong. As usual.

            How’s your ratio of up/downvotes? Yep that’s what I thought.

  7. TheLLMLad

    Ultimately CUDA is under assault from three vectors;

    First, enthusiasts and hobbyists who are unable to afford Nvidia hardware with sufficient amounts of VRAM to do anything interesting.

    Second, the manufacturers themselves.

    Third, and perhaps more importantly, all of the hyperscalers are investing heavily in their own accelerators. None of them can tolerate the current status quo; most can't even secure enough Nvidia chips to do what they want.

    I think the long term prognosis for Nvidia remains dim, not in a "Nvidia will fail" way but a "it'll go back to being an AMD sized company and not an Amazon sized one". The moment right now feels a bit like the mid-10s when it appeared that Intel had a monopoly on x86 and nobody else could challenge it (eventually AMD would come to save the day, as it were, but arm started being taken seriously as well as a result of that time), but perhaps even more volatile. Nvidia does have a lot of the fab time booked and HBM bought which will make competition somewhat difficult, but you can already see Google and now Anthropic (via AWS Inferentia) migrating towards custom accelerators and away from GPUs.

    And of course the GPU architecture isn't really purely optimized for Matmul, there's a lot to be said for, say, cerebras' waferscale approach (which also handily doesn't rely on HBM).

    1. O'Reg Inalsin

      Isn't Cerebras' aiming primarily for the so-called "language output inference" (better name "statistical-inference of language output") niche? Similar to another "language output inference"-niche company "Groq"? So they are not exactly directly competing with NVidia, because NVidia hardware can used for both "training" and "inference".

  8. Draco
    Mushroom

    K-T Boundary?

    The CUDA "moat" is a classic example of the first-mover advantage in an ecosystem.

    It's not just about technical parity but also developer convenience. HIPIFY, SYCL, and ROCm offer alternatives, but often require manual intervention, have compatibility issues, or lag in performance. NVIDIA's strength lies in its relatively unified ecosystem.

    Custom accelerators, however, may challenge Nvidia's moat from outside rather than within.

    Displacing a dominant player is always hard, whether in computing or evolution. But disruption often emerges from an unseen niche that overturns what seemed unshakable. Either that or the equivalent of an monster asteroid strike that shakes things up a bit.

    1. Yet Another Anonymous coward Silver badge

      Re: K-T Boundary?

      I think NVIDIA's moat was more "going all in" than first just mover.

      There was a while when the new OpenCL had a lot of interest. Certainly in academic circles it was a lot more popular than CUDA, there is a lot about the language design that was nicer.

      But nobody really committed to the HW, everyone had an "OpenCL implementation", but they were all slightly non-standard, didn't keep up with new versions and had terrible tooling. Soem <cough>AMD<cough> had terrible unfixed bugs in core routines like FFT but nobody cared because the market was games or laptops - OpenCL was just a checkbox to meet some purchasing requirement

      NVIDIA on the other hand bet everything on GPU compute, years before AI, not just HW but the tooling and support around CUDA.

  9. Anonymous Coward
    Anonymous Coward

    Antitrust?

    Especially in the US or the EU, their respective legislative bodies could insist by x date that *all* gpus operate to an open standard, or at least offer tools to completely convert an open standard to their lower level code without induced performance penalties.

    1. doublelayer Silver badge

      Re: Antitrust?

      They could try, but it likely wouldn't work. There are already open standards. Someone mentioned OpenCL. That's still around, there's your open standard, and the chips already support it. Every manufacturer can claim compliance on that. It's not their fault that people are not writing for it. Having an open standard doesn't do anything if people choose to write for the closed versions. If you try to forbid the closed versions, then you'll get a lot of complaints from all the people whose code you've just disallowed if they don't just ignore you.

  10. Mostly Simian

    Anyone who says they have no moat...

    ...hasn't tried to get tensorflow/pytorch/spacy/whatever to work GPU accelerated on NVidia vs basically anything else. Just the other day I went on AMD's website to find out how to get pytorch to run GPU accellerated on AMD. They wanted me to downgrade Pytorch, use their specific branch, patch a bunch of other stuff, boot with a special kernel param etc. When I did that, it still didn't build. After hitting it with a stick a few times it built but didn't actually run gpu accelerated. This is pytorch - which is the most important library for ML.

    The exact same thing on nvidia? Just builds. The latest pytorch, the standard upstream of everything. You don't have to do anything special- it just builds and runs GPU accelerated straight away.

    If AMD or Intel or anyone else want to actually get serious they will stop putting out press releases and do the work that's necessary to get their shit to actually build and work with the 2 frameworks that actually matter. Until then I'll believe in the nvidia moat.

    1. John Smith 19 Gold badge
      Unhappy

      "stop putting out press releases and do the work"

      Yes.

      That's about it.

      Time will tell who does this and who just keeps talking about doing it.

  11. rwill2
    Go

    What about Apple M2 Vs NVIDIA / AMD

    The M2 chip from Apple boasts superior efficiency in power and thermal management, making it highly suitable for LLMs and general AI applications with lots of dev libraries support. Although Apple's market share in the AI hardware landscape is smaller compared to NVIDIA's dominant position (maybe less on servers but did see AWS offer them), it is progressively encroaching on AMD and well lots of devs use macs over x64/x86!

    1. John Smith 19 Gold badge
      Coat

      " What about Apple M2"

      That's a techie answer, not a business answer.

      Nvidia gives you vendor lockin with a supplier.

      M2 gives you vendor lockin with a competitor.

      And pretty soon you're not their competitor anymore, you're their acquisition.

      People have seen the "Enfold, extend, extinguish" game played plenty of times already.

  12. Dave81

    I don’t get it

    Why don’t these vendors just build a CUDA compiler or PTX assembler for their own GPUs? Is there some sort of IP issue blocking their freedom to do this?

  13. Chz

    Grudging admiration

    As aggravating as Nvidia can be at times, I do have a grudging admiration for their ability to execute their plans over the past 10 years or so. Sort of like Intel was 15 year ago. AMD and Intel can't get into the market not because their products are particularly inferior (though you could argue lack of CUDA makes them so), but because Nvidia hasn't put a foot wrong in a very long time in IT years. Even with a *better* product, when the market leader has 90% of the market it's not enough. They have to make a mistake. And I think one thing that Nvidia has and the other two don't that helps them not bugger it up is that they actually know where they're going. CUDA's dominance is the result of a near-20 year campaign to put it where it is. I don't see anything out of AMD that indicates that kind of vision on the GPU front.

    I did mention Intel before to point out that it's not inevitable that Nvidia continues to eat all the pies. Hubris is a thing in IT, and Intel was certainly guilty of it. They still dominate the market of course, but they've shown their weak side and AMD continues to nibble away at their marketshare. And this is exactly the sort of boost that the ARM-based companies need to get their foot into the lucrative consumer desktop/laptop world. Maybe finally become a general purpose option in the server room, too. One wrong step out of Nvidia when AMD or Intel are having a good year could cause a seismic shift in the GPU market. Let's see what 2025 brings us.

  14. Anonymous Coward
    Anonymous Coward

    chipStar helps with the source-level HIP/CUDA compatibility problem

    Good article, but it misses chipStar (https://github.com/CHIP-SPV/chipStar), a compilation flow/runtime that can compile HIP and CUDA sources to an open standard API-based heterogeneous software stack (OpenCL and SPIR-V). It doesn't support IL/binaries though, but it also means no reverse engineering is required for the compatibility layer. Its usage has increased lately for making various legacy HIP code bases open standards compatible. The development is currently led by Argonne National Laboratory: https://www.alcf.anl.gov/events/chipstar-hip-implementation-aurora.

  15. CowHorseFrog Silver badge

    I wonder how many bonuses were given to the CUDA design and implementation team ?

  16. John Smith 19 Gold badge
    Coat

    If OpenCL is *the* cross platform standard why is it not used?

    I'm guessing because it was started by developers and all the mfg preferred that they adopt their (proprietary) choices.

    Congratulations HW mfgs. You got your wish.

    Except for most of you it wasn't your proprietary standard that the market chose, was it?

    I'd like to see them attack Nvidia's market dominance through OpenCL but that depends on how big a job it would be to shift the existing code bases to it, and of course how much of that job can be reliably* automated.

    The other direction is to deliver 100% compatibility at the lowest level, so they can inherit the toolchain above that level, and of course maintain that compatibility.

    2025 could be quite interesting.

    *Unreliable automation is a complete f**king waste of time. You're not just re-writing your stuff, you're checking the automations attempt to rewrite your stuff. This does not save time, unless you cross y our fingers, hope your test suite will find all the errors and give you enough info to drill down to them and fix them. If anyone does this (and it works) please let me know. I've never seen it made to work properly, but there's always a first time.

    1. doublelayer Silver badge

      Re: If OpenCL is *the* cross platform standard why is it not used?

      OpenCL had various restrictions, with each of them having some downward effect on its usage. However, it didn't end up becoming the most popular neither because someone deliberately killed it nor because it had gaping technical flaws. We can debate how large each of the effects was, but I think we also need to question why we assume it was likely to be the victor.

      By being a standard, OpenCL tended to run on more things, but not as efficiently on each one. Optimizing for a specific part meant you could run faster, and for something as compute-intensive as training ML models, that was a key benefit. Any other standard is likely to have the same downside. We can try to make a standard that's close to the hardware which would reduce that difference, but it's always likely to exist. That's not necessarily a problem; although compiled languages tended to run slower than hand-coded assembly*, people still chose them for the portability or writeability advantages.

      Your alternative suggestion, a standard at the lowest level, is going to have other restrictions. At that point, you're no longer going to prefer one manufacturer over another, but you also entirely eliminate any ability for a manufacturer to improve anything. If they can't change the instructions their chips understand, they can't do something as simple as allowing you to work on a larger piece of data, thus accomplishing in one instruction what would have taken multiple. Other optimizations would be prohibited. Nvidia and AMD have spent lots of money building those improvements, testing them, making hardware that can do them, and building them into compilers. By requiring a standard like the one you've described, you're effectively telling every user that they will no longer get any of that work in exchange for being able to buy AMD parts. It would be similar to mandating that every CPU manufactured must be an AMD64 one, with alternative approaches like ARM and RISC-V disallowed, but anyone who wanted could try to build an AMD64 one. Do you think users are going to be pleased with that?

      * Nowadays, compilers are so good at optimizing that they often produce better code than someone working in assembly. This doesn't change the point. People can still improve on the compiler's output, and in some cases they can do it quite easily because they understand the semantics of their program's contents whereas the compiler's optimizations are general ones. It is still possible to exceed a compiler's efficiency. You just have to really want to do so and be willing to put in the substantial effort.

      1. John Smith 19 Gold badge
        Coat

        "but you also entirely eliminate any ability for a manufacturer to improve anything."

        I think you're mistaken.

        It would be the other mfg that would be doing this work. If they added a feature then the patch would recognise it. It potentially leads to a "Virtuous circle" of mfg leap-frogging each other with tweaks to their instruction set that would be recognized as making apps on their hardware run faster.

        While this would mean developers needing to update their tools more frequently it could lead to a more level playing field. Mfgs competing on performance, which I think is what we all want.

        1. doublelayer Silver badge

          Re: "but you also entirely eliminate any ability for a manufacturer to improve anything."

          I don't understand how that's supposed to happen. When you say "If they added a feature then the patch would recognise it", what is "the patch"? They can change their implementation in microcode or something like it, thereby improving their implementation of the allowed instructions. There is only so much you can do to improve that. They wouldn't be allowed to have new instructions because those new instructions aren't part of the standard, so if someone compiled for those new instructions, it would only work on that manufacturer's parts. What improvements, other than microcode, do you think they'd be allowed to do with a mandated standard? While we'd see some improvements with microcode, ruthless focus on microcode alone is how we got several serious CPU security bugs, because Intel and AMD wanted more performance out of the same ISA every year and kept looking for more and more hacks to get it until one of those hacks had some nasty side-effects. Even then, they were still adding some things to the ISA which sped up some classes of program, which wouldn't be allowed if they have to stick to a standard.

          Nvidia's moat is not a good thing, but attempts to eliminate it by requiring Nvidia and AMD to build the same chip isn't helping. There are a few negative effects. It seems we're disagreeing about the ability to improve on a mandatory standard, which I assume is due to a miscommunication somewhere. We also have the problem that, if any improvement one makes is immediately available to its competitor, there is less benefit to the manufacturer for improving, so why bother to do so? I think the best way to fight against Nvidia's moat is what AMD is currently doing: making it easier to port things from CUDA to run on theirs as well. They're lagging, not because Nvidia did anything wrong, but because Nvidia built something that AMD didn't bother with and people wanted that thing. AMD can catch up but they have to go to some effort to do it. I don't think we are helping the users by trying to fill in that moat for AMD. There are some times where that kind of regulation is justified, but I don't think we're there and I think attempting it nonetheless will be unwelcome to users and less effective than you expect.

  17. Azuric

    Semianalysis would highly disagree and in its in depth analysis of AMD...

  18. Rfinner

    Is CUDA still the real Nvidia moat?

    I agree that CUDA was the moat back in the days we called these things GPUs. In those days we felt processing graphics and videos and smaller scale forms of AI was a lot of data and processing. Now with LLMs, the scale of processing is a new ball game. Nvidia's new moat has everything to do with scale that often saturates Ethernet and melts components. New highspeed memory chips and joining multiple processors on the same board or even the same silicon. Liquid cooling, DPUs, infiniban and who knows how many trade secrets, patents, strategic partnerships are needed to get the job done right. Imagine the patents owned by the creators of the robots and automated precision manufacturing required to assemble these things and sourcing reliable suppliers. Don't forget the engineers who actually have the skills and experience at building something that resembles a super computer and other aspects of HPC. As an example, many of us are programmers here but few of us are kernel programmers. I think CUDA is still part of the moat but keep in mind they are building a high performance fighter plane and not just an engine.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like