back to article Where does Microsoft's NPU obsession leave Nvidia's AI PC ambitions?

Nvidia is the uncontested champion of AI infrastructure — at least in the datacenter. In the emerging field of AI PCs, things aren't so clear cut. In early 2024, it became plain that, for better or worse, the future of Windows would be imbued with AI-augmented features and experiences. Headline features included live captions …

  1. harrys Bronze badge

    hey clayton u dork....

    Dont u know..... lifes a zero sum game, u got kids?

    ""For stuff that simply doesn't fit on a PC, you run those on GPUs in the cloud, where you have effectively unlimited performance," Clayton added."

    1. m4r35n357 Silver badge

      Re: hey clayton u dork....

      Get scalped on the useless hardware, then again in the cloud.

      Get in line, suckers!

    2. Korev Silver badge
      Terminator

      Re: hey clayton u dork....

      ""For stuff that simply doesn't fit on a PC, you run those on GPUs in the cloud, where you have effectively unlimited performance," Clayton added."

      Just chuck 200USD a month at OpenAI...

  2. Richard 12 Silver badge

    So what's it do at INT8?

    If the spec is INT8, it doesn't matter if it can do more at FP4.

    It doesn't matter if I can cut your grass really quickly, if you want the hedge trimming, you won't hire me.

    Waving the lawnmower in the air is rather inefficient. And somewhat dangerous.

    Although all this AI running around with giant scissors is also somewhat unwise, no matter what it's doing.

    1. Anonymous Coward
      Anonymous Coward

      Re: So what's it do at INT8?

      For a 5090 GPU about 3,400 TOPS INT8 according to Tom's hardware - https://www.tomshardware.com/pc-components/gpus/nvidia-announces-rtx-50-series-at-up-to-usd1-999

      So roughly x85 an integrated NPU

      1. Korev Silver badge
        Coat

        Re: So what's it do at INT8?

        > So roughly x85 an integrated NPU

        X86 surely...

  3. klh

    > We don't expect it'll be long before beefier NPUs make their way into desktop silicon.

    Please let the bubble burst before this waste of silicon infects desktop chips. If anything, NPUs should be add-on cards, that's kind of the whole point of a desktop.

    1. Anonymous Coward
      Anonymous Coward

      Didn't you get the memo?

      Nobody is supposed to have upgradable hardware anymore. You're supposed to landfill the whole thing every two years so a billionaire can have bigger numbers than some other billionaire.

  4. Anonymous Coward
    Anonymous Coward

    Isn't it just a driver thhing.

    Can;t they implement the DirectML API in the driver and have the GPU do the NPU stuff, but better.

    1. cyberdemon Silver badge

      Re: Isn't it just a driver thhing.

      Err sure, but why would nvidia bother writing a driver for a competing framework?

      At the moment, serious AI bollocks uses CUDA, and nvidia like it that way

      If DirectML were to become a thing, then Microsoft, Intel and AMD would be happy, but not nvidia

  5. hoola Silver badge

    What is the point?

    I simply struggle to understand what all this hardware is going to do and the value it is going to add.

    The only beneficiaries are the chip, memory, SSD & PC suppliers. It is yet another bundle of built in obsolescence and forced upgrades.

    1. m4r35n357 Silver badge

      Re: What is the point?

      Struggling is a waste of effort.

      Might as well put our feet up and watch an unusually honest demonstration of the _true_ meaning of "supply and demand".

      1. abufrejoval

        Re: What is the point?

        Cooling!

        It's called dark silicon and required to keep temperatures within manageable limits by leaving cells only partially filled with active transistors or entirely void next to noisy neighbours.

        Except they are giving that wasteland a fancy name and sell it extra pricey now.

    2. AnAnonymousCanuck

      Re: What is the point?

      As to need

      My main house computer is a 10 year old fanless Intel i3 running kalliope, a Voice Assistant. This attaches to all my media/email/web as well as a Home Assistant server for all the IOT hardware. All 100% open source.

      Both my TTS (text-to-speech) and STT (guess:) are now offering ONNX or tflite enhanced models. The accuracy is double that of the old matching engine. However, it takes 2 seconds for a response vs 0.4 for the old engine. This makes it currently unusable. I am CPU bound. :( Furthermore, I am dependent on one of the big corporates for my speech recognition. There have not really been functional local solutions. The ability to run LLMs and pattern recognition processes locally is vital. They are my only processes dependent on the Cloud. Not only that I currently have a very restricted list of words/phrases for orders. The ability to have an LLM handle verbal input will make pattern matching much easier. Speech output.also improves immensely.

      I am looking at renovating one of my towers and have been researching what the motherboard looks like. First iteration looks like a mass produced NPU solution, then when NVidia prices collapse, get a top notch card. I have been following the "Build Your Own AI" series here on The Reg, this article fills in some of the mid-level hardware options

      YMMV

      AAC

  6. williamyf Bronze badge

    DirectML uses the GPU already, and WinML can use GPU, CPU or NPU.

    As per microsoft's documentation, DirectML can use any DirectX 12 Video card for AI acceleration. «DirectML is a low-level hardware abstraction layer that enables you to run machine learning workloads on any DirectX 12 compatible GPU.»

    https://learn.microsoft.com/en-us/windows/ai/directml/dml

    DirectML is the low level API. The ONNX runtime and the WinML API are the high level APIs/Abstractions and can use DirectML as their backend. As a matter of fact, WinML can run on the CPU, GPU or NPU, the develper can specify where, or can let the OS decide...

    «What does WinML run on by default?

    If you don't specify a device to run on with LearningModelDeviceKind, or if you use LearningModelDeviceKind.Default, the system will decide which device will evaluate the model. This is usually the CPU. To make WinML run on the GPU, specify one of the following values when creating the LearningModelDevice:

    LearningModelDeviceKind.DirectX

    LearningModelDeviceKind.DirectXHighPerformance

    LearningModelDeviceKind.DirectXMinPower

    »

    https://learn.microsoft.com/en-us/windows/ai/windows-ml/faq#what-does-winml-run-on-by-default-

    Running the MLs on an NPU has a couple of drawbacks:

    1.) You are thermally/power constrained (i.e. more watts /˚C for the NPU are less watts/˚C for the CPU).

    2.) The ML memory trafic will contend with the CPU memory trafic and the iGPU memory traffic, causing Mem BW bottlenecks.

    Having said that, there are advantages to running the models on an NPU using the mem controller of the machine:

    1.) On most desktops and some Laptops, memory is extensible (DIMM, SO-DIMM andd CAMM2). It's been ~20 years since the last time I saw an GPU with Upgradeable memory. Therefore, if your AI model(s) require more memory down the line, you can expand. The alternative is overspec your GPU at the begining (therefore paying for something you will no use for the first few years) or replacing the (costly) GPU card of your desktop.

    2.) The mechanisms to swap from main memory to Block Storage are standarized and well undestood for the CPU, for decades. Meanwhile, on the GPU side, not so much, just to give an example, DirectStorage dates back to 2022. This results in more overhead when swaping the AI model(s) to and from GPU memory.

    3.) The extra memory will benefit more activities when the AI models are not in use if the extra memory is in the CPU than if it is in the GPU (unless the machine is just used for GPU intensive activities and almost nothing else).

    4.) If your laptop has neither an MXM card, or is not a framework 15" with replaceable GPU ¿What do you do when you need more VRAM for the AI models? Yes, replace the whole laptop.

    Hence, Microsoft's insistence on an NPU.

  7. mark l 2 Silver badge

    I feel like the Copilot AI PCs will be like how every company was obsession with pushing 3D TV's from early 2010s, people bought one, watched a few 3D movies but then released its just a gimmick and got bored and went back to just using it as a regular TV after that.

    So the same will probably happen with these Copilot PCs, people will get a new computer and try out a few of the AI features to generate a picture in MS paint or create a document from a prompt. They then realise that its a bit of a gimmick and go back to using it as a normal computer and the NPU will remain there hardly getting any use day to day.

    And eventually manufacturers stop including NPUs on the CPU die to save money and Microsoft quietly retires the Copilot PC branding.

    1. cyberdemon Silver badge
      Big Brother

      Don't worry

      The NPU will be busy building models of your screen activity for Recall

  8. Anonymous Coward
    Anonymous Coward

    The main difference is that customers might want Nvidia's offering, while nobody wants Microsoft's latest pig slop.

  9. DS999 Silver badge

    Microsoft doesn't want to count GPU/CPU cycles

    Because if they did a whole lot of existing PCs would meet the 40 TOPS threshold and wouldn't justify the purchase of a new PC with the "AI PC" label slapped on it.

  10. rwill2

    Will Microsoft support Co-pilot on Macs M2

    That would be funny on top of Windows GPUs Co-pilot support, but yes I think NVidia are right to support developers over AI PC things no one wants (and could be done on server-side)

    1. williamyf Bronze badge

      Re: Will Microsoft support Co-pilot on Macs M2

      ¿Will Microsoft support Co-pilot on Macs M2?

      TL;DR: YES

      Since all Windows (Including Win11 on ARM) can run virtualized (and this in an official use case), is up to the VMM to present a virtual NPU (just as VMMs present VirtualTPMs nowadays), using whatever resoruces the underlying hardware has.

      In the specific case of Windows on Mac M2 (or Mx for that matter), both Apple and microsoft officially stated that the way to run Win11 on ARM macs is via virtualization. And ALL ARM Macs have an NPU which is, for all intents and purposes, an NPU. As long as the virtual NPU is performant enough, and recognized by Win11, there should be no problem, even if, on underpowered machines, the VMM is, behind the sceenres, pooling performance from the CPU, GPU and NPU to achieve said performance...

    2. DS999 Silver badge

      No they won't

      Apple's NPU, at least what they've shipped in the last couple revs, is 38 TOPS, so it would just miss the cut. I kind of laughed when I saw their arbitrary cutoff when they first announced it and wondered if that was setting up some subtle dig they could make at Macs not being a real "AI PC".

  11. IGotOut Silver badge

    I, just like 99.99999999% of people....

    ...just don't give a shit about AI other than, "Will I lose my job?"

    Other than the meta verse, I've never seen such collective indifference from not just ordinary people, but also the tech community.

    The only people that give a shit, are those getting even more obscenely wealthy, the journalists that lap up, without question the bullshit those assholes spew out, and the public that just believe the slop that these spineless journalists pimp out, without question.

  12. Evaluator

    The most important part of building an AI PC with a Microsoft Copilot+ product is to overwrite the flash with Ubuntu.

  13. DrXym

    My laptop has a NPU

    And an annoying co-pilot button eating up space on the keyboard. Allegedly this gives me benefits but I fail to see what they are even supposed to be. Even Microsoft's Copilot+ website struggles to justify why anyone should care, citing features like brightening video and audio transcription that a CPU or GPU could do, assuming someone wanted either of those things. So I've hidden / disabled Copilot as much as possible in the OS. It seems like a solution in search of a problem, and probably has all kinds of hideous data scraping going on with it too.

  14. Anonymous Coward
    Anonymous Coward

    Isn't the answer simple? Loking ahead, an NPU is a relatively cheap piece of tech and relatively power efficient piece of tech built specifically for AI...whereas a GPU is not a purpose built piece of hardware for AI and just bruteforces the task and is therefore not very efficient.

    Add-in NPU cards for desktops / laptops are quite expensive right now, but I suspect with scale the prices will come down rapidly and high performance NPU solutions will be more common than high end GPUs...I can't foresee GPU prices coming down any time soon.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like