hey clayton u dork....
Dont u know..... lifes a zero sum game, u got kids?
""For stuff that simply doesn't fit on a PC, you run those on GPUs in the cloud, where you have effectively unlimited performance," Clayton added."
Nvidia is the uncontested champion of AI infrastructure — at least in the datacenter. In the emerging field of AI PCs, things aren't so clear cut. In early 2024, it became plain that, for better or worse, the future of Windows would be imbued with AI-augmented features and experiences. Headline features included live captions …
""For stuff that simply doesn't fit on a PC, you run those on GPUs in the cloud, where you have effectively unlimited performance," Clayton added."
Just chuck 200USD a month at OpenAI...
If the spec is INT8, it doesn't matter if it can do more at FP4.
It doesn't matter if I can cut your grass really quickly, if you want the hedge trimming, you won't hire me.
Waving the lawnmower in the air is rather inefficient. And somewhat dangerous.
Although all this AI running around with giant scissors is also somewhat unwise, no matter what it's doing.
Cooling!
It's called dark silicon and required to keep temperatures within manageable limits by leaving cells only partially filled with active transistors or entirely void next to noisy neighbours.
Except they are giving that wasteland a fancy name and sell it extra pricey now.
As to need
My main house computer is a 10 year old fanless Intel i3 running kalliope, a Voice Assistant. This attaches to all my media/email/web as well as a Home Assistant server for all the IOT hardware. All 100% open source.
Both my TTS (text-to-speech) and STT (guess:) are now offering ONNX or tflite enhanced models. The accuracy is double that of the old matching engine. However, it takes 2 seconds for a response vs 0.4 for the old engine. This makes it currently unusable. I am CPU bound. :( Furthermore, I am dependent on one of the big corporates for my speech recognition. There have not really been functional local solutions. The ability to run LLMs and pattern recognition processes locally is vital. They are my only processes dependent on the Cloud. Not only that I currently have a very restricted list of words/phrases for orders. The ability to have an LLM handle verbal input will make pattern matching much easier. Speech output.also improves immensely.
I am looking at renovating one of my towers and have been researching what the motherboard looks like. First iteration looks like a mass produced NPU solution, then when NVidia prices collapse, get a top notch card. I have been following the "Build Your Own AI" series here on The Reg, this article fills in some of the mid-level hardware options
YMMV
AAC
As per microsoft's documentation, DirectML can use any DirectX 12 Video card for AI acceleration. «DirectML is a low-level hardware abstraction layer that enables you to run machine learning workloads on any DirectX 12 compatible GPU.»
https://learn.microsoft.com/en-us/windows/ai/directml/dml
DirectML is the low level API. The ONNX runtime and the WinML API are the high level APIs/Abstractions and can use DirectML as their backend. As a matter of fact, WinML can run on the CPU, GPU or NPU, the develper can specify where, or can let the OS decide...
«What does WinML run on by default?
If you don't specify a device to run on with LearningModelDeviceKind, or if you use LearningModelDeviceKind.Default, the system will decide which device will evaluate the model. This is usually the CPU. To make WinML run on the GPU, specify one of the following values when creating the LearningModelDevice:
LearningModelDeviceKind.DirectX
LearningModelDeviceKind.DirectXHighPerformance
LearningModelDeviceKind.DirectXMinPower
»
https://learn.microsoft.com/en-us/windows/ai/windows-ml/faq#what-does-winml-run-on-by-default-
Running the MLs on an NPU has a couple of drawbacks:
1.) You are thermally/power constrained (i.e. more watts /˚C for the NPU are less watts/˚C for the CPU).
2.) The ML memory trafic will contend with the CPU memory trafic and the iGPU memory traffic, causing Mem BW bottlenecks.
Having said that, there are advantages to running the models on an NPU using the mem controller of the machine:
1.) On most desktops and some Laptops, memory is extensible (DIMM, SO-DIMM andd CAMM2). It's been ~20 years since the last time I saw an GPU with Upgradeable memory. Therefore, if your AI model(s) require more memory down the line, you can expand. The alternative is overspec your GPU at the begining (therefore paying for something you will no use for the first few years) or replacing the (costly) GPU card of your desktop.
2.) The mechanisms to swap from main memory to Block Storage are standarized and well undestood for the CPU, for decades. Meanwhile, on the GPU side, not so much, just to give an example, DirectStorage dates back to 2022. This results in more overhead when swaping the AI model(s) to and from GPU memory.
3.) The extra memory will benefit more activities when the AI models are not in use if the extra memory is in the CPU than if it is in the GPU (unless the machine is just used for GPU intensive activities and almost nothing else).
4.) If your laptop has neither an MXM card, or is not a framework 15" with replaceable GPU ¿What do you do when you need more VRAM for the AI models? Yes, replace the whole laptop.
Hence, Microsoft's insistence on an NPU.
I feel like the Copilot AI PCs will be like how every company was obsession with pushing 3D TV's from early 2010s, people bought one, watched a few 3D movies but then released its just a gimmick and got bored and went back to just using it as a regular TV after that.
So the same will probably happen with these Copilot PCs, people will get a new computer and try out a few of the AI features to generate a picture in MS paint or create a document from a prompt. They then realise that its a bit of a gimmick and go back to using it as a normal computer and the NPU will remain there hardly getting any use day to day.
And eventually manufacturers stop including NPUs on the CPU die to save money and Microsoft quietly retires the Copilot PC branding.
¿Will Microsoft support Co-pilot on Macs M2?
TL;DR: YES
Since all Windows (Including Win11 on ARM) can run virtualized (and this in an official use case), is up to the VMM to present a virtual NPU (just as VMMs present VirtualTPMs nowadays), using whatever resoruces the underlying hardware has.
In the specific case of Windows on Mac M2 (or Mx for that matter), both Apple and microsoft officially stated that the way to run Win11 on ARM macs is via virtualization. And ALL ARM Macs have an NPU which is, for all intents and purposes, an NPU. As long as the virtual NPU is performant enough, and recognized by Win11, there should be no problem, even if, on underpowered machines, the VMM is, behind the sceenres, pooling performance from the CPU, GPU and NPU to achieve said performance...
Apple's NPU, at least what they've shipped in the last couple revs, is 38 TOPS, so it would just miss the cut. I kind of laughed when I saw their arbitrary cutoff when they first announced it and wondered if that was setting up some subtle dig they could make at Macs not being a real "AI PC".
...just don't give a shit about AI other than, "Will I lose my job?"
Other than the meta verse, I've never seen such collective indifference from not just ordinary people, but also the tech community.
The only people that give a shit, are those getting even more obscenely wealthy, the journalists that lap up, without question the bullshit those assholes spew out, and the public that just believe the slop that these spineless journalists pimp out, without question.
And an annoying co-pilot button eating up space on the keyboard. Allegedly this gives me benefits but I fail to see what they are even supposed to be. Even Microsoft's Copilot+ website struggles to justify why anyone should care, citing features like brightening video and audio transcription that a CPU or GPU could do, assuming someone wanted either of those things. So I've hidden / disabled Copilot as much as possible in the OS. It seems like a solution in search of a problem, and probably has all kinds of hideous data scraping going on with it too.
Isn't the answer simple? Loking ahead, an NPU is a relatively cheap piece of tech and relatively power efficient piece of tech built specifically for AI...whereas a GPU is not a purpose built piece of hardware for AI and just bruteforces the task and is therefore not very efficient.
Add-in NPU cards for desktops / laptops are quite expensive right now, but I suspect with scale the prices will come down rapidly and high performance NPU solutions will be more common than high end GPUs...I can't foresee GPU prices coming down any time soon.