Assuming the 'AI' bubble doesn't burst these definitely look like a better option than constantly upgrading the CPU to get better performance. I can't see any mention in the article or on their site how it reads the values from the model but it seems likely it acts as a bus master and accesses system memory via DMA. i.e. unlike an 'AI' workload running on a GPU, it will compete with the CPU performing other tasks for access to the RAM.
Hailo's latest AI chip shows up integrated NPUs and sips power like fine wine
Today, users who want to interface with AI usually do so through a cloud-based service like ChatGPT or Microsoft Copilot, rather than locally. Part of the reason for this is that there are just not many great options for running AI and large language models (LLMs) on end-user hardware, although we did break down some ways to …
COMMENTS
-
-
Wednesday 10th April 2024 20:07 GMT Tom Womack
And the reason you run AI workloads on super-expensive GPUs is precisely that the GPUs have large quantities of extremely fast RAM.
If your RAM Is running at 70Gbytes per second, which is a pretty good measured performance from fast DDR5 on current desktop platforms, then even in int4 you're not going to get more than twenty tokens a second out of a 7B model; or two a second out of a 70B model which uses more than 32GB of platform memory.
(I don't have a very good idea why the model sizes are 7B, 13B, and 70B, rather than being just below the memory capacities of common GPUs - I'd have guessed that 7B was so you fit the model and a bit of extra data in a 16GB GPU, but the next bigger GPU is 24GB and the one after that 40GB, so I was expecting 11 and 18)
-
-
-
Wednesday 10th April 2024 15:54 GMT cyberdemon
Meh
Given that this is only 10-20% of what even a low-end laptop GPU can do, and as JessicaRabbit mentions, uses the system RAM via DMA and so chokes out any other memory-intensive tasks, so it's not terribly impressive.
I'm sure it will be used by the likes of Dell and Lenovo to flog "AI PCs" though.
On the upside, at least system RAM is usually upgradeable to well beyond the 16GB provided by high-end laptop and midrange desktop GPUs though, so it may cope with larger models than could run on your GPU, even if it is slower
-
Thursday 11th April 2024 08:25 GMT Wu Ming
A
> there aren't many CPUs with integrated neural processing units (NPUs), unless you're looking at the latest laptop CPUs from Intel, AMD, and Qualcomm, or for desktop, the Ryzen 8000 series.
Are you not forgetting here the elephant in the room for integrated ML capabilities? Delivering hundreds of millions of integrated NEs. In 2017. In a smartphone. And never stopped since then with more smartphones, laptops, desktops and workstations in well over a billion units. You know, just to not skew readers view.