AI doesn't exist except as a marketing deception.
Buying a PC for local AI? These are the specs that actually matter
Ready to dive in and play with AI locally on your machine? Whether you want to see what all the fuss is about with all the new open models popping up, or you're interested in exploring how AI can be integrated into your apps or business before making a major commitment, this guide will help you get started. With all the …
COMMENTS
-
Sunday 25th August 2024 16:59 GMT Henry Wertz 1
can't stress vram enough
Can't stress vram enough,. Messing about with this stuff, more ram bandwidth and tops might let it run faster. But not having enough vram keeps models from runnibg at all (without just running on cpu instead.)
I've found my GTX1650 to be rather ineffective since many models need more than 4GB VRAM no matter how you slice it. (you can run highlyt quantisized but running the text ones that qiuantisized. they get stupid and halliucinate.. well more than normal,. Image gen is right out,
'Luckily' since the GTX1650 doesn't have Tensor untils and whatever to make models run extra fast, and my coffee lake cpu is reassonable. it's 'only' about 10x the performancer of just letting the CPU do it. Just playing, i don't care if a text response takes 5 or 10 seconds or an image takjes like 90 seconds instead of about 10 on gpu (with the one set of settings that made that model 'small' enough to run on the GPU at all.)
-
-
Monday 26th August 2024 10:55 GMT Adair
Re: But ...
But... seriously, I suppose it's just like any tool. In the right hands, and used for the purpose for which it is intended, great things (relatively) can be done. For the rest of us, probably not so much.
I can see that 'AI' (whatever that misnomer actually means), properly tuned and applied in very specific and controlled circumstances can be used to great effect, e.g. medical analysis, etc. But, at the same time we're talking about a system that in uncontrolled situations effectively feeds on excrement, and we are surprised when shit is what it generates.
'Garbage in; garbage out' - same as it ever was.
-
Tuesday 27th August 2024 03:41 GMT Michael Wojcik
Re: But ...
In the right hands, and used for the purpose for which it is intended, great things (relatively) can be done.
I'm still waiting for an example. And I follow a fair bit of the research, as well as commentary from people like Zvi Mowshowitz. Many things which are impressive, in the context of machine-learning research, natural language processing, and so on, have indeed been done. Things which are in some sense "great", outside the context of research? I can't think of any.
In fact, I have argued that LLMs and diffusion models are, at least in their common uses, in the long run counterproductive, as competitive cognitive technologies. They encourage shallow thinking and discourage understanding.
There are tools which are not, in fact, particularly useful. People invent all sorts of things. Some of them are misses.
-
-
Sunday 25th August 2024 22:42 GMT Fazal Majid
A M1 or M2 Mac Studio has far more unified RAM available to its GPU/NPU than even the $36,000 nVidia H100 to run large models. Unfortunately the Apple Silicon GPU is nowhere near as fast as nVidia's.
Primate Labs, makers of Geekbench, have a ML/AI benchmark tool. The results are finally available on the genersl Geekbench browse (but still not searchable)r:
https://browser.geekbench.com/ai/v1
-
-
Monday 26th August 2024 01:38 GMT HereIAmJH
Re: New Geekbench AI benchmarks
At a glance, my first thought was why didn't they make it a portable app so installation isn't required. (AppImage, etc) I could see putting it on a thumbdrive to benchmark a variety of machines, but other than hardware upgrades how often would you benchmark the same machine?
An interesting note; with ONNX I can benchmark both of my GPUs. With OpenVINO it only sees the Intel UHD, and not the NVIDIA. i don't know enough about AI yet to know if these number are telling me anything useful. But based on bigger numbers are better, even my old 1660TI beat the crap out of an i7. So repurposing an old server isn't going to make a good AI workload box without investing in a decent GPU. Maybe I'll have to look for a gaming desktop around Black Friday.
-
-
Monday 26th August 2024 05:50 GMT Kevin McMurtrie
If the "70 billion parameter" LLM is llama3.1:70b, it runs fine on 128GB of DDR5 RAM without a GPU. Not fast, but fine. It can reply faster than you can search the Internet.
I managed to get llama3.1:405b running with 128GB plus a sacrificial Gen5 NVMe stick for swap. It takes it two days to complete a response so it's not at all usable. DDR5 motherboards for "desktop" computers require unbuffered memory that currently maxes out at 192 GB. Apple's M2 chips hit the same 192GB limit even if you have the wealth for more. Only a "server" motherboard taking buffered DDR5 can reach the 256GB needed. The two types of DDR5 are not interchangeable, of course.
All of this compute power doesn't even cover training LLMs or pre-loading them with a lot of context. That $$$$$$ is not in my range even if it was a hobby.
-
Monday 26th August 2024 11:59 GMT neurochrome
Hobby-level hardware for ML tyre-kicking
Image gen - maximise amount of vram even if it's slower chippery. Best value/compatibility I could find was Nvidia's 3060 with 12GB. Doesn't need to be the Ti version either. Along with that, system RAM is important - doesn't need to be super fast, but the more the better.
For Stable Diffusion SDXL using Forge webui on Win10 with 64GB and the 3060/12 I found system RAM usage hovering in the low 30s. Flux1.dev also fine, but system RAM usage regularly into the mid 40s. SDXL training of TIs and LoRAs is doable, but model fine-tuning is not. It looks like Flux1.dev LoRA training *might* be possible - I see a lot of effort on Github to fit the training into 12GB gfx cards.
Laptop with Nvidia 2060/6GB + 16GB system RAM: SDXL image gen with Forge is doable (around 1min for a 1MP image). Scaling up to larger images does become a bit drawn out, and I had to add a fan base under the machine. Flux1.dev was painfully slow with plenty of crashes.
For LLMs, the same laptop works OK with Kobold and a variety of 7B models at usable speed (bit less than reading speed), but it does max the laptop out. Interesting to try out, but not terribly practical.
-
Monday 26th August 2024 18:21 GMT Bitsminer
This space is evolving quickly...
The CPU / GPU demands will drop for a while as the software gets faster and models improve in quality while staying at about the same size.
Then it will all change as "40 TOPS" minimum becomes "60" then "90"....
By coincidence I had to buy a new laptop last year. 8core+SMT. 32GB of RAM was only a few hundred more. 1TB was only a hundred more. Faster video was only a hundred more.
Turns out it runs ollama or llamafile quite readily, but only on the CPU. It's a laptop so the thermal regulator cuts in pretty quick and the cores slow down to 1.5GHz or less. The 4-bit models work pretty well in spite of this.
My advice is to buy a very new CPU; the old ones don't have the AVX512 variants and that makes a big difference. If you do try to recycle old hardware you will need to fit in a (supported) 16GB video card.
-
-
Tuesday 27th August 2024 03:53 GMT StargateSg7
Right now, I'm running an old AMD S9170 GPU card with 2.6 TeraFLOPS at 64-bits that when converted to 16-bit Integers gives me 10.5 Tera-Compare Operations Per Second throughput. YES! I mean Tera-Compare operations and NOT convolution or numeric processing operations! I am talking about BITWISE COMPARE operations!
One thing I have done for the current modality of statistics-based and CNN (i.e. typical 3x3 or 5x5 convolution kernels-based operations)-based LLM's and Stable Diffusion models is to convert those models over to bitwise-operations which is much less taxing on a GPU. By using bitwise-compares (i.e. a nested if-then-else) which lets me use only 9 compare operations and two range-clipping/rounding operations and one copy or move operation in order to evaluate a token, it means I only need to do twelve 16-bit operations per token which gives me a performance of about 850 MILLION 16-bit UNICODE character tokens evaluated per second which is GREAT for most natural language-oriented operations. If you convert that to words and sentences, it means I can get an average of 20 Million Words Searched, Indexed, Sorted and Re-Ordered for final output per second which a heck of an improvement on a GPU that normally only does 13 tokens (words) per second when using a full convolution operation or statistical analysis process for all inputs and outputs.
That is an equivalent LLM output of around 60,000 pages of fully-evaluated words that would/could match all the requests in a typical 2000 word input prompt/end-user LLM query.
Convert that same 60,000 pages of LLM output to Stable Diffusion-style image processing, it means I could output 3 frames of 8192 by 4320 pixels of 64-Bits per RGBA pixel video frames every second (i.e. HDR aka High Dynamic Range colour DCI-8K resolution video!) --- AND --- If I go down to a mere DCI-4K resolution at 32-bits per RGBA pixel, i could get 24 frames per second output in real-time which is the typical Hollywood production film frame-rate at 4096 pixels by 2160 pixel video!
Simply by taking advantage of bitwise compare operations within ANY of the major GPU's means you can get ENORMOUS performance increases in LLM and Stable Diffusion output!
V
-
Tuesday 27th August 2024 08:51 GMT harrys
The more "boring" ML stuff happening on non LLM type data is where the real stuff is happening and gonna change peoples everyday lives again
Why again .... ML all started with going from paper ledgers to spreadsheets/databases yonks ago and jeez.... did that change the world or what !
Though i suppose u could argue that the abacus was a ML device too :)
The more things change the more they stay the same, allbeit, speeded up, faster and faster and faster to ad nauseum