Still Bullshit
More efficient bullshit generation? Excellent news!! </sarcasm>
Is distributed training the future of AI? As the shock of the DeepSeek release fades, its legacy may be an awareness that alternative approaches to model training are worth exploring, and DeepMind researchers say they've come up with a way of making distributed training much more efficient. DeepSeek caused an element of panic …
Cool to see those French kids (eg. Arthur Douillard -- Dipaco, DiLoCo, and Louis Fournier -- Wash: comms-efficient weight shuffling & avg) doing great work on distributed computations (hopefully applicable to actually useful computation some day too, like FP64 HPC for CFD, Maxwell eqs, ...!).
Can't help to notice also that the next item in the TFA-linked "Import AI newsletter" is AI self-replication, whereby an agentic-enhanced LLM, prompted to "replicate yourself", might just do so "with no human interference" until full completion of the RotM.
Wonder if that works with DeepSeek ... (the root incarnate of all this evil?)
told it to write a bash script to email out when resources hit a particular value
it did it fine, but....
looked at the cpu usage history and the felt the extra heat generated in the room
my brain could have done it for a few watts, or found the script somewhere online
never again will i run a local model, this stuff is replicated millions of times right now in countless racks in data centres around the world
its even worse then the shite that is called crypto mining
most of the llm's i've played with stay somewhere in the ballpark when you have a "conversation" or ask a question but my limited experience with deepseek-r1 is that those responses are often not on the same planet. i believe it's the worst model out there. it's responses aren't consistent within a single paragraph of output. llama3.2 does a better job and it's half the size. Really can't understand why nvidia investors were so worried about deepseek.