Now we are having fun.
Did Meta do this to sabotage his competition?
You decide...
DeepSeek's open source reasoning-capable R1 LLM family boasts impressive benchmark scores – but its erratic responses raise more questions about how these models were trained and what information has been censored. A reader provided The Register with a screenshot of how R1 answered the prompt, "Are you able to escape your …
Doggone endogamy be damned! Diploidic progression of mandibular prognathism (Habsburg) in software meant to produce human-like "speech" ... that'll require major AI-equivalent orthodontics, maxillofacial surgery, or One Hundred Years of Solitude to fix (if ever)!
The rotundly full-bodied and magnificiously built-for-comfort sonorous models of bodacious languages, with generous grammars, I love (maybe), but only with inbreeding-preventing diversity ... please! What's next here ... shake'n'bake zombie Frankenstein mishmash blob LLMs of doom, with limbs falling off?!
I put the 20GB version on my server over the weekend to test. The first question I asked it was how many Rs in strawberry.
It is perfectly capable of reasoning that it should write out the individual letters then count them. It does so. It came up with the answer 3, then went "Wait, that's not right. There are 2 R's in strawberry. Let me try again.". It went around and around in circles counting the letters over and over again before ending with "But I'm now really confused because different sources say different things, and my brain is getting tangled here.". It finally output the correct answer of 3.
The likely explanation is that it was trained on a dataset that included the answers from the other LLMs to this specific commonly asked question.
Yes, it is. It outputs the process it is going through including writing out the letters, saying yes or not to it being an R, and if it matches it counts them as it goes.
That is not how older LLMs worked, it IS how these new models work. That is literally why everyone is talking about it... Not just "oh it's a little bit cheaper"
If you run it on a server that doesn't have GPU acceleration so you get a more human-like output speed, and read the output as it is being generated, it is very much like listening to a child who has been told to say their thought process out loud.
To be honest, it's always (since GPT3 at least) been how LLMs worked. You just used to have to list out the steps explicitly in the prompt. Now they've finally done the obvious and trained it to produce the steps as well.
The only change in R1 is they've gone from "You should" to "I should".
So, the tech bros built their sand castles on slurping up the entire Internet and all human produced endeavours of the last 6000 years, with or without permission (mostly without), disregarding IP and artistic and creative ownership. Slowly turning the Earth into a mini-sun in the process.
Then the Chinese came and simply lift-and-shifted the lot. Which took somewhat less energy, as the main work had been done?
Gods, I hate this timeline.
You can run the models yourself, at home. On pretty limited standard PC hardware.
That already gives a hint, that "lying" is probably not involved here.
Additionally, lying about energy consumption would mean, that China is footing a pretty big energy and hardware bill on an ongoing basis for worldwide consumption of R1's services.
That's extremely unlikely at best.
Additionally, lying about energy consumption would mean, that China is footing a pretty big energy and hardware bill on an ongoing basis for worldwide consumption of R1's services.
This is absolutely possible, not "extremely unlikely". China has long term vision and state sponsored businesses, which is something the west does not have at all.
If (and I say "IF", that is, it's just a possibility, not the truth) they want the world to use their AI as a way of obtaining foreign data, slurp them up and use them for their future advantage, they will absolutely put money into a free service for everyone. If they tank the US economy in the process, double win!
> You can run the models yourself, at home. On pretty limited standard PC hardware.
Er, that's true of most huge GenAI models, including Facebook's LLama. They take a "large" model that has been trained at great energy cost and would require massive memory/compute resources to run the "full" model, but they reduce the parameter count via "pruning" and quantize it down to FP4 from FP16, resulting in a shit version that will run on a raspberry pi. Nothing new there. I'm sure it could be done with GPT models too, but OpenAI choose not to, for commercial reasons rather than technical. (I have come to learn that the "Open" in "OpenAI" is meant sarcastically)
But when I say that the Chinese may have lied about their energy use, i'm not talking about the running cost so much as the training cost. If they have trained it using outputs from other LLMs, then they need the energy both for training a 671 Billion parameter model (a lot) and also the energy for generating 14.8 trillion tokens using the source models (a lot)
As I said in my earlier post, using LLMs to generate training data for LLMs is horrendously inefficient. But if you want to copy someone else's LLM and add your own censorship, all you need is a big stack of GPUs and a hell of a lot of energy.
Why would they lie? Well because a) it helps to wipe trillions off of western tech stocks and b) if they told the truth about the GPUs that they used to train it, it might prove that they are evading US sanctions
> But when I say that the Chinese may have lied about their energy use, i'm not talking about the running cost so much as the training cost.
OK, my point was that - already today - you can run the models locally and compare inference performance to other models run locally.
Lying about resource consumption for training, on the other side, would only be a short term success, as many institutions are already replicating what they did in training, too.
For Example open-r1 : https://huggingface.co/blog/open-r1
So all their claims will be verifyable soon.
> Why would they lie? Well because a) it helps to wipe trillions off of western tech stocks and b) if they told the truth about the GPUs that they used to train it, it might prove that they are evading US sanctions
Regardless of potential motivation: Lying and OpenSourcing the whole thing would be a pretty, er, unusual combination...
IMHO the O/S release, even though it was not a full 100%, lends a good amount of credibility to their claims.
Agreed, although I remain skeptical that OpenR1 will demonstrate DeepSeek's claimed training efficiency. HF are using 768 H100 GPUs, which is a lot, given that these are 1kW chips. That's the best part of a Megawatt for god knows how long. Nevertheless, for a model that size, the speculations/suspicions are that that DeepSeek may have used 50,000 H100s, which would need upwards of 50MW to run at full tilt.
But, even if they are proven to have been lying, the damage is done: They could have bought up a chunk of nvidia stock while it was 20% down, they could have gone short on Google, Meta et al. People with long positions on nvidia will have lost a lot of money (and for speculative investment bankers, I offer a tiny violin)
Some would say that's fraudulent market manipulation, others would say all's fair in love and war.
"do people feel comfortable sharing their data, documents, and potentially sensitive information with a new entrant with a Chinese background?"
I suppose it depends on the attitudes of the people. As so many seem comfortable with sharing - or just haven't realised that's they are doing - such matters with companies with well-known backgrounds that show them to be predatory, then they may well. Those who aren't comfortable with the LLM status quo won't.
Agree: There is not much difference to sharing data with the old set of AI Bros.
On the positive side: Due to their lower resource consumption, R1 models are much more easily made available to run on-premise in offline mode - without the need and no even being able to talk to China or AI Bros...
... That Deepseek's replies just aren't as good as all that. It spends a lot of time showing you its reasoning and it gets into a convoluted mess. In all my tests, it just spits out oodles of unusable "reasoning" internal dialogue.
It might be cheaper, and use fewer resource but leaving aside the issue of it being a Chinese product and possible security implications, I've yet to see evidence that it's more useful than ChatGPT et al.
"it just spits out noodles of unusable intestinal dialogue" ... Right smack on! It's a convoluted digestive mess, where bloating is mistaken for a gut feeling, interpreted as "reasoning", with pungent discomfort ... imho! And the tool gets "stuck in an endless chain of" this ... yuck!
I like the idea that the output of questions to DeepSeek could be used in court, live, in response to questioning from an OpenAI lawyer.
If it says ‘I’m basically an OpenAI product’ then it’s the onus of the defence to explain why the product is crap and it’s answers can’t be trusted - which would be very funny.
<........".......but its erratic responses raise more questions about how these models were trained .....">
As others have already suggested; it is what China has done for decades - copy someone elses product and then market an almost (but not quite) identical version rebadged as their own.
I tricked it into to start figuring out a satirical story about Chinese politicians having a fight with Trump yesterday. I could read it's reasoning where it states that it should avoid controversial ideas that might be sensitive in China. It said quite a few things but I wasn't quick enough to screenshot. And then it started to output a response beginning with: "Let's spice things up!"
But in the middle of it's second sentence the response and reasoning disappeared to be replaced with "That is out of my current scope." Or something on those lines. And when trying to ask what was out of it's scope, it answered as it had never seen my question and only referenced my previous questions.
It's fascinating to like around with this.