Another LLaMe advertorial.
Cheat codes for LLM performance: An introduction to speculative decoding
When it comes to AI inferencing, the faster you can generate a response, the better – and over the past few weeks, we've seen a number of announcements from chip upstarts claiming mind-bogglingly high numbers. Most recently, Cerebras claimed it had achieved an inference milestone, generating 969 tokens/sec in Meta's 405 …
COMMENTS
-
-
-
This post has been deleted by its author
-
-
This post has been deleted by its author
-
-
-
-
-
-
-
This post has been deleted by its author
-
-
-
Monday 16th December 2024 15:10 GMT nobody who matters
Re: > readership doesn't give a shit
,......"It would be nice to see less "LLM sucks" drivel in the comments".......>
It would be nice to see some general usage case for LLM where they don't suck tbh.
Maybe one day they will get there and the 'drivel' as you call it will be replaced by posts expressing awe and wonder at the brilliance of it all, but at present the technology has a very long way to go and is certainly a long.long way from being reliable enough for general usage. 'Generative AI' is at present a bit like Teslas' supposed 'Full-Self-Driving' mode (ie. it isn't what it says it is!).
-
This post has been deleted by its author
-
Monday 16th December 2024 10:59 GMT Dan 55
You have to distinguish between:
- ChatGPT/Gemini/Copilot, etc... occupying whole datacentres slurping the entire internet, throwing it at the wall and seeing what sticks, and still being unable to count the Rs in strawberry
- self-hosted LLMs where if you feed them the right selected training materials is probably the future for this technology
So, I find these series of articles on self-hosting LLMs interesting reads.
-
-
-
Monday 16th December 2024 11:25 GMT m4r35n357
I find your first sentence puzzling. Surely this is as it _should_ be, and translating docs to code should be a deterministic process. We all know that this technology is going to be most "useful" to those idiot company bosses who see documentation as an avoidable cost (in other words, pretty much all of them!).
-
Monday 16th December 2024 11:54 GMT Dan 55
I wasn't thinking about translating documentation to code or vice-versa which would probably not end at all well, rather allowing employees to ask questions about the products and services that a business sells, so having correct up-to-date documentation is valuable in itself. Imagine asking a question about a UI feature and be told what frontend and backend code is used, data formats and stored procedures, how data flows from one system to another, and being told what the chapter and verse is in the original documentation so it can be checked, and so on.
-
Monday 16th December 2024 12:35 GMT m4r35n357
Sounds like a "nice to have", but that is exactly the sort of procedure that is likely to go wrong IMO, or more likely just not be used. How much "training" do you think would be necessary to get that process anywhere near right? What if code or products (or anything else) change - yes, retrain again. then patch up mistakes, etc. etc. ad infinitum.
I really do think the "translation" route is the right one, the docs and code are right there and can be directly compared! Why add layers of expensive an error-prone fuzzing?
-
-
-
-
-
Monday 16th December 2024 18:38 GMT HuBo
Thesis -- m4r35n357: "why not code up a proper expert system"
Antithesis -- Dan 55: "have an LLM which gives reasonable answers"
Synthesis -- I think you're both right, together, that ES and LLMs have to be combined for this (as they were in IBM's Watson Jeopardy champ). LLMs are great at NLP but crap at logic. ES are great at logic but crap at NLP. Put them together to make systems that work well in both fields.
Now, yes, Meta's CoConuT attempts to introduce backtracking (central to Prolog) into its LLaMas, but I doubt it'll be as effective as going full hog-wild predicate calculus first-order logic through an ES (or Prolog) for the parts of the reasoning process that require logic (i.e. the actual reasoning part) -- imho (but it'll be great for the language bit).
-
-
-
-
-
-
-
-
-
-
-
-
-
Monday 16th December 2024 00:02 GMT Gene Cash
"AI inferencing"
It's not inferencing, it's simulating a string of words that looks close to what an answer to your query might look like, based on the training set.
There is no thought process or reasoning behind it, and no inferring of anything.
As humans, we can't help but be impressed by what appear to be right answers, and we assume (consciously or unconsciously) that these answers were arrived at by similar processes to the ones that we use in our own brains. But this is nowhere near the case, and it makes us all easy to fool.
Stop peddling this snake oil.
You're better than this, El Reg.
-
This post has been deleted by its author
-
Tuesday 17th December 2024 01:18 GMT O'Reg Inalsin
Re: "AI inferencing"
I definitely agree that LLM operation is not on the "same scale" as the human mind. Snake oil salesmen are using "learn", "think", and "infer" to shake down and rob the rest of society. Social media and other aspects of online-life left society bleeding and SmartyPants-AIs are the screw worms burrowing into those wounds for the final kill. For that reason, I had to give you a thumbs up.
However, Armageddon aside, the meaning "inference" as used in "statistic inference" is not a new concept:
"From the point of view of statistical inference, the most crucial problems are those which arise in arguing from samples and their statistics back to populations and their parameters. The problems which arise here are no longer wholly deductive; conclusions cannot, in general, be drawn with certainty. Statements can be made, however, subject to risks of error of being wrong, where the error is precisely expressed in terms of probability and sampling theory." [Samuel Stanley Wilks, The Theory of Statistical Inference, 1937]
That 87 years old definition of "statistical inference" could be used to describe AI-LLM outputs, although "where the error is precisely expressed ..." should probably be changed to "where the error is approximated in terms of probability and sampling theory." because "error" is itself an amorphous quantity that is decided by humans on an ad hoc basis.
-
Wednesday 18th December 2024 00:39 GMT Justthefacts
Re: "AI inferencing"
What’s this “query” nonsense? If you’re using an LLM as a natural language substitute for websearch…..well, in that very limited sense, it’s probably not a good websearch tool. You’ll be wanting a search engine for that - a tool whose entire job it is to match your query to an existing body of knowledge, and simply direct you to the exact text as it was originally written without making any changes that could insert errors.
In other news, they’ve been advertising this iPad thing, but it’s really not as good at knocking in nails as the hammer I already have.
If however you have a really challenging Maths problem, PhD level plus, out of your field, and don’t have somebody in the top 1% of Maths PhDs to hand……try this:
https://www.youtube.com/live/hkTpMmkVAok
-
Wednesday 18th December 2024 06:57 GMT HuBo
Re: "AI inferencing"
Maybe ... but remember, the William Lowell Putnam Mathematics Competition is a math contest for college students (not PhDs), and both questions and answers were posted online prior to o1 Pro "taking the test". Plus, the questions were well-written (near machine-like), without red herrings, making it easier to pattern-match answer fragments to them, and then glue those together. And, Kyle admits to not understanding a bunch of the math involved in them (iirc), and therefore to be unable to actually grade those responses, except for "final answers" (that is not necessarily logically concluded through rigorous math by the o1).
Testing o1 Pro on math, seriously, is something that more than one group of actual independent scientists might want to do, to check on accuracy and repeatability (with answers not posted online, and not otherwise available to the rotund language model), and with the introduction of mild variations (eg. red herrings) to test for robustness and sensitivity. IMHO.
-
-
-
-
Monday 16th December 2024 19:33 GMT HuBo
Re: but....
Yeah ... the paper linked under "suggests" (id under "discussed") in TFA has all the gory details of how this method, inspired by speculative execution in CPUs, and applicable (at least) to autoregressive models (like Transformers, aka GPTs), works. It's all down to the interplay between speculative sampling and speculative decoding that results in the speedup given by Theorem 3.8:
S = (1 − αγ+1) / (1 − α) (γ*c + 1)
with c hardware-related and close to 0, α some intrinsic prop of the model and task (val from 0 to 1), and γ+1 the number of concurrent small (speculative) models that can be run in parallel with no increase in walltime (γ = 2,3,5,7,10 in Tables 1 & 2). Table 1 shows S = 6.9X with c=0, α=0.9, γ=10, and in experiments they went up to S = 3.4X (Table 2).
-
-
Monday 16th December 2024 10:17 GMT Anonymous Coward
Hallucinations matter
A fly cannot hallucinate at human's level. Hallucinations and dreams are hypotheses. Automatisms (fast decisions) are reliable hallucinations, which could still result in mistakes. Automatisms are necessary to move fast, because detailed analysis is slow and expensive, though possible or necessary. Impulsive purchases are a great example. The more expensive a purchase, the more regrets. Like buying a house before a market crash. Tell me humans are intelligent.
-
Monday 16th December 2024 15:16 GMT Gene Cash
Re: Hallucinations matter
Stop rolling your jargon dice since you don't understand half of what you said.
Hallucinations and dreams are NOT hypotheses. A hypothesis is something you can test: "is water wet?" [sticks hand in] "yes it is" from which you evolve the theory that "water is wet"
And there's no such thing as a "reliable hallucination"
-
Tuesday 17th December 2024 07:12 GMT ChromenulAI
Re: Hallucinations matter
You're thinking about hallucinations was hallucinated under false pretenses that not everyone shares. Feel free to jump up and down screaming until your lungs bleed, because some of us specifically go out of our way to think differently than everyone else.
Spare me your arrogance and serve me your allegiance.
-
-
-
-
Tuesday 17th December 2024 01:59 GMT Dimmer
As a test.
I fed the comments into an AI and here is what it came back with:
Conclusion:
The discussion reflects a diverse readership: deeply technical individuals, skeptics of AI hype, and those seeking balanced, informative content. Users engage in spirited debate about the limitations, usefulness, and societal implications of AI/LLMs, often mixing sarcasm, technical depth, and community banter
So then I asked if it was funny:
Yes, parts of the conversation were funny, particularly if you enjoy dry wit, sarcasm, and technical humor. The humor emerges from a mix of pedantry, playful wordplay, and snark, often in response to others’ overly serious or exaggerated takes.
If a normal person had a choice between drinking a cold beer and reading the comment, which would they?
A normal person would almost certainly choose the cold beer.
The comments, while witty and entertaining for a niche audience (tech enthusiasts, grammar pedants, and AI skeptics), are dense, opinionated, and full of technical jargon. For the average person, reading through a lengthy back-and-forth about “fewer vs. less,” expert systems vs. LLMs, and AI hallucinations would feel like hard work—especially when a refreshing beer is on the table.
The choice is simple:
• Cold beer = immediate enjoyment, easy to process, universally appealing.
• Comments = mental gymnastics, niche humor, and the risk of needing a second beer to recover.
-
-
This post has been deleted by its author
-
-
-
Tuesday 17th December 2024 15:30 GMT metooaswell
These systems are useful
These LLM systems are indeed very useful, accurate and practical for a wider variety of problems.
One example I implemented last week at work, was a system for helping to write Cypress tests for journeys through an insurance company website. Writing a system to fireup a browser and navigate to the site, and then giving the LLM the DOM of the loaded page a human can write something like 'accept cookies and then select that you want to purchase Dog Insurance'. The LLM can then take that (along with the serialized DOM) and return Cypress code to perform the action on the site, which we can then concatenante to our test code file and also execute to change the state of the website. We can then pass the new DOM and a new instruction 'fill out the form with my dog 'Geoff's' details. He is 10 years old and a border collie.
Using an LLM in this way generates accurate results and usable test files. The advantage over writing the tests directly oneself are that if the structure of the website changes you can just regenerate the test file at the click of a button. A human would have to go in and start looking at the DOM all over again to rewrite the tests to ensure that any DOM changes haven't affected his test commands.
This is just one practical use of LLMs doing something that we needed humans to do for us previously. To say they have no utility, or a trivial is a very misguided way to think about this new technology.
-
This post has been deleted by its author
-
-
This post has been deleted by its author