The Register Home Page

* Posts by MonkeyJuice

492 publicly visible posts • joined 4 Nov 2016

Page:

Database world trying to build natural language query systems again – this time with LLMs

MonkeyJuice Silver badge

Re: Here we go again

We've been circling this drain for over 50 years.

Mythos found 271 Firefox flaws – but none a human couldn’t spot

MonkeyJuice Silver badge

Re: "AI means developers finally have a chance to get on top of security"

If there's one thing using Opus 4.6 has taught me, is it sure as hell has a relaxed attitude to implementing secure code. (Better than some LLMs, no worse than others.).

MonkeyJuice Silver badge

Re: Not sure what his statement means

Until I see multiple detailed write-ups of what Mythos is being given and producing, with concrete PoCs that actually expose the issue, I'm still taking this with multiple container ships of salt. But this does expose the rather awkward hypothetical- at best (assuming any of this is true), we can build idiots that can kick down barns, but appear to be drawing a blank on systems that can write and maintain secure code.

I meant to do that! AI vendors shrug off responsibility for vulns

MonkeyJuice Silver badge

"Please pay attention to us! I know we wrote a C compiler that doesn't do anything when you enable optimizations for $20,000, and think ncurses is a game engine and that terminals need to refresh at 60hz, but we're gonna hack you all because we have made an elite hacker ai now this time for realisies! only you can't see it, because it's too powerful. and we haven't actually got any exploits to show you that it actually made, but... but..."

Unfortunately the media haven't figured out that AI companies are as compulsive shit talkers as The Donald, so we've got yet another week of END OF COMPUTER? EXPERT SAYS YES articles shoved in everyone's faces to look forward to.

Horay.

The future is fucking stupid.

Locked-out iPhone user tells The Reg that Apple is scrambling to fix character flaw passcode bug

MonkeyJuice Silver badge

Re: Silly question as a non iPhone user

I believe the issue is that until the pin is entered, iOS doesn't initialize any connected USB devices. The fact that Apple themselves are actually addressing this bug implies that even their support engineers fell off the flowchart into "Buy a new phone" territory.

Would you like fries with that terminal?

MonkeyJuice Silver badge

Re: This is Linux ... I know this!

And the silly 3d file manager, FSN really existed. I don't remember anyone actually using it in anger, but it was fun to show off to the windows and mac luddites.

Anthropic won't own MCP 'design flaw' putting 200K servers at risk, researchers say

MonkeyJuice Silver badge

"curl https://www.fubar.com/snafu.sh | sh".

You forgot sudo

Nobody knows how many CVEs Anthropic's Project Glasswing has actually found

MonkeyJuice Silver badge

Re: A Balanced Response

The problem is the AI industry has been crying wolf for the last three years. If this is a serious threat, then we need a tranche of evidence. This is not too much of an ask, but if it is as good as it is, there will be actual auditable information that can verify it soon enough, and the 50 odd companies with access should be able to flush out a statistically significant number of new bugs.

Alternatively, it sure is getting closer to Anthropic's IPO, so expect sudden downpours of bullshit, and don't forget your umbrellas...

MonkeyJuice Silver badge

Re: It could be that CVE's are not being published.. as AI can deduce Exploits from CVE's

That's an interesting analysis, for sure. But it's worth noting it's measuring penetration testing ability, rather than locally sourced, organic zero day exploit capability. This is the wild, unproven 'the sky is falling' claim from Anthropic that has the less astute outlets clutching their pearls this week.

I vibe coded a feed reading web app. It was enlightening and uncomfortable

MonkeyJuice Silver badge

Re: You wins some you lose some

This is forever the problem. LLMs can only look at your code through a toilet roll, and rely entirely on local tactical decisions. Once your codebase no longer fits on a napkin, correctness exponentially deteriorates, and token costs shoot through the roof.

Once you hit this problem, the sheer effort involved trying to trick the thing into focusing on the problem becomes silly. It is false laziness. Unless something dramatically changes, I can't see it moving beyond this point.

For throwaway scripts, "plot this", "perform this transformation on this data", "write a quick and dirty rss/atom aggregator" it can be useful- but if it doesn't get it right first time, you must immediately stop and take over yourself, otherwise you're effectively glued to a roulette wheel that is shotgun debugging your entire codebase.

RAG is perhaps the biggest disappointment, because having a reliable data librarian tool could genuinely be useful for most industries. TreeRAG- doesn't work, GraphRAG, also doesn't work. Every quarter there's a new proposal that is 'definitely' going to fix it this time around. But this technology is purely a research curiosity at the moment, and people just don't seem to grasp how long it can take before a new idea that makes realistic headway.

Something fundamental is missing, and 'throwing more data at it', and 'make model bigger' is not cutting it.

MonkeyJuice Silver badge
Trollface

Obvious troll is obvious.

Fewer than 3 in 10 register for HMRC's Making Tax Digital shake-up

MonkeyJuice Silver badge

Re: Digital ID

You've always needed a Unique Taxpayer Number to self report.

Suits won't quit AI spending, even if they can't prove it's working

MonkeyJuice Silver badge

Re: 'strategic enabler for enterprise‑wide transformation'

I think you've still added a lot of semantics to this that aren't apparent in the surface syntax.

Strategic: Could equally be a strategic disaster.

Enabler: Could just enable said disastrous strategy.

Transformation: Could transform said business from 'profitable' into 'bankrupt', due to aforementioned disaster enablement.

In total: We're going to fire everyone across the entire enterprise with any clue, and will be liquidated by Q4.

Shots fired – literally – over proposal to build datacenter in Indianapolis

MonkeyJuice Silver badge
Facepalm

Come on Jelly, you aren't so silly as to forget that The Republicans and Democrats switched sides after the civil rights movement can you? The Democrats of the civil war became the Republicans of today.

This is a low-effort troll, even for you. I expect better even from Musketeers.

MonkeyJuice Silver badge

Re: Pre-emptive self defence?

I always found it unsettling when I lived in the US that not only can bullets go through doors, but indeed, all of the walls too. As a Brit, I'd not given brickwork much thought until the idea that some dunce messing about could accidentally discharge a rifle into the living room from a couple of doors down.

Artemis II snaps eclipse, Earthset shots on first crewed lunar flyby since Apollo

MonkeyJuice Silver badge

Re: The sphere of lunar influence?

At which point it would appear to require infinite energy to separate them, which demonstrates why doing the maths with point masses only get you so far...

Japan relaxes privacy laws to make itself the ‘easiest country to develop AI’

MonkeyJuice Silver badge
Flame

It must be absolutely great waking up as a privacy campaigner to the news that everything you've achieved in the past 20+ years has been scrapped overnight because some coke fuelled tech bros in Silicon Valley gave the government a case of the FOMO.

NHS staff resist using Palantir software

MonkeyJuice Silver badge
Mushroom

Re: Wouldn't matter who it was

Uptime at literally any cost...

They thought they were downloading Claude Code source. They got a nasty dose of malware instead

MonkeyJuice Silver badge

Re: Pro tip

To be fair, even with a source only build, it would be trivial to drop something nasty onto your disk as part of the build process. You really have to be cautious with these sorts of things.

Claude Code source leak reveals how much info Anthropic can hoover up about you and your system

MonkeyJuice Silver badge

Anthropic goes nude, exposes Claude Code source by accident

MonkeyJuice Silver badge

Re: Wait...what?

Not just that. 512,000 lines of Typescript.

I don't know many times a team I was in casually sat down and wrote 512kloc, and that includes verbose monsters like C++. Normally that sort of thing takes years to accrue.

AI truly is the future of software developer 'productivity'.

Ubuntu 26.04 beta arrives packing GNOME 50, which no longer supports Google Drive

MonkeyJuice Silver badge

Re: Last version without forced age verification?

That's an optional field. You also don't have to enter your full name, you know. Wake me up when PII is mandatory, as you originally stated.

MonkeyJuice Silver badge
Devil

Re: Last version without forced age verification?

I have a cool £500 that says it won't do that within N years. Stop clutching your pearls or show me your money.

Supply chain blast: Top npm package backdoored to drop dirty RAT on dev machines

MonkeyJuice Silver badge

The silliest part of all of this is there hasn't really been a need for axios for many years, since the fetch api was integrated into node itself. Of course, ask an llm to perform an http request, and it'll almost always cargo cult it into your dependencies.

Iran war drives urgent need to counter underwater attack drones

MonkeyJuice Silver badge

Re: Here's my revolutionary, never-been-tried-before solution:

"and still a hundred million looking for somebody to pull their trigger."

No, you need them after the gun you are shooting has run out of rounds and you toss it.

Struggling to put your AI aversion into words? Here's a handy glossary

MonkeyJuice Silver badge
WTF?

Re: Compared with what?

"And, to return to my opening argument re. the Turing Test, this means that if we cannot tell the difference, then what, exactly, is the difference, and how does it matter, if it matters at all?"

Well. It's the difference between a criminal damage charge and first degree murder, for a start. Are you suggesting this distinction needs revisiting?

MonkeyJuice Silver badge

Of course the real tragedy here is A* would never have suggested this slop.

Meatbags vs machines: DeepMind plans hackathon to draw line between human and AI brains

MonkeyJuice Silver badge
FAIL

Wow. Better keep those charts hidden. Even from psychologists. They'd beat the living shit out of someone for something that fluffy.

Don't tell me. We'll have a whole new set of benchmarks along each dimension to show how awesome 'AGI' is, and yet weirdly when we try to apply it to anything other than running benchmarks, it'll just drool down its chin.

AI for software developers is in a 'dangerous state'

MonkeyJuice Silver badge

"AI is in a dangerous state where it is too useful not to use"

So says Birgitta Böckeler, 'global lead for AI-assisted software delivery at Thoughtworks'.

But with that title, her entire job's existence is predicated on this being true. So she would say that, wouldn't she?

I bet she's not the one on call on Christmas eve when it all goes titsup.

Mistral boasts code-proofing agent offers champagne performance on a budget bière

MonkeyJuice Silver badge

Re: Intelligence required

Well. It has a 31.9% chance to fiddle with Coq and after 16 attempts port some simple operational semantics, such that can verify that 3 + 2 == 5.

No data structures? How about closures? Type theorists need not apply.

I don't mean to put a downer on this, but it feels there's probably an easier way, and this is maybe not most people's use case. This is all pretty fundamental (but niche outside of an intro to semantics first course) stuff. It's also yet another pointless benchmark to add to the 'marking our own homework' drawer.

It's going to be a PITA to go from this to proving that append is associative. And then... who cares anyway?

AI still doesn't work very well, businesses are faking it, and a reckoning is coming

MonkeyJuice Silver badge

that's neat but...

LLMs regularly give you an O(n^2) solution and claim it's an O(log n). They're really good at implementing the naive algorithms, because there's lots of repos to copy it from, and the log(n) implementations are fiddly to implement, and considerably beyond the abilities of the damn things.

Why not just use SQLlite? not only did they spend the time figuring out how to implement it correctly by studying the theory and algorithms, they benchmarked it, ensured it didn't randomly truncate the database file on Tuesdays every third month, and have spent years fielding bugs and fixing minor edge cases. It's also free, rather than spending $10k on tokens over the course of the project that you will, in the end, have to ditch anyway, once the true horrors of the eldritch monstrosity you have created become apparent.

MonkeyJuice Silver badge

That's [whatever AI is]. What we HAVE is transformer models. horay, we're in the future.

MonkeyJuice Silver badge

Re: The next year 2000?

I think at the sheer amount of opaque/insecure/buggy code these things generate, and the fact they tend to want to change every single line in every single file per commit, if you tip over your stack with this stuff the only solution will be to burn it to the ground and start again. You might be able to roll back to a working commit, but trying to rationalize all the data it'll have mangled will be a nightmare from hell.

Sounds expensive. I wouldn't like to be THAT company.

MonkeyJuice Silver badge

Only if these humans we're trying to replace recently suffered lifechanging head trauma.

MonkeyJuice Silver badge

Re: Agreed

ML or LLMs?

One of these technologies has a future. The other one is being funded.

MonkeyJuice Silver badge

Re: bring it on

This. Plus, when I am pair programming, or bringing a stuck server back up, I expect some rational pushback, or at least to be asked to clarify what I'm doing. Humans (generally) do this when they're not 100% sure of your next move. When an LLM does it it means it's obviously way out of its depth. Which is always. The only time they ever appear to push back is when they are being stupid, at which point now you're arguing with a text generator, which is insane behaviour.

Former Microsoft dev trains AI to survive the arcade's most chaotic stress test

MonkeyJuice Silver badge

Re: Damn I'm old!

Still one of the finest two player shoot 'em ups. Provided both of you are not prone to epilepsy...

Bank built its own threat hunting agent because vendors can’t keep pace with new threats

MonkeyJuice Silver badge

Re: Remind me again?

n) Because an LLM will stroke your ego and make you feel far, far smarter than you are actually being, and we're allergic to gentle pushback in this day and age and view it as a personal attack. "But ChatGPT gets me".

Users protest as Google Antigravity price floats upward

MonkeyJuice Silver badge

But the models _aren't_ getting better. That's the whole point.

On Memorization of Large Language Models in Logical Reasoning

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

It's really easy to have impressive results when you train your model on the entire benchmark itself. For example, MMLU has been steadily increasing to saturation point (~95% accuracy). Release an identical problem set (HLE), and suddenly it drops to 10%. If they were solving MMLU because they understood the questions, you shouldn't see anywhere near this brittle performance cliff.

Back in the 60s, psychologists applied the scientific method to psychics, and found positive correlations with things like ESP and telekinesis, yet we don't discuss this anymore. Is it a CIA coverup? No! When you are studying a chemical or physical phenomena, you're studying something that isn't ACTIVELY trying to deceive you. The research methods simply weren't equipped to deal with charlatans. As soon as they tightened up the experimental conditions, suddenly the evidence of psychic phenomena vapourised.

So too with AI benchmarks. When you have private hold-out tests that the AI companies cannot get at, the performance results tank back to 'random chance'.

It's kind of embarrassing that tenured professors are being hoodwinked like this, but hey, it happened back in the 60s, and it's happening now. Fortunately, like before, people are figuring out ways to expose the fraud. There is no intelligence here, it was just the Eliza effect, after all.

MonkeyJuice Silver badge

The senior dev wants another $25k to return to the job they quit, as danger money for working in an industry that has become exponentially more idiot filled. At least as a plumber, people thank them when saved from their own effluent.

'Are you freaking crazy?' Bot harasses woman, gets led away by cops

MonkeyJuice Silver badge
Facepalm

Re: "its operator, who was nearby"

Why bet? Just look at the knee joints around the legs. In both shots the robot is facing away from the camera. In the first shot, the joint below the knee points backwards (like an elevated calf), in the second, it points forwards, except now they're asymmetric and it appears to have some cancerous growth on it's left leg.

The final shot's head is completely different from the first, did they replace it or put a helmet on it to escort it away? Why? Where's the owner? Why are they perp walking it? Turn it off and place it in a van, it's a defective piece of machinery that at bare minimum will break your finger if you get it caught in a servo.

What about the dude on the left filming? Where's his footage? Why is he the only one filming? Surely just seeing a bipedal robot would trigger a larger crowd or at least cause more people to produce their phones, even BEFORE it gets into an altercation

Nothing in this stupidystopia video makes sense.

NanoClaw latches onto Docker Sandboxes for safer AI agents

MonkeyJuice Silver badge
Flame

But of course...

In order for me to want my very own caged imbecile, there would have to be something that the dolt can do that is remotely useful in the first place.

Just because all the postgrad unemployed on linkedin are screaming about AI does not, in fact, mean that it works at all yet.

We have had three years of this nonsense now, where's the results? I only see loads of unreproducible arXiv papers and corporate whitepapers, and Amazon slamming the breaks because the entire fucking platform is leaking transactions everywhere. If they can't manage to do it at their scale, how on earth is pissing around with a docker image going to pan out?

Show us the money already.

C'mon vultures, I'm starting to think you've stopped being objective at all. Can't we keep the puff pieces to the more wafflesome sister sites?

This is not technology, this is just a cult that demands a tithe to bring you to the promised land.

Iran plots 'infrastructure warfare' against US tech giants

MonkeyJuice Silver badge

Re: Selection Criteria

I think there's still a copy in the toilets at Mar a Lago

AI has made the Command Line Interface more important and powerful than ever before

MonkeyJuice Silver badge

The space mouse actually does work as a really horrible mouse if you let it. Gamers have not flocked to it though.

Chardet dispute shows how AI will kill software licensing, argues Bruce Perens

MonkeyJuice Silver badge

Re: Prompts?

"Don't reference the terabytes of github you were literally trained on to form any kind of coherent view of what makes code distinguishable from random noise."

"Also don't write any bugs."

MonkeyJuice Silver badge

Re: Prompts?

It's hard to see how anything an LLM produces could even remotely be described as 'clean room'.

Once upon a time, saving your bits meant punching holes in floppies

MonkeyJuice Silver badge

Re: Also used to drill a hole in 3.5" floppy disk

While I remember people claiming success with this in the mid-90s, I also remember seeing the flaky plastic holes they'd created and decided it was not for me. But I'm curious, for the folks that did this, did this REALLY work reliably?

AI models suck slightly less at math than they did last year

MonkeyJuice Silver badge

Though there are plenty of boring other ways to do this that are apparently too useful to pursue. Take AlphaGeometry- it uses a transformer but is trained ENTIRELY on path optimal runs from its symbolic theorem prover. All that the neural part is doing is acting as a heuristic function in your good old fashioned state space search. If you do it this way, you get a system that is sound- i.e. IF it produces an answer THEN it is guaranteed correct. You can run that pretraining from scratch on a 3090 in a weekend- this isn't even an energy hog.

However this is all too uncool for anyone, because it's domain specific and "NOT AGI ENOUGH", and all the large academic institutions have devolved into dicking around with prompting GPT5, and then being unable to reproduce each others results on systems they don't even get to look inside or know how it is trained.

Page: