back to article How to weaponize LLMs to auto-hijack websites

AI models, the subject of ongoing safety concerns about harmful and biased output, pose a risk beyond content emission. When wedded with tools that enable automated interaction with other systems, they can act on their own as malicious agents. Computer scientists affiliated with the University of Illinois Urbana-Champaign ( …

  1. Doctor Syntax Silver badge

    Maybe it tells us something about how the LLMs get their training material.

    1. Anonymous Coward
      Anonymous Coward

      Superb point! By being trained on internet social media (broad-based crookery), the LLMs become experts at incremental maliciousness (of all kinds) ... THIS (self-replicating autonomous hackorama) has to be the tech's killer-app!

      1. Yet Another Anonymous coward Silver badge

        It's why I stay away from teachers. Spending all that time socialising with children is obviously going to turn them into serial killers

  2. Anonymous Coward
    Anonymous Coward

    Maybe it's useful?

    What I think it tells us is that websites need to take security more seriously. Because any flaws will be exploited very quickly. In a bizarre way it may improve the breed if there are aggressive parasites hanging around biting everything that appears.

    1. HuBo Silver badge
      Childcatcher

      Re: Maybe it's useful?

      Right on! IMHO, straight thinkers have a hard time comprehending the devious mind, which leaves the bulk of us at a disadvantage security-wise (and is why the FBI recommends courses in criminal behavior and psychology of evil to its trainees). Having AI models that can act as digital twins for criminals should be a great help for training activities, and to evaluate the eventual robustness of preventative counter-measures. Hopefully, other models get to compete with GPT-4 in this, in the not too distant future.

      1. Pascal Monett Silver badge

        Interesting point. Maybe it would be possible to harness this tool as an addendum to security checks. You set up your network, make sure you've done your best, the let loose the AI to find out what you missed and/or what you need to check.

        Of course, this would mean that said AI was not on the Internet, available to all. It would have to be brought on-prem, thus deployed by contract, and only used in-house. All of which will never happen.

        So it's the miscreants that are going to have fun with it.

        And don't come with the "only incremental abilities" argument. It's an LLM, it's supposed to learn, isn't it ?

        Could this be the true beginning of Skynet ?

        1. Anonymous Coward
          Anonymous Coward

          > It's an LLM, it's supposed to learn, isn't it ?

          Not really.

          None of the LLMs mentioned are learning at the point of use - they all have long (and costly) learning phases, building up all the internal weightings, and then a separate deployment phase when those weightings are run against the (user) input texts.

          The releases - i.e. GPT-1 to GPT-2 ... GPT-4 - are *because* the models are not learning during deployment, but instead at most feed back into the ongoing training phase, which then spits out the next incremental/step release (whilst the learning phase chunders on, with various tweaks, to generate the next release after that).

          Of course, as They are watching every time you use their LLM, no doubt those interactions are used (somehow?[1]) to feed back into the training then, yes, the *family* of models may be learning from the experience, but aside from the slow release cadence (which does mean They can say the improvements are "incremental" and not continuous, just 'cos it is so choppy) the feedback specifically related to breaking into websites is diluted by all the other interactions being recorded - all the telephone fraud and homework cheating - so specific improvements in one area are hard to pin down.

          Unless, of course, They are watching out for specific usage, such as cracking websites, and preferentially use those to improve the training of the next release, but They would never do that. Why would they want to?[2]

          [1] that feedback itself is interesting to consider, as to use a session as training input you need to be able to give the session - better, parts of a session - weightings: was this a good or bad session? On what criteria?[2]

          [2] You really want a nice clean success/failure criterion to make best use of a session as training input. For example, "did we manage to break into a website (yes/no)?" Oh dear.

          1. FeepingCreature

            Broadly correct, but slight correction: the primary difference in the GPT generations is the size of the network, not just the generational dataset. As things stand, GPT-2 was 1.5B weights, GPT-3 was 175B weights, and GPT-4 is suspected (leaked) to be 1.8T weights split in 16 units, of which only two (dynamically chosen) are active at once.

    2. Anonymous Coward
      Anonymous Coward

      Re: Maybe it's useful?

      Shame the open source models aren't (currently?) up to the task.

      It would be handy to be able to have an on-hand cracker (on a separated LAN, in a VM) that could be run against one's website test build, for a small website builder. You international conglomerates can afford to hire the pentest team, the rest of us just need something that can run the pentests automagically, even if it takes an overnight run instead of twenty minutes.

      Bonus points if it tells us what went wrong, rather than just crowing how easy that website was to bring down.

      1. steviebuk Silver badge

        Re: Maybe it's useful?

        Can run basic tests using Kali Linux. Plenty of guides on YouTube for this. I did for our company website when the Wordpress got compromised because the on/off dev hadn't bothered to keep the plugins up-to-date.

        1. Michael Wojcik Silver badge

          Re: Maybe it's useful?

          Dunno why this was downvoted; it's certainly true. Kali comes with a lot of free scanning and penetration tools, and learning to use many of them is pretty easy. There are tons of courses available, free and paid.

          That said, if someone's interested in checking their own sites or getting started in website / web-application security analysis or penetration testing, I'd refer them to OWASP in general and this list of DAST tools. A number of them are free.

      2. ThatOne Silver badge
        Unhappy

        Re: Maybe it's useful?

        > It would be handy to be able to have an on-hand cracker (on a separated LAN, in a VM) that could be run against one's website test build

        Initially I thought so too, but then I realized that there still is no way (in time and money) you might afford to prepare against relevant (i.e. recent or somewhat sophisticated) exploits, if only because those exploits haven't been found/made public yet. Nothing changes.

        To put it simply, you'll only be able to check against yesteryear's exploits, and that's about all.

        Last but not least, AFAIK 99% of all breaches are due to people being too lazy to patch known issues. AI won't change that...

      3. Michael Wojcik Silver badge

        Re: Maybe it's useful?

        Shame the open source models aren't (currently?) up to the task.

        I suspect you could achieve similar performance with a dedicated sparse transformer model. Web-technology languages (HTML, Javascript, etc) are all much more regular than natural language, so the parameter count is less important. Put more resources into context-window length and specialized training: train the model with e.g. OWASP resources, particularly on WebGoat/WebWolf transcripts and that sort of thing.

        This research was using already-available models because that was the hypothesis: that at least some generally-available models could do this kind of thing.

        Frankly, you can almost certainly achieve good results without even using a DL stack. Combine a fuzzer with a large HMM, for example, trained and tuned with human-labeled data (and the usual techniques such as backoff), and you'd probably do pretty well at hijacking a lot of sites. The interesting bit here is seeing how far you can get with off-the-shelf tools.

    3. Anonymous Coward
      Anonymous Coward

      Re: Maybe it's useful?

      > aggressive parasites hanging around biting everything that appears.

      Maybe, if any large body (looking at you, governments) actually gave a damn about *really* helping web security, they could permanently run just such a beast and send you a report about how and why it was able to break in, with mitigation suggestions.

      As they would need to make more money out it (instead of just redirecting some of the money set aside for pointless shouting and waving of arms about online security) they could instigate fines, based upon income derived from the website, the sites didn't improve b some deadline.

      1. Michael Wojcik Silver badge

        Re: Maybe it's useful?

        Yes, people love it when the government mounts DoS attacks on their sites.

        Not that this hasn't been suggested before. The thing is, it takes really very little effort, compared to development cost, to download and run, say, Zed Attack Proxy (ZAP) against an internal version of the site. Or even production, for that matter. If people can't be bothered to do that, what makes you think they'd read a report from CISA or whatever?

        And for that matter, there are plenty of bounty-hunting skiddies doing this already, and not a few actual security researchers. Again, they often get ignored.

  3. Eecahmap

    But can it teach matchboxes to play tic-tac-toe?

    1. b0llchit Silver badge

      No, but it can light the matches and burn tic-tac-toe.

  4. John Smith 19 Gold badge
    Unhappy

    So could be used as "cheap" pentest

    But the question is how comprehensive is it?

    Disappointing that with so much information on what makes an insecure website so many are still built that way.

    Why? Because it's still (in 2024) easier to write a poorly secure site than a secure site.

    Not exactly most peoples idea of "progress."

  5. fg_swe Silver badge

    Glas Half Full

    As always in the security field, defenders need to understand offensive tactics.

    So the defenders("white hat hackers") should indeed use these AI tools to try to break into the systems to be protected. Also see "red team".

    Having said that, AI is still "worm intelligence"(based on complexity and my real-world testing results) and the advanced tactics will still be developed by humans.

    AI is essentially a neat form of automation of existing stuff. All the problems such as "hallucination" and "posing as perfect" will apply.

    1. Yet Another Anonymous coward Silver badge

      Re: Glas Half Full

      Just ban black hoodies = no more hackers, problem solved !

      1. Michael Wojcik Silver badge

        Re: Glas Half Full

        "Under new cybersecurity law, mothers will no longer be allowed to have basements."

        1. Mike007 Silver badge
          Joke

          Re: Glas Half Full

          Can always stay at the girlfriend's house...

          1. John Smith 19 Gold badge
            Happy

            Can always stay at the girlfriend's house...

            As if.

    2. Michael Wojcik Silver badge

      Re: Glas Half Full

      For legitimate security researchers, using LLMs to "try to break into the systems" is almost certainly a poor use of resources. We have much, much better vulnerability-scanning and penetration-testing tools.

      The point of this research isn't to show that LLMs are good at finding website vulnerabilities. It's to show they can do it at all, thereby serving as Yet Another tool for the lowest tier of attackers — the script kiddies.

  6. This post has been deleted by its author

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like