back to article PyTorch dependency poisoned with malicious code

An unknown attacker used the PyPI code repository to get developers to download a compromised PyTorch dependency that included malicious code designed to steal system data. Developers who last week downloaded the nightly builds of the open source PyTorch framework also unknowingly installed a malicious version of the …

  1. bvae8osfhk

    "John Bambenek, principal threat hunter at Netenrich, told The Register that while there are benefits to open source software, there is little institutional protection beyond an almost entirely voluntary effort to address inherent supply chain risks. Until more money is directed to the issue, the problems will continue, Bambenek said."

    The problem with PyTorch is... that there isn't enough money behind it? Bold claim.

    1. Lil Endian

      Bambenek is referring to the supply chain (eg. PyPI), not an individual package (eg. PyTorch).

      Regardless, his statement says nothing. Risks apply to closed source supply chains as to OSS. However, exposure/visibility are not the same, nor is funding - and the two are tied. Institutionally, a proprietary for-profit dev house isn't going to advertise its fuck ups if it can help it, because: ow! profit!

      1. Yet Another Anonymous coward Silver badge

        The difference is pulling in other dependencies.

        Closed source apps - I need to jump through a whole bunch of internal processes to use an external library, if it's also closed source product it will take a year for legal/purchasing to sort it out.

        Opensource I'm going to pull in every existing command line/logging/networking/etc lib I can and each of these are going to pull in others etc etc.

        Doing the product I need to have everything provably built from source (I even need a Fortran compiler for a Blas based library) but for R&D I can easily see myself leaking .ssh keys cos Spyder included some package which included some other package which was compromised.

        1. An_Old_Dog Silver badge

          Security Model and Automatically Pulling-In Dependencies

          Zeroth, programmers have to learn and accept that in the current threat environment, we can no longer blindly pull in code and use it without understanding it.

          First, managers have to learn and accept this -- and its corollary: coding will take longer than it currently does.

          Second, we've got to replace the blind auto-pull mechanisms currently in use and replace them with new mechanisms (programs) which help us deal with review of chained external dependencies in an organized and logical way.

          Third, dependency pulls have to be locked to a specific, cryptographically-signed version. In other words, your code might ask for the latest version of YLib, and get joe.smith.foomasters.org/YLib-1.2-2. You'll check that code, and if you accept it, further builds of your product will automatically request joe.smith.foomasters.org/YLib-1.2.2, even if more-recent versions of YLib are in the repo. If you want to use a newer version of YLib, you'll have to "unlock" the dependency-loading mechanism, which will download, say, joe.smith.foomasters.org/YLib-1.3.5. Once you review and approve it, further builds of your product will automatically download that specific version.

          Fourth, an automatically-produced Bill of Materials should be automatically produced as a text file, so a programmer who reads "joe.smith.foomasters.org/YLib-1.2.5 is compromised!" on SlashDot can run a script which greps the BoMs, to check which, if any of the programs s/he's worked on contain the compromised code.

  2. Elongated Muskrat Silver badge
    FAIL

    Logical extension of a ML package

    As far as I can tell, ML (such as those "AI" art algorithms) is largely based on slurping up as much "training data" as possible and then denying all knowledge when it produces things that are suspiciously similar to copyrighted works. This is just the logical next step - cut out the middle man and just move directly to stealing valuable data.

    In all seriousness, though - why is this trusting a third-party package repository over an official one, and if they want any credibility at all, why does that third-party repository just allowing people to upload anything they like, including packages with identical names to ones in the official repository?

    I expect to be reading more about how their security model is woefully broken in Bruce Schneier's next blog post...

    1. Crypto Monad Silver badge

      Re: Logical extension of a ML package

      For comparison: in Go, module dependencies point to exact versions of external code, with SHA256 checksum. And each module is named using a domain name and path whose ownership is clear: commonly "github.com/<author>/<repo>", although it doesn't have to be.

      It works well.

      1. Elongated Muskrat Silver badge

        Re: Logical extension of a ML package

        Mentally, I'm comparing this to nuget repositories; there are official repos and there are unofficial ones. We wouldn't dream of letting some random submit a package to one of our internal nuget repos, and although I've not tried it, I'm willing to bet that the process of getting something into an official repo at least requires authentication and certification of the package. I also suspect these are curated by Microsoft, and anything dodgy would (hopefully) get removed with short shrift.

        The process is probably pretty analogous to Go repos (I don't really know enough, technically, about either), and whilst I'm sure a third party org could quite easily make a nuget repo publicly available, the question of trust should be foremost in the mind of anyone thinking of using it. In this case, the issue of trust should also have been considered. Not only using untrusted third-party packages, but using that package source in preference over the official one. I get that these are "nightly builds" and not release builds, but still, where's the due diligence?

        Moving fast and breaking things indeed...

        1. Yet Another Anonymous coward Silver badge

          Re: Logical extension of a ML package

          >but still, where's the due diligence?

          How much are you paying for it ?

        2. John Brown (no body) Silver badge

          Re: Logical extension of a ML package

          "I also suspect these are curated by Microsoft, and anything dodgy would (hopefully) get removed with short shrift."

          MS don't even QA their own patches these days, so what makes you think they bother to QA other peoples patches, especially for OSS projects?

        3. katrinab Silver badge
          Alert

          Re: Logical extension of a ML package

          Aren’t Microsoft responsible for curating npm there days?

          If you want a secure, trustworthy package management system, look at what npm are doing, and do the opposite.

  3. Lil Endian

    Epeen > Research

    @Anonymous Security Researcher: giving you the benefit of the doubt, and believing your claims of research, then clearly your epeen is more important to you than following genuine research principles.

    This is how a significant number viruses came into being during the 80s & 90s: a disenfranchised hacker all alone in their bedroom, wanting to prove their mettle.

    1. Anonymous Coward
      Anonymous Coward

      Re: Epeen > Research

      She only gets the benefit of the doubt if he publishes her approved thesis documents so we know who to blame for approving them in the first place, with no notification of the package maintainers. Seems some other university did that within the last 2 years and were banned from kernel changes.

    2. heyrick Silver badge

      Re: Epeen > Research

      While there are obvious problems in packaging and how dependencies are pulled in, one should not be able to use "research project" as an excuse for screwing with a live system in use by people. This should be treated as a deliberate malware attack, and this researcher and whatever establishment they're associated with blacklisted.

      1. Elongated Muskrat Silver badge

        Re: Epeen > Research

        Equally, a live system in use by actual people shouldn't be pulling packages from an untrusted, uncurated repository, especially not in preference to trusted packages from an official repository.

  4. Claptrap314 Silver badge

    Plenty of blame to go around

    First, for something like PyPl, it DOES make sense to have a "preferred" repository override stable--that's what nightly builds are for, after all. What does NOT make sense is making said repository open. "Dependency confusion" is NOT something new as of 2021. It's been around for decades. Those claiming otherwise need to buy a badge that says "security" on it.

    And yeah, the dev. Utterly irresponsible behavior. If you're going to write POC exploit code, you either put in guards to make **** sure that you are only grabbing data that you in fact have legitimate access to, or expect a knock on the door with an offer you cannot refuse to have your room and board provided by the taxpayer for an extended stay. Do not pass Go. Do not collect $200.

  5. drankinatty

    Accessed $HOME/.ssh/ -- there go your GPG private keys...

    This "Research Project" excuse holds little water if, in fact, it targeted $HOME/.ssh/. That's where your GPG keys live by default (both the PRIVATE key(s) and your public key(s)). Since github, and virtually all ssh accessible hosts allow public-key/private-key authentication, with both your keys, in many instances, the "researcher" can simply add your private key to his $HOME/.ssh/ directory and turn around and ssh into your box or any system you have used public/private key authentication to access. (and just why were the .git config files targeted -- oops, yes, to identify the repositories you have access to)

    Yikes! And with most of the GPG keyservers down since the 2018 debacle, retracting a key is a thing of the past. Better ssh-keygen again and then withdraw your old public key hash from all authorized_keys files on each server you access. (easier said than done since it is something akin to removing your credit-card number from each site you have purchased from). Unfortunately -- you are the card company regarding your GPG keys.

    This is the "nightmare scenario" where the "researcher" basically gets all the keys to the kingdom and your kingdom's address from your .git config files. Let's hope there were no admins using the compromised python package -- or all remote customer sites are now likely wide-open to this so called "researcher".

    And just what were the first 1000 files from $HOME for??? Bad juju all the way around...

    By all means, we trust you when you say you have now deleted all ill gotten data. The real question is what did the "researcher" do with it before he got caught? Times like this make me really glad I hate python so much.

    1. Crypto Monad Silver badge

      Re: Accessed $HOME/.ssh/ -- there go your GPG private keys...

      Confused between SSH keys and GPG keys? GPG keyservers have nothing to do with SSH keys. And you can't "retract" an SSH key.

      But yeah, SSH private keys are valuable for exactly the reasons you describe. The user *should* have protected them with a passphrase strong enough to make brute-force decryption infeasible - but not everyone does.

      I guess the moral is: do all your development inside a sandbox of some sort (e.g. VM, docker container, lxd container, whatever).

  6. Jay 2

    Unconvinced

    So you're a "security researcher" and you think it's a good idea, even in a PoC, to grab the contents of .ssh amongst other things? That doesn't sound right. And if they did snaffle some data, can we really believe they actually deleted it? Yeah I may be paranoid, but there's just too much data leakage/theft about as it is.

    Though if nothing else it does shine a light on such repo dependancies and how such things can be subverted. I'll have a bear that in mind a bit more. It's fairly obvious when you think about it, but sometimes you may not be paying too much attention when an install says I'm going to install X (which you do expect) from Y (which is not where you'd usually expect it to come from).

  7. Anonymous Coward
    Anonymous Coward

    “research”

    Well… I am sure this data was collected in the name of research. Once caught taking it…

    Shouldn’t SELinux be protecting these files though? Why can Python scripts access .ssh?

    1. Jason Bloomberg Silver badge
      FAIL

      Re: “research”

      We appear to have a whole range of issues here; the so-called researcher doing it, PyPi allowing package dependencies to be hijacked, the OS allowing access to the files, workflows which allowed it to be effective.

      What surprises me is how easy it seems to have been to do it.

  8. Version 1.0 Silver badge
    Unhappy

    malicious code is normal ...

    ... it's always an option if you are using software that you didn't write yourself. I just arrived at work and did my normal "start of the day" activity ... deleting all the viruses that have been quarantined overnight on the mail server. Malware isn't a Python issue, it's a universal issue - you can fix it by writing all the code that you are using yourself or with your trusted team.

  9. vekkq

    less dependencies please

    I sure hope this causes devs to cut down on their dependencies.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like