back to article Python Package Index had one person on-call to hold back weekend malware rush

The Python Package Index (PyPI), home to more than 455,000 Python code repositories, caged itself to new users and their projects over the weekend because it could not deal with a rush of efforts to create malicious accounts and code libraries. "The volume of malicious users and malicious projects being created on the index in …

  1. VoiceOfTruth

    And I bet

    That some people somewhere complained about PPI. 'Tut tut. If only PPI would get off its arse and fix this.' The same sort of people who complained about Log4J while doing nothing at all ever to help.

    I tip my hat to the dedicated people in the open source world who maintain this stuff. They are often nameless (until the sh** hits the fan).

    1. Korev Silver badge
      Pint

      Re: And I bet

      > I tip my hat to the dedicated people in the open source world who maintain this stuff. They are often nameless (until the sh** hits the fan).

      Exactly!

      See icon for what they deserve

      1. FIA Silver badge

        Re: And I bet

        They deserve money.

        We may be hitting the point where the digital world needs to start to learn to pay for things more.

        If I started a brick and mortar business I would have to rent the property, pay for having it fitted out, pay staff, pay for electricity and other utilities. Then whatever I produced, I'd have to pay suppliers for equipment and raw materials, including maintenance and upkeep costs, etc etc...

        However, as a software developer I can build quite complex projects entirely without cost (other than my time) using the work that others have generously given away for free.

        We need to find a way to bridge this gap as an industry, open source is a real boon and offers many many benefits, the innovation that comes from it shouldn't be underestimated, but just as farmers deserve to be paid for their efforts, so do open source contributors; especially when projects are critical to swathes of the digital infrastructure.

    2. CowHorseFrog Silver badge

      Re: And I bet

      No. the real cause of Log4j is defaults. Someone somewhere enabled a default that was dumb and opened up the exploit. Using defaults is a sign of laziness and poor workmanship. Everything should have s witch which you turn on or off, so you are actually aware of what you are getting.

      1. Aitor 1

        Re: And I bet

        Defaults should be safe. Yesterday I pushed a new default into a system, and it is disabled by default so clueless people don't scree their systems.

  2. Len

    Difference between PyPi and NPM?

    Forgive me because this is not really my field but isn’t it the case that PyPi and NPM are quite different?

    I thought that PyPi hosts packages that are included by developers in their projects but that are ‘compiled’ into bytecode on deployment. Every visitor of the web app would always run that bytecode until it’s updated in a new deployment.

    With NPM the code isn’t downloaded until a visitor of the website starts the app by visiting the site. It’s then executed together with all the other latest packages.

    That would mean that a Python app would be considerably safer if you have a policy of only using packages that are at least a week (or so) old to allow malware to be picked up, whereas with an NPM app an infected package would be used by millions within seconds.

    Or am I mistaken?

    1. thames

      Re: Difference between PyPi and NPM?

      It's apparently a typosquatting attack. The malicious author creates a new package which has a name which is very similar to a commonly used legitimate package.

      When a developer who wants to install a legitimate package misspells the package name he may get the malicious one instead. It then gets installed and used in a project which the developer is working on.

      The target seems to be web developers. When the victim tests his project with his web browser, the malicious package injects some JavaScript which looks for bitcoin addresses in his browser's clipboard. It then changes the address to the malicious author's own so that any bitcoin transactions go to the attacker's own wallet.

      The attackers are using automated scripts to creat hundreds of new package names which are very similar to legitimate ones, and counting on human error to occasionally get picked due to a typo.

      This is a problem inherent to any repository which is not manually curated, regardless of the language.

      The most straightforward solution is to only install packages from your distro's repos instead of directly from PyPI.

      1. ChoHag Silver badge

        Re: Difference between PyPi and NPM?

        > The most straightforward solution is to only install packages from your distro's repos instead of directly from PyPI.

        Maintainer incompetence trumps developer malice:

        https://www.theregister.com/2008/05/13/debian_openssl_bug/

        Boy that was fun. Blind trust will get you nowhere [good].

        1. captain veg Silver badge

          Re: Difference between PyPi and NPM?

          I don't do much in Python, but I always check first whether a package is available in the $distribution repository first, not out of blind trust, but because I don't want a distribution upgrade breaking everything.

          Frankly I don't trust third party libraries at all, and try to avoid them.

          Our PHBs just got excited about Meta's Robyn library. OK, it's partly R, but with major Python dependencies. On investigation it turns out that we already have most of what it does covered by 100% in-house code! None of it is rocket science.

          Do it yourself. Know what you've got.

          -A.

        2. F. Frederick Skitty Silver badge

          Re: Difference between PyPi and NPM?

          To be fair, that was a particularly awful bit of undocumented code doing something unusual in amongst a swamp of other code that was utterly awful. I followed the OpenBSD folks as they cleaned up the sh*tshow that is the OpenSSL library, and quite frankly it's scary how bad it was.

          1. sev.monster Silver badge

            Re: Difference between PyPi and NPM?

            And it got so bad that some maintainers rebelled, the LibreSSL project was started, multiple major projects switched over to it, and eventually those changes made it back to mainline OpenSSL.

            Funny how that whole thing played out.

      2. CowHorseFrog Silver badge

        Re: Difference between PyPi and NPM?

        This is another example of convenience or lets be honest - laziness causes the problem. Less typing means more opportunities for typo squatting, if multiple items are required to identify a package theres less chance for this problem.

    2. claimed Silver badge

      Re: Difference between PyPi and NPM?

      Depends on how you configure your web app. You can deploy a web app that doesn’t do this, and I wasn’t aware you could do this TBH - don’t know why you would want untested code delivered straight to end users! My understanding is npm wouldn’t do this by default as you can tie yourself up in knots with dependency chain hell.

      You can also use the “package.lock” file for npm to specify specific package versions.

      Whenever I’ve used this I’ll bundle the whole thing and deploy a static set of packages assembled at build time

    3. unimaginative
      Devil

      Re: Difference between PyPi and NPM?

      You are mistaken.

      For a start there is now a lot of Node backend JS code, and for some reason it tends to have huge numbers of dependencies.

      Also, with front end code you use node to install code in the backend where the build system would generate the front end code that would actually be served.

      I think the big difference is that Python projects simply have fewer dependencies, and a high proportion are widely and well maintained ones so the indirect dependencies are both fewer and better monitored. The large standard library helps too.

  3. ChoHag Silver badge

    > "... I did a quick check with the rest of the team to make sure they felt like it was okay. And then I pulled that lever so that I wouldn't feel personally responsible.

    It's called the weekend for a reason.

    Let the people who are paying for that 24 hour tech support complain. Give them their money back maybe.

  4. mpi Silver badge

    The problem isn't solveable imho

    As long as you have a central packaging index that's open for everyone to use, everyone will use it, including bad actors.

    So, stop doing that.

    Golang shows how an entire language eosystem can work without relying on central package authorities. (No, goproxys don't count, because they aren't required for the system to function). What's a package? A collection of plaintext files. How do we store such files? In repositories. How do we identify them? By their URL and semantic versioning.

    1. Richard 12 Silver badge

      Re: The problem isn't solveable imho

      That is objectively worse as there are more letters available for typosquatting, and nobody to report malicious packages to.

      The only reason PyPI exists is so that there is a single trusted party with the ability to quickly take down malicious packages.

      As long as there is the Internet, bad actors will use it.

  5. Charlie Clark Silver badge
    Stop

    Registry not repository

    The Python Package Index (PyPI), home to more than 455,000 Python code repositories

    The repositories for the code are almost entirely elsewhere. PyPI holds software releases only.

  6. Claptrap314 Silver badge

    Typosquatting defense

    The general defense against typosquatting would seem to me to be just slightly more sophisticated that requiring a hamming distance of 2 between projects. Therefore, if "projectwonderful" is valid, "pojectwonderful" is NOT. Or "ppojectwonderful". Or "porjectwonderful".

    Certainly, this does not catch everything, but it would go a long ways.

  7. Kevin McMurtrie Silver badge

    Hacker tolerance

    I still don't understand the social and government tolerance for hacking. Somebody can spend all their time crafting attacks and it's everyone else's job to spend all their time on defenses. The government doesn't care. The hosting provider doesn't care. The network peers sell DDoS services so they're definitely not going to interfere.

    I'm all in favor for good security but at today's levels the attacks are a continuous drain on good resources. Any automation to prevent abuse is met with new automation creative abuse.

    1. claimed Silver badge

      Re: Hacker tolerance

      So the government should pass some kind of law against the misuse of a computer? Once we have such an act of parliament (or equivalent in your country), this will solve hacking? Are you sure…

      https://www.legislation.gov.uk/ukpga/1990/18/contents

      Or do you mean the government *should* implement a country wide network filter, inspect all traffic, and shutdown anything that it chooses to label as “hacking”/“bad”….

      I didn’t downvote but I don’t understand what you’re advocating here

      1. Kevin McMurtrie Silver badge

        Re: Hacker tolerance

        There are plenty of laws already around the world. They're just not enforced on any level of operations.

        I've played the whack-a-mole game at home and at work to stop hackers. You block a user ID, you block their activity pattern, you block their new activity pattern and new user ID, you block their network, you block their new activity pattern and new network and new user ID... It never ends. I care little about false positives on network blocks (go find a clean ISP) but blocking activity patterns over and over harms legitimate customers too. It's no wonder that the PyPi admin needed a break.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like