back to article Update on PHP source code compromise: User database leak suspected

PHP maintainer Nikita Popov has posted an update concerning how the source code was compromised and malicious code inserted – blaming a user database leak rather than a problem with the server itself. The PHP code repository was compromised late last month with the insertion of code that, if left in place, would have enabled a …

  1. Mike 137 Silver badge

    Surprise v. reality

    'This user database was part of "very old code on a very old operating system/PHP version," said Popov, who added that a vulnerability "would not be terribly surprising."'

    No more surprising maybe than a vulnerability in newer code? If not, why do we have the regular "updates"?

    The persistent idea that 'legacy' code is more buggy than new code is based on the false premise that we're constantly improving, whereas in reality we're merely constantly changing things - and I suspect in some cases actually progressively getting worse as we make things ever more complicated.

    1. Anonymous Coward
      Anonymous Coward

      Re: Surprise v. reality

      After working in the motion capture field for 30 years, I've seen far more issues with modern code than the original FORTRAN code. I always I wonder if that may be simply that we're better at finding problems these days than we used to be ... but when I look at the old legacy code I have my doubts

      I think you're right, I agree with you.

      1. big_D Silver badge
        Mushroom

        Re: Surprise v. reality

        In the old days, the code had to be tight, efficient and it was thoroughly tested, often by hand (dry-runs), because actually running the code on a mini or mainframe cost a lot of money. That, combined with limited resources, meant that the code was, generally, fairly well written and well tested.

        In many areas, there was associated paperwork and you could compare the paper results with the calculated results and problems surfaced more readily.

        Security was also not a "problem" back in the pre-Internet days, as Microsoft, among others, keeps finding out to its chagrin.

        Over time, the processors have become more powerful, memory and storage more abundant and the price per CPU-Second have plummeted to negligible levels, so it is often now cheaper to just run the code and see if it breaks than to properly analyse it - there is too much code and coders and testers now cost more than CPU time. Throw in agile coding practices and you have a recipe for disaster, in terms of security at least. Many start-ups don't worry about security until they get big enough to be a target, for example, because the investors want a return on their investment and "wasting" time, making sure the system is secure isn't finding new users and new sources of revenue... Bling bling is more important than the ring ring of security alarms going off.

        In my opinion, we are looking at two diverse sets of problems, old code that wasn't designed to be on the open Internet being dragged screaming into the Internet age and we have new code, which is highly complex and cannot be as effectively tested as previous generations of code, because the time and money just isn't there, results are expected NOW and if it breaks, well we can just fix it going forward.

        Until businesses are made legally responsible for their mistakes, the money that can be saved on proper testing and outsourced onto the unpaid users will go into bonuses, not making the products actually safer.

        1. juice

          Re: Surprise v. reality

          > In the old days, the code had to be tight, efficient and it was thoroughly tested, often by hand (dry-runs), because actually running the code on a mini or mainframe cost a lot of money. That, combined with limited resources, meant that the code was, generally, fairly well written and well tested.

          Not entirely sure I'd agree with that.

          I mean, I agree to some degree. There's the old article from Joel Spolsky, about how Netscape screwed the pooch by deciding to rewrite their codebase, and failing to realise that much of the "cruft" was actually there for a good reason...

          https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/

          But beyond that...

          In the first instance, the era of "running code on a mini or mainframe" was arguably back in the 70s and 80s. But when we got to the 90s - which is over thirty years ago, don't forget[*] - code was being written on Pentium desktops and being rolled out to Dell, HP, SunOS and Irix servers. There was even this upstart called Linux, though you had to download a lot of floppies to get this running on your home PC...

          And that's the era where this code comes from. Disk space was no longer as much of an issue - at least for source code files - but memory and CPU time was still expensive, and with the original dotCom bubble, there were a lot of people flooding into the industry who were perhaps more focused on the financial rewards rather than achieving coding excellence.

          (I doubt that's the case with this stuff, but there's a reason that the term code-monkeys was invented...)

          Beyond that, old code can be efficient, and could be well tested - after all, it's had years for all the edge cases to be explored and fixed. On the other hand, it's far more likely to look like "checksummed line noise" (to quote an old description of Perl code), far less likely to be commented or have documentation, often makes use of implicit/obtuse features of the language - and is likely to take shortcuts in the name of performance.

          E.g. see the old story about Mel, in which the eponymous programmer hand crafted incredible code which took maximum advantage of the physical hardware, but which was virtually impossible to maintain.

          http://www.catb.org/jargon/html/story-of-mel.html

          Beyond that, we're in a very different world these days, even without considering the many security concerns. Data payloads are bigger (I've recently been having fun with running out of memory while decoding JSON files which are several hundred megabytes in size; sadly, the "streaming" JSON decoders I've found and tested are several orders of magnitude slower than the standard system library), and we now need to deal with two-byte UTF-8 characters and the like rather than assuming that everything is plain and simple ASCII. Etc etc etc.

          [*] As much as I'd like to believe that it's actually only around 2008 or so...

          1. big_D Silver badge

            Re: Surprise v. reality

            Yes, it is scary, my first IT "job", teaching my elderly neighbour to use a ZX81, was 40 years ago!

            Beyond that, old code can be efficient, and could be well tested - after all, it's had years for all the edge cases to be explored and fixed.

            I think that is part of the problem, that is an assumption many people make. They haven't tested it themselves, but they assume, somebody, somewhere must have after all this time, right? The truth is that is has worked until now, so nobody has touched, in case it stops working, or they have assumed somebody else has already tested it for the new edge cases, otherwise, what is it doing in production?

            I agree with you on the rest.

            1. juice

              Re: Surprise v. reality

              > I think that is part of the problem, that is an assumption many people make. They haven't tested it themselves, but they assume, somebody, somewhere must have after all this time, right?

              I probably should have been a bit clearer in my statement :)

              Over time, it's likely that the majority of edge cases for the original system will have been explored. As the old saying goes about Microsoft; everyone may only use 10% of MS Office's features, but the problem is that they each use a different 10%. Overlapping Venn subsets a go-go!

              (It's certainly the case for the system I currently help to develop/maintain: the system is very flexible and is used by dozens of customers, each of which has their own unique configuration. We often have to return to mechanisms which were written 5 or even 10 years ago, thanks to some new combination of A + B + C which has thrown up a new edge case...)

              The big problem comes when things change. Update your compiler, use a new version of the language, upgrade your system libraries, move your system to a different OS, expose an endpoint to the internet, let people use non-ASCII text, etc, etc.

              Because it's at that point where the limitations of the old code start to show. Assuming all text is ASCII and failing to handle multi-byte characters? That's a paddlin'. Accepting tainted input as-is, because it always used to come from a trusted source? That's a paddlin'. Making use of some obscure language feature/quirk which has been deprecated? That's a paddlin'. Processing data-lumps in RAM because it's quicker and the payloads used to be small? That's a paddlin'...

              And so on.

            2. martinusher Silver badge

              Re: Surprise v. reality

              >teaching my elderly neighbour to use a ZX81, was 40 years ago!

              Actually, you may just have identified the problem. The ZX81 wasn't that good example of a computer, it was just popular because it was cheap. There were a lot of much more powerful systems about even then but the budget needed to develop and deploy them was far beyond what most people could afford (and often beyond what companies in the UK were prepared to pay). Code developed for Unix systems, for example, was based on the idea that small, modular, components that were well defined and easy to test could be assembled into quite complex systems. The concept was carried over to Linux (since its was a clone of Unix) and explains why Linux systems to this day tend to be more robust and easier to work with than legacy desktop systems.

              Desktop systems like Windows are deliberately made more complex than they need to be, its part of their fundamental marketing strategy since untangling the internal interactions would also open up opportunities for competing components. (This has certainly been Microsoft's strategy since the earliest days of MS-DOS.) Unfortunately, unbridled complexity just offers boundless opportunities for exploits. Since most programmers cut their teeth on these desktop systems its not surprising that the mindset and methodology have carried into other systems and now dominates software development. The result is outlined in another article on this site about how programmers fix problems by adding complexity rather than removing outdated or redundant components. (This matches industry practice where 'maintenance' is a dirty word, a task often given to the more junior programmers while the experienced ones design that new component, the one that rules everything.....)

    2. Charlie Clark Silver badge

      Re: Surprise v. reality

      I think you're operating from a false premise. The reference here is specific to a codebase where older code has not had the same kind of reviewing and testing that newer code has. Static code analysis should have picked this up but it was obviously never run when the code was taken over by the current group.

      1. big_D Silver badge

        Re: Surprise v. reality

        I agree, although my experience of modern coding practices is that levels of testing are much lower today, because it costs money that could be better spent on marketing...

        1. juice

          Re: Surprise v. reality

          > I agree, although my experience of modern coding practices is that levels of testing are much lower today, because it costs money that could be better spent on marketing...

          Depends. One of the key elements of Devops - and which I've seen implemented with varying levels of success - is the use of automated testing.

          Admittedly, that does then depend on getting said automated testing properly set up, and then maintaining it...

    3. teknopaul

      Re: Surprise v. reality

      sql injection flaws are not acceptable these days. Sony got hammered for that and was not their own code. php.net can hardly claim they don't have php devs about to fix old code.

      Very well written code can have security flaws, security flaws are inevitable in all but the most simple code, but acceptable code cannot have sql injection flaws.

      It's trivial to ensure that it does not.

      1. big_D Silver badge

        Re: Surprise v. reality

        I worked on the eRetail site of a UK retailer at the turn of the Century and I was specifically called in to test for things like SQL-Injection (they had a huge one and wouldn't acknowledge it, until I dropped the whole development database in a demonstration by simply "logging in" with a SQL-Injection attack, forget Johnny Tables, I was Johnny Database on that day), so there should be absolutely no excuse these days.

    4. Anonymous Coward
      Anonymous Coward

      Re: Surprise v. reality

      “...and I suspect in some cases actually progressively getting worse as we make things ever more complicated.l

      Agree 100%. For examples, see every piece of software from Apple and Microsoft. Quality going through the floor.

  2. don't you hate it when you lose your account

    Legacy is always a problem

    Whatever I do my mum just keeps getting hacked.

    1. keith_w

      Re: Legacy is always a problem

      you should drop Mum 1.0 and upgrade to Mum 1.1. Much more secure.

      1. ortunk

        Re: Legacy is always a problem

        don't bother and switch to Wife 1.0

        1. NonSSL-Login

          Re: Legacy is always a problem

          It can be costly to upgrade to Wife 2.0 depending on how long Wife one was in place.

  3. RegGuy1 Silver badge

    Whatever happened to write once, run anywhere? Surely that which has already been written will by now have been hardened through time and experience.

    Or is write once, run anywhere bollocks? Just asking for a friend.

    1. Anonymous Coward
      Anonymous Coward

      clue's in the acronym

      WORA load of bollocks

    2. Dave559 Silver badge

      Ancient code which has already been written doesn't just miraculously update itself and automagically fix crufty code (like that which permits SQL injection) by itself, you know!

      Someone has to review the old code, find the mistakes, and then update/rewrite it…

    3. big_D Silver badge

      Write once, run anywhere sort of works, but it can't eliminate sloppy coding and you have to constantly update all systems running that code to keep them safe. If that code gets customized along the way...

      And that code still needs to be thoroughly tested and if its scope changes, it needs to be re-evaluated and re-tested. WORA isn't any better or worse than any other type of code, it is the testing and security practices further down the line that make the critical difference, but constantly checking and re-checking code and ensuring all systems are up-to-date costs a lot of money that very few are willing to invest.

    4. Antonius_Prime

      "Or is write once, run anywhere bollocks?"

      Always was. They fed us that fallacy in Uni with C, C++ and Java.

      It ends up being write once, debug 7 trillion times, publish, hope t'f*** nothing breaks...

      ... and go again...

  4. FIA Silver badge

    The actions now taken include resetting all passwords, and amending the code to use parameterised queries, to protect against SQL injection attacks.

    What? The? Actual? Fuck?

    Come on people, it's 2021.

    Ancient code which has already been written doesn't just miraculously update itself and automagically fix crufty code (like that which permits SQL injection) by itself, you know!

    Someone has to review the old code, find the mistakes, and then update/rewrite it…

    This is PHP, one of the most prominent internet programming languages, being exposed to one of the most common 'rookie mistakes'. You'd hope that at least one of the many many many many many many many many many many times this has happened might've prompted some kind of 'Might be a good idea if we were seen to be doing it right' security audit? (I mean seriously... I'm starting to think about retirement... and parameterised SQL was a thing well before I even started in this field.... certanly well before PHP was a thing).

    1. Anonymous Coward
      Anonymous Coward

      PHP's database integration used to be such a clusterfuck that SQL injection vulnerabilities were no surprise, especially since most online tutorials and even books contained code that didn't sanitise inputs or use parameterised queries.

      As much as I hate it, PHP does now offer better database APIs, and has done so for quite a while. The core devs should be the most aware of anyone as to how shitty old PHP code tends to be, and their own infrastructure should not contain this kind of crap.

    2. Dave559 Silver badge

      Of course the system "shouldn't" still have been using old sloppy coding practices and functions, but my point was more that I suspect that the PHP site developers have as much difficulty as the rest of us in finding the requisite number of round tuits to review existing (and seemingly "working") code. All too often, things just don't get looked at again until they "really" need to. It's not good, of course, but it is the way things tend to be.

  5. Anonymous Coward
    Anonymous Coward

    Infrastructure written in PHP turns out to be insecure.

    Who'd have guessed?

  6. Anonymous Coward
    Anonymous Coward

    "older source code management system called Subversion"

    C'mon... it's not so old people need to be reminded how it was called...

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like