back to article How a botched kernel patch broke Ubuntu – and why it may happen again

If you spent the early days of June fighting kernel panics in Ubuntu 20.04, you were not alone – and we now know why. A problem with a Ubuntu-specific Linux kernel patch early last month rendered many systems, running Docker on that flavor of the operating system unusable, and it probably won't be the last time. The whole …

  1. Paul Crawford Silver badge

    Alas, that is always the risk of patching a system: don't and you might get hacked, do and you might get borked.

    1. Spazturtle Silver badge

      Only a risk if you are patching your live system without having tested it on your test system first, but who has one of those these days.

  2. sambaynham
    Pint

    Panic! at the distro? Have your pint your magnificent bastard.

    1. devin3782 Silver badge
      Pint

      Seconded, that brought me tears of joy

    2. This post has been deleted by its author

    3. Pierre 1970
      Coat

      What's next? "This charming man" when they borked the help pages?

      1. CAPS LOCK

        Heaven Knows I'm Miserable Now...

        ...more likely...

        1. Pierre 1970

          Re: Heaven Knows I'm Miserable Now...

          Ouch...my bad...I've thought that the reference was to Panic of The Smiths.... Pure music iliteracy

    4. Anonymous Coward
      Anonymous Coward

      Fire in the Distro

      Fire in the Distro

      Fire in the...update shell.

      Danger danger. Botched update!

    5. cd

      The root!

      The root!

      The root is on fire!

  3. FIA Silver badge

    "Maintaining an out-of-tree kernel patch for any length of time is an arduous task," Webb wrote, adding that the situation is unlikely to get any easier for the Ubuntu kernel devs and may actually become more difficult before long.

    It's almost as if Linux would benefit from a well defined, documented, versioned and stable driver interface rather than a collection of source code patches and hope.

    <Runs for cover behind something fireproof>

    (I am actually making a serious point, but I know it doesn't chime well with the open source ideals, or something? Never quite understood why good engineering should be forgoed for the GPL, but hey ho.

    It would also be virtually impossible to do in Linux as it would require a lot of co-operation and work by people who would ultimatly decry the outcome as it would allow people to more easly ship binary drivers.

    But imagine if an Android kernel upgrade didn't need to be patched with the latest drivers as they just worked to a stable, well defined and well versioned interface.

    It's like how Winodows XP could support SATA drives even though it was released first.... it has a well defined and documented driver interfaces the manufacturers could write storage drivers for.)

    1. Mike 137 Silver badge

      Never understood...

      "Never quite understood why good engineering should be forgoed for the GPL, but hey ho."

      Good engineering has not featured in software development in general for absolutely ages - probably since the days of mainframes and batch jobs. In those days, your bad code could bring everyone's jobs to a halt, so you'd get pretty unpopular. As computing became 'personalised' the exposure to censure reduced. However it's potentially back again as 'everyone' migrates to the 'cloud' (your mainframe in the sky), so a foul-up by the service provider can take all of you down.

      Unfortunately, sloppy habits are by now so engrained that the situation is unlikely to improve. It would be magnificent if 'software enginering' became worthy of the name, but I don't think it ever will now.

      1. vtcodger Silver badge

        Re: Never understood...

        Mike137:

        Having been around in the days of mainframes and batch jobs, let me assure you that early on, the hardware was so unreliable that the fact that the software (mostly) wasn't all that good was not a huge problem. In the 1960s, as much as 8 hours a day was scheduled for preventive maintenance. And unscheduled maintenance wasn't all that infrequent. The software didn't have to do much and what it had to do was often pretty clear, and it usually did it after a fashion. And there were some issues that (thankfully) no longer exist. Like every vendor having their own (incompatible) character set for text. And byte order issues that had to be fixed in application level code. And hardware divide operations so complex that one could spend an entire afternoon trying to figure out how the hardware could possibly produce the results it had from the inputs it was given.

        I'm with you in feeling that "software engineering" is, and always has been, pretty much an oxymoron. But I don't see any sign that anyone feels any need to fix that. Maybe when (if) the full magnitude of the computer security problem starts to become evident there will be pressure to change. But probably not.

        There is also the problem that writing a decent software specification looks to be extraordinarily difficult, time consuming and expensive. Harder than programming really. In 30 years in the system business -- 1961-early 1990s -- I saw exactly one such spec. Programming to it was a joy. I doubt that writing it was all that much fun.

        So, I look forward to a world where most everything except our backyard vegetable gardens is dependent on flaky, poorly engineered, software. I doubt it'll be all that much fun.

        But it'll be interesting.

        1. mtrantalainen

          Re: Never understood...

          The problem is that the only "decent software specification" is the actual source code. Anything less is either code that can be automatically converted to source code or it's more or less handwavy description of imagined end result.

          I've been writing software professionally for two decades and I've become to conclusion that in most cases software developer is actually an interpreter between normal humans and computers. It's the developer's task to ask enough questions from different parties to figure out what is actually required or wanted.

          Long time ago it was imagined that there's a job for a designer that works as this interpreter and then there's another human working as a programmer. It turns out that the detail level needed to transfer the requirements from the designer to the programmer is about the same as actually writing the source code.

          What we actually need is more resources for testing that the generated source code actually matches the true requirements because the software developer may have misunderstood the human parties at the start.

          In addition, there may be actual bugs in the implementation but those are much easier to detect and fix. And that will happen as a side-effect if the testing to verify the behavior is actually done.

          In most cases nobody wants to pay for testing to improve quality, though.

          1. Mike 137 Silver badge

            Re: Never understood...

            "It turns out that the detail level needed to transfer the requirements from the designer to the programmer is about the same as actually writing the source code"

            Agreed re the level of detail but it's different detail. For example, although a programmer for several decades I'm not a Java programmer (i.e. I don't know very much about the language). However, when developing an application written in Java, my job as application designer is to specify what it should do and what the operational constraints are (functionality, algorithm, performance, reliability), whereas the programmer's task is to implement those requirements optimally in Java - and come back to discuss any problems the language imposes on achieving them. So we are collaborators with different functions (just like in any other engineering group project). If a given mechanism to be coded is high priority, time critical or must be robust against specific dodgy input, the programmer won't be able to infer that unless I've told them (i.e. it's in the specification).

    2. Totally not a Cylon Silver badge
      Boffin

      A good idea.

      So, write one. That after all is the whole point of Open Source; you think something could be done better then you write it......

      1. FIA Silver badge

        A good idea.

        Thank you.

        So, write one. That after all is the whole point of Open Source; you think something could be done better then you write it......

        I thought the point of Open Source was to make the source code available so it may be modified/improved/inspected/maintained by the user of the software?

        But that regardles I'm afraid I don't have the engineering skills required. I am a trained software engineer, but that was a long time ago, and as Mike 137 points out above, the job I do (non life critical software development) means I have had, like most people in the field, many many years of what really isn't 'engineering' level software development. I'm also quite shy, so don't have the force of personality of someone like Pottering, which would be needed in a project like this also.

        The reality is what I am suggesting would require design and thought by people competent at that low level where software and hardware meet. Specifications would then require implementation, which would require broad consensus, and it would also require long term buy in by every contributer to the kernel.

        Doing this well is hard, it is "software engineering", not "programming".

        Doing this well when you're a singular monolithic organisation with control over the whole thing is hard... (NT and NeXT weren't developed over night).

        Doing this well, or even at all, when you're a globally colaborative project with many contributers all with broadly similar but subtly selfish motives is probably next to impossible.

        1. georgezilla Silver badge

          So ...................

          ..... what you are saying is that because Ubuntu, and apparently ONLY Ubuntu, had a problem, everyone or anyone else but them, should fix THEIR fuck up?

          It's a problem with Ubuntu, NOT with Linux. And this ISN'T the first, and won't be the last, time Ubuntu fucked something up.

          ALL.

          BY.

          THEMSELVES!

          So they should fix THEIR shit, and leave Linux in general out of it.

          And is but another reason ( one of many, many ) I don't, won't EVER, use or recommend Ubuntu to ANYONE!

          1. david 12 Silver badge

            Re: So ...................

            Building your own kernel is part of what distributions do. That isn't limited to Ubuntu.

            Last century, we used to do our own kernel builds, to support particular drivers or even particular software. That's what 'open source' meant. What's changed is that now most 'open source' users never look at the source, and use the kernel builds provided by their distribution. That's what 'open source' means now.

            1. mtrantalainen

              Re: So ...................

              Building custom kernels is totally expected. That means taking the official Linux kernel and configuring the kernel config as you wish.

              However, what Canonical did here is to take kennel kernel source code, apply untested code changes to it and then distribute the results to the end users.

              Had the problematic code that they added been executed even once, they would have catched the bug before release.

              This is purely about Canocical trying to maintain their custom kernel code without adequate resources.

        2. mtrantalainen

          The original Linux kernel patch was reviewed and tested by multiple developers. However, Canonical didn't apply the patch as-is and the problem was introduced by their modifications.

      2. elsergiovolador Silver badge

        The point of Open Source has been long gone.

        You will commit your free time to make a beneficial change that big corporations will make billions off of and you won't see a penny.

        Open Source is promoted by big corporations, so they don't have to pay wages to talented engineers.

        1. vtcodger Silver badge

          "Open Source is promoted by big corporations, so they don't have to pay wages to talented engineers."

          Actually I should think the corporations would prefer to pay the engineers and lock their customers in with un/poorly documented proprietary, trade secret protected, and/or patented technologies. But they'll settle for some occasional support of open source as that helps keep the anti-trust folks at a safe distance.

          At least that's my guess. Who knows what actually goes on in the mind (if any) of an MBA toting manager?

          1. LDS Silver badge

            "lock their customers in with un/poorly documented proprietary"

            They do it anyway. Google locks you in, AWS locks you in, etc. etc. Just they save boatloads of money because they don't have to pay licenses or write all that code themselves.

            "trade secret protected, and/or patented technologies"

            They still do it - how many patents Google & C. amasses while using open source code wherever they can use it for free and don't need to make technologies freely available? All Google algorithms are a secret or a patent, and still they can be run at that scale because open source gave them the way to cut costs greatly.

        2. georgezilla Silver badge

          So who is it exactly that is putting a gun to the heads of those that are writing opensource software making them write it?

          Wait ..............

          Hang on .............

          NO ONE IS!

          Don't like opensource, or the people making money off of it?

          Simple solution .....................

          DON'T USE IT.

          But that appears to be a concept that is beyond some people.

          Just like with those that whine about the GPL.

          No clue.

          But are butt hurt and whine anyway.

      3. georgezilla Silver badge

        Oh look .............

        Twice as many people that don't have a damn clue about opensource as those who actually do understand it.

        ( 4 up vs. 8 down at the time I wrote this )

        And people wonder why I tell those that don't, to just please ...............

        Fuck Off!!

    3. Spazturtle Silver badge

      Linux is a monolithic kernel, you should not be using out-of-tree patches and drivers, they should have all been merged into the kernel. If Qualcomm added their drivers to the Linux kernel then there would be no issues with upgrading the kernel on Android.

      1. FIA Silver badge

        If the in tree drivers were written to driver interface version x, barring bugfixes or feature additions, until that interface version was deprecated there'd be no need to touch them at all.

        Simply having something 'in tree' doesn't magically negate the effort required to keep it upto date.

        Open source developers free from the drugery of needless updating would have more time to write other open source software.

        1. mtrantalainen

          This bug was about applying a patch meant for an older kernel to newer kernel with new features. And the problematic patch contained broken code that got activated when the new feature was available.

          It was just by lucky accident that the broken patch worked with the old kernel.

        2. Spazturtle Silver badge

          If your change to the kernel breaks a driver then you are required to fix that driver in order to get your change approved. From the driver developer's perspective there is much less effort required to keep it up to date.

      2. georgezilla Silver badge

        But you see that's not how Ubuntu works.

        They have .............

        But it's not ours, and ours is better.

        NOW PAY ATTENTION TO ME! .... syndrome.

        Mir.

        Unity.

        Convergence.

        Phones,

        Tablets.

        To name a few.

        All failed.

        And all someone else's fault.

        "Nobody likes me.

        Everybody hates me.

        I think I'll go eat worms."

        1. mtrantalainen

          It sure appears that Canonical has trouble prioritizing the work they want to do.

          With enough resources to correctly implement all the features they have introduced (e.g. upstart, Unity, kernel patches) they would have solid product.

          Instead the keep starting new projects and no project is ever finished to really high quality product.

          They have still become very popular because the competition has even worse quality out of box. (And here I'm including "good defaults" into the quality. You can tweak nearly any Linux distro up your needs but Canonical has done great job figuring out pleasing defaults to many people.)

      3. mtrantalainen

        It's not just Qualcomm having hardware without drivers in vanilla kernel. Nearly all Android OEMs also tweak the hardware in their devices so even with generic drivers in the kernel, some custom hardware still wouldn't work unless those OEMs follow the GPL terms and actually distribute the source code matching the released firmware.

        Most OEMs distribute some source but it rarely contains all the code for the latest firmware.

    4. david 12 Silver badge

      a well defined and documented driver interfaces the manufacturers could write storage drivers for.

      A requirement IBM demanded for DOS 2.x. Which was why although "DOS only supported 10MB hard drives". 40MB drives were in wide use. There was a clearly documented block-storage driver interface.

    5. mtrantalainen

      The problem wasn't unstable driver interface but botched backported patch that was then blindly used in more modern kernel version without verifying that the patch is valid.

      The issue was that when the security patch in question was backported to older kernel version, the patch was partial because the features not supported by the older kernel were simply silently dropped instead of backported. That is, the patch contained only part of the original Linux kernel patch required to fix the security problem and the missing (broken) part of the patch was included in the code but it was not active with the old kernel.

      The kernel crash problem then appeared when this broken patch was blindly applied to more modern kernel without understanding that the broken part would then activate.

      Had Canocical applied the original Linux kernel patch to their newer kernel instead of their own modified&broken version there wouldn't have been any issue.

      Or if they had run *any* code after patching the kernel that actually used the patched code, they would have noticed the issue.

      The real problem was broken patch which introduced broken logic and no stable interface could have helped at all.

      Actually having an automated test case that actually uses the kernel feature that was modified would be the best. In best case that automated test could identify if the system is running with the original security issue or not. And if such test were run before releasing the broken kernel version, it would have crashed the rest machine! One could hope that it would have been clear enough signal that the patch wasn't good enough for release.

  4. elsergiovolador Silver badge

    Dumpster fire

    The 20.04 has been a dumpster fire for over a week now if you have enabled automatic updates.

    It keeps crashing randomly and then computer won't even turn back on.

  5. fishman

    Stock kernels

    I'm running a Ubuntu variant, Mint. It's been years since I've used the kernel that comes with the distro (except during initial installation), I just download the latest release from kernel.org, compile and go. I started doing that when I had some problems with the distro kernel.

    1. CAPS LOCK

      Linux Mint seems to ship with generic kernels...

      $ uname -r

      5.4.0-121-generic

  6. Anonymous Coward
    Anonymous Coward

    It seems a trite and poor excuse that the 5 year support period for Ubuntu LTS requires the HWE kernel to run anything useful and therefore is hard to maintain when you consider that Debian which Ubuntu is based on provides the same support term and yet does not need the addition of a specific HWE kernel nor suffers from these kinds of unpleasant surprises. I may be wrong but I do think that the bloat brought about by Ubuntu flying in the face of the Linux philosophy of doing one thing and doing it well by trying to be both a server and desktop OS in the vein of Windows is why these issues keep cropping up. Don't get me wrong, Ubuntu is a fantastic OS but suffers from self inflicted problems. For example, one of the easiest to solve that is commonly automated around in the industry is the whole system-resolvd piece which causes no end of pain, increases complexity and does nothing to improve DNS lookups. I do feel that rather than blaming support complexity for these issues the Cannonical team needs to take a step back and reconsider some of their design decisions which have been major pain points for the community for a number of years rather than doubling down on them in the footsteps of Microsoft whom even after being bitten by a well known annoyance during a keynote speech (surprise updates anyone?) continue along the same line of thinking without reconsidering their ideas dispassionately. In short, when you make a change to a system that works well and always has, please ask yourself why, and if you cannot provide a good reason, don't do it!

    1. This post has been deleted by its author

    2. Robert Carnegie Silver badge

      It's not my field of expertise but surely you want a LTS release to just run its original workload. Except for support updates, you shouldn't need to change anything. Now - this is difficult to follow - that seems to be why it broke... if you stayed with kernel 5.13 as you're entitled to do, and you updated it, then - that's the problem? But, it's still safer than if you ignore LTS and you do update everything to a latest version whenever you can.

  7. Anonymous Coward
    Anonymous Coward

    Not the only recent goof up canonical did

    This is a long-ish story, but it describes a mistake submitting patched version of glibc, which was then corrected a couple of days later. But not after it affected quite a few systems, necessitating a re-install of the OS. And the usual costs of lost productivity.

    At some point (April 28 2021) a version of glibc (2.31-0ubuntu9.3) was released by ubuntu. This would as is normal be installed on systems as part of the usual updates. However there was a problem with that version and it was removed. This caused problems on systems which already had received this update.

    Some systems would have packages based on the glibc 2.31-0ubuntu9.3 source whilst others would have packages based on the glibc 2.31-0ubuntu9.2 source, It didn't appear to affect anything except it made it impossible to install libc6-gdb:i386 and libcs:i386 if it was not already installed and the OS has packages based on the glibc 2.31-0ubuntu9.3 source.If one attempted such installs it could cause conflicts with the package manager which were time consuming to fix.

    My guess is that it was a goof up on canonical's side which they quietly and promptly fixed but not before it was out long enough to be installed on some systems.

    Looking at the release history:

    https://launchpad.net/ubuntu/+source/glibc/2.31-0ubuntu9.2/+publishinghisto[..]

    Superseded on 2021-04-26 by glibc - 2.31-0ubuntu9.3

    https://launchpad.net/ubuntu/+source/glibc/2.31-0ubuntu9.3/+publishinghisto[..]

    Deleted on 2021-04-27 by ...

    Update might be causing regressions in snaps and the core20 snap (LP: #1926355)

    https://bugs.launchpad.net/snap-core20/+bug/1926355

    The practical upshot was that affected systems would have a hard to fix conflict in the package manager, which meant re-installing the OS would be less time consuming than trying to fix the conflict (if that was even possible).

    I am sure such regressions and resulting fixes are rather common. But I don't remember in my 20+ years of working on debian and derivatives it causing such a strange conflict in the package manager necessitating re-installing the OS.

  8. Kevin McMurtrie Silver badge

    Zwhat?

    I can't believe I actually avoided a bug by using the Docker ZFS storage driver.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like