back to article Aviation delays ease as airlines complete Airbus software rollback

Airlines around the world have rushed to roll back software that powers Airbus A320 planes after the aviation giant discovered a recent update could put the aircraft in danger. This story starts on October 30th, when flight 1230 operated by US airline JetBlue made an unplanned diversion to Florida’s Tampa International Airport …

  1. Anonymous Coward
    Anonymous Coward

    The mystery wrinkles

    I'm a bit puzzled by this as there were X-class solar flares in the first half of the year (Jan-Jun), but then nothing until November ... so the October 30 flight receiving "intense solar radiation" seems odd (could it really have been affected by the M4.8 flare of October 15?)

    And also, which part of the software rollback fixes the issue ... something needed for ECC, something to do with the operation of lock-step redundant units, was the previous software update running Python or JavaScript with maliciously trojanized Shai-Hulud npm packages, vibe coding? It would be nice to get some clarity on this imho ...

    1. Jim Mitchell
      Boffin

      Re: The mystery wrinkles

      There is just more radiation at the altitude of a transcontinental flight. I suspect that this was a low probability event and the one flight got unlucky.

    2. FirstTangoInParis Silver badge

      Re: The mystery wrinkles

      Python etc? Absolutely not. Any software that goes anywhere near an aircraft must be deterministic and is very heavily evaluated to DO-178. Software that keeps the aircraft in the air receives the highest assurance level (DAL A) meaning every requirement and every line of code is scrutinised. I’m told POSIX C is ok for the lower assurance levels, but likely you’ll need an RTOS for the higher levels.

    3. Paul Kinsler

      Re: The mystery wrinkles

      I was also a bit mystified; as far as I could tell from the CNN report the incident was c 18:48UT on the 30th (from 13:48EST), although and (as you say) nothing shows on space weather live. Perhaps it was a radio-only flare, and thus would not appear on the GOES X-ray flux?

    4. Acrimonius

      Re: The mystery wrinkles

      Airbus would never confess to using vibe coding. Hell would break lose if they did

      1. spireite

        Re: The mystery wrinkles

        Why not? It appears to me that Boeing use vibe design......

    5. jonfr400

      Re: The mystery wrinkles

      The early November 2025 flares reached X5 solar storm levels. Followed by strong CME. The X5 solar flare was the last of the three solar flares that happened that week.

      ESA wrote about those solar flare activity. https://www.esa.int/Space_Safety/Space_weather/Lessons_from_the_November_2025_solar_storm

      Here is also a good list that is updated regularly.

      https://en.wikipedia.org/wiki/List_of_solar_storms

      1. Paul Kinsler

        Re: The mystery wrinkles

        Well, indeed; but the event described in the article appears to have occurred on 30 October, a day characterised by an unremarkable Kp, low solar proton flux, and no flares as evidenced by significant X-ray flux. One can, for example, scroll back and zoom out at ...

        https://spaceweather.knmi.nl/viewer/

        ... to compare the two dates in question.

        This aviation event may indeed have been a solar-activity related occurrence - spaceweather.knmi is far from comprehensive - but there seems no reason that I can see that justifies claiming it occurred during any kind of "solar storm" (although any corrections and additional evidence, are, of course, welcome).

    6. Fursty Ferret

      Re: The mystery wrinkles

      Solar flares don't really bother aircraft electronics, the impact is on radio transmissions mainly. The big cause of SEU is cosmic rays which come from outside the solar system, as they carry enough energy to make it a significant way through the atmosphere. In fact, high solar activity tends to reduce the amount of cosmic rays reaching lower levels.

  2. Winkypop Silver badge
    Mushroom

    intense solar radiation

    Ironic.

    The same intense solar radiation that the flying punters want to lie out in when they reach the Costa Del Lobster.

  3. ITMA Silver badge
    Devil

    Redefinition of airliners

    Flying used to be fun.

    Now it has become a depressingly frustrating and often unpleasent experience - yes I'm looking at you Manchester Airport Group and Ryan Air (and the like).

    To rationalise that, I started thinking of commercial airliners not as aircraft, but busses with wings.

    I think a new definition is in order - flying microwave ovens....

    1. ajadedcynicaloldfart

      Re: Redefinition of airliners

      @ITMA

      Quote "Flying used to be fun.

      Now it has become a depressingly frustrating and often unpleasent experience - yes I'm looking at you Manchester Airport Group and Ryan Air (and the like)". Unquote.

      What has that got to with the subject of the article?

      1. Anonymous Coward
        Anonymous Coward

        Re: Redefinition of airliners

        And if you don't like ryanair, don't fly with them. I won't

      2. spireite

        Re: Redefinition of airliners

        Not wanting to stay offtopic, but if you're flying out of Manchester on on Ryanair too - that's on you and nobody else.

        1. Yet Another Anonymous coward Silver badge

          Re: Redefinition of airliners

          The real problem is flying on a plane you don't own

          1. Brave Coward Bronze badge

            Re: Redefinition of airliners

            'The real problem is flying on a plane you don't own'

            Flying in the cloud, you mean ?

            1. Anonymous Coward
              Anonymous Coward

              Re: Redefinition of airliners

              IR

            2. Yet Another Anonymous coward Silver badge

              Re: Redefinition of airliners

              As the man said: If it flies, floats or fscks - rent it

        2. ITMA Silver badge
          Devil

          Re: Redefinition of airliners

          You'd be surprised which airports not in Manchester are owned by MAG (Manchester Airport Group) such as Stansted.

          1. Yet Another Anonymous coward Silver badge

            Re: Redefinition of airliners

            That explains a lot about the Stansted experience

    2. Roland6 Silver badge

      Re: Redefinition of airliners

      > I started thinking of commercial airliners not as aircraft, but busses with wings.

      The US domestic airlines were already that in the 1980s.

    3. Anonymous Coward
      Anonymous Coward

      Re: Redefinition of airliners

      > To rationalise that, I started thinking of commercial airliners not as aircraft, but busses with wings.

      Have you heard of this airliner called the... air bus? Seriously. They call it a bus that flies in the air. Who'd have ever thunk it.

      1. I ain't Spartacus Gold badge
        Mushroom

        Re: Redefinition of airliners

        Airbus is rubbish though! I mean, their flying buses aren't even nuclear powered!

    4. Anonymous Coward
      Anonymous Coward

      Re: Redefinition of airliners

      Well, there is a sizeable HINT in the name of the company.

  4. Ken Shabby Silver badge
    Facepalm

    I initially thought isn’t error correction done at the hardware level?

    But maybe some crc check when transferring data. But I would have thought there would be multiple versions of the truth voted on.

    Sounds like it was a screwed up patch for the initial problem, that has issues that may or may not be related to the initial problem.

    I see many airlines saying “we are not affected”, wondering if that is because they don’t update very often.

    1. Anonymous Coward
      Anonymous Coward

      It stands to reason they don't update very often. Imagine a flight using Microsoft software on Patch Tuesday needing a reboot at 20.000 feet up..

    2. Richard 12 Silver badge

      They don't.

      Airlines don't update software very often. It's normally only done at regular scheduled overhauls.

      The only time it's done ASAP is to correct a flight safety issue.

      Installation is not trivial. There's a full audit trail of the state before the update, verification steps to ensure it actually occurred, and the state afterwards.

      Even "normal" industrial software/firmware only gets updated during scheduled downtime, and generally only to correct specific known issues or add specific desired features. Downtime costs too much.

      1. FirstTangoInParis Silver badge

        Re: They don't.

        > Even "normal" industrial software/firmware only gets updated during scheduled downtime,

        And indeed a lot has *never* been updated because of continuous processes. And then some bright spark decides to connect them to the interweb causing all sorts of exposure issues due to vulnerabilities and lack of patches. Not to mention zero day issues.

  5. Jason Hindle Silver badge

    Something doesn't add up

    "Left unexplained, for now, is exactly how intense solar radiation corrupts data."

    Presents as a hardware design defect; fixed in software?

    1. pip25

      Re: Something doesn't add up

      Not necessarily. For example, they may have incorrectly calibrated some algorithm to expect values from an instrument within a specific range - which is true 99% of the time, but such rare events may push readings outside the assumed boundary.

      1. Anonymous Coward
        Anonymous Coward

        Re: Something doesn't add up

        I believe unexpected accelerometer readings is what caused the schiparelli mars lander to think it was on the surface when it was a few miles above the surface, with the effect that it decided it no longer needed it's parachute

        1. fg_swe Silver badge

          V-Model, HIL, Redundant Sensors

          In theory, this kind of things can be avoided by running the control unit in realistic HIL tests. As mandated by the V-Model.

          Sensor failure is part of proper HIL Testing.

          Redundant Sensors detect Sensor failure.

          HIL Testing can be done in front of a particle beam, which simulates the sun and other radiation sources.

          One should think that well-educated and enlightened engineering managers could think of this and make the necessary time, money and machinery available.

        2. Like a badger Silver badge

          Re: Something doesn't add up

          "I believe unexpected accelerometer readings is what caused the schiparelli mars lander to think it was on the surface when it was a few miles above the surface, with the effect that it decided it no longer needed it's parachute"

          Robotic Darwinism.

        3. anothercynic Silver badge

          Re: Something doesn't add up

          That was due to metres/sec vs feet/sec. Makes a bit of a difference that one...

          1. aaronmdjones

            Re: Something doesn't add up

            You are confusing two incidents. The person you are replying to was correct; the Schiaparelli EDM crashed because it ejected its parachute too early after its sensor fusion reported a negative altitude reading (making it think it had already landed, when it was still more than 3km above the ground). The crash due to confusing metric and imperial units of measurement was an entirely different mission; the Mars Climate Orbiter, 18 years prior.

    2. Anonymous Coward Silver badge
      Boffin

      Re: Something doesn't add up

      The equivalent of switch debouncing: wait until the data is in a stable state before acting on it.

      Then someone comes along and "optimises" the code - "why are we waiting here? Delete"

      That's my experience anyway... I'll leave it to you to work out whether I was the one who wrote the initial code, or the one who sped it up.

      1. Ellipsis
        Boffin

        Re: Something doesn't add up

        Except at the very highest assurance levels for safety-critical software: (a) “why are we waiting here?” would find an answer in the traceability of every line of code to a requirement; (b) “Delete” would cause the independent verification testing of that requirement to fail; and (c) the unsanctioned modification would be caught by the change review process before the update could be released.

        (In theory, at least…)

        1. Anonymous Coward Silver badge
          Alien

          Re: Something doesn't add up

          "In theory, theory and practice are the same. In practice, they are not."

    3. fg_swe Silver badge

      Clearing Up Physical Mysteries

      1.) Measure real-world radiation

      2.) Talk to a particle physicist how to simulate 1000x the radiation in a lab

      3.) Strap control unit with the real software in a HIL setup in front of artificial radiation source (linear accelerator or the like)

      4.) See what the HIL reports.

      5.) Change software and or hardware.

      6.) GOTO 3.

      It is almost as if we spend lots of money on CERN and almost if Airbus could cooperate with CERN on this matter. As both entities are funded mostly by EU states.

      1. Mishak Silver badge

        Good luck

        If you could get some electronics into the beam at CERN, it's not going to survive.

        Don't quote me on these figures, but the energy in the beam (at full power) is equivalent to something the mass of a large aircraft carrier travelling at 40 knots. There are dump tanks round the ring that are filled with water, and you do not want to be near one when it's used.

        The sort of even that could have caused the Airbus issue is likely to be a (single) energetic particle causing a single bit upset.

        1. fg_swe Silver badge

          So ?

          Put the control unit behind a properly designed armourplate/water tank to get the required radiation spectrum and particle count.

          1. Irongut Silver badge

            Re: So ?

            Large hadron colliders are not toys to be played with, summoning Nyarlathotep is serious business you know.

            1. Yet Another Anonymous coward Silver badge

              Re: So ?

              Imagine if you ran it anticlockwise, what would you summon?

              1. A. Coatsworth
                Holmes

                Re: So ?

                You'd get Petohtalrayn, shirley?

              2. Someone Else Silver badge

                Re: So ?

                It'd be like playing a Beatles record backwards

              3. The Organ Grinder's Monkey Bronze badge

                Re: So ?

                Widdershins?

        2. Excused Boots Silver badge

          Re: Good luck

          Indeed, although the dump tanks aren’t filled with water but they are water cooled.

          From CERN’s own website "Each beam dump absorber consists of a 7m long segmented carbon cylinder of 700mm diameter, contained in a steel cylinder, comprising the dump core (TDE). This is water cooled, and surrounded by about 750 tonnes of concrete and iron shielding. The dump is housed in a dedicated cavern (UD) at the end of the transfer tunnels (TD).”

          I recall reading a article written by one of the CERN engineers who claimed that ‘if we have to ‘dump’ the beam, the very, very last place on Earth that you want to be standing is at the end of the tunnel’!

          1. anothercynic Silver badge

            Re: Good luck

            You don't want to stand anywhere near *any* accelerated beam in *any* facility. There's a reason why proton beam therapy is used to kill cancerous cells, and why people undergoing PBT are effectively locked into a solid frame to ensure *nothing* moves.

            I've seen what a beam does to a living thing (a spider that happened to be in a beamline cell when the cell was lit up)...

            1. Ken Shabby Silver badge
              Black Helicopters

              Re: Good luck

              I hope it didn’t bite you before it croaked

              It looks like a radioactive spider ====>

      2. anothercynic Silver badge

        Re: Clearing Up Physical Mysteries

        Sorry to have to say this, but you clearly have no idea what science CERN is involved in.

        That said, you *can* potentially find light sources (like Diamond Light Source in the UK, ESRF and SOLEIL in France, ALBA in Spain etc) that could potentially simulate extreme radio events like the ones potentially found at 40-45,000 feet up. If anything, those sources (which use electrons at close-to-light-speed to generate radio waves of various frequencies, which in turn are bundled into beams a few microns in size) would be more appropriate than a proton-smasher like CERN. You don't want to use linear accelerators either, because that's not what they are designed for. It's not simply a case of "oh, just bung that into a box in front of a linear accelerator"... some experiments (in LINACs, light sources etc) take *months* to design and implement, others take years (CERN being one giant experiment with some experiments on the side).

        And just to set the record straight, Airbus is not 'funded mostly by EU states'. It's a publicly listed and traded company. CERN on the other hand *is* funded by governments, including non-EU and non-European ones.

    4. Anonymous Coward
      Anonymous Coward

      Re: Something doesn't add up

      But the fix was to rollback software, implying that the bug was introduced through a previous upgrade?

      Now we know where the laid off Micro$oft programmers are going.

  6. fg_swe Silver badge

    Details, Aerospace Software

    It transpires:

    1.) The problem was a bitflip, caused by solar storm radiation. For some hard to explain reason, the affected variable in main memory was not protected by CRC, ECC or the like.

    1.2) Protection is ideally done by hardware, but can also be done in software: Store multiple copies of the variables and compare them upon each use. Handle deviation in a proper way.

    2.) The affected software controls the horizontal control surface. This means the aircraft can potentially pitch up or down wildly, up to a breakup of the a/c structure.

    3.) The software rollback again protects against solar storm particles.

    Questions:

    A) Shouldn't Airbus have found this problem in a HIL Test rig under simulated solar radiation ? Particle beam accelerators do exist and are not expensive for a fleet of thousands of aircraft. Needs to be done once for each release and all a/c

    Note: Control systems of this kind are typically programmed in Ada, C, C++ and execute on a RTOS like Integrity-178, VxWorks, QNX or the like. Unixes or Windows do not fit the bill, as they are not hard realtime capable. CPU could be an embedded version of PowerPC, ARM or 680x0.

    1. fg_swe Silver badge

      Re: Details, Aerospace Software

      Note 2: software of this type is developed with the V-Model approach, which is vastly different from the quick-and-dirty approach used for most beancounting and general IT software.

      See

      https://di-fg.de/RobusteSoftware.html

      Airbus does have a good history of faithfully executing the V-Model and this appears to be an unfortunate exception. Nevertheless, they should now subject ALL of their safety-critical control units to artificial particle beam while executing inside a HIL test rig.

    2. Mishak Silver badge

      Protection is ideally done by hardware, but can also be done in software

      Maybe, but it is non-trivial; simply storing multiple copies is not enough. For example:

      bool f( int x )

      {

      static int c1;

      static int c2;

      c1 = x;

      c2 = x;

      return c1 == c2;

      }

      Many C compliers will optimise this code to always return 'true', as there is no way that the values of 'c1' and 'c2' can differ within the abstract machine that the language uses to execute the code; memory corruption is not considered by the machine.

      1. fg_swe Silver badge

        Re: Protection is ideally done by hardware, but can also be done in software

        In aerospace software development you do not blindly trust the compiler, rather you will review every single line of Ada code and resulting machine instructions.

        And you will test the effectiveness of your measures by a realistic particle/ radiation beam, which simulates hundreds of years of a/c operations in a matter of days.

        1. fg_swe Silver badge

          Assembly Code Review

          Of course machine code review will only be done for highly safety-critical parts of e.g. a flight control system("FlugLageRegler") and probably not on less critical things such as the radar, the radio and the like. Focus efforts on the most important parts and relax it on the lesser ones. Basic rationality goes a long way.

      2. ChrisC Silver badge

        Re: Protection is ideally done by hardware, but can also be done in software

        bool f( int x )

        {

        volatile static int c1;

        volatile static int c2;

        c1 = x;

        c2 = x;

        return c1 == c2;

        }

        1. Sumpbuster

          Re: Protection is ideally done by hardware, but can also be done in software

          Beat me to it !

          If I remember correctly, the way to force the compiler was to "volatile" the variables, so ensuring that the compiler would always re-read the value?

          1. ChrisC Silver badge

            Re: Protection is ideally done by hardware, but can also be done in software

            That's one way of doing it, and if all you need to do is force the compiler to re-read the value each time, without caring about any other optimisations it might be applying to the resultant code, then it's probably also the easiest/cleanest way. In some projects I've also used file-level optimisation overrides to prevent the compiler doing *any* optimisation on *any* of the code contained within said file, which is another, rather more brute-force, way of doing it, but which might be necessary for other reasons - e.g. to ensure cycle-accurate timing without the need to actually write the asm yourself...

            I'm getting a little long in the tooth to be subjecting my brain to that, though in my younger years I used to relish the chance to get stuck into some truly bare metal coding without even the crutch of a compiler for support, and I do still understand the instruction sets of the processor cores I work with these days well enough to know what the compiler is generating, which is a skill any coder working on lower-level systems really needs to have IMO, even if they never use it to actually write so much as a single line of asm in their entire career - unlike those earlier years, where I genuinely could write better asm than the compilers of the time (some of which would happily generate code that was utterly and hopelessly wrong), I'm happy to concede that a few decades of compiler development combined with the increasing complexity of the cores themselves means that the compilers these days almost certainly are doing a far better job than most of us could manage.

        2. Brave Coward Bronze badge

          Re: Protection is ideally done by hardware, but can also be done in software

          ... but who wants to fly among volatiles?

          1. Mishak Silver badge

            who wants to fly among volatiles?

            Or, as I call it, First Class?

      3. This post has been deleted by its author

      4. Roland6 Silver badge

        Re: Protection is ideally done by hardware, but can also be done in software

        Perhaps more importantly, even optimisation disabled, the compiler would put C1 and C2 in adjacent memory addresses, rather than in different memory modules.

    3. Anonymous Coward
      Anonymous Coward

      Re: Details, Aerospace Software

      > For some hard to explain reason, the affected variable in main memory was not protected by CRC, ECC or the like.

      It's not really hard to explain at all, none of the CPUs involved support ECC.

      1. fg_swe Silver badge

        Re: Details, Aerospace Software

        Can you provide more details ?

        1. Ken Shabby Silver badge
          Alert

          Re: Details, Aerospace Software

          Ok there are cpus without ECC, but this is a safety critical system and not a gaming machine

    4. david 12 Silver badge

      Re: Details, Aerospace Software

      Particle beam accelerators do exist and are not expensive for a fleet of thousands of aircraft.

      You don't need radiation to test for bit-flip errors. You can use any kind of simulated hardware.

      Full emulation testing with simulated bit flips is very slow: it may take days to run minutes of simulation. Testing every possible bit-flip is like testing every possible chess game. And just using random radiation to provoke bit-flips would never test all possible situations.

    5. anothercynic Silver badge

      Re: Details, Aerospace Software

      Particle beam accelerators do exist and are not expensive for a fleet of thousands of aircraft.

      Again - zero clue about the real cost of particle beam accelerators or what the science in them is...

  7. cookiecutter Silver badge

    compare with boeing

    airbus- this COULD cause a very rare issue in the future, let's get fixed!!

    boeing - we're told the average occurrence of this issue is once in 2 years so we'll leave it 2 years to fix the problem- HANG ON!! why did it only take 3 months for the next crash?! they said AVERAGE of two years!!

    1. Anonymous Coward
      Anonymous Coward

      Re: compare with boeing

      Be glad Ford wasn't involved. Their treatment of the Ford Pinto fuel tank issue was so bad it changed legislation..

    2. Excused Boots Silver badge

      Re: compare with boeing

      Which is why when any of my family are flying anywhere, one of the first things I ask them to check is who made the aircraft!

    3. anothercynic Silver badge

      Re: compare with boeing

      Bingo.

  8. fg_swe Silver badge

    Well

    This issue seems to expose a deficiency of current aircraft/spacecraft control unit development(HW+SW) processes.

    Why did they not find it in a HIL test strapped in front of an appropriate particle beam(simulating a solar storm's radiation over several years) ?

    I've added this subject to my document on these matters:

    https://di-fg.de/RobusteSoftware.html

    1. IGotOut Silver badge

      Re: Well

      Have you got shares in a "particle beam accelerator" company? You keep banging on about them like you gave a fucking clue what your on about, but you may of noticed all the down votes by people who DO know what they are talking about.

      1. This post has been deleted by its author

  9. Pussifer

    This is why, when seated, you should have your seat belt buckled - even loosely. If you hit turbulence, or a super rare incident like this, you're less likely to be injured by hitting the ceiling or overhead lockers. It wouldn't stop you being injured by an unsecured idiot around you though.

    If it's a Boeing I'm still not going.

    1. fg_swe Silver badge

      If you read carefully, the bit flip could occur in a high valued bit of an important variable. This would trigger maximum elevation of horizontal control surfaces. A/c would perform extreme pitch, resulting in high aerodynamic forces, resulting in structure breakup.

      Electronic control means the control unit must work almost perfectly. Bitflips must be propely dealt with. Sensor faults must be dealt with.

      The same argument can be made about hydraulic and mechanical controls, though.

      Engineers need to be on top of any failure mode.

      1. Anonymous Coward
        Anonymous Coward

        As I understand it, maximum in any one direction is OK, at least until you leave the flight envelope. There have been crashes from swinging control surfaces back and forth rapidly.

        1. fg_swe Silver badge

          "until you leave the flight envelope"

          This happens in a few seconds with maximum elevation of the horizontal stabilizer at full speed. A major aspect of this control unit is to keep the a/c pitch inside proper limits. A control algorithm only works as long as there are no unchecked bitflips... See the problem ?

          1. Someone Else Silver badge

            An "unchecked bitfilp" when fed to a properly damped control algorithm, would tend to be ignored...unless the bitflip were somehow made persistent (e.g. the damaged value was stashed, and consistently reused without being refreshed at an appropriate interval). It would be odd that such a value were to be made persistent for values that control pitch, yaw or roll.

            1. fg_swe Silver badge

              No

              Imagine the bitflip occuring inside your imagined dampening filter. Filter value jumps from 0x0005 to 0x8005. The output of your control unit goes from "minimal" to "maximum" in 20ms or so. In a matter of seconds the a/c attitude goes into a dangerous pitch that will rip the a/c apart. In the meantime your dampening filter went from 0x8005 to 0x8001 - not a relevant difference to save the a/c.

              In reality it is probably much more complex PID controllers and filters working together to do the ELAC work. But any bitflip in the PID and the filters will hairraise the control software engineers. They want NOTHING of the like happening in their control codes. They want either immediate reset of the ELAC(in something less than 300ms) or a switchover to the Other Elac.

              1. fg_swe Silver badge

                Re: No

                Any proper control unit software engineer operates under the assumption that variables do not simply flip, as this nullifies any assumptions made about the code. Ideally the RAM, the Flash and the CPU itself performs ECC both in storage and in processing paths. The other option is to run a second unit in lockstep and compare the outputs, as identical bitflips in two ECUs is very unlikely.

                There is no way a control algorithm can accept a bitflip of control variables or of program code; the outcomes can all be catastrophic.

                1. Mishak Silver badge

                  operates under the assumption that variables do not simply flip

                  Any decent critical system will assume that they can happen, but will introduce mitigations to ensure they are detected.

                  The "simplest" way is to use hardware so that the software can be written so that it does not have to worry about them. Whilst it can be done in software, it's not so easy as even (e.g.) CRCs only give transient protection (the data was valid when its CRC was checked, but what about when it is then used?).

  10. MiguelC Silver badge
    Angel

    Re: Airlines around the world scrambled to make the fix, but many couldn’t avoid delays to their schedules

    And then there are the VIPs

    1. fg_swe Silver badge

      So ? You want the pope to fly on a more dangerous plane than others ?

      Quite dark view of humanity. Too much Marx Intake ?

      1. This post has been deleted by its author

      2. MiguelC Silver badge
        FAIL

        Just highlighting the fact that airlines had to send their planes to a few designated airports to get the fix, while the Pope gets a technician on home call

        Marx had nothing to do with anything here, but maybe your religious opinions blurred your view?

  11. david 64

    "...intense solar radiation..."

    Sounds a bit BOFH excuse generator to me...

    1. Excused Boots Silver badge
      Joke

      "...intense solar radiation...”

      of course that could just mean a bright light!

  12. Paul Barnett

    "A320 pilot Arjun Singh has identified the problematic software release as “L104” and said the rollback was to version “L103+”."

    So what is not clear is why this fixes the problem. Did L104 remove data checks that L103+ still had? and what other (safety) updates in L104 are we now missing by going back to L103+

    1. fg_swe Silver badge

      Indeed

      Airbus and EASA are stonewalling on the exact details of this failure mode. I guess they consider control unit engineering to be their secret sauce and fear they could advance the competition by telling too much.

      My best guess is that L104 removed a redundant RAM storage+computation path, which would detect and mitigate the bitflip.

      Or maybe L104 simply turned off ECC by accident. This would align with their decision to perform a hardware replacement of older control units. They probably have no ECC at all.

      But then there is the Airbus philosophy of triplicate and higher control unit redundancy. Why did that not catch the bitflip ?

      1. Someone Else Silver badge

        Re: Indeed

        I guess they consider control unit engineering to be their secret sauce and fear they could advance the competition by telling too much.

        Don't be silly! Boeing would never consider copping another's control algorithm; they are way to busy screwing up their own, and well, NIH-syndrome, mutter, mumble.

        Now Embraer might be tempted, but their stuff works, so where's the incentive?

        1. fg_swe Silver badge

          Re: Indeed

          There are some nasty upstarts in that one asian nation, who would love to take over Airbus and Boeing in one swipe...

  13. fg_swe Silver badge

    Radiation Testing Services

    Quick search with AI yields companies like this:

    https://radiationtestsolutions.com/services/radiation-effects-testing/

    https://www.northropgrumman.com/what-we-do/space/launch-vehicles/launch-vehicles-and-propulsion/testing-for-success

    It seems the capabilities already exist, ready-to-use, but some MBA beancounter decided it was not necessary to contract them.

    1. fg_swe Silver badge

      Re: Radiation Testing Services

      Boeing also seems to have exquisite capabilities and a network of even more exquisite partners to do radiation testing

      https://www.boeing.com/specialty/radiation-effects-laboratory#accordion-7aa0d0df7d-item-bfd8ce86b9

      As I wrote above, some radiation sources need national labs capabilities.

    2. Roland6 Silver badge

      Re: Radiation Testing Services

      A question has to be whether computer systems in a composite airframe are more exposed to solar radiation than those in an aluminium bodied plane.

      I'm sure someone has already done the research for real, rather than just relied on a computer simulation.

      1. fg_swe Silver badge

        Re: Radiation Testing Services

        Not hard to come up with a experimental rig for this purpose. The hardest thing will be to convince the MBA sitting on the purse. Can be done for less than 100k dollars, if you have access to Boeing and Airbus scrap parts.

  14. Electronics'R'Us
    Holmes

    Flight control system design

    Having actually done some of these, here is the pretty standard design requirements.

    Triplex design, galvanically isolated so if one lane goes out, the other two are not electrically affected. Processors must be 3 completely different architectures to make the chance of a microcode problem (spurious execution) so small as to be effectively zero as all the processors will actually have completely different engines under the hood (in reality, the numbers are more like 10^^-12 or so).

    Memory interfaces: L1 = parity protection. L2 and higher ECC (fix one, detect 2).

    All the above is in hardware.

    Every relevant sensor is read (typically 20 times per second) and the 3 channels vote with their results which will naturally have some margin for different physical sensors. That is usually done either in a processor or quite commonly now, a FPGA [1]. If there is a disagreement, the two in agreement will reset the other channel.

    Ultimately it is software that commands control surface movement based on pilot input against the control laws, a specification given to the manufacturer of the equipment by Airbus in this case. In this particular case, if Airbus are doing the top level or have replaced the third party vendor model it looks like the control laws may well not be properly defined.

    I don't know the current software stack in the A320 but I do know the process is very strict.

    1. FPGAs in this context are designed to the requirements of DO-254 (or the Airbus equivalent which has the same requirements) and is very similar in scope and effect to DO-178 for software. FPGAs with SRAM configuration data are (or at least were) a big no-no. The old Actel ProASIC flash based series were the go to parts in this area for a long time. Flash is pretty much immune to free neutron hits.

    Most atmospheric particles are free neutrons, the density of which increases with increasing altitude (it is an air pressure issue and varies quite considerably across the world)

    1. Anonymous Coward
      Anonymous Coward

      Re: Flight control system design

      Actel is part of Microchip now.

      See AntiFuse FPGAs

      1. Electronics'R'Us
        Stop

        Re: Flight control system design

        Unfortunately, many AntiFuse FPGAs from US vendors come under ITAR and therefore will not be used in any civil design.

    2. This post has been deleted by its author

  15. JimmyPage Silver badge
    Boffin

    Left unexplained, for now, is exactly how intense solar radiation corrupts data.

    This isn't the BBC. We know how.

    1. collinsl Silver badge

      Re: Left unexplained, for now, is exactly how intense solar radiation corrupts data.

      Except it's not the job of the BBC to explain something based on their supposition of the situation when facts are involved, it's their job to report what the facts are that they're presented with.

      And if Airbus and EASA aren't saying, then the BBC can't report the explanation that they've given. That's why they said "Left UNEXPLAINED" - I.E. no one has explained it to the BBC, not that the BBC doesn't know how it COULD have happened. Facts are important in journalism, especially these days.

  16. Nematode Bronze badge

    I love safety-critical software. Not. I'll always remember one site start-up I did where the client's functional spec., which we followed, resulted in a hugemongous-inch pipeline valve moving when it shouldn't have*.

    Having decided that the client's design could therefore not be trusted after all, I did a 1-man instant on-site re-analysis of their "code" and found 3 more errors.

    * Had to throw out that particular pair of shorts.

    1. Roland6 Silver badge

      If you've written safety-critical software, I think you definitely have a different view of programming and testing to those who's only experience is VB and web app's.

    2. fg_swe Silver badge

      Guess What, Genius

      It is your job as a proper development engineer to find contradictions and bullshit, discuss with customer and then change the requirements document.

      THAT IS YOUR MOST IMPORTANT VALUE CREATION.

      sorry for shouting.

      1. collinsl Silver badge

        Re: Guess What, Genius

        Don't you mean change the design? A requirements document should say "we want the system to be able to do x" not "this is how the system will do x". The design should say how.

  17. Judge Mental
    Pint

    Good planning.

    It seems the airlines and Airbus planned this well. There's a four day window between Thanksgiving and Monday when fewer in USA are flying and everyone is stuffing themselves with Turkey or Black Friday shopping.

  18. Anonymous Coward
    Anonymous Coward

    Apparently the issue does not exist in previous software versions. So it's not really about the design or that there wasn't suitable protection (combination software/hardware), but that something went wrong in the new version.

    And I'm sure there is QA to try and make sure this can never happen. But I think with solar radiation, there must be some level of statistical analysis on what level of QA testing is necessary and what is not. For example, if all the cross-checked values in the right places of the memory of all the processorsl change in the same way - while not changing any of the other memory contents - it would be an undetectable error. But the chances of it happening would be way too low to care about. I seem to recall a story of Boeing doing some analysis of where statistically there would be no bit flips over the life spans of the entire fleet, even if they were constantly in a solar storm. Things like that.

    So that makes me wonder if the effect of the broken software change was to invalidate that earlier analysis? For example it could have changed the structure of the critical variables in memory to be more closely packed together, or maybe a new version of the compiler might have optimised away some critical check due to a latent bug from the 1980s that was never noticed before. This would be tricky to notice in testing or code review, and the solar radiation related QA done to the old requirement level wouldn't have picked it up either.

    Of course, it could also be something incredibly boneheaded. Like if Thales just fobbed the software off to someone who did the equivalent of "ChatGPT, please implement the feature described in ticket THA-34. Review and optimise the code base according to the coding conventions. Commit the changes and start a code review. In the code review, enter code improvement comments as the two persons with review rights, and then commit code changes according to those comments. Approve the code review as the two earlier commenters and release the new version as ELAC L104. Leave space between each step that would be consistent with a software developer doing these things manually".

  19. This post has been deleted by its author

  20. First Light

    From YouTube Blancolirio channel, two commenters explain (and I don't know how they know):

    @txkflier

    The omitted code was in the pitch attitude limitation module in alternate flight-control law. The L104 update for ELAC B introduced enhanced envelope protection against stall (e.g., pitch limits during failures), but omitted SEU detection/recovery logic for corrupted data, enabling uncommanded pitch-down under solar radiation.

    @robertbutsch1802

    The L 104 software version apparently was an effort to get the A-320 flight control system closer in performance to the, much newer, A-350 FCS. The software people (whoever they might be) may have cut corners a little bit in the Error Correction Code in the name of making the new software compatible with the ELACs. The fix is to revert to version L 103+.

  21. Occasional Comentard

    How does a software change cause or prevent corruption by solar flares? Surely it is the hardware that needs changing, by adding shielding etc.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon