back to article Software update borked radar, delayed hundreds of flights, says US FAA

The Federal Aviation Administration has blamed a software upgrade to its next-generation radar system for the hundreds of delayed and cancelled flights from Washington DC airports on Saturday. "The FAA identified a recent software upgrade at the Leesburg, VA high-altitude radar facility as the source of Saturday's automation …

  1. Mark 85

    Want to bet...

    .... that Lockheed-Martin comes out with "a sub-contractor" or "a sub-sub-contractor" did it? Outsourced or not... ?

    1. BillG
      Facepalm

      His name is Walter.

      So why that M*A*S*H photo? Why the picture of Radar O'Rei... oh, NOW I get it!

      1. Anonymous Coward
        Anonymous Coward

        Re: Mashup

        "Why the picture of Radar O'Rei... oh, NOW I get it!"

        You get it. I get it. Lots of other old fogeys get it too.

        What age profile do you think recognition of the picture will have?

        How about the other culturally iconic team of software engineers, Morecambe and Previn: "I'm encrypting all the right bits. Just not necessarily in the right order".

        The young people of today, they ... er how does the rest go?

  2. Sureo

    Upgrades

    I remember when I used to roll out upgrades to production. Usually no problem, but once when something went wrong my manager said "You did WHAT??"

    It helps to test first.

    1. midcapwarrior

      Re: Upgrades

      I'm sure they tested it, just not at a large enough scale or long enough see system run out of memory space.

      Agile development....

  3. Someone Else Silver badge
    FAIL

    What? Test if first BEFORE we deploy?!? Shirley, you jest.

    This testing shit costs money which comes right out of our bonuses! We're not doing that!! Say....let's do like Microsoft does, and let our customers test this stuff for us! What? We don't have any "customers"? Of course we do...they called "passengers", but why mince words. There... perfect!!. Time for a Scotch....

  4. Notas Badoff
    Boffin

    Testing methodology

    "Yep, it passes all the tests... it's gold. Updates away..."

    If I'm reading between the lines correctly, the critical difference was between the code 'working' and the code working over time, e.g. a whole shift.

    Feature/function testing vs. ? What is it called when you test the code as it is actually used over a valid duration of time by real people?

    When the users are flight controllers - rather important users them - aren't their usage patterns well enough understood to be the basis of "real world" testing? Or is this the after-thought to now be remedied?

    1. Alister

      Re: Testing methodology

      What is it called when you test the code as it is actually used over a valid duration of time by real people?

      Burn-In?

    2. Anonymous Coward
      Anonymous Coward

      Re: Testing methodology

      "What is it called when you test the code as it is actually used over a valid duration of time by real people?"

      Rare.

    3. swm

      Re: Testing methodology

      Forty-five years ago I was the chief architect of a 100 (eventually 200) user time-sharing system. One of the design goals was to make sure that users couldn't "hack" the system and interfere with other users. Another goal was for the system to be up >99% of the time. Any crash (which rebooted in less than a minute was counted as 15 minutes of down time. Testing was done during experimental times when we would attempt to crash the system, put on large loads etc. We eventually could log a scheduled uptime of over 99% - this includes failures from all causes such as power failure, operator error etc.

      We also had some users that would attempt to hack the system and they were partially successful a couple of times. Once they installed a trojan that did interesting things when run by a privileged user. They also found a hole in our disk quota system (a bad compare with the maximum integer value).

      Sometimes the system (after a year of pretty solid operation) would crash and we would find a bug that had been in the system for years. Examining the bug we couldn't understand how the system had ever worked - but it did.

      My take away from all of this is that it is extremely hard to make a large system completely air tight. Testing cannot be complete. Users change their behavior over time exposing bugs. Updating a system that is running is even more difficult. I think one important thing is to recover quickly and not lose important data.

  5. 404

    Evidently...

    ... Lockheed installed Windows 10 Home instead of Pro?

    1. Sven Coenye

      Re: Evidently...

      No, that was the "disabling unauthorized hardware" feature at work...

  6. Chairo
    Flame

    A memory leak issue?

    One would think they'd check such a critical software for memory leaks before rolling it out.

    Where is the electric fence icon, when you need it?

  7. Uberseehandel

    I can see that there might be a lot of money to be made out of supplying ATC developers with nightmare busiest day scenarios in digital form for the test factories/labs, so overload situations can become part of standard testing.

  8. Mintyboy
    Mushroom

    "Despite the outage, air traffic controllers safely handled 70 to 88 percent of Saturday's scheduled arrivals and departures"

    I'm confused what happened to the 12-30% of Saturday's scheduled arrivals and departures???

    Icon for obvious reasons

    1. eldakka

      I assume the 12-30% were the cancelled flights mentioned in the opening paragraphs.

      But ti's still concerning they don't know the EXACT number, was it 70% or was it 88%? Surely "scheduled flights - actual flights" are 2 known, discrete numbers that can produce a known discrete number as a solution rather than a range?

  9. Where not exists

    It all depends

    If your goal is to find bugs, then you will find them. If your goal is to find zero defects, then that is what you will find. Different thought processes. Lockheed is a manufacturer. Manufacturers strive for zero defects. And that is what they found.

  10. Anonymous Coward
    Anonymous Coward

    Future Issues

    It's not necessarily been reported that...

    "In order to prepare for future software update failures, air traffic controllers are now practicing controlling air traffic while wearing blindfolds and earplugs."

    Anon Y. Mus

  11. JustNiz

    Seriously the FAA are running their next generation radar system on Windows? Wow. Whoever made that choice badly deserves to get fired for incompetence.

  12. Anonymous Coward
    Anonymous Coward

    Was it written in Java?

  13. eldakka

    That's a rather convoluted way...

    ...of saying the application has a memory leak.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like