Want to bet...
.... that Lockheed-Martin comes out with "a sub-contractor" or "a sub-sub-contractor" did it? Outsourced or not... ?
The Federal Aviation Administration has blamed a software upgrade to its next-generation radar system for the hundreds of delayed and cancelled flights from Washington DC airports on Saturday. "The FAA identified a recent software upgrade at the Leesburg, VA high-altitude radar facility as the source of Saturday's automation …
"Why the picture of Radar O'Rei... oh, NOW I get it!"
You get it. I get it. Lots of other old fogeys get it too.
What age profile do you think recognition of the picture will have?
How about the other culturally iconic team of software engineers, Morecambe and Previn: "I'm encrypting all the right bits. Just not necessarily in the right order".
The young people of today, they ... er how does the rest go?
This testing shit costs money which comes right out of our bonuses! We're not doing that!! Say....let's do like Microsoft does, and let our customers test this stuff for us! What? We don't have any "customers"? Of course we do...they called "passengers", but why mince words. There... perfect!!. Time for a Scotch....
"Yep, it passes all the tests... it's gold. Updates away..."
If I'm reading between the lines correctly, the critical difference was between the code 'working' and the code working over time, e.g. a whole shift.
Feature/function testing vs. ? What is it called when you test the code as it is actually used over a valid duration of time by real people?
When the users are flight controllers - rather important users them - aren't their usage patterns well enough understood to be the basis of "real world" testing? Or is this the after-thought to now be remedied?
Forty-five years ago I was the chief architect of a 100 (eventually 200) user time-sharing system. One of the design goals was to make sure that users couldn't "hack" the system and interfere with other users. Another goal was for the system to be up >99% of the time. Any crash (which rebooted in less than a minute was counted as 15 minutes of down time. Testing was done during experimental times when we would attempt to crash the system, put on large loads etc. We eventually could log a scheduled uptime of over 99% - this includes failures from all causes such as power failure, operator error etc.
We also had some users that would attempt to hack the system and they were partially successful a couple of times. Once they installed a trojan that did interesting things when run by a privileged user. They also found a hole in our disk quota system (a bad compare with the maximum integer value).
Sometimes the system (after a year of pretty solid operation) would crash and we would find a bug that had been in the system for years. Examining the bug we couldn't understand how the system had ever worked - but it did.
My take away from all of this is that it is extremely hard to make a large system completely air tight. Testing cannot be complete. Users change their behavior over time exposing bugs. Updating a system that is running is even more difficult. I think one important thing is to recover quickly and not lose important data.
I assume the 12-30% were the cancelled flights mentioned in the opening paragraphs.
But ti's still concerning they don't know the EXACT number, was it 70% or was it 88%? Surely "scheduled flights - actual flights" are 2 known, discrete numbers that can produce a known discrete number as a solution rather than a range?