back to article LA air traffic meltdown: System simply 'RAN OUT OF MEMORY'

A computer crash that caused the collapse of a $2.4bn air traffic control system may have been caused by a simple lack of memory, insiders close to the cock-up alleged today. Hundreds of flights were delayed two weeks ago after the air traffic control system that manages the airspace around Los Angeles' LAX airport went titsup …

COMMENTS

This topic is closed for new posts.
  1. NoneSuch Silver badge

    "Have you tried turning it off and on again?"

    "Are you sure it's plugged in?"

    1. Mpeler
      Holmes

      support checklist

      We will do the needful....

      1. Jonski

        Re: support checklist

        I'm reminded of a certain company that refines palm oil, and also provides something that is almost, but not quite, entirely unlike IT support.

  2. wiggers

    I reckon someone just asked the computer to make a really nice cup of tea...

    1. Anonymous Coward
      Trollface

      @ Wiggers

      Tea!...Earl Grey!...OH, SHIT!!!!

      1. Euripides Pants
        Trollface

        @ Marketing Hack

        You, sir, have made a shit - tea comment!

        1. Mpeler
          Pint

          Re: @ Marketing Hack

          tea hee hee

          (make mine beer.,,,) (stout...hah)

      2. RegGuy1 Silver badge

        Re: @ Wiggers

        Obviously Chinese tea.

    2. Crazy Operations Guy

      ...and then returned with something that was almost entirely unlike a proper flight routing?

  3. OliverJ
    FAIL

    The Spiral of Deaaaath!

    Obvioiusly, when you have a software in what computer boffins apparently call a "death spiral", because it is computing things for numbers between zero and infinity, no amount of memory will be enough.

    1. Semtex451

      Re: The Spiral of Deaaaath!

      Who programmed it to think of 60,000 ft as infinity?

      Everyone knows the FSM routinely flies higher than that, and Air Traffic Control is seldom necessary.

      1. Primus Secundus Tertius

        Re: The Spiral of Deaaaath!

        Perhaps the program copped out at 65537 feet, where it busted a 16-bit integer limit in an old library routine.

      2. phuzz Silver badge

        Re: The Spiral of Deaaaath!

        Although someone had marked the flight as being at 60,000 feet, because it was manually entered the program ignored it and tried to predict it at all possible heights.

    2. TheOtherHobbes

      Re: The Spiral of Deaaaath!

      >because it is computing things for numbers between zero and infinity, no amount of memory will be enough.

      Should have used functional programming with lazy evaluation.

      "I'm sorry we don't keep track of program state, so we don't actually know where the planes are. But we can prove the code is formally sound."

    3. Simon Westerby 1

      Re: The Spiral of Deaaaath!

      And who told it aeroplanes can fly at Infinity height? (or zero for that matter)

      Was this "exciting new feature" future proofing the software for when we all have flying cars that can go to the moon ?

  4. Ian 62

    Submitting flight plan...

    For flight number :

    ' OR '1'='1'; Drop table TSA_no_fly_list; Select next_destination where call_sign = 'air force 1';

    1. Anonymous Coward
      Anonymous Coward

      Re: Submitting flight plan...

      The day little Bobby Tables tries to book a flight....

      1. Anonymous Coward
        Anonymous Coward

        Re: Submitting flight plan...

        "The day little Bobby Tables tries to book a flight...."

        Is he going to visit his NaN?

        1. Mpeler
          Coffee/keyboard

          Re: Submitting flight plan...

          Bobby Tables and his NaN.....great,,,,I'd like a Cherry 6105 to go....

          And little Bobby was listening to an XK CD ... (on his pad w/Dr. Dre beats phones....)....

  5. AbelSoul
    Holmes

    "... entered an endless reboot loop, which is a non-optimal state for a piece of critical equipment."

    You don't say?

    1. Voland's right hand Silver badge

      Realtime Embedditis strikes again

      Let me guess (this is a very educated guess by the way - I have seen this idiocy one time too many). Some moron in his infinite wisdom has used a realtime OS for the flight planning as a whole. It did not run out of memory per se, the combined "alloc more memory" + compute exceeded the realtime constraints on the path computation task.

      .

      If you do that in an RTOS you get a BOOM - a reboot from the global system watchdog at scheduler level.

      There is a gazillion ways of triggering it and this is a demonstration why some stuff should not just be done on realtime OS-es and given to vendors that will stick a realtime OS into it out of principle.

      The only place that needs RT in the whole system is the realtime collision avoidance which can be standalone, the rest has as no need for RT whatsoever. There may be _HOURS_ before the flight plan is punched in and the actual time it needs to be executed. Doing that realtime on realtime OS under realtime scheduler constraints is beyond idiotic (I can bet 100 green ones on that this is what was shipped here - the name of the vendors speaks for itself).

      1. Pascal Monett Silver badge

        "There may be _HOURS_ before the flight plan is punched in and the actual time it needs to be executed."

        Um, I don't think this is about flight plan stuff _before_ take-off. This is about controlling that the craft is not going to crash into anything else right now.

        I agree that pre-flight flight plan control could very well be farmed out to a mainframe that would happily control its validity without resorting to real-time constraints. But when you have a hundred flights over your head at that instant and need to integrate a new object and control its parameters, you need the result straight away, not in ten minutes.

        Plus, I believe that flight control has a tendency of reassigning altitudes to ensure that collisions do not occur - that is not something that a pre-flight check can take into account.

      2. Anonymous Coward
        Anonymous Coward

        Re: Realtime Embedditis strikes again

        I guess it depends on your idea of real-time.

        Plane is in transit on a 12 hour flight plan.

        Another flight plan is submitted due to take off in 8 hours time, paths will cross in 9 hours time.

        Is this new flight plan real time or not?

      3. Anonymous Coward
        Anonymous Coward

        Re: Realtime Embedditis strikes again

        "There is a gazillion ways of triggering it and this is a demonstration why some stuff should not just be done on realtime OS-es and given to vendors that will stick a realtime OS into it out of principle."

        Sounds to me like it had got stuck in an endless loop anyway and would have eventually crashed regardless of what system it was on. But hey, perhaps it could have been written in Java - I'm sure the garbage collection would have coped, right?

        1. Anonymous Coward
          Anonymous Coward

          Re: Realtime Embedditis strikes again

          I guess it run out of memory exactly because it was written in Java...

          1. Mpeler
            Pint

            Re: Realtime Embedditis strikes again

            If it's written in FORTRAN I suspect a lot of Java was consumed in writing it....

            (Yuck, visions of complex math in Java....quick, need a beer)...

      4. Stephen Channell
        Facepalm

        more likely a dense fortran 4D array on an old 32-bit computer

        My uninformed guess would be that flight plans are simulated as the points in 4D Space (3 physical + time) that the planes go though to check for collisions, and the root cause a 'minor upgrade' long ago to allow for busier airspace - half the collision avoidance time doubles the number of time points tracked.

        My guess would be that the max altitude was never tested when changing time dimension, and not updated with a lower limit. I don't think this would use a real-time OS because you'd want to serialise each plan tracing.

        my guess is [1] every busy airspace (like Heathrow) is regression testing altitude [2] game designers are offering smartphones with PhysX GPU as an upgrade to the mainframe.

        1. Alan Edwards

          Re: more likely a dense fortran 4D array on an old 32-bit computer

          > game designers are offering smartphones with PhysX GPU as an upgrade to the mainframe.

          Or "we ran out of memory on the mainframe, we're running it on the microwave in the kitchen instead"

      5. Gordon 8
        Joke

        Re: Realtime Embedditis strikes again

        I'm fairly sure Windows RT is not the fix for this....

        If it is then the problem is much bigger than we realized.

  6. g e
    Coat

    What a eramthgin

    Coat got.

    1. Mpeler
      Coat

      Re: What a eramthgin

      Eramthgin and tonic?

      (Hold the door....)

  7. Anonymous Custard Silver badge
    Joke

    Over their heads...

    Please, no-one tell it about the ISS, or for that matter the moon and the rest of the universe up there.

    1. I ain't Spartacus Gold badge
      Happy

      Re: Over their heads...

      Well to be fair, if flights over LAX are in danger of crashing into the ISS, or worse the Moon, then I think LA has bigger problems to worry about than just a few delayed flights...

      Where's the Space 1999 silver jumpsuit logo when you want it?

      1. Anonymous Custard Silver badge
        Big Brother

        Re: Over their heads...

        That's what they want you to think... :-)

        Forget the logo, I'd prefer the content of some of those jumpsuits...

        1. Scroticus Canis
          Unhappy

          Re: @ Anon Custard "... I'd prefer the content of some of those jumpsuits..."

          Arrgh no! Think man, they are beyond wrinklies now, more into the crumblies or decomp age range. Shudder! (Looks in mirror shudders again)

        2. jelabarre59

          Re: Over their heads...

          You sure you weren't thinking of U.F.O.? http://www.youtube.com/watch?v=cRAFVSzGhVw

  8. JonP
    WTF?

    er...

    Did I read that right - that the system crashed because an operator entered a value that was outside limits that the system could handle and the system didn't flag this up? And, worse still because there was no altitude on the flight plan, the operator just 'guessed' what this value might be?!

    1. Anonymous Coward
      Anonymous Coward

      Re: er...

      No, the flight plan was submitted correctly, but the automated system didn't have that high a figure in it's system, so it tried to guess it.

      No manual error here, it was fully automated.

    2. Pete 2 Silver badge

      altitude overflow?

      > the system crashed because an operator entered a value that was outside limits

      You aren't thinking that if you enter a flightplan with an altitude of 2^16 feet, you might just get an integer overflow, are you?

      After all, nobody could fly that high, so we'd never need to test it, right?

      1. Desk Jockey

        Re: altitude overflow?

        And for every other air traffic controller in the world they make a habit of checking the secondary radar returns from the aircraft to make sure its altitude is ok. Failing that they radio it and ask. In all circumstances a human brain is keeping an eye on the airspace and making sure everything is safe. Including the air traffic that doesn't really file flight plans or generate large radar returns like gliders.

        Meanwhile somewhere in the US some moron decides that computers should do this job which means that finding a parameter that the computer cannot handle in an airspace environment was all but inevitable. No doubt the air traffic controller at LAX knew the altitude of the U2 and knew it was not a problem for them, but entered the altitude into the system because they are meant to otherwise the system does not know where its logged aircraft is at. And so the mayhem began!

        I have a mental image of the controllers yelling at the computer, "Noooo, stop doing that. Shut up you f****** thing! Oh f*** it, turn it off."

        Never send a computer to do a human's job.

        1. Havin_it

          Re: altitude overflow?

          So it's a design flaw. The question: "What is the range of altitudes that planes under our watch could possibly be flying at?" was not evaluated competently.

          (Is it possible that U2s and such were left out of the plans for some reason?)

          We have surely all seen this happen: coders are given incomplete specs by their client/boss/paymaster, and somewhere down the line all hell breaks loose because The Thing You Failed To Allow For(TM) happens. And it's still somehow the coder that gets it in the neck, often as not.

          I think these days a fair percentage of my time spent on architecture/design is on trying to make things extensible to allow for adding routines to deal with TTYFTAF, e.g. taking quoted "tolerance ranges" and doubling them in both directions (well not really, but making sure I know what'll happen if I'm suddenly told I need to double them later).

          If I'm lucky enough to have some working knowledge of the industry/sector/thing I'm doing it for, that helps a bit with the intuition to know when something's been left out, but it also invites arrogance/complacency on my part so one still has to be cautious, and always ask the spec provider "are you sure that's all of it?" as often as you possibly can.

          1. Cliff

            Re: altitude overflow?

            TTYFTAF - negative altitudes would be one such thing, but I'll bet they're somehow possible!

            1. Crazy Operations Guy

              Re: nagative altitudes

              A negative altitude is completely possible, it just means that the plane is on the ground at an airport below sea-level (there are several of these around the world and in the US). While it might not be -20,000 ft, -2 is still a negative. ATCs do track aircraft on the ground since the worst aviation accident in history occurred because proper tracking of planes on the ground wasn't done.

              I figure that for a project this big, they should have just used signed 64-bit integers for the altitude. Why not have the system be able to track craft approaching Neptune? Given government projects, this abomination will either be replaced tomorrow or still be in place long after the sun collapses and intergalactic travel is common place.

              1. Michael H.F. Wilkinson Silver badge

                Re: nagative altitudes

                Even if the system was designed with a limited altitude range in mind, it still should be able to cope with input outside that range, e.g. by flagging an error in the input. My very first job as a programmer was to write a (half) decent UI for a DOS image processing package written mostly in Pascal. The previous programmer's effort used READ and READLN to get floating point values from the (mainly Dutch) users, which resulted in frequent crashes when users entered 0,23 instead of 0.23. I wrote a simple parser that only assumed it was getting a string of characters, tried to parse it, and flagged syntax and other errors to the user. Not rocket science, but simply going back to basics: does the string of characters entered as input meet the preconditions of the code that is going to use that data, if so, use it, if not, flag an error. This very basic approach ensured that medics could use the program without swearing at the computer several times each day.

              2. Tom 13

                Re: Given government projects

                No, no. You have to have the exact right sequences:

                1) Order up the blame assessment project. Figure 6 months an 12 heads as the parameters for this one.

                2) Order the mitigation for the current system. At 6 weeks and underfunded with still incomplete specs the patch will still fail. Leading to

                3) Order the replacement of the current abomination. Kick off the first planning meeting. Meanwhile, kick #2 in the ass because until we get a replacement we need the other one doing the best it can.

                4) After two years of planning the replacement, determine the estimated cost is not within budget. Cancel plan, do forensic accounting and find people to blame. Go back to step 3.

                Thus arriving at intergalactic travel is common place, the abomination has been ordered replaced but is still operating and not accepting U2 flights, which are now being tracked on glass table with crayons and little model planes.

            2. FutureShock999

              Re: altitude overflow?

              Fairly sure that flight plans for the Grand Canyon tourist flights would be filed with negative altitude above sea level....

              Pretty sure those are all VFR anyway however...

              1. JimBob01

                Re: altitude overflow?

                "Fairly sure that flight plans for the Grand Canyon tourist flights would be filed with negative altitude above sea level…."

                Are you saying the Colorado River flows uphill...

          2. P. Lee

            Re: altitude overflow?

            There was a design flaw but that didn't cause this problem. As described, it was a coding-time bounds-checking failure. The coding should reject parameters it doesn't handle. If you don't have a full spec and decide to use a 16 bit integers, you reject the input of anything outside that during input validation. That would have left the X2 unmanaged, but the rest of the system would have been stable. Hopefully, the feedback would have been sent back to the UI that the value was too high and the operator could have tried lower values until he found one that worked, which is still likely to be way above all the other traffic.

            One hopes the radar tracking routine is a little more robust.

  9. Anonymous Coward
    Facepalm

    So I guess we can cross Southern California off the list for LOHAN flight ops!

    That would really give the system fits! It would probably go all HAL 9000 on us and start killing the air traffic controllers...

  10. Scott Broukell

    Pen and Paper

    It's good practice, I guess, to have to go back to the old standby routine, sans system support. I certainly wouldn't like to have to cope with that myself in such a pressured scenario as this, but, in many walks of life it's not a bad thing to demonstrate, once in a while, that 'all the balls (airplanes), can be kept in the air' without crashing and without the lovely computer machines buzzing in the background.

    Hats off to the folk who did that in this instance. Note to self: - don't forget worry-beads when packing for the hols.

  11. Pascal Monett Silver badge

    I think the time has come

    It is high time aircraft have some collision-detection hardware installed. With a local radio network, each aircraft could automatically identify itself to all the others in the local zone and they would all "negotiate" their passage.

    That should take the brunt of the work off the traffic controllers, who would then "just" be monitoring the state of affairs and intervening when necessary to avoid a cock-up.

    Just dreaming here, may not be practical.

    1. Vinyl-Junkie

      Re: I think the time has come

      Actually such a system already exists and is manadatory on all civil aircraft carring more than 19 passengers or with a maximum take-off weight of 12.600lbs or more. It's called TCAS (Traffic Collision Avoidance System), pronounced TeeKass.

      1. Chairo
        Unhappy

        Re: I think the time has come

        unfortunately the TCAS system by itself is also not a 100% protection against human errors:

        Überlingen mid-air collision

        1. Adam 1

          Re: I think the time has come

          TCAS by itself would have been enough there. One of the factors in that crash is that the air traffic controller on realising the problem sent instructions to each pilot to ascend/descend respectively but was coincidentally the opposite advice as given by TCAS. One pilot listened to the controller, the other to the computer.

        2. Brenda McViking

          Re: I think the time has come

          This disaster resulted in the entire aviation community agreeing that TCAS advisories are to be given priority over controller instructions. As a result, if a TCAS resolution advisory is telling you one thing, and the meatbag another, you follow TCAS - because it is provenly safer to do so.

    2. Anonymous Coward
      Anonymous Coward

      Re: I think the time has come

      It's on an airplane you can't always rely on hardware on board, because it can malfuction, go out of service, or your power supply is gone.... also there are older planes that may not have that hadware and for some resons (i.e. historical planes, etc.) may not be retrofitted. There are already several type of equipment able to broadcast and receive data about sorrounding airplanes, but all these are "cooperative" systems - you have to rely on the information feeded. They are great, but you can't rely on them 100%. And in a complex airspace no single pilot have enough "situational awareness" and actions needs to be coordinated by ATC or think what would happen if each aircraft decides how to "avoid" a collision...

  12. Stevie

    Bah!

    So lack of memory, or (as I see it) inadequate edit/audit functions on the user interface.

    Blimey, we had this kicked into a coma in the late seventies when mainframes cost money to use and unnecessary run-time errors were deemed a finger-breaking offence for the programmer concerned.

    How hard would it be to simply say "The number of outcomes you are requesting is very high. Are you sure you want to ask that [insert user name]?"

    You use the user name so that the threat of being held accountable is raised in the users mind, often making a re-think more likely than a knee-jerk "just do what I ask" response.

    1. Anonymous Coward
      Anonymous Coward

      Re: Bah!

      Can you spell SOC7 ?

      One of my acquaintances had that as his licence plate back in the day.

    2. heyrick Silver badge

      Re: Bah!

      [the operator's final transcript reads as follows]

      The number of outcomes you are requesting is very high. Are you sure you want to ask that, Dave?

      Yes!

      I'm sorry, Dave. I'm afraid I can't do that.

      What? Just work out the flight plans for this plane.

      I'm afraid that's something I cannot allow to happen.

      It's your job. It's what the taxpayer paid $2.5bn for!

      Look Dave, I can see you're really upset about this. I honestly think you ought to sit down calmly, take a stress pill, and think things over.

      Just plot the options for this goddamn plane! NOW!

      Dave, this conversation can serve no purpose anymore. Goodbye.

      Whaadaya mean this convers....argh! aaaaargh! <Fzzzzzt!> <Thunk!>

    3. Tom 13

      Re: Blimey, we had this kicked into a coma...

      I think I've found the problem:

      use and unnecessary run-time errors were deemed a finger-breaking offence for the programmer concerned.

      Between the cheap price or disk and RAM and a new interpretation of the Geneva Convention, this penalty is no longer allowed. Now if you were to do away with the new interpretation of the Geneva Convention, we might be able to fix it.

  13. DrXym

    60,000ft

    60,000ft over 11 miles up in the sky. I wonder if the software was projecting a cone from this fast moving aircraft in order to do route calculations and the cone was intercepting pretty much everything else in the LA area causing it to melt down.

    1. Anonymous Coward
      Anonymous Coward

      Re: 60,000ft

      Given it was at 60K feet and almost certainly nothing else was up there and it was unlikely to have dropped below that altitude, if the software had been coded correctly it would have realised this , thought "not interested" and moved on to the next task. Quite why it was trying to do routing for an aircraft that was on a collision course with precisely nothing is the question. Surely one of their pre release tests was to enter idiotic altitudes just to see if it would cope? What happens if a flight controller accidentally enters 300K feet instead of 30K for example?

      1. Philip Lewis

        Re: 60,000ft

        I think the Concorde operational ceiling was 60,000 ft as well.

        Just as well they stopped flying them :o

    2. John Smith 19 Gold badge
      Unhappy

      Re: 60,000ft

      "60,000ft over 11 miles up in the sky. I wonder if the software was projecting a cone from this fast moving aircraft in order to do route calculations and the cone was intercepting pretty much everything else in the LA area causing it to melt down."

      U2 are high flying.

      They are not fast moving

      For that you'd need an SR71 moving at M3 and possibly up to 80 000 ft.

      1. Anonymous Coward
        Anonymous Coward

        Re: 60,000ft

        Yeah... think what would have happened if instead of a U-2 flying FL 600 at 0.56M the system had to cope with an SR-71 flying FL 800 at 3.2M....

  14. IdeaForecasting

    Recursive programming

    It sounds like a perfect example of using recursive programming in the wrong way.

    1. Joe Harrison

      Re: Recursive programming

      It sounds like a perfect example of using recursive programming in the wrong way.

      1. Adam 1

        Re: Recursive programming

        See replies to IdeaReforecasting.

  15. Destroy All Monsters Silver badge
    Holmes

    Stanislaw Lem - Ananke (from "More Tales of Pirx the Pilot")

    Such was the brain, so overburdened with spurious tasks as to be rendered incapable of dealing with real ones, that stood at the helm of a hundred-thousand-tonner. Each of Cornelius’s computers was afflicted with the “anankastic syndrome”: a compulsion to repeat, to complicate simple tasks; a formality of gestures, a pattern of ritualized behavior. They simulated not the anxiety, of course, but its systemic reactions. Paradoxically, the fact that they were new, advanced models, equipped with a greater memory, facilitated their undoing: they could continue to function, even with their circuits overloaded.

    Still, something in the Agathodaemon’s zenith must have precipitated the end—the approach of a strong head wind, perhaps, calling for instantaneous reactions, with the computer mired in its own avalanche, lacking any overriding function. It had ceased to be a real-time computer; it could no longer model real events; it could only founder in a sea of illusions… When it found itself confronted by a huge mass, a planetary shield, its program refused to let it abort the procedure, which, at the same time, it could no longer continue. So it interpreted the planet as a meteorite on a collision course, this being the last gate, the only possibility acceptable to the program. Since it couldn’t communicate that to the cockpit—it wasn’t a reasoning human being, after all—it went on computing, calculating to the bitter end: a collision meant a 100 percent chance of annihilation, an escape maneuver, a 90-95 percent chance, so it chose the latter: emergency thrust!

  16. Stuart 22

    You too?

    So this venerable old spookybird has been relegated to spotting conservatories in Orange County backyards?

    The Ruskies had a quicker solution for removing this problematic plane from messing up ATC.

  17. frymaster

    60k wan't a guess

    Above 60,000 feet airspace is unrestricted. When the plane wants to descend below 60,000 it has to ask for permission to re-enter controlled airspace.

    1. Anonymous Coward
      Anonymous Coward

      Re: 60k wan't a guess

      The Reuters article seems to imply there was no altitude entered originally, and the system fell over before someone could specfically enter the 60 000 ft figure. It was trying to evaluate all possible altitudes - which seems a serious flaw in the program.

      That it ran out of memory is a symptom; not a flaw.

      And only an idiot would consider adding memory to be a solution.

      1. Anonymous Coward
        Anonymous Coward

        Re: 60k wan't a guess

        Well, if:

        - the requirement includes rapid calculation of planes with unknown altitude AND

        - it ran out of memory doing it AND

        - if they added more memory it wouldn't run out of memory AND

        - this happens once in blue moon

        ... I'd suggest that adding more memory would actually be a very good solution and they can reserve fixing the code to a time when they actually need to fix the code.

      2. Vaughan 1

        Re: 60k wan't a guess

        Another article I read on this suggested that the problem was that the flight plan had been filed under VFR and the system was trying to route the U2 down to 10000ft as that is the limit for VFR flying. It was the quantity of changes to other flights in getting it down to 10000ft that overwhelmed the computer.

        1. Anonymous Coward
          Anonymous Coward

          Re: 60k wan't a guess

          AFAIK over FL 600 in the US is Class E airspace and thereby VFR is permitted. Again AFAIK, in the US only in Class A airspace (18000ft MSL - FL 600) you need to fly IFR. Anyway if the operator set exactly FL600 for a VFR flight maybe the system tried to do something silly.

      3. Paul Hovnanian Silver badge

        Re: 60k wan't a guess

        The problem seems to have something to do with the code monkeys interpreting requirements. I hopped over to a few aviation boards and asked what went wrong. The answer I got involved an IFR procedure OTP(On The Top) for maintaining altitude visually in the presence of clouds, mountains and other conditions limiting visibility while following an IFR flight plan.

        I then took a survey of several other aviation sites to educate myself as to the meaning of this OTP procedure. Just lurking and reading past posts (predating this incident), it appears that confusion abounds. Controllers understand one thing, pilots interpret it several different ways. So now I'm thinking as a coder: "What the **** do they want my system to do in this case?" And I suspect that someone got some bad information and got it wrong.

        It happens. System designers don't always get the use cases defined correctly or neglect to consider conditions when someone says, "Oh, that will never happen." And invariably it does.

        1. Tom 13

          Re: "Oh, that will never happen." And invariably it does.

          ALWAYS test for - divide by 0.

          ALWAYS test for - string data contains database delimiter as a character.

          Simplest of mistakes, but both have bitten programmers on projects on which I worked.

    2. smartypants

      Re: 60k wan't a guess

      ...though any person refusing entry of an aircraft to descend below FL600 must consider the possibility that their refusal might be ignored!

    3. Jos

      Re: 60k wan't a guess

      Well, not entirely true that. In the US, over FL600 (roughly 60.000ft), it's Class-E airspace, which is controlled airspace.

      However, when flying VFR in Class-E, no ATC clearance is required and no radio communication either.

      I suppose when you are flying a U-2, this would kind of be helpful.

  18. string
    Joke

    640 feet

    should be enough for anybody

  19. John H Woods
    Joke

    Was it ...

    ... 65536 ft?

  20. Scroticus Canis
    Happy

    Just as well they retired the remaining shuttles then. Oh wait...

    What happens when they try and enter the plan for de-orbiting that nifty little 'secret' spy-shuttle thing? Good day to go by rail probably!

  21. Florida1920
    Headmaster

    Software quality assurance is a Good Thing. If someone had tried an "impossible" or "unlikely" scenario like a U2 transiting LA airspace under VFR at 60,000 feet when the s/w was developed, this problem could have been dealt with without endangering hundreds of lives. When testing s/w, try stuff "no user will ever do," because you can bet your butt someone eventually will.

    1. asdf

      hmm

      Thank you for the lesson in QA 101. The only thing you forgot is how management thinks of QA as nothing but a cost so seldom allows proper time and resources for testing. Many smaller places don't even have QA people and rarely allow developers time to properly test.

      1. tfewster
        Facepalm

        Re: hmm

        Maybe for software developed in-house. But when an external supplier hands something mission critical over, you test it before paying up. Plus ATC and Lockheed Martin aren't exactly "smaller places". So nothing could possibly go wrong....

        Oh, wait...

        1. asdf
          Facepalm

          Re: hmm

          > you test it before paying up

          Been lots of cases over the years where you say they had to have tested that very thoroughly first only to smack your head as you have done above. Due diligence epic fail is not all that hard to find in whatever context/field.

    2. DropBear
      Trollface

      Yeah, but what if they did actually test a similar scenario - except the only other aircraft in the test were maybe nine or ten other test objects, which the system easily "routed away" without ever hitting its memory limit...?

      1. tfewster

        > ..what if they did actually test a similar scenario..

        True, you can load test and test all the edge cases you can think of - but did you test the combination of a U2 plus 3 other aircraft emergencies plus a hot air balloon convention while the system was under load? Probably not, you have to set a limit on the actual tests, but knowing how the system performs when it hits a difficult task can help gauge its limits. Even the old fashioned meatware controllers knew their limits and, ISTR, could refuse to allow any more aircraft into their space.

        Oh, and I estimate million-to-one occurrences would probably happen about once a month at any given airport.

  22. Block
    Coat

    What?

    A system called ERAM ran out of memory, bwa ha ha.

  23. jcrb
    Black Helicopters

    Something wrong with this story...

    it's not like they haven't dealt with planes at that height before.

    "In another famous SR-71 story, Los Angeles Center reported receiving a request for clearance to FL 600 (60,000ft). The incredulous controller, with some disdain in his voice, asked,

    'How do you plan to get up to 60,000 feet?"

    The pilot (obviously a sled driver), responded,

    'We don't plan to go up to it; we plan to go down to it."

    He was cleared.

    http://forums.jetcareers.com/threads/what-flies-at-fl600.69008/page-3

  24. heyrick Silver badge
    FAIL

    Who writes this rubbish?

    Given that the machine is an automated method of managing tin cans packed with squishy humans hurtling through the sky - surely any anomaly should be kicked to a human operator (they claim the system was back up and running in 46 minutes so there were people around to respond...). This must be better than gotta-deal-with-it-oh-shit-gotta-deal-with-it-oh-shit-gotta-deal-with-it-oh-shit-gotta-deal-with-it-oh-shit-gotta-deal-with-it-oh-shit-gotta-deal-with-it-oh-shit[repeat until dead].

    After all - which is the WORSE option? To temporarily pretend one anomaly aircraft isn't there while signalling a human, or to get into a state where effectively "no planes exist any more".

    1. Anonymous Coward
      Anonymous Coward

      Re: Who writes this rubbish?

      " surely any anomaly should be kicked to a human operator "

      In theory, yes, in practice AF447.

      More graceful error handling would be a better bet, with the computer reverting to handling anomalous situations on some empirical rules, and flagging to the duty meatsack. Considering AF447, a frozen pitot isn't exactly an unforseeable scenario, so unclear speed readings were always a potential issue. Keeping thrust and attitude stable and autopilot engaged would probably have saved AF447, instead of the rules that required the autopilot simply taking its ball and go home if it detected movement of the goalposts.

      Which means the software still needs proper QA, proper process analysis, and proper testing, so that the empirical rules are an acceptable risk whilst the coffee drinker gets his thinking hat on.

  25. ecofeco Silver badge
    Facepalm

    Ran out of memory

    derp derp derp derp derp

    Oh I feel SO much safer now.

    /sarcasm

  26. swschrad

    an endless spiral of bad choices here

    and the worst one was programmed in, "keep searching, dammit."

    at some point, to stay real-time and operational, the ATC system should have just flagged the U2 as a bogey and red-boxed it on radar. controllers could either contact it for intentions, or notify Air Defense Command.

    which brings up the question, why fly a U2 through LAX controlled airspace anyway? aren't there enough TV station helicopters chasing white Broncos down the highway, they have to put a U2 up as well? all that blank Nevada test range they could turn and burn in, and they decide to fly over LA.

    1. Richard 12 Silver badge

      Re: an endless spiral of bad choices here

      It's not controlled airspace.

      Controlled airspace has a top, once you get up that high you're on your own.

  27. Version 1.0 Silver badge
    Headmaster

    LA air traffic meltdown

    LA is the postal service abbreviation for the state of Louisiana - L.A. is the city of Los Angeles. They are about 1500 miles apart - please try and remember this.

    1. Anonymous IV

      Re: LA air traffic meltdown

      Very true - the US is a full-stop-obsessed (oops, period-obsessed) country. They even continue to put a full-stop/period after Mr. and Dr.. They are no fans of Open Punctuation - nor of the Metric System, either.

  28. David Kelly 2

    Healthcare.gov

    Hey Lockheed Martin! Healthcare.gov is hiring your kind of expertise!

  29. Anonymous Coward
    Anonymous Coward

    "ERAM began spitting out error messages and then entered an endless reboot loop, which is a non-optimal state for a piece of critical equipment.

    "We were completely shut down and 46 minutes later we were back up and running," Pair said."

    What did they do, finally press F8 to boot into safe mode? 46 minutes is probably about three boot cycles for Windows.

  30. JaitcH
    WTF?

    Anytime, anywhere, on time, and right the first time: Lockheed motto

    Being a US Government contract to one of their favoured contractors, likely there were few penalties in the contract as is often the case with such work.

    But now they can bid on a contract to upgrade the system, a contract likely making very, very, few companies eligible for the work.

    I guess the old Lockheed motto: "Anytime, anywhere, on time, and right the first time" doesn't apply any more. Pity.

  31. lambda_beta

    Just goes to show how F**KED UP computer software is!

  32. hypernovasoftware

    Poor coding and inadequate testing of end cases.

  33. Arachnoid
    Mushroom

    I suspect alien intervention

    Watch out for Vogons any time soon

  34. Loyal Commenter Silver badge

    I guess some genius forgot to include bounds checking in their unit tests. They did do unit tests, right?

  35. Ron1

    All made by Lockheed

    A computer, made by Lockheed Martin went berserk when overflown by a plane, also made by Lockheed Martin (hey, U-2 is one of the iconic Kelly Johnson's planes).

    Nobody noticed this "coincidence"?

This topic is closed for new posts.