back to article Alaska Airlines grounded by mystery IT meltdown

Timing is everything – except when it isn't. US carrier Alaska Airlines has grounded its fleet once again due to a mystery IT issue. The problem began at 3:30 pm Pacific Time on October 23 with a "failure" at the company's primary datacenter. Alaska Airlines insisted it wasn't a cybersecurity event or related to "any other …

  1. Anonymous Coward
    Anonymous Coward

    too soon?

    "You're only supposed to blow the bloody doors off"

    1. Evil Auditor Silver badge

      Re: too soon?

      Never too soon.

      It's Friday. So my first thought was: «don't you ever piss off the BOfH!»

    2. The Oncoming Scorn Silver badge
      Pirate

      Re: too soon?

      Boeing.

      "Bloody Doors are NOT supposed to blow out!"

      1. Evil Auditor Silver badge
        1. Martin-73

          Re: too soon?

          I knew what that was before clicking. There's nothing out there. Except sea, fish, birds, 20,000 tons of crude oil, and a fire. And the part of the ship the front fell off

  2. Ochib

    mystery IT meltdown

    It's never DNS unless it's DNS

    1. Jimjam3

      Re: mystery IT meltdown

      It’s not oracle is it?

  3. DarkwavePunk Silver badge

    Jamaica?

    Alaska.

    1. Paul Herber Silver badge

      Re: Jamaica?

      Used to be part of Russia. He wants it back.

    2. Ken Shabby Silver badge
      Alert

      Re: Jamaica?

      No, she went of her own accord.

  4. Vincent Manis

    Contrary to their name, Alaska Airlines (which I fly on quite regularly) is based in Seattle, Washington.

    1. Yet Another Anonymous coward Silver badge

      But Alaska is just off the coast of Seattle, in a little box of its own

      1. Vincent Manis

        British Columbia, where I live, is in the way.

  5. Yet Another Anonymous coward Silver badge

    Fallback

    Given the number of these fsckups and the cost of having all the planes out of position, wouldn't it be worth having the day's movements and crew rostas printed out at the start of each day?

    At least it would give them a day to fix the computer

    1. Doctor Syntax Silver badge

      Re: Fallback

      They don't even have to go completely old-school. They can use the computer to print it out. When it's working.

      1. Yet Another Anonymous coward Silver badge

        Re: Fallback

        And this wasn't another failure of ATC or weather or NOTAMs or any of the safety critical stuff. It was just the airline not knowing where this plane was going next - that shouldn't be beyond the wit of man, or a well trained monkey and a blackboard

        1. MachDiamond Silver badge

          Re: Fallback

          "or a well trained monkey and a blackboard"

          Ook?

          (translation: Did somebody just use the M word?)

    2. Marty McFly Silver badge
      Alert

      Re: Fallback

      It is a cascading failure, and it is more about crews than aircraft.

      Modern aircraft fly day after day with minimal daily maintenance. Sure, after a certain number of cycles they go in for programmed maintenance, but that is planned well in advance.

      Crews, however, are a perishable commodity. They are strictly regulated with the number of hours on duty before a mandated rest period. If a crew experiences a significant delay, they will 'expire' before the flight can depart because there is not enough time left to ensure they reach their destination before their mandated rest period.

      Other regulations, for example, require crews to have printed hard-copy weather reports on-board before departure. There are other in-house documents that flight crews need before their birds can fly. For example, the calculated weight of the aircraft would be needed performance (ie: fuel) calculations. Thus something like an outage of the reservation system could cause the flight crew to not know their number of passengers, thus their weight, and be unable to fly.

      For the cascade...

      Let's say a flight crew is halfway through they day. They flew SEA (Alaska Air hub) to AUS (Austin, Texas, not an Alaska Air hub) and are now returning. Same bird, same crew, a comfortable margin of two flight hours left in their day when they get back to SEA. The weather computer that prints the hard copy goes down. It takes three hours to fix. That crew has just expired and now needs to go to mandatory rest. No spare Alaska Air crews are in place to take over (not a hub airport). Flight canceled. Same bird, same flight crew will likely fly the next day.

      No idea what caused Alaska Air's outage, but it easy to see how a data center problem & legal flight requirements can conspire to keep birds out of the sky.

      1. Yet Another Anonymous coward Silver badge

        Re: Fallback

        But all those can be done manually locally at the plane/airport level. But shutting down the airline for 3days in a cascading failure because your loyalty card system was down and it's so tied into everything else.

      2. MachDiamond Silver badge

        Re: Fallback

        "There are other in-house documents that flight crews need before their birds can fly. For example, the calculated weight of the aircraft would be needed performance (ie: fuel) calculations. "

        It's not as efficient, but weights can be assumed to be near maximum so fueling can be calculated. Since fuel prices can vary and there's no need to take on a full load every time (and it's problematic for many reasons) airlines will add fuel according to price. A daily hard copy of the aircraft's itinerary with passenger/freight numbers is not that hard. Weather reports aren't an issue as there's many ways to get them and pilots should be able to do flight planning without their own in-house system. Even many small municipal airports have a pilot's lounge with wi-fi and sometimes computers for flight planning. I would expect a bigger passenger airport to have similar facilities everywhere. Weather, NOTAMS and TFR's need to be checked immediately before every flight as close to departure as possible. Getting airport information is also important and while it's checked again before push-back, a lot of stuff such as taxiway/runway closures won't change that often.

        Mainly it's a problem with internal airline information and whether they can get people checked and onto the aircraft. Flying isn't terribly difficult to do a work around if there's a problem. There should be a local data storage system at each airport with the day's information to give at least a day buffer. It might mean that people can't book online, print their boarding pass and whizz through the airport on the same day, but would need to go to the airport and take their chances on getting a flight in-person. It would be another reason to not rely on "cloud" tickets/passes and just print out your travel documents as a backup.

    3. Chrissy

      Re: Fallback

      Unfortunately many airlines have decided that their IT is robust enough that "going manual" is no longer an option.

      This leads to painful experiences where I experienced a certain large British airline having to board 300 plus people onto a New Orleans to LHR flight checking each ticket manually by using a single person with a mobile phone calling back to Operations in London.

      Which was fun.

  6. FirstTangoInParis Silver badge

    Cloud?

    Maybe an in house data centre, one would hope. Unless some fsckwit subbed it out to $cloudVendor with a single point of failure in its authentication system…….

    1. MachDiamond Silver badge

      Re: Cloud?

      "Maybe an in house data centre, one would hope."

      It's looking like they need to install a secondary center that can be used as a fallover. The margins for airlines are not that great so a day or two of sitting still might be all of the profit for the quarter. They will have to get people holding tickets to their destinations and that might cost a wee bit more money to make happen. There's also the reset, moving/housing crews and purchase contracts for catering and other services that will have to be evened up. How much was a second data center again?

      1. An_Old_Dog Silver badge

        Systemic Failure Inherent in Human Organisations

        "Yeah, redundant/failover systems are nice, but they cost money. I can say, 'No,' get a promotion for 'saving money,' and be out of the hot-seat before things break. While I am gambling doing this, the odds look great to me!"

        1. Judge Mental

          Re: Systemic Failure Inherent in Human Organisations

          Which is why passenger planes have more than one engine.

        2. MachDiamond Silver badge

          Re: Systemic Failure Inherent in Human Organisations

          ""Yeah, redundant/failover systems are nice, but they cost money. I can say, 'No,' get a promotion for 'saving money,' and be out of the hot-seat before things break. While I am gambling doing this, the odds look great to me!""

          It's then a game of musical chairs with those that are operating the company at the time of a melt down get the blame. Sadly, you are correct about how companies get manipulated so certain people will get a bonus for doing things that are detrimental for the long term health of the company.

          I change out the tyres on my car when they start looking a bit worn/old rather than trying to run them all the way down. I'd save money by using up more of the tyres, but the risk of sitting on the side of the road with a shredded tyre for hours and damage to the sheet metal is more than I want to take on.

      2. nintendoeats

        Re: Cloud?

        Altogether now...

        "If you think safety is expensive, try having an accident."

      3. Anonymous Coward
        Anonymous Coward

        Re: Cloud?

        Failover.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon