too soon?
"You're only supposed to blow the bloody doors off"
Timing is everything – except when it isn't. US carrier Alaska Airlines has grounded its fleet once again due to a mystery IT issue. The problem began at 3:30 pm Pacific Time on October 23 with a "failure" at the company's primary datacenter. Alaska Airlines insisted it wasn't a cybersecurity event or related to "any other …
It is a cascading failure, and it is more about crews than aircraft.
Modern aircraft fly day after day with minimal daily maintenance. Sure, after a certain number of cycles they go in for programmed maintenance, but that is planned well in advance.
Crews, however, are a perishable commodity. They are strictly regulated with the number of hours on duty before a mandated rest period. If a crew experiences a significant delay, they will 'expire' before the flight can depart because there is not enough time left to ensure they reach their destination before their mandated rest period.
Other regulations, for example, require crews to have printed hard-copy weather reports on-board before departure. There are other in-house documents that flight crews need before their birds can fly. For example, the calculated weight of the aircraft would be needed performance (ie: fuel) calculations. Thus something like an outage of the reservation system could cause the flight crew to not know their number of passengers, thus their weight, and be unable to fly.
For the cascade...
Let's say a flight crew is halfway through they day. They flew SEA (Alaska Air hub) to AUS (Austin, Texas, not an Alaska Air hub) and are now returning. Same bird, same crew, a comfortable margin of two flight hours left in their day when they get back to SEA. The weather computer that prints the hard copy goes down. It takes three hours to fix. That crew has just expired and now needs to go to mandatory rest. No spare Alaska Air crews are in place to take over (not a hub airport). Flight canceled. Same bird, same flight crew will likely fly the next day.
No idea what caused Alaska Air's outage, but it easy to see how a data center problem & legal flight requirements can conspire to keep birds out of the sky.
"There are other in-house documents that flight crews need before their birds can fly. For example, the calculated weight of the aircraft would be needed performance (ie: fuel) calculations. "
It's not as efficient, but weights can be assumed to be near maximum so fueling can be calculated. Since fuel prices can vary and there's no need to take on a full load every time (and it's problematic for many reasons) airlines will add fuel according to price. A daily hard copy of the aircraft's itinerary with passenger/freight numbers is not that hard. Weather reports aren't an issue as there's many ways to get them and pilots should be able to do flight planning without their own in-house system. Even many small municipal airports have a pilot's lounge with wi-fi and sometimes computers for flight planning. I would expect a bigger passenger airport to have similar facilities everywhere. Weather, NOTAMS and TFR's need to be checked immediately before every flight as close to departure as possible. Getting airport information is also important and while it's checked again before push-back, a lot of stuff such as taxiway/runway closures won't change that often.
Mainly it's a problem with internal airline information and whether they can get people checked and onto the aircraft. Flying isn't terribly difficult to do a work around if there's a problem. There should be a local data storage system at each airport with the day's information to give at least a day buffer. It might mean that people can't book online, print their boarding pass and whizz through the airport on the same day, but would need to go to the airport and take their chances on getting a flight in-person. It would be another reason to not rely on "cloud" tickets/passes and just print out your travel documents as a backup.
Unfortunately many airlines have decided that their IT is robust enough that "going manual" is no longer an option.
This leads to painful experiences where I experienced a certain large British airline having to board 300 plus people onto a New Orleans to LHR flight checking each ticket manually by using a single person with a mobile phone calling back to Operations in London.
Which was fun.
"Maybe an in house data centre, one would hope."
It's looking like they need to install a secondary center that can be used as a fallover. The margins for airlines are not that great so a day or two of sitting still might be all of the profit for the quarter. They will have to get people holding tickets to their destinations and that might cost a wee bit more money to make happen. There's also the reset, moving/housing crews and purchase contracts for catering and other services that will have to be evened up. How much was a second data center again?
""Yeah, redundant/failover systems are nice, but they cost money. I can say, 'No,' get a promotion for 'saving money,' and be out of the hot-seat before things break. While I am gambling doing this, the odds look great to me!""
It's then a game of musical chairs with those that are operating the company at the time of a melt down get the blame. Sadly, you are correct about how companies get manipulated so certain people will get a bonus for doing things that are detrimental for the long term health of the company.
I change out the tyres on my car when they start looking a bit worn/old rather than trying to run them all the way down. I'd save money by using up more of the tyres, but the risk of sitting on the side of the road with a shredded tyre for hours and damage to the sheet metal is more than I want to take on.