"in short there's plenty of blame to go around"
Another way of looking at it is that there's always someone ont which to whuffle one's own blame. So nobody takes responsibility.
The cause of a power outage that cut the juice to London Heathrow airport in March has been identified - along with a chain of failures that allowed it to happen. You may recall that in March a fire broke out at an electrical substation serving London Heathrow and surrounding communities, disrupting operations at one of the …
The ATC at LHR has robust kit and processes to handle total power loss.; UPS for all the kit, mostly on a per-rack basis to keep things going before their generators kick in. They have processes to maintain it and test it, whcih they do periodically. But it's only specced to keep ATC running long enough for them to land anything on approach and hand off the rest of their inboounds to other airports - nominally about an hour.
As for the rest of the airport..... who knows. Based on my experience working with other aiports they will have more people managing advertising hoardings and leasing retail space than managing resilience. Having said that, if almost anything else that's vaguely operational goes down - baggage handling, security screening or any of their IT, networks or databases - then they are buggered based just on capacity. I was working at a much smaller airport once when they simulated the check-in system going down and the staff had to do it all manually for a few flights. I've never seen so much effort put into helping people count up to 200. Shagwanking disaster doesn't come close.
It's a sorry tale all along the line.
NG didn't do anything about the initial transformer problem (moisture detection), extended the maintenance intervals on the transformers, then "deprioritised" scheduled maintenance to cover problems elsewhere... the faulty transformer wouldn't have been seen to until Novermber 2025! I suppose they'll now be looking for other "accidents waiting to happen".
But it's not just NG or Heathrow, the bottom line is a culture of penny pinching and then hand wringing when it goes wrong, and it's prevalent across public and private sector.
Quote
"But it's not just NG or Heathrow, the bottom line is a culture of penny pinching and then hand wringing when it goes wrong, and it's prevalent across public and private sector."
Aye, just look at the Hatfield train crash when a rail broke under the load of a high speed train and caused several deaths.
Why did the rail break? because the surface cracks had been allowed to grow beyond the safety point by an engineering maintence team who didnt have any qualifications in rail maintence. admittedly , that rail was due for replacement, but then the railway inspectors found just how many other lengths of rail were in bad shape and slapped a 50mph speed limit of most of the network as a result.
Not so. There were 732 deaths from 146 accidents in 46 years when there was national rail provided by the state between 17 April 1948 and 15 October 1994. There were 92 deaths from 50 accidents in 28 years when there was privatised rail between 31 January 1995 and 24 August 2021.
The nationalized period includes about 26 years before the Health and Safety at Work Act 1974. That Act may have lead to an awful lot of "Health and Safety gone mad" stories - but it also led to a dramatic drop in the number of deaths and injuries from accidents. I'd like to see a graph of "death rate vs time", and I would not be surprised to see an uptick after 31 January 1995. The trouble of course is that railway deaths tend to be very "lumpy", so it's hard to see a signal in amongst the random noise.
I'd like to see a graph of "death rate vs time", and I would not be surprised to see an uptick after 31 January 1995.
Not hard to find, and you'd be wrong. One spike in 1999, the Ladbroke Grove accident when a driver passed a signal at danger, and another for Great Heck in 2001 when a driver fell asleep and rolled his Land Rover onto the track, but generally low and falling.
The point was about Hatfield which happened when UK rail infrastructure was run by Railtrack, a private company.
Your rail statistics don't differentiate between infrastructure and private train operators.
If privatisation of infrastructure is such a great idea why aren't our roads all run by private companies?
Indeed but there you still have the choice to go the slow state funded route for free or take the shiny fast private toll road. And that's what the UK gets so wrong with privatisation - the lack of competition. We can't chose which substation to get our power from, or which water company to use, or which train company to take between London and Bristol or even which hub airport to use. So we get state-mandated monopolies who can just focus on extracting profit for shareholders without needing to improve their service to keep our custom.
Southwestern is a fair point (more so before they cancelled the direct service) but a coach operator isn't an equivalent competitor any more than the only supermarket serving a large town is kept on its toes by a weekly farmers market. Fair competition on the railway is LNER vs Lumo to Edinburgh, or Avanti vs Chiltern vs West Midlands to Birmingham.
It always amuses me that a country like France, which vaunts its socialist model of égalité, fraternité, etc. still constructs an elite motorway network so that those with money can pay for fast and smooth travel, yet supposedly capitalist countries like the UK and USA mostly provide an open high-speed road network available to all for no additional charge.
Ofgem played its role controlling spending to keep customer bills down.
In the current price control period, it cut "non-load" capex asked for by NG by a third from £2,650.9 million to £1,765.8 million. "Non-load" covers maintenance of all its assets. It did however increase "load" capex to more than NG asked for. Non load is about increasing the grid power so covers adding new grid connections to renewables.
It's worth noting that LHR has three separate power supplies but these are not linked to a common bus-main, so the failure of any one supply will cause some equipment to stop causing an inevitable (planned) shutdown of the airport. This risk was known and deemed high impact, low probability. Well, we all know about "low probability"; if it didn't happen last week, you're on borrowed time.
It's a 275 kV site with three power transformers. Credit to all the services involved: no-one was injured.
LHR dont DR test their power supply and they dont have a resilient infrastructure
Sounds like LHR don't DR test their power supply because they don't have resilient infrastructure.
If your DR process is "we can reconfigure our power intake if we shut down critical systems for 12 hours" then you wouldn't test it either!
You might fix that rather glaring issue though...
I see the BBC are reporting:
'Heathrow told the BBC National Grid "could and should" have prevented the fire and that it expected it to "take accountability for those failings".
"Those failings that resulted in significant damage and loss for Heathrow and our airlines," a spokesperson added.'
While I agree that National Grid could have maintained their equipment better, I think it's rather cheeky, and over-optimistic, of LHR to think that they are somehow responsible for LHR's failure to implement a decent changeover or backup system for their power supply themselves. Your electricity supplier isn't under any obligation to maintain a continuous supply that I'm aware of. Critical users need to implement their own backup systems. Heathrow's losses are their own to bear for failing to do that.
If the airlines sue LHR then it'll be pots and kettles. None of the airlines I've flown with in the UK has the sort of resilience that they seem to be demanding of LHR. If a single aircraft goes tech then they just cancel the flight and it can have knock-on effects for passengers for the next 12 hours or more. They could keep a fleet of spare aircraft and crews scattered across airports around the country and they'd have the same sort of resilience they expect airports to have. Suggest this to them and they'd tell you that it would add £10 to the price of a ticket and everyone would therefore stop flying immediately.
If LHR did invest in, say, its own backup power for the whole airport to run for a few days then the airlines would be the first to complain about the LHR wasting more money and then whinge when the price of slots went up.
and had a plan to use them to pick up the slack
In the modern world, that is possible.
But note that for safety reasons, supplies must be isolated, and there must be no easy way to energize a dead section from a second source.
Network supplies are different, as are military installations, so it's not a new and different idea, but it's a major shift in thinking that is normally implemented "from new", not something retro-fitted to an existing system. Because there is such a danger that something could go wrong.
Automatic supply changeover is common and very simple. It's how genset or battery backup works!
Essentially there's a Chunky Relay that powers the DB or individual rack from one supply at a time. Then you configure rules for switchover like "prefer A", "switch over X seconds post failure", "switch back Y seconds after restored" to limit how fast the overall load on either supply changes.
Requires "dual supply" signage, but it's very common.
A lot of kit supports redundant PSUs that bus together at the DC side, so you can have PSU 1 permanently connected to A, while 2 is on B, and either is sufficient. Simpler as it doesn't require an electrician to set it up, but the load changes are instant so may not scale.
Because it is! Big numbers are just small ones, repeated.
60MW is less than a thousand 100A 400VAC three phase supplies. It's not difficult, there's just a lot of them.
The largest loads are often simply split without any changeover at all, because losing ½ or ⅓ for a few days is acceptable - half the HVAC on A, half on B etc.
Other loads get changeovers or use inbuilt redundancy, depending. You don't want to put "too much" on a single contactor because a contactor can fail too, so they're rarely "very big", maybe 100A-200A?
Others have no backup - if the advert screens don't work, planes still fly safely.
Been there, done that, long since lost the hi-vis.
In reality, choosing what doesn't need a backup is the most difficult task.
It costs money and time, and of course a maintenance team - on a site that size something is always broken.
> ....failure to implement a decent changeover or backup system for their power supply themselves.
I'm near a Tourist Trap Island. Electric power comes from the mainland over a very long cable. As summer population rose and A/C got common, power got brown much of the time. Fatter cable is a major expense and chore. So for over 20 years the island got-along with three "jet engines" at the center of the island. Turbo-generators. They mostly sat but I did hear them run a couple times. Not the cheapest power but they were topping, not base generation. Here they probably drank #2 heating fuel oil. But an airport will have millions of liters of jet fuel, same thing different additives.
Backup generators often don't start. Or they run for an hour and run dry. An airport has turbojet mechanics experienced at getting balky airplanes going. And unlike my old job with a few small tanks (maybe stale), the airport has huge supply of fresh fuel. The accountants will freak out b/c you not supposed to make field electricity with fuel budgeted to sell to airlines, but that can be sorted later. Turbogenerators run at least to 150 MW, which would power a hundred towns like mine.
Yes from my childhood it was an average of two days a week sitting in front of a non working TV with candles on top due to power cuts, and three days sitting in front of an empty TV table as it was away having the valves repaired. Even when it worked it was only black and white unlike everyone else's.
The CEO was fast asleep with his phone on silent and they couldn't raise him. Why someone wasn't dispatched to wake him up, who knows...
Which meant it fell to the Operations Director to handle the incident, and he made the call to shut the airport for the entire day, the best part of 24 hours.
Which doesn't appear to have been remotely necessary as several areas - including Terminal 5 - had power sufficient to handle passengers safely before the morning was out.
So aside from not testing their power resilience, their chain of command fell through, and the person left in charge by default made a poor decision that actually caused much of the financial harm.
Now they're pointing the finger of blame at others to try to deflect from their own failings.
There's no way the CEO would be better informed than the Operations Director.
What is obvious is that they hadn't actually written any procedures for loss of supply - at least, not ones that matched the reality of how their critical systems are powered.
This should have been a "Open Big Red Folder to page 95, follow the flowchart".
And the very last item on the list would be "Inform CEO that we've done this"
Yet it was not.
This post has been deleted by its author
The 'Big Red Folder' will no doubt contain various procedures for dealing with a wide range of emergencies, and if arranged sensibly, would be ordered according to the likelihood of them happening and/or the severity of the consequences. Total failure of the electrical supply would be fairly well down the list I would have thought, and so the emergency procedures related to a total power blackout would probably not be at the top of the list, and therefore not on page 1.
Whilst you could have a separate different coloured folder for different types of emergency, keeping all the procedures within one folder may well make it easier and faster to find the appropriate set of instructions.
There's no way the CEO would be better informed than the Operations Director.
Yeah, the only function of the CEO appears to be to carry the can publicly when something goes catastrophically wrong. At all other times, regardless of whether the company performs good, bad or indifferent, they extract annual remuneration packages that most people wouldn't see in multiple lifetimes and can keep their phones on silent.
CEO's should be strategic not operational.
Whatever that means.
Most corporate CEOs jump around between sectors every few years. Bit of banking, then a supermarket, then an airline. They will barely have grasped the industry they're currently working in before swanning off somewhere else collecting the requisite golden goodbyes and handshakes along the way.
It's all a scam, but thus far the standard of living of the plebs has not quite fallen to the point where they start questioning why the CEO is earning 150x the median wage in their company.
I'd put money on it that the CEO - had he been awake - wouldn't have decided to close the entire airport for 20 hours when the fire had only been burning for an hour or so.
I also can't imagine that that level of detail of decision is codified in the big red "oh shit" book.
Aircraft that were inbound on long haul, and still 6+ hours away from needing to make a decision about landing ended up turning back to their origin. For BA and Virgin in particular, they've now got lots of passengers and tens of planes on the wrong side of the world, and a massive logistical headache that will take days to unwind.
Heathrow's Ops Director likely doesn't give as much as a damn about the knock on effects as the CEO would. The Ops Director will have been in tactical mode; the CEO would likely have taken a bigger picture strategic approach to decision making.
Terminal 5 could have been accepting flights from mid-morning onwards, so the long hauls still airborne could have continued and arrived more or less on time, those that had diverted to other UK airports could have shuttled down to Heathrow, and a large percentage of the knock-on disruption of having people and planes in the wrong places would have been avoided.
Choosing to close the airport for the entire day was demonstrably the wrong decision, and wholly unnecessary. The only thing that will have saved the Ops Director's bacon is the fact his boss screwed up by being uncontactable and sleeping blissfully ignorant until 6am.
And I still imagine boss's first words to the Ops Director will have been along the lines of "you've fucking done what!?"
“ I'd put money on it that the CEO - had he been awake - wouldn't have decided to close the entire airport for 20 hours when the fire had only been burning for an hour or so.”
You may well be right, but I’d also put money on it that had a decision not to shut down went horribly wrong resulting in wholesale death and/or injury said CEO wouldn’t be in a hurry to step forward and accept responsibility for it. Well, maybe in a mealy mouthed, platitudinous, statement to the press kind of way, but not in an actual personally accepting consequences way…
"Terminal 5 could have been accepting flights from mid-morning onwards"
Comes back to knowing your level of resilience and having practiced it to prove that what you think you know is reflected by reality. If there was any doubt at all over whether the Pret in T5 could still rustle up a flat white then they'd still call the whole thing off "for the safety of our staff and co-workers".
>And the very last item on the list would be "Inform CEO that we've done this"
From PTerry's description of an unlikely incident at a nuclear power station:
"Twenty-seven people were got out of bed in quick succession and they got another fifty-three out of bed, because if there is one thing a man wants to know when he's woken up in a panic at 4:00 A.M., it's that he's not alone."
"The CEO was fast asleep with his phone on silent and they couldn't raise him. Why someone wasn't dispatched to wake him up, who knows... Which meant it fell to the Operations Director to handle the incident"
As it should.
In the old days when I worked in IT operations, any major incident would inevitably lead to multiple C-level nitwits huddling around the one poor sysadmin tasked with getting things up and running. The first thing our new Operations Director did was to throw them all out. I think he would have liked to revoke their door passes for the IT department as well.
After that, the number of "high business impact" incidents has plummeted.
Should the CEO be directly involved with people fixing underlying faults - hell no. And a good management structure that acts as an umbrella to shield the workers from senior management is invaluable and always appreciated in a crisis.
Should the CEO be directly involved in making a judgement-call decision, i.e. "shut the airport until 11pm" - that will cost the business - and it's customers - tens of millions in disruption - abso-fucking-lutely.
You're conflating different types and scales of decision-making here. They should have sent someone to get him out of bed to make the decision, and held off making knee-jerk decisions for the hour or so it would have taken. Even if you've got nobody local to him, you send a taxi local to him to go and hammer on his door.
Hospitals, homes, airports and schools go dark but the data centres keep on trucking. Particularly important now that simple search queries like 'blonde porn' have now been replaced by complex AI queries such as 'where do I find porn, blondes, that my VPN can access, dodging government ID checks', using so much more energy. That's progress for you!
Like others I find it ironic that Heathrow deems it appropriate to criticise National Grid for their failure to fix the problem long known about, whilst ignoring the fact that they (Heathrow) failed to manage the sudden drop-out of a vital power connection properly.
Yes, National Grid should've fixed the problem when it was first identified. And no, the problem should've been fixed regardless of whether National Grid knew or not that the substation in question was vital to Heathrow or not.
And yes, Heathrow deserves as much criticism as National Grid does over not properly implementing a recovery plan (other than "shut the airport down for 12+ hours") when a power supply fails. From what NESO mentioned in an interim update, Heathrow has supply from THREE different 'substations' in the area, so the one failing should not have brought down the airport unless, well, the airport failed to set up (and test) something that would seamlessly transfer getting juice from one station to the other two when something bad happens.
And then the airport has the cheek to want to sue? I'd sue back.
From reading the report it seems that NG really eff'ed up on this one. Numerous failures to address serious issues within that site.
What cracked me up was NG or SSEN had given North Hyde a 12.75/100 rating for the site innrecent times. Where zero is a brand new installation and 100 is a site which should be shut down immeadiately....
I remember noticing a substation close to my workplace was growing a mini forest inside of it. I duly called the phone number shown on the door of the enclosure and stated the ID number of the substation in question. I was then asked to provide several pieces of personal information (why?) as well as the address of the substation. The latter was difficult to supply since it was located on a patch of grass, accessible by a public footpath, between two industrial units that themselves were on two separate roads. Why on Earth the bod on the end of the line wasn't able to locate the substation from the ID number is anyone's guess. Suffice to say, several months later nothing had changed aside from the foliage reaching new heights.
I believe this substation was administered by UK Power Networks which, according to google, is 100% owned by several companies based in Hong Kong. Perhaps the lackadaisical approach to maintenance is all part of the plan.
Maintenance costs money and affects the bottom line.
Let's just sweat the assets longer and make it someone else's problem in the future.
Stupid UK Senior Management at it's finest.
Although it doesn't help that incentivization is almost always based around financial results rather than things like reliability
I think you'll find the same attitude to system/subsystem maintenance in the UK applies to many other countries, including the USA.
It also applied to many ex-Soviet countries too. Several years back when I was a competitive free flight model flier, I used to know a lot of the top Soviet guys and one of their frequent sayings was "The State pretends to pay us and we pretend to work".
We also used to regularly visit Hungary, Czechoslovakia, Romania etc. to fly in their International competitions, where it was very noticeable that private dachas, etc were in *much* better condition than anything the State owned.
So in 2018 National Grid found moisture in the oil and did nothing. Ok, so the maintenance report got misfiled, it happens. But why didn't the same checks find the same problem in six months, and every six months after that? This suggests either maintenance checks weren't being performed, weren't being performed properly or findings were routinely being ignored. And since this clearly wasn't a one-off the same must be true for other substations covered by the same maintenance regime. So .. how many more are about to blow?
Anon as a friend of mine (now retired) used to work (in the labs) for a company that manufactured high performance oils for various industries uses (ranging from lubrication through to cooling) and also had oil analysis / test services they offered / upsold to their customers..
He told me about the truly impressive amount of tests the lab can do on (correctly taken) oil samples.
He also mentioned that when a test produced warning level result of some sort (be that moisture, oil breakdown by-products past acceptable limits, contaminants beyond acceptable levels (e.g. especially in friction lubricant roles where there will inevitably be some frictional damage but typically specify quite low levels of something that could be friction by-product as something that should be nipped in the bud, though moisture in oil can also make oil more corrosive so that can sometimes be a cause as opposed to frictional damage)
The depressing thing he mentioned was that often customers would have a series of failing tests (results getting progressively worse) before they were OK again - suggesting a problem was present for a while before being fixed / oil replaced - essentially gambling on putting things off for a while & hoping for the best.
* in cases where it was moisture related, this was customers that did not have any effective workarounds in place to treat the oil themselves (e.g. filters, centrifuges, heat treatment). Some customers did have systems in place for dealing with moisture levels - .
Isn't this down to what the 'requirements' are and I suspect that a decision had been made by someone at LHR, while contingency planning, and they may have worked out that the £ impact of supposedly very unlikely event of total power loss was not worth the £ cost of standby generators and their maintenance. The essential systems of ATC had standby power arrangements. Losing those could lead to loss of life but a few delayed bags and checkins is not on the same scale. Although they could smarted up their procedures for taking power from other substations.
That does not excuse the lack of routine maintenance at the substation.
Just my 2P's worth. :-)