So by messaging he means some sort of enterprise service bus was taken down ? also a power surge protection is normal so what went wrong with the switchover to redundant power? The hints from this a bit like the Talk Talk hack suggest something very simple and not some unavoidable impossible to understand failure they would like to media steer us towards.
BA CEO blames messaging and networks for grounding
The catastrophic systems failure that grounded British Airways flights for a day appears to have been caused by networking hardware failing to cope with a power surge and messaging systems failing as a result. The Register has asked BA's press office to detail what went wrong, what equipment failed, what disaster recovery …
COMMENTS
-
-
-
Tuesday 30th May 2017 12:16 GMT Anonymous Coward
"I realise that backup systems have been mentioned"
I used to work for a company, large company, that provided DR services. The vast majority of companies treat DR as a compliance checkbox. They buy some DR services so they can say they have DR services... but in the event of a primary data center loss, there really is only the rough outline of a plan. Basically their data, or most of it, is in some alternative site and they may have the rest of their gear there too or not. There is rarely anything resembling a real time switch over from site A to site B in case of a disaster in which their entire stack(s) would come up without any manual intervention at site B. Mainly because architectures are a hodge podge of stuff which has collected over the years. Many companies never rewrite or modernize anything, meaning much of the environment is legacy with legacy HA/DR tools... and there is sparse automation.
-
Tuesday 30th May 2017 13:12 GMT wheelybird
There's a difference between disaster recovery and high-availability (though they do overlap).
It's perfectly reasonable that disaster recovery is a manual fail-over process. Fully resilient systems over two geographically separated locations can be hard and expensive to implement for smaller companies with not much in the way of a budget or resources, and so you have to compromise you expectations for DR.
Even if failing-over can be automated, there might be a high cost in failing-back afterwards, and so you might actually prefer the site to be down for a short while instead of kicking in the DR procedures; it works out cheaper and avoid complications with restoring the primary site from the DR site.
Not every company runs a service that absolutely needs to be up 24/7.
A lot of people designing the DR infrastructure will be limited by the (often poor) choices of technology made by the people that wrote the in-house stuff.
As an example, replicating your MySQL database between two datacentres is more complicated than most people would expect. Do you do normal replication and risk that the slave has lagged behind the master at the point of failure, losing data? Or use synchronous replication like Galera at the cost of a big latency hit to the cluster, slowing it right down?
If it's normal replication, do you risk master-master so that it's easy to fail-back, with the caveat that master-master is generally frowned upon for good reasons?
I think it's disingenuous to berate people for implementing something that can be very difficult to implement.
Though of course, large companies with lots of money and lots to lose by being down (like BA) have no excuses.
-
-
-
Tuesday 30th May 2017 06:26 GMT Anonymous Coward
suggest something very simple and not some unavoidable impossible to understand failure
The weasel isn't going to take any personal responsibility - even though he is THE chief executive officer. But it is his fault, all of it, in that capacity. The total and absolute failure of everything is clearly a series of multiple failures, and he (and BA) are trying to control the message as though that denies the reality of this catastrophe. He should be fired for his poor communication and poor leadership if nothing else. But that's what you get when you put the boss of a tiddly low cost airline into a big, highly complex operation with a totally different value proposition.
Looking around, press comment reckons that it'll be two weeks before all flight operational impacts are worked out (crews, aircraft in the wrong place at the wrong time, passenger failures made as good as they can), and the total cost will be about £100m loss of profit. I wonder if that will affect his bonus?
-
Tuesday 30th May 2017 06:51 GMT Anonymous Coward
But that's what you get when you put the boss of a tiddly low cost airline into a big, highly complex operation with a totally different value proposition.
Whatever you might think about his performance during this unmitigated balls-up, there's much more relevant experience in his biography than just running a "tiddly low cost airline".
-
-
Tuesday 30th May 2017 17:11 GMT Anonymous Coward
Re: "I wonder if that will affect his bonus?"
He didnt get one last year
"Alex Cruz, the Spanish CEO of British Airways, will not receive a bonus for 2016 from the IAG airlines group. The company said in a statement to the National Stock Market Commission that he will be the only one of the 12 senior executives not to receive a bonus. "
-
Wednesday 31st May 2017 06:25 GMT John Smith 19
"he will be the only one of the 12 senior executives not to receive a bonus. ""
Which suggests he has been trying extra hard to get one.
And look what his efforts have produced.....
I think he's going to be on the corporate naughty step again.
IT.
It's trickier than it looks in the commercials.
-
-
Tuesday 30th May 2017 07:19 GMT Anonymous Coward
The weasel isn't going to take any personal responsibility - even though he is THE chief executive officer.
IT and the CIO don't fall under him, IT is provided by [parent company] IAG "Global Business Services" as of last year. But of course, Cruz has fully supported all the rounds of cuts that have been made.
It smells like a store-and-forward messaging system from the dawn of the mainframe age
JMS-based ESB.
Ex BA AC
-
Tuesday 30th May 2017 09:45 GMT Anonymous Coward
But you would think that something as critical as an ESB in BA would mean that they have built it with high availability in an active/active configuration with plenty of spare capacity built in with nodes in different locations and on different power supplies. And of course ensuring that the underlying data network has similar high availability.
Otherwise you have just built in a single point of failure to your whole enterprise and as Murphy's law tells us - if it can go wrong then it will go wrong and usually at the most inopportune moment.
-
Tuesday 30th May 2017 13:04 GMT Anonymous Coward
IT and the CIO don't fall under him, IT is provided by [parent company] IAG "Global Business Services" as of last year.
As a director of BA, he is in fact responsible in law, even if the group have chosen to provide the service differently. I work for a UK based, foreign owned energy company. Our IT is supported by Anonco Business Services, incorporated in the parent company's jurisdiction, and owned by the ultimate parent. If our IT screws up (which it does with some regularity), our customers' have redress against the UK business, and our directors hold the full contractual, legal and regulatory liability, whether the service that screwed up is in-house, outsourced, or delivered via captive service companies.
-
Tuesday 30th May 2017 15:29 GMT Anonymous Coward
Director?
If he is a director of BA! A search of companies house finds a director of a BA company in the name of
Alejandro Cruz De Llano
I'm guessing this him?
A member of staff of a company only has legal responsibility if they are a registered director with companies house. The fact the company calls them a CEO or director does not mean they are a registered director.
-
-
-
Tuesday 30th May 2017 08:06 GMT John Smith 19
"and the total cost will be about £100m loss of profit. I wonder if that will affect his bonus?"
You can bet that any "profit improving" (IE cost cutting) ideas certainly did.
This should as well.
But probably won't, given this is the "New World Order" of large corporate management that takes ownership of any success and avoids any possibility that their decisions could have anything to do with this.
If you wonder who is most modern CEO's role model for corporate behavior it's simple.
Carter Burke in Aliens.
-
Tuesday 30th May 2017 12:42 GMT Anonymous Coward
Cruz previously worked at Vueling which has a terrible record for cancellations, lost bookings and cruddy customer service - so he's clearly brought his experience over.
He was appointed to cut costs at BA which he's done by emulating RyanAir and EasyJet whilst keeping BA prices. He's allowed the airline to go downmarket just as the Turkish, the Gulf and Asian carriers are hitting their stride in offering world-wide routing and don't treat customers like crap. Comparing Emirates to BA in economy is like chalk and cheese.
BA's only hope is if the American carriers continue to be as dreadful as ever.
-
-
Tuesday 30th May 2017 06:29 GMT Voland's right hand
It smells like a store-and-forward messaging system from the dawn of the mainframe age (Shows how much BA has been investing into its IT). It may even be hardware + software. Switching over to backup is non-trivial as this is integrated into transactions, so you need to rewind transactions, etc.
It can go wrong and often does, especially if you have piled up a gazillion of new and wonderful things connected to it via extra interfaces. Example of this type of clusterf*** the NATS catastrophic failure a few years back.
That is NOT the clusterf*ck they experienced though because their messaging and transaction was half-knackered on Friday. My boarding pass emails were delayed 8 hours, check-in email announcement by 10 hours. So while it most likely was the messaging component, it was not knackered by a surge, it was half-dead 24h before that and the "surge" was probably someone hired on too little money (or someone hired on too much money giving the idiotic order) trying to reset it via power-cycle on Sat.
This is why when you intended to run a system and build on it for decades, you have upgrade, and you have to START each upgrade cycle by upgrading the messaging and networking. Not do it as an afterthought and an unwelcome expense (the way BA does anything related to paying with the exception of paying exec bonuses).
-
Tuesday 30th May 2017 07:03 GMT James Anderson
If it was a properly architected and configured mainframe system it would have just worked.
High availability, failover, geographically distributed databases, etc. etc. were implemented on the mainframe sometime in the late '80s.
Some of the commentards on this site seem to think the last release of a mainframe OS was in 1979, when actually they have been subject to continuous development, incremental improvement and innovation to this day. A modern IBM mainframe is bleeding edge hardware and software presenting a venerable 1960s facade to its venerable programmers. Bit like a modern Bentley with its staid '50s styling on the outside and a monster twin turbo multi valve engine on the inside.
-
-
Tuesday 30th May 2017 10:15 GMT MyffyW
no such verb as "to architect".
I architect - the successor to the Asimov robot flick
You architect - an early form of 21st century abuse
He/She architects - well I have no problem with gender fluidity
We architect - sadly nothing to do with Nintendo
You architect - abuse, but this time collective
They architect - in which case it was neither my fault, nor yours
-
This post has been deleted by its author
-
-
-
Tuesday 30th May 2017 14:24 GMT Anonymous Coward
Die off is fine. So is die back. They're descriptive and worth keeping. Architect as a verb is more or less OK, although why did someone assume 'design' wasn't good enough, since it's a correct description of the process, making architect as a verb a replacement for a word that didn't need replacing.
-
-
Tuesday 30th May 2017 14:13 GMT Anonymous Coward
<rant> True. But at least that's one I can, however reluctantly, at least imagine.
For me, by far the worst example of this American obsession with creating non-existent 'verbs' is, obviously, 'to leverage'.
Surely that sounds as crass to even the most dim-witted American as it does to everyone else in the English speaking world, doesn't it? I'm told these words are created to make the speaker sound important when they are clueless.
I can accept that some lone moron invented the word. But why did the number of people using it ever rise above 1? </rant>
-
-
Wednesday 31st May 2017 16:41 GMT dajames
There is no such verb as "to architect".
That's the beauty of the English language -- a word doesn't have to exist to be usable. (Almost) anything goes.
It's not always a good idea to use words that "don't exist" -- especially if you're unhappy about being lexicographered into the ground by your fellow grammar nazis -- but most of the time you'll get the idea across.
[There is no such verb as "to lexicographer", either, but methinks you will have got the point!]
Ponder, though, on this.
-
-
Tuesday 30th May 2017 15:06 GMT CrazyOldCatMan
A modern IBM mainframe is bleeding edge hardware and software presenting a venerable 1960s facade to its venerable programmers.
And always has done. in the early 90's, I was maintaining TPF assembler code that was originally written in the 60's (some was older than me!).
And I doubt very much if those systems are not still at the heart of things - they worked. In the same way as banks still have lots of stuff using Cobol, I suspect airlines still have a lot of IBM mainframes running TPF. With lots of shiny interfaces so that modern stuff can be done with the source data.
-
Tuesday 30th May 2017 08:48 GMT yoganmahew
@Voland
"It smells like a store-and-forward messaging system from the dawn of the mainframe age "
You mean teletype? No, it doesn't sound like that, TTY store and forward is simple in its queuing and end points. If the next hop to the destination is unavailable, it stops. When it is available, it restarts. To me, this sounds like MOM, that wonderful modern replacement for TTY. That heap of junk that queue fulls and discards messages, that halts with only writers and no readers, or readers and no writers. That cause of more extended system outages that any other component in a complex system.
@James Anderson
Damn straight!
On the issue of local resources:
The use of 'resources' is as usual interesting - it speaks volumes of identikit, replacable, hired in, temporary. An airline doesn't refer to its pilots as resources, or its aircraft engineers. It should refer to its IT staff as such... my guess is that of course it was local 'resource' fixing the problem, they'd be the only one with the access to touch the systems unless BA is gone even more loony than some on here suspect. They local resources would be on a bridge with a cast of hundreds from the supplier, all shouting at each other, all pointing in different directions. First the load balancer would be failed over and resynched. Then the firewall. Then the DNS. Then someone would point out that some component that's never failed before has everything going through it and that it has a single error report in log somewhere not highlighted in automation because it's never been seen before.
Then, as said above, when some numpty decides to restart the box, a part of it fails catastrophically - something to do with electricity... it may be sunspots or a power surge, yes, we'll call it a power surge.
-
Wednesday 31st May 2017 08:38 GMT Mellipop
We're doing the RCA
This is why I like reading the comment on the reg, we'll explore the problems and suggest changes.
If only some of those managers in the PMO carefully read these comments instead of 'socialising' requirements or 'managing expectations' then more of these chaotic and complex system evolutions would actually get better. Dare I say become antifragile?
My suggestion is an AI based system monitoring tool would have worked wonders.
https://www.moogsoft.com for example.
My two penny worth.
-
-
Tuesday 30th May 2017 10:28 GMT TitterYeNot
"So by messaging he means some sort of enterprise service bus was taken down?"
Sounds something like it. To quote Cruz - “we were unable to restore and use some of those backup systems because they themselves could not trust the messaging that had to take place amongst them.”
So, production system suffers major power failure, production backup power doesn't kick in, and either:
A) Power is restored to production but network infrastructure now knackered either due to hardware failure or someone (non-outsourced someone, obviously, 'coz he said so <coughs>) not saving routing and trust configuration to non-volatile memory in said hardware, so no messages forwarded.
or
B) DR is immediately brought online as the active system, but they then find that whatever trust mechanism is used on their messaging bus (account/ certificate/ network config) isn't set up properly so messages are refused or never get to the intended end-point in the first place, leaving their IT teams (non-outsourced IT teams, obviously, 'coz he said so <coughs>) scrabbling desperately through the documentation of applications they don't understand trying to work out WTF is going wrong.
Same old story, again and again...
- Mr Cruz, did you have backup power for your production data centre?
- Yes definitely, the very best.
- Mr Cruz, did you test your backup power supply?
- Erm, no, that takes effort and costs money...
- Ah, so you didn't have resilient backup power then, did you? Mr Cruz, did you have a DR environment?
- Yes definitely, the very best money can buy, no skimping on costs, honest...
- Mr Cruz, did you test failover to your DR environment?
- Erm, no, that takes effort and costs money...
- Ah, so you didn't have resilient DR capability then did you Mr Cruz?
- Mr Cruz did......etc. etc. ad nauseam...
-
Tuesday 30th May 2017 15:12 GMT TkH11
messaging
I doubt they are that modern to be using SOA architecture.
I read a transcript of what he said the other day and he was eluding to network switches going down. So I think he's trying to dumb down his words for a non technical audience, messaging - aka packets being switched or routed across the network between servers and apps.
-
Thursday 1st June 2017 15:04 GMT Matt Bryant
Re: TkH11 Re: messaging
".....he was eluding to network switches going down....." Ahem, not wanting to point fingers, of course - perish the thought! - but, knowing some of the "solution providers" involved in the designs of BA systems, has anyone asked CISCO for a quote about the resilience of their core switches in "power surge" situations?
-
-
Tuesday 30th May 2017 06:22 GMT Voland's right hand
Even if it is sourced locally
He personally is grossly incompetent.
IT is not a cost center in a modern airline. It is a key operational component and in fact a profit center. Without IT you cannot operate online bookings, notifications and most importantly you cannot dynamically price your flights. That is wholly dependent on the transactions being done electronically. All of the profit margin of a modern airline comes from dynamic pricing. If it prices statically it will be in the red.
He, however, has systematically treated IT as a cost to be cut, not as a profit center to be leveraged. So even if he hired the staff for these systems locally, they were most likely hired under the market rate and the results are self-explanatory.
-
Tuesday 30th May 2017 06:32 GMT Anonymous Coward
Re: Even if it is sourced locally
IT is not a cost center in a modern airline. It is a key operational component and in fact a profit center.
Its more than a profit centre. A modern airline is an IT business, one that just happens to fly aircraft. There's no manual processes to backup ops management, scheduling, pricing, customer acquistion, customer processing, ticketing and invoicing.
Until BA (and many large businesses) get a grip on this concept and start treating IT (people, infrastructure, systems) as the core of their business, we'll continue to see this sort of screw up
-
Tuesday 30th May 2017 07:15 GMT Mark 110
Re: Even if it is sourced locally
I can almost guarantee that whilst the physical infrastructure was locally supported the application support will have been off shored. Its the typical model in these organisations. Always seems to end up costing more than the planned savings in my experience.
The question to ask is when was the DR plan last updated? When was it tested? Was it successful?
If the answer is 'what plan?', which isn't as uncommon as you might think, then someones head will probably roll.
-
Tuesday 30th May 2017 08:13 GMT John Smith 19
"A modern airline is an IT business, one that just happens to fly aircraft. "
Which echos the comment that banks are IT businesses (big ones if they are retail) which just happen to have a banking license
There is at least one major IBM iSeries app that was basically a complete banking system, just add money, banking license and customer accounts.
I wonder how many major lines of business have been so automated that manual reversion is simply impossible. I'm guessing the fruit and veg arms of all big supermarkets.
-
Tuesday 30th May 2017 11:10 GMT A Non e-mouse
@John Smith 19 Re: "A modern airline is an IT business, one that just happens to fly aircraft. "
I wonder how many major lines of business have been so automated that manual reversion is simply impossible. I'm guessing the fruit and veg arms of all big supermarkets.
I read about the history of the LEO computer. Even back then, they realised that although this new fangled computer could improve things enormously, they were buggered if it went down. So they didn't go live with it until a second, backup, unit was in place.
-
-
-
Tuesday 30th May 2017 08:37 GMT Charlie Clark
Re: Even if it is sourced locally
He personally is grossly incompetent.
And presumably already negotiating his exit so that blame doesn't spread up to the IAG board.
So, he'll leave early on a fat settlement and the cuts will continue, presumably after a round of pink slips for those who just happened to be in the building.
-
-
Tuesday 30th May 2017 12:03 GMT Peter2
Re: Even if it is sourced locally
Devil's advocate for a moment: couldn't you say the same thing about electricity?
Yes, you could.
Which is why important servers have UPS's to ensure they don't lose power for more than a few milliseconds and backup generators which can then ramp up and take over from battery backups.
-
-
-
Tuesday 30th May 2017 06:23 GMT Norman Nescio
Ethernet
At a basic level, the progress of Ethernet datagrams around a network is 'messaging', so the problem could be as simple as a switch failing to operate properly, and a backup/fail-over process not working. Rebuilding a switch configuration from scratch in a data-centre might take a while, especially if documentation is missing or inaccurate.
-
Tuesday 30th May 2017 07:21 GMT Warm Braw
Re: Ethernet
Networks are supposed to be redundant - that's the whole point of spanning trees and routing protocols. A switch failing to operate shouldn't be catastrophic. And anyone found responsible for "missing or inaccurate" documentation in a critical operation of this kind should be hung from the cable trays as a warning to others.
-
Tuesday 30th May 2017 08:44 GMT Norman Nescio
Re: Ethernet
@ Warm Braw
"Networks are supposed to be redundant - that's the whole point of spanning trees and routing protocols. A switch failing to operate shouldn't be catastrophic. And anyone found responsible for "missing or inaccurate" documentation in a critical operation of this kind should be hung from the cable trays as a warning to others."
I agree completely, they are supposed to be redundant, and the cause of 'missing or inaccurate' documentation does need to be determined and rectified.
I'm not saying that a 'simple' switch failure is the cause of the BA issue. Just pointing out that 'messaging' in CEO-speak is not inconsistent with standard LAN protocols: you don't necessarily need to invoke message-passing applications, although that would be the standard interpretation of what the CEO said.
It might surprise you to find out that spanning tree may well not be enabled. In the core of a big data-centre you are likely to have a pair (or more) of switches that are meant to share throughput in normal running, but be able to fail over to each other in case of need. This isn't done by spanning tree, and could be implemented on virtual switches, just to makes things a little more complicated.
The thing about change control on switches and routers is that it is hard. There are expensive solutions out there, but any device that has a running configuration and a stored configuration, where the running configuration can be changed without a reboot, is susceptible to someone making a change and not making the corresponding change in the stored configuration file. There are expensive solutions that force all* access to managed devices through a server where sessions are logged, keystroke-by-keystroke on an account-by-account basis, so you can see who did what to which device and when. Reviewing those logs is tedious, but sometimes necessary. in addition, most changes are automated/scripted to prevent typos. There are periodic automatic audits of the configuration on the device against the configuration stored in the management system. Despite all this, discrepancies occur.
What could easily happen is a power glitch that kills one of two switches (high-end switches can have multiple power supplies and multiple control processors, but it is still possible for a dodgy power feed to bring one down terminally, possibly letting the magic blue smoke out). The other switch picks up the load, but - maybe the fail-over logic doesn't work properly so it reboots, or a power-cycle is needed - at which point you find the stored config being loaded is not the same as the old running config, and all hell breaks loose - systems that should be able to talk to each other can't; and systems that should be on isolated VLANs suddenly can chat to each other. You find you are using physical cabling that hasn't be used for a while, some of which has a fault, so you need to start tracing cables between devices using data-centre cabling documentation last updated by an external contractor whose mother-tongue wasn't English; and the technician in the data-centre has to come out of the machine room to talk to you because the mobile signal inside is poor and the noise from the fans so high you can't reliably hear him (or her), and there are no fixed-line phones. It can take days to sort out. Unfortunately I have experienced all of the above at one time or another. What should happen and what does happen can be remarkably different.
I hope we find out what actually happened to BA, but I suspect a veil of 'commercial confidentiality' will be drawn over it. An anonymised version may turn up in comp.risks or on https://catless.ncl.ac.uk/Risks/ someday.
*Of course, in 'an emergency' there are means of gaining access without going through the management system. Reconciling any changes made then with what the management system thinks the configuration should be is always fun.
-
Tuesday 30th May 2017 15:36 GMT Anonymous Coward
Re: Ethernet
My interpretation from his dumbed-down explanation is they lost power to the network switches. You have to remember, they've consistently referred to a power failure, if you're lost power to all their switches, then spanning tree won't help you.
I'm suspecting a poorly maintained UPS, with knackered batteries, and they lost power to the entire equipment room.
-
Wednesday 31st May 2017 06:35 GMT John Smith 19
"I'm suspecting a poorly maintained UPS, with knackered batteries,"
Yes, that should do it.
Dropping a spanner across a couple of power bus bars in the main electricity distribution room of a building is also quite effective. Spectacular to witness apparently, but I only saw the after effects in a company.
In fact the premium rate power repair service took hours to turn up, the system wide UPS batteries had not been charged and the backup generator was due to be fueled next week. IOW a perfect storm.
The Director level pyrotechnics were quite spectacular.
-
-
Tuesday 30th May 2017 17:21 GMT Alan Brown
Re: Ethernet
"Networks are supposed to be redundant - that's the whole point of spanning trees and routing protocols."
Spanning tree operated over a domain more than 5 switches wide is a disaster waiting to happen (that's what it was designed for). It can be a disaster on domains even smaller than that if there's LACP involved (any LACP disturbances result in a complete spanning tree rebuild, so you don't want LACP to your servers unless the switches they're connected to don't use spanning tree to the rest of the network)
Thankfully there are better alternatives.
I deployed TRILL on our campus a couple of years ago. Whilst the switchmakers primarily push it as a datacentre protocol it was _DESIGNED_ for campus-wide/MAN applications and will work across WANs too. (Bypass Cisco and look around, there are a number of OEMs all selling using Broadcom's excellent Trident2+ descendants, with far better levels of support than Cisco sell)
Naked TRILL does leave a (small) SPOF - routers for inter-subnet work, but that was plugged a while back: https://tools.ietf.org/html/rfc7956 - The distributed L3 extension to TRILL takes care of that nicely and means one less complicating factor can be taken out of the loop (no need for VRRP or OSPF or other failover protocols within the network, just at the edges)
-
Wednesday 31st May 2017 00:06 GMT Anonymous Coward
Re: Ethernet
i hope "a couple" is > 2. TRILL was all very cool for the datacentre 5 or 6 years ago, but it's a dead duck nowadays (VXLAN is where it's at if you still need layer 2 - ideally with EVPN as the control plane). In the campus I can't see any reason to have layer 2 domains that span multiple switches...
-
-
-
Tuesday 30th May 2017 10:28 GMT Bill M
Re: Ethernet
I set up a resilient IATA Type B interface for airline industry messaging at the tail end of last century with gateways at multiple geographic locations so it would all carry on working even if an entire data centre went TITSUP.
A couple of months after this I was on a conference call with the Messaging Provider about another project and the concept of such resilient messaging was mooted and the Messaging Provider stated that it was impossible to set up geographically dispersed resilient messaging. When I said I done just this for a previous project there was a pregnant pause followed by indignation from the Messaging Provider saying it was impossible and even if it was possible then we were not allowed to do it.
The Global IT Director had been on the conference call and visited me a couple of days later to see how I had set up the resilience. Soon after the Global IT Director announced we were changing our Messaging Provider and bunged me a pay rise and a cash bonus.
Makes me wonder what Messaging Provider BA uses.
-
-
Tuesday 30th May 2017 11:51 GMT Bill M
Re: Ethernet
He was a good Global IT Director.
I can remember a panic conference call a few years later when a major system was effectively down due to performance issues. I was sorting the issue with a techie from the supplier, but the Global IT Director kept wittering on about other things and I snapped at him and told him to either shut up or fuck off.
He did shut up and I got things sorted with the techie and back up soon after. I was a bit worried in case I had overstepped the mark when I got no phone calls or emails from him for a few days. But then I got an email from him bunging me another, albeit modest, pay rise.
-
-
-
Wednesday 31st May 2017 16:54 GMT dajames
Re: Ethernet
Rebuilding a switch configuration from scratch in a data-centre might take a while, especially if documentation is missing or inaccurate.
Oh, I know this one ... or the documentation is complete, accurate, and thorough ... but everyone assumes that if there is any documentation at all it will be patchy, inaccurate, and misfiled, so they don't even look for it, and just make stuff up as they go along.
(Cynic, moi?)
-
-
-
-
Tuesday 30th May 2017 10:11 GMT Nick Kew
Re: Where was the "power surge"
I've seen suggestions of a lightning strike,
Might've been me, in the last El Reg commentfest on the subject.
but that could be someone confusing Saturday's cock up with Sunday's thunderstorm.
You mean, my comment posted here on Saturday referenced Sunday's alleged storm?
More likely having observed the very big and long-lasting thunderstorms we had around the wee hours of Friday night / Saturday morning. Caused me to power down more electricals than I've done any time in my four-and-a-bit years since moving here. All my computing/networking gear, including UPS protection. Even the dishwasher, which had been due to run overnight.
-
Tuesday 30th May 2017 14:30 GMT Arthur the cat
Re: Where was the "power surge"
You mean, my comment posted here on Saturday referenced Sunday's alleged storm?
I was thinking old fashioned print media (except read by the magic that is the Internet :-). Trouble is I have several in the news feeds and can't remember which it was.
More likely having observed the very big and long-lasting thunderstorms we had around the wee hours of Friday night / Saturday morning. Caused me to power down more electricals than I've done any time in my four-and-a-bit years since moving here. All my computing/networking gear, including UPS protection. Even the dishwasher, which had been due to run overnight.
$DEITY, that bad? Worst I've ever had in 30+ years round here (Cambridge) was a nearby lightning strike that took out an old fax modem and left everything else standing. Probably because the place is rather flat and the church steeples (and the UL) stick up above most other buildings.
-
-
Tuesday 30th May 2017 11:45 GMT Tom Paine
Re: Where was the "power surge"
Indeed. There are rather a lot of commentards here and on the previous story of the 27th offering detailed, blow-by-blow accounts of what must have happened, and they're all different -- except that everyone's dead certain Cruz is an incompetent idiot and that it's all due to the famous outsourcing deal.
I'm not saying the outsourcing had nothing to do with it -- I'm not in the position to know -- and I don't take his assurances that it wasn't a factor completely at face value. I just wonder how so many people know so much more about it than I do, when we all read the same article...
-
-
Tuesday 30th May 2017 15:43 GMT TkH11
Re: Where was the "power surge"
At the start of the incident, all references were to a power failure. It's only Cruz that introduced 'power surge'. Power surges are generally caused by the national grid power in to the site supplied by the electricity company, power failures can be anything, including external to the site, or within the internal power distribution network in the data centre, UPS, generators, circuit breaks tripping..
Either an attempt by Cruz to deflect blame on to the power company, or just a poor choice of words.
-
-
Tuesday 30th May 2017 06:43 GMT Brett Weaver
I'm not a BA Customer or Shareholder
I just have 40 years experience in IT. The CEO should be fired (After he fires the CIO)..
Actually I am sick and tired of organisations insisting that issues that have been addressed successfully for generations are somehow new, and different and they could not have anticipated....
-
Tuesday 30th May 2017 06:54 GMT Blotto
Sounds like they had some type of encryption system running, perhaps encrypting their WAN traffic that relies on a central key server that went awol.
Something like a Group Encrypted Transport VPN
http://www.cisco.com/c/en/us/td/docs/ios-xml/ios/sec_conn_getvpn/configuration/xe-3s/sec-get-vpn-xe-3s-book/sec-get-vpn.html
Very secure but
Loose your key server / config or timings and things go bad quickly,
Loss or delay of replication leading to out of sync data leading to corruption leading to a mess.
-
Tuesday 30th May 2017 15:46 GMT TkH11
Encrypted traffic
F. Me! I think I've just seen a UFO. Anybody else got any entirely random and made up theories?
Let me offer this one: a cat found a small hole in the side of the building, jumped up on to the circuit breakers and urinated. Bang! circuit breakers blew, crashing the databases whilst they were in the process of writing records.
-
-
Tuesday 30th May 2017 07:56 GMT Anonymous Coward
There is Testing it and there is Testing it
Doing a once only failover at 02:00 on a Monday morning is not going to cut it.
Doing it regulary at 02:00 on a Monday morning is a step in the right direction
Doing it at least once on a very busy holiday weekend is testing it but only once you are really confident in the previous step.
Doing a DR failover with little or no load is one thing. Doing it when the system is 80% loaded is IMHO a proper real world test.
In my last job (also in the Airline Industry which was outsorced to India) we would send someone to the main DC and tell them to sit there all day. At some point in the day they would hit the big RED button that would power down that DC. They would do it without warning to anyone else. That way we would know if the systems could fail over properly and come up again when power was restored.
All that anyone outside of the person tasked with doing the power down knew was that sometime that day, a failover would happen.
Almost the first thing that the Outsorced team did was to put a stop to those regular DR tests. The reason they gave was that they'd have to send almost the whole team to the site to be there when it happened. That would be too expensive and time consuming.
After a 'Doh' moment, I took my redundancy and left them to it.
As I said, there is Testing and there is TESTING!
-
Tuesday 30th May 2017 08:37 GMT Anonymous Coward
Re: There is Testing it and there is Testing it
It's noticeable how prominent those big Red Emergency Power Breaker buttons look to a disgruntled employee whose job is being outsourced pretty soon and knows it won't end well for the Company involved. Especially when it controls the power to a (hypothetical, of course) data centre the size of 5 football pitches.
I know of one example - a US company was very jittery about their size and prominence in the newly built Data Centre, when they set up a European Base. It's odd, the need to explicitly explain that you do need to be able to quickly find these big red buttons to kill the circuit breaker, in an Emergency, emphasizing the point, you don't let anyone into those areas you have any doubts about.
-
Tuesday 30th May 2017 09:55 GMT Anonymous Coward
Re: There is Testing it and there is Testing it
And sometimes for special customers who are concerned about Data Center resilience, they are invited to come along and turn the power off themselves with the BRB (Big Red Button). It seems that only half of them actually hit the BRB, the other half chicken out and the Data Center manager has to do it himself.
-
-
Tuesday 30th May 2017 09:53 GMT Anonymous Coward
Re: There is Testing it and there is Testing it
"Doing it at least once on a very busy holiday weekend is testing it but only once you are really confident in the previous step."
... however, can ou imagine the headlines etc if we get the report that "BA reveal the bank holiday IT failure was due to their IT department deciding to test their failure recovery system on one o fhte busiest days of the year. The spokeseprson added that 'the IT department were confident from rpevious tests that this would work but unforseen circumstances broughtthe system ot a standstill'. BA would like to thank all the passengers that have been delayed for their patience and for playing their part in this learning experience"
-
Tuesday 30th May 2017 11:26 GMT A Non e-mouse
Re: There is Testing it and there is Testing it
I think this is happening because IT is often siloed into "systems" (SAN, Networking, Servers, etc) rather than services. Which means that few (if any) people really knows how everything glues together.
(Anyone remember this IBM TV advert from a few years back...?)
-
Tuesday 30th May 2017 13:06 GMT Roland6
Re: There is Testing it and there is Testing it
I think this is happening because IT is often siloed into "systems" ... rather than services. Which means that few (if any) people really knows how everything glues together.
Another cause is that few enterprise applications (off-the-shelf or bespoke) have been truely written to incorporate instrumentation and measurement hooks for service monitoring and mapping, and even fewer organisations have systems management systems that do service level monitoring of applications. So few (if any) people really know which interfaces/systems are the bottlenecks and critical points in the business functions/application's cloud.
-
-
Tuesday 30th May 2017 12:59 GMT Roland6
Re: There is Testing it and there is Testing it
Re: As I said, there is Testing and there is TESTING!
Agree, hitting "the big RED button that would power down that DC" is a very dramatic, but quite simple failure, basically I would list it in the 'clean' fail list. What I suspect happened at BA was a dirty fail combined with a rather large systems estate - 200 hundred (application) systems is rather a lot of systems to keep in sync.
Suspect in the final analysis, the fault, that actually caused the recovery failure, will be found in some poorly written Web2.0 application (ie. written without regard for the constraints of distributed real-time transactional processing).
-
-
Tuesday 30th May 2017 07:10 GMT Anonymous Coward
'backup power systems then failed'
- Well, if you're a terrorist, now you know which areas BA is vulnerable on...
- Like the recent United meltdown it doesn't show much actual leadership, just finger pointing and excuse making while this CEO waits for his million-dollar-bonus and golden parachute, until the next elite arrives.
- If anything too, it shows growing distancing between the elite and the workers, in everything except PR statements: 'sure aren't we all on the same shop floor lads'???
- Have to say it was nice watching this particular CEO squirm. BA kept $2K after they refused to let us embark recently and wouldn't return the money, so forgive a moment or smugness.
-
Tuesday 30th May 2017 07:36 GMT MonkeyCee
Latest config not saved
Allright, my bet is that the power failure is caused by putting too much load on the circuits at one time. Either from a mass reboot or the cooling systems not behaving nicely, or being mismanaged. Some piece of vital network kit wasn't on the UPS, and lost it's current config on reboot. Or mangled it in a fun way.
-
-
Tuesday 30th May 2017 10:17 GMT Anonymous Coward
Re: Latest config not saved
Mmmh, proper design of a datacenter won't allow too many machines to turn of at the same time. There are managed PDUs which monitor the load and can control the power on/off sequence.
Also an hot reboot doesn't drain as much power as a cold one - but even patching of large installations is done in a rolling fashion.
-
-
Wednesday 31st May 2017 19:00 GMT Alan Brown
Re: Latest config not saved
"Allright, my bet is that the power failure is caused by putting too much load on the circuits at one time. Either from a mass reboot or the cooling systems not behaving nicely, or being mismanaged."
If any of the above happen, then the power system is mismanaged.
If it can't survive a mass reboot then you beef it up or don't let such a thing ever happen
I can't fathom DCs which appear to only have one (or two) cooling systems. This falls under "not a good idea" for a number of reasons - and after some nasty experiences with dickheaded wiring on "centrally managed" cooling(*), I'm minded to insist on completely independent command and control systems for each one even if that puts the cost up a little.
(*) Hint: Guess what happens to your 5 "independent" cooling systems if the central panel goes "phut" ?
-
-
Tuesday 30th May 2017 07:41 GMT pele
One possible explanation, since he mentions that backup systems would not trust other systems, is that backups were running expired certificates (for db connections etc) or for the backup sub-domain and backup machines would talk to eachother but would not talk to primaries (because of the aforementioned certificate issues) since noone ever bothered to check if they even could (cross-talk between the mix of primary and backup boxen). In the previous 70k-plus-employees organisation we had a team of 5 who would only worry about certificates and keep them fresh on all machines all day every day. We even had a random person in charge of changing batteries in servers on any given day for chrissake! And there were no "backup systems" per se, everything was live at all times, in parallel, but spread across 3 physical locations. And we would be knocking machines out daily in the process of rolling out new versions and patches so were constantly testing "backup systems". Every friday PSUs for the building were kicking in just for fun. As did the fire and smoke alarms. Something like this BA "incident" was not possible. So to my mind CIO/CTO bear the full responsibility for this together with a lot of people downstream from there. And CEO just for choosing an incompetent CTO/CIO for the job. And the whole board for bringing on-board such an incompetent CEO. YES you outsourcing crucial brain power to indian subcontinent DOES have EVERYTHING to do with this episode. Just like it did for RBS 5 years ago, an assowl comes along who thinks he can just "upgrade" the MQ and "lets just clear this stupid queue there who needs that anyway". I say someone needs to fine BA heavilly and use the money to re-hire all the laid-off people. And then re-nationalise the whole lot. No I never worked for BA nor am I a member of any workers union, I was just a well-pleased BA frequent flyer who is sad to see such a great airline being degraded by small greedy men who have no basic understanding of the quality they (used to) represent.
-
Tuesday 30th May 2017 08:05 GMT kmac499
It's looking increasingly likely, that whatever the initial cause of failure; the recovery was hampered by poor or zero maintenance of backup hardware,software and configurations.
Maybe the CIO should take a leaf out of the aircraft maintenance process. Scheduled checks and refurbishments, signed checklists stamped by authorised\certified engineers as dictated by external regulators.
I wonder what BA's public liability insurance quote is going to look like next year?
-
Tuesday 30th May 2017 10:51 GMT pele
You don't need to take anything out of aircraft trade - you need, however, to have listened in CS208 Software Engineering class when the lecturer was talking about software lifecycle management etc etc. Which would have been a requirement to finish off your Comp Sci degree. Which I am certain would have been a requirement to have been hired as a CTO in the first place.
This mentality of blaming EVERYTHING nowdays on "IT issues" is getting on my nerves big time. All cock-ups that incompetent idiots cause can simply be referred to you friendly local IT guy who'd be more than happy to take all the blame. Or, failing that, a "terrorist threat".
-
-
-
Tuesday 30th May 2017 07:42 GMT Ochib
“All the parties involved around this particular event have not been involved with any type of outsourcing in any foreign country ,” he said [Reg emphasis]. “They have all been local issues around a local data centre who has been managed and fixed by local resources.”
So were is the data center, because you could read this as
"All the parties involved around this particular event have not been involved with any type of outsourcing in any foreign country [Data Center and Staff are in the same town].They have all been local isssus around a local data centre [Based in Mexico] who has been managed and fixed by local resources [Also based in Mexico]
-
Tuesday 30th May 2017 08:00 GMT lglethal
um what?
"There are no redundancies or outsourcing taking place around this particular hardware live operational systems resilience set of infrastructure in this particular case."
Is this is a misquote, or did he really say something which can best be described as gibberish or perhaps a poor attempt at bullsh%t bingo?
-
Tuesday 30th May 2017 08:06 GMT Anonymous Coward
My sympathies are with the passengers caught up in this...
Like many (all?) commentators here, I get the feeling there's a lot more to this than a simple random act of God power surge type event. And I also get the feeling we'll be seeing more and more of this as companies that don't consider themselves as IT-reliant refuse to invest in keeping their systems up-to-date with the demands placed on them, outsource vital support to far away and cheap lands, and generally believe that fizzy TV ads and pastel coloured websites are more than enough to keep the bad things at bay.
People who work at these companies are getting fed up with being treated like dirt, the people in the far away lands (fill in the applicable country yourself) are starting to appreciate what being called a 'low cost environment' really means, are starting to appreciate their importance to the company, and are starting to quietly down tools at critical moments. This will only get worse as every company engages in a desperate race to the bottom.
But my sympathies go out to all the passengers caught up in this nonsense - air travel is horribly soul destroying and stressful at the best of times, so to add to the misery almost feels criminal. But alas it's going to take a major air catastrophe before we see any sort of real improvements.
-
Tuesday 30th May 2017 08:12 GMT JimmyPage
EFFECTED ?????
I think he meant "affected".
What the fuck is going on with the UK that the CEO of an alleged serious global company is allowed to make such a schoolboy error ????
Also, really, El Reg should have noted it as "(sic)"
It really is a bad day for the image of UK education when a French and Indian colleague can point out a Brits bad grammar .....
-
Tuesday 30th May 2017 08:41 GMT Fenton
Hardware vs software
Even if the hardware is locally maintained, it's the software that goes tits up when the hardware fails. And it's the software management that has been oursourced.
The software is old and does not have automated resyncing (taking referencial integrety into account),
there are likely to be hundreds of little scripts that have been created to get around functionality gaps, many of which will not have been documented or architected properly (likely to have been created to fix a P1 issue).
I've been part of many an outsourcing deal, with perspecitive from both sides.
The recipient doesn't know what they don't know, the people giving up the system don't know what they have forgotten and are generally pissed off they may be redundent soon.
The best moves have been the ones where a ground up review/re-write/rearchitect/retest have been part of the move adding new functionality, plugging gaps properly and properly re-testing.
-
-
Tuesday 30th May 2017 11:44 GMT Doctor Syntax
Re: Hardware vs software
There is still no verb "to architect".
Could you please suggest a pre-existing alternative.
The implication of the word is something at a higher level than "design" would cover, dealing with the overall form and how the components fit together but not quite the same as "specify".
If there isn't such a verb - and off-hand I can't think of one - then it might be necessary to devise one. Importing a word from one part of speech into another is a long established practice in English. All it requires is for enough people to do it so that it becomes accepted. Objections along the lines of "there's no such word" seem to be part of the process.
-
Tuesday 30th May 2017 16:05 GMT Commswonk
Re: Hardware vs software
There is still no verb "to architect".
@Doctor Syntax: Could you please suggest a pre-existing alternative.
Try this:
...there are likely to be hundreds of little scripts that have been created to get around functionality gaps, many of which will not have been documented or
architectedstructured properly...
-
-
-
Tuesday 30th May 2017 11:24 GMT Doctor Syntax
Re: Hardware vs software
"The best moves have been the ones where a ground up review/re-write/rearchitect/retest have been part of the move adding new functionality, plugging gaps properly and properly re-testing."
And we all know what the chances of that happening are when the objective is cost-saving.
-
-
Tuesday 30th May 2017 08:41 GMT richard_w
I dealt with BA as a supplier and I also worked there. As a supplier, the IT department was super fast to blame us when there was a hardware problem, but refused point blank to take the steps which would have avoided the problem in the first place. Because it would have cost more, or because they did not need to do what we suggested on their small systems (as opposed to multi terabyte enterprise database systems).
When I worked for them it became apparent that the multiplicity of disparate systems all of which had to communicate to keep the airline running were an accident waiting to happen.
-
Tuesday 30th May 2017 08:59 GMT Nick Ryan
When I worked for them it became apparent that the multiplicity of disparate systems all of which had to communicate to keep the airline running were an accident waiting to happen.
<= THIS
Organisations need to periodically refactor and simplify their systems, particularly after what can sometimes be decades of accumulation and evolution. Yes, there is a risk in doing this, however there is probably more of a risk in not doing this because of unknown, undocumented systems being somehow critical to operations. Like insurance, it's a gamble and in this case BA lost - now it's just a case of them working out how much money they lost as a result.
Some organisations refactor and simplify on a continual basis which, if done well, should remove the need for more drastic measures later. However this requires more ongoing investment and planning but the upside is better systems overall and, most likely, cheaper in the long run,
-
Tuesday 30th May 2017 10:07 GMT richard_w
In BA's case there were 2 things which mitigated against simplification. A lack of understanding of the individual architectural components (for instance there was a mission critical PC system in many airports around the world which exactly one person in the airline understood). If it went wrong, they put him on a plane (first of course), to fix it.
The other was the fact that many of the systems dated from the 1980s or possibly earlier, and had been changed repeatedly. The only way to fix this was to re architect them. Not quick, expensive, and gave no bang for your buck, unless you improved the functionality from the business perspective.
-
Tuesday 30th May 2017 10:33 GMT Anonymous Coward
Everything was a profit centre at BA when I was there (too long ago and too unimportant in the scheme of things to have any insight into this fuck up). Not as in profitable, but in that it had to make a profit otherwise bad things would happen.
And with a set-up like that, IT is considered a dead weight because it's got no external clients, instead of being the backbone of the company.
-
-
Tuesday 30th May 2017 16:07 GMT Anonymous Coward
I think BA did refactor their systems, to modernise them. But poor design, low calibre software engineers, possibly cheap outsourcing the development off-shore, inadequate testing is what is now leading to most of the problems BA is experiencing. I think the most recent incident is probably a combination of an initial hardware problem and then a massive issue of trying to recover crashed applications and databases, messaging systems which have left apps in inconsistent states and thus very difficult to recover.
-
-
Tuesday 30th May 2017 16:03 GMT Anonymous Coward
Risk Appetite
I work in the transport business supporting some large systems. The level of risk appetite is scarily low. I could tell you a few horror stories, but I'd be recognised. Hardware, software is not updated, operating systems not patched, because the customers are too frightened in case it causes future operational issues. The customer is frightened to reboot servers in case they don't come back up.
-
Tuesday 30th May 2017 21:00 GMT Anonymous Coward
Re: Risk Appetite
That would probably be the same transport company that has no meaningful DR plan, and wouldn't know which systems to restore in what order in the event of an outage. Senior exec capability is scarily low in both BAU and projects. Actually I think most transport cos are like this from what I've seen...
-
-
-
Tuesday 30th May 2017 08:54 GMT John Smith 19
A note on message passing.
In IBM MF land message queues (msgq in the AS400 command language) are effectively named pipes which can link processes. They can expand if the "writer" is producing a lot more data than the "reader" can accommodate at any one time. IIRC they can also do character set translation (EG EBCDIC to ASCII) which is handy give a lot of stuff is not EBCDIC as standard.
BTW there is also an MS version of MQ series.
I can't recall if the reader dies wheather that can pause the writer process or if the queue just keeps getting bigger (the simple programming option is the MQ just deals with it. No special case handling required).
I can see the joker in the pack being different processes dying at different times given different queues holding mixed amounts of good and bad data that are not synchonised, making it very difficult to decide which entries (BTW they are called "messages" but the definition of "message" is very flexible) to discard.
However these issues are completely predictable and MF devs and ops have been dealing with them for decades. BA should definitely have some tools to manage this and some procedures in the Ops manual to use them.
As for configuration I find it very hard to believe that in 2017 a business this big does not have a set of daemons checking all its network hardware and recording their actual (working) configurations.
This is also one of those moments when labeling all those cables and power plugs with what they power and what they should be plugged into turns out to be quite a good idea.
So much HA and DR is not in the moment. It's in the months of prep before the moment.
-
Tuesday 30th May 2017 10:09 GMT Anonymous Coward
Re: BTW there is also an MS version of MQ series.
I well know - I integrated it into the Avery "Weighman" suite to connect to British Steel (as was).
It was designed to be a reliable protocol, on top of the unreliable TCP/IP stack - so no data loss. Unread messages are queued (MQ - "Message Queuing") until they can be processed. Obviously there is a theoretical upper limit predicated on memory and disk space, but if you hit that, you have far more problems than a missed message.
While we're talking reliable protocols ... am I alone in finding that more (younger) people really struggle with the idea ... ("We'll send it by email.", "er, what if it doesn't arrive ?", "but that never happens ...").
-
Wednesday 31st May 2017 19:13 GMT Alan Brown
Re: BTW there is also an MS version of MQ series.
> am I alone in finding that more (younger) people really struggle with the idea ... ("We'll send it by email.", "er, what if it doesn't arrive ?", "but that never happens ...").
No you're not and many of the worst offenders are people old enough amd experienced enough to know far better.
They're also the same people who use email as a multi-terabyte document repository, then complain because the servers (which haven't been upgraded to cope) are struggling/crashing, or won't let you change the storage methods to something which _WILL_ cope because of the "risk"
-
-
Tuesday 30th May 2017 13:08 GMT Alistair
Re: A note on message passing.
MQs runs on linux, AIX, HPUX and Solaris as well as on MF and Windows.
Actually runs rather well on all of that save windows.
does inline encryption between MF and *nix. No experience with MF to Windows encryption.
have a rather critical pipeline from HPUX <-> MF <-> linux <-> vpn tunnel(s) <-> ??
Terrifying days for that are when one of those elements updates MQ. 8 years and we've never had an update take it out. Only SSL cert (changes) take it out.
-
Tuesday 30th May 2017 16:27 GMT John Smith 19
"MQs runs on linux, AIX, HPUX and Solaris as well as on MF and Windows."
Thank you for reminding me. It's been a while. MQ's mult platform nature is one of its strengths. I called it MF land because of IBM's MF centric view.
As anything involving HA systems I'm sure no update would go to the live environment unless it's been thoroughly tested first.
All of which makes the CEO's story about this being a messaging failure seem stranger and stranger.
-
-
-
Tuesday 30th May 2017 09:43 GMT cynical bar steward
Dump bad operating systems NOW
If you want reliable clustering and computing, dump the M$ permanently bugged (since Win 3.1 they have NOT fixed it yet? Hey sorry to tell you guys but this emperor is stark naked and always will be. Put proper stuff in. Legacy means IT WORKS. M$ just keep dangling that carrot beyond W10? Is this coffee you can smell?
Reliability was designed in, not cobbled around. Unixen may be more secure but never designed from the ground up for reliability.
[Does BA use any of this, who cares its RUBBISH and this proves it.]
Programming for VMS railroads you into doing things properly, or at least considering that.
Oh, another thing VMS and [insert name of exploit]? It doesn't need patching 15 times a day for security holes. But soon to appear on the very same hardware M$ cannot run securely.
VMS is alive and well, you'd almost think that the old naysayers were the lemmings leaping towards that cliff [of unreliability]... Best place for them, probably.
Yeah yeah you've committed too much into your IT infrastructure to change direction? It's your funeral...
The cost of the IT rarely actually measures much up to the value of the data or the company, and in these days of bleeding talent to save costs sight of that has been lost, penny wise and pound foolish.
Fine, it's like the CVIT (Cash and Valuable In Transit) firms swapping to using bicycles because they are cheaper.
-
Tuesday 30th May 2017 09:43 GMT Anonymous Coward
Only speculating here but my bet would be something like this:
http://www.cisco.com/c/en/us/about/supplier-sustainability/memory.html
Note that this affected other vendors and not just Cisco but... having seen other customers impacted by the issue above, it feels very familiar (short power failure/maintenance caused network devices to die).
Looking forward to reading some root cause analysis...
-
Tuesday 30th May 2017 09:44 GMT AbsolutelyBarking
What a SNAFU. Any decent middleware/MQ platform is (or should be...) designed to be fault tolerant and, in particular, work in a distributed environment. One node falls over and it all carries on working, albeit maybe with more queuing latency. Quite good fun seeing this work properly when you go around yanking network cables out (in the dev environment obviously..) and back in. An infrastructure architecture FAIL IMHO.
Senior people typically only interested in reducing short term costs = this year's bonus. Well that worked out well didn't it....
-
Tuesday 30th May 2017 10:16 GMT eldakka
My theory:
They are probably using IBM mainframes and IBM P-series servers and IBM Datapower appliances, all with IBM providing the support with a dedicated team of 20 professionals who have worked on the contract for years with tons of corporate knowledge.
Then IBM fired all their contractors, and were suddenly down to 5 permanents as the 15 contractors are let go, and those permanents are mostly management-types. So there were no skilled staff left to handle what would have otherwise been a routine failover and so botched it.
-
Tuesday 30th May 2017 11:21 GMT Anonymous Coward
Plus...
the IBM Support desk is in India where the first thing they ask you to do is 'Switch it off and on again'...
They seem blissfully ignorant that 'switching it off and on again' is not something you do to an IBM Mainframe setup. PErhaps that was the power surge?.... {mind boggles}
Anon as I make a living from implementing MQSeries, WAS and IIB on a variety of platforms.
-
-
Tuesday 30th May 2017 10:34 GMT Domeyhead
Quite surprised at some of the immature and uninformed comments on a site that should provide more information than the mainstream media. I have worked with several large organisations and in 20 years experienced 4 or 5 serious (datacentre scale) failures. In all cases fast failover technology was available but in all cases it failed to function fully and correctly when needed. In one case the business in question had 3 levels of UPS to protect a datacentre and it was one of these that caused the outage! Only one company in that list actually bothered to rehearse DR events and this is crucial - when did BA actually rehearse a full DR at the datacentre in question. The most fundamental requirement of any DR is to prove it works anytime, as much for the people as for the technology. I wrote the DR specification for one particular system (not BA I hasten to add) . Auditors would call the DR at an unspecified time and then it was down to the staff - and although operations rehearsed recovery regularly (around once per year) they never passed the DR audit in the 4 attempts I was involved in - but they did get a whole lot better (and calmer) at recovering from the inevitable spanners that chance throws into the mix. Let's get something clear here. a CEO may be accountable for the ultimate financial performance of a company - that is his job - but he is clearly NOT responsible for the success or otherwise of a particular DR. Accountability and responsibility are not the same thing. The CIO (or IT director in some places) IS accountable. And the IT Operations Director IS responsible because he/she can actually make decisions that influence the quality of the solution - especially the requirement to homogenise and rehearse under an independent auditor. I suspect the problems here were caused by multiple hardware and software layers having inconsistent commit and rollback points. I've seen this myself - storage using some kind of block level mirror across sites while up above the database commits at a row level and the application above that commits at an a transaction level. All these bits flow along a bus at some point - but it's not the bus that's the problem. It's a lack of DR planning and rehearsal (Ops Director) - it's a lack of communication between delivery and operational teams ( technical team managers - and thinking that a whole load of expensive software and hardware enables you to do without Business Continuity professsionals (CIO). There are plenty of war stories out there - look for what they all have in common - and it isn't the CEO of BA.
-
Tuesday 30th May 2017 11:58 GMT Doctor Syntax
"Let's get something clear here. a CEO may be accountable for the ultimate financial performance of a company - that is his job - but he is clearly NOT responsible for the success or otherwise of a particular DR."
He is - or should be - accountable or responsible, whichever you prefer, for ensuring that the CIO has taken whatever steps are necessary to ensure the proper operation of his side of the business.
If the CEO has lost sight of the fact that his business relies on IT for the moment to moment operation of the business (not just day to day) and not acted accordingly then he should cease to be the CEO.
-
-
Tuesday 30th May 2017 11:41 GMT Anonymous Coward
Well it sounds like a Network Failure (power) caused the ESB to fail. It must be more than that though because I cannot believe that BA would have a single ESB node connected to a single switch for communication to the rest of the enterprise.
If they did then they deserve everything they got and have got plenty of architecture redesigning to do.
-
Tuesday 30th May 2017 11:57 GMT smartypants
Taking a step back for a moment...
Regardless of the nature of the issue and the reasons for it, in general, things do go wrong when humans are involved. The aspiration we have of things never breaking is a good one, but reality is in control here.
What has changed in the last 40 years is the sheer number of IT systems out there, and for each, the average complexity of what the systems do, and the number of people they serve. But although methodologies and technologies for ensuring service integrity have proliferated, the static component in the mix is the mere human, and this hasn't really changed much at all in the same period (perhaps fatter?) We're just as able to underestimate the risk factors as we always were, and we are just as subject to the lure of following the money (or the promise of spending less of it).
There will generally be more of this sort of thing in the future. In each case, the broken service will return to what it was before the event in just a few days, which is more than can be said for the system known as the "United Kingdom", which some idiots have taken the back off and are planning to improve it by cutting a load of wires with shears.
-
Tuesday 30th May 2017 12:22 GMT simmondp
Business School 101
Business School 101; never outsource anything which is critical to your business. Outsource that which is not critical and gives you no competitive advantage (for example; payroll & HR systems).
Seems BA seem to have forgotten the essentials.......
Any CEO who signed off on such a deal, irrespective of whether it was the root cause, should be falling on their sword.
Any CEO who does not understand the criticality of the IT systems (and it may be negligible) in their business should not be in the job.
-
Tuesday 30th May 2017 12:57 GMT Anonymous Coward
Rule 1 of Press Releases
Rule 1 of press releases is... delete the evidence when your client screws up big time!
-
Tuesday 30th May 2017 13:49 GMT cdegroot
20/20 hindsight is easy...
But let's look at the other side of the coin. You're a CEO of an airline. Competition is incredibly fierce, and because of that and deregulation ticket prices have been dropping like a rock over the decades (when my dad took us to Spain in the late '70s, I think the charter tickets where 600 guilders. I guess I can pick up an AMS->ALC roundtrip for 200 euros or thereabouts?). We're all profiting from the collective consumer consciousness driving prices into the basement, let's not forget that.
Efficiency and cost cutting are pretty much your only options, and pretty much they need to be done across the board. As a CEO, you're in the risk taking business - cut just enough cost, you make a profit; cut just too much somewhere, and you have a disaster. Cut cost on aircraft maintenance, the disaster is people dying, cut cost on IT, the disaster is a huge inconvenience, egg-on-face, and a - probably nicely predictable - monetary loss. Yes, this may wipe out this year's profits, but across the board, it's probably the better outcome.
You know what? I'd probably cut IT costs if I'd be running an airline. I think it's just too simple to blame this on stupidity. Yes, you could probably do better than BA's IT (I've only worked three air travel related systems - booking, ATC and airport - and that was enough to put me off for good), but I wouldn't be surprised if these IT systems are a mess of old and new kept together by the Kermit protocol and similar band-aids. For all I know, the C-levels took reasonable decisions with measurable risks, and the coin just fell on the wrong side.
-
Tuesday 30th May 2017 15:37 GMT Doctor Syntax
Re: 20/20 hindsight is easy...
"Efficiency and cost cutting are pretty much your only options"
Your IT systems then become they key to providing that efficiency. Your business becomes, as has been said in another comment, an IT business that happens to fly planes. Or another way to put it is that IT becomes one of your core competences. Becoming incompetent at that is stupidity.
-
-
Tuesday 30th May 2017 14:02 GMT JaitcH
Dependable Utilities Generate a Blasé attitude
I lived for many years in Canada where the water flows free and plentiful and generates most of our power needs, year after year, decade after decade. Then in 1998 came the Great Ice Storm.
Quebec was knocked flat. Much of it's power comes from Labrador (part of Newfoundland) and many of the stately pylons carrying the life blood of today's lifestyle collapsed, simply crumpled, under the weight of the ice.
'Low voltage' (local distribution) failed, too, with trees and hydro (electricity) wires fell and utility poles cracking or snapping off under the combination of 7-11 cm (3-4 in) and cold weather. But after a month everyone had their power restored.
Then came the Northeast blackout of 2003 was a widespread power outage that occurred throughout parts of the Northeastern and Midwestern United States and the Ontario, Canada on Thursday, August 14, 2003, just after 4:10 p.m. EDT.
Two very hot days in a big city like Toronto, with no power, is not fun.
Now. living in VietNam, power failures are a nothing. Most business have portable generators, larger entities such as hotels and hospitals have impressive generators, housed in 'sound proof' box=shaped containers.
My wife owns two mid-sized hotels and I own a mid-size 4-floor office building and a couple of homes. All have fire alarms and stand-by power systems. The hotels have LED lighting, throughout, and their initial back-up is through batteries, big batteries.
My office has battery/generator back-up, as do my homes.
Every one of our buildings has an automatic, human intervention-free, fire and power system. They are programmed to randomly power off, or sound the alarm, so that all occupants are very familiar with emergency procedures.
I wonder how often BA actually tested their facilities without giving prior notice? How often does YOUR company do the same? It's the only way to really test emergency equipment.
-
Tuesday 30th May 2017 14:40 GMT Dodgy Geezer
Culled from the Pilot's forum...
"On Saturday morning around 9:30 there was indeed a power surge that had a catastrophic effect over some communications hardware which eventually affected the messaging across our systems..."
...Mr Cruz said the surge was “so strong that it rendered the back-up system ineffective”, causing an “outage of all our systems” at 170 airports in 70 countries. Power companies denied that there had been any supply problems at the company’s main hub at Heathrow or the airline’s headquarters, north of the airport perimeter. SSE and UK Power Networks, which both supply electricity in the area, said that there had been “no power surge”....
-
Tuesday 30th May 2017 15:38 GMT SharkNose
BA used to be a massive user of IBM MQ. I'd be astonished if that isn't still in the picture for intersystem messaging. I've used it extensively for pushing 20 years and found it to be incredibly resilient when configured correctly.
The BA problem sounds more like a lack of robust tested procedures to be able to reconcile disparate systems following an outtage.
-
Tuesday 30th May 2017 17:09 GMT Anonymous Coward
As insane as the BA fiasco is, I think I can top it.
In the 1990s I worked for a cable TV and broadband provider based in the Boston suburbs. We had a single data center which delivered phone, TV and internet service to the entire New England region (and parts of NY state) and it just happened to be in the same building as our corporate HQ, call center and depot. In fact, almost the whole company was based in a single building.
Security was non-existent. Anyone could walk through the front door and enter the data center at will. For example, FedEx guys would routinely access the DC wheeling in their deliveries unescorted and unchallenged.
One day a FedEx guy was wheeling in several boxes of new hardware. Someone - I forget who - politely held one of the 2 main doors open all the way so he could wheel in his dolly/cart/trolley. The only problem was that behind the door, on the wall, was the Big Red Button (BRB). The door opener leant against the BRB and, well, you know what happened next. The DC was shut down and all of New England went offline - TV, phone, cable - the lot. The call center went offline. Corporate systems - including phone - went offline. The only route to the outside world was a single analog phone mounted to a wall in the DC.
RCA was simple - someone hit the BRB. The long-term solution? Encase the BRB in a perspex box. And.....er.....that was it. Amazing.
Then again, this was the company who hired an IT Director who made the decision to (and remember this was the mid 1990s) equip the 200 seat call center with Macs rather than Windows PCs. His reason? He simply preferred Macs. That no commercial 3rd party systems/tech - phone, CTI, databases, etc. - integrated with Macs was irrelevant. He just liked Macs. Madness.
-
Tuesday 30th May 2017 18:50 GMT CJames
Not monitoring - not smart
Never underestimate the importance of monitoring data centre performance. In a millisecond world why is it that monitoring every few minutes, or not monitoring at all, is acceptable. The world has changed, application performance rules any on-line activity, so application and infrastructure owners need to continuously monitor performance in real-time, not just availability, or agree a performance based SLA with their supplier. Modern Infrastructure Performance Monitoring (IPM) platforms work at an application centric level and can proactively prevent slow-downs or outages before they happen by alerting before end users are impacted.
-
-
Tuesday 30th May 2017 21:40 GMT Anonymous Coward
Re: data centre performance..continuously monitor performance in real-time..performance based SLA
"2 posts in less than 2 hours and both posts look remarkable alike."
More than "remarkably alike". Word for word cut and paste (near enough?) in two different topics.
Registered today too.
I wonder if El Reg have sent Chris a copy of the advertising tariff?
-
-
Tuesday 30th May 2017 20:22 GMT cantankerous swineherd
from the grauniad:
The global head of IT for IAG, Bill Francis, has been attempting to find savings of around €90m per year from outsourcing. Francis, with no previous IT experience, was previously in charge of introducing new contracts and working practices to cabin crew at BA during the bitter industrial dispute of 2010-11.
draw your own conclusions.
-
Tuesday 30th May 2017 21:34 GMT John Smith 19
"Francis,..no previous IT experience, was previously in charge of introducing new contracts "
So basically a management goon.
Doesn't sound like someone who take advice, especially from subordinates, before, during or after an IT situation.
Which is a bit of a problem if you a) Know nothing about IT and b) The s**t has hit the fan.
-
-
Wednesday 31st May 2017 03:47 GMT Jtom
While there has been much discussion about Cruz's future viz-a-viz the system crash, I have seen no mention of the other half of this debacle: the complete and utter meltdown of BA's airport services. Rebooking desks with no agents, poor communications, wrong info given out, unretrievable luggage, etc.
I have always told my staff that everyone makes mistakes, it's how you handle them that shows your worth. BA failed miserably, and has been continuing providing poor customer service since. They seemingly have no plans as to what to do if they aren't flying due to their own cock-ups (e.g., they don't have the luxury of being terse, saying, "Don't blame us, it's the weather. Sorry you can't find a hotel room, but we aren't responsible.")
As soon as it hit the fan and flights started getting cancelled, someone should have been calling in caterers to provide free food and drinks, sending reps into the crowds to give them updates, doing what they could to help special needs passengers, working with airport and nearby hotels to get as many bookings as possible (and threatening them with lawsuits if they try to price-gouge), and so on. From what I have read, major airports (Heathrow, Gatwick) were in shambles and the BT staff running around in circles.
-
Wednesday 31st May 2017 07:51 GMT anonymous boring coward
These people aren't suited to thinking by themselves and take any initiatives.
I once was at an airport where my luggage would never reach the pick-up area because the belt was full with unclaimed luggage from some earlier flight (and those passengers where nowhere to be seen).
I asked the staff there why they didn't remove the luggage so the new luggage could reach us. They "weren't working for the luggage handling company".
Of course, the orange-vest people on the other side of the rubber curtains couldn't come out into the hall to remove luggage, so they just stood around waiting for the band to move...
So I and another passenger removed all the luggage so the band could again move, and our luggage could get to us.
And this was in Western Europe. Not bloody USSR.
-
-
Wednesday 31st May 2017 07:39 GMT anonymous boring coward
Cruz has also promised that passengers will never again have such an experience with BA, in part because the carrier will review the incident and figure out how to avoid a repeat.
Is that because the only way such an experience can happen is if this incident occurs in the exact same way again?
Very intelligent thinking by that fantastic CEO. Obviously this has nothing at all to do with having enough of the right kind of people available to quickly identify issues and resolve them urgently. Nothing at all.
Looks like BA has a f*cking politician at the helm -and we all know how well that usually goes.
-
Wednesday 31st May 2017 09:41 GMT scubaal
currently in LHR
In BA Lounge in LHR still trying to get to where I am supposed to be (spain) from OZ due to BA cock-up.
Most helpful BA office was in USA -trying calling that from Oz - but least they answer.
The Oz support number ON THE BA WEBSITE has a recorded message that says 'this number is no longer in use - please call the number on the web site'.
You couldn't make it up.
-
Wednesday 31st May 2017 12:23 GMT mojacar_man
BA didn't need to outsource offshore to India. TATA, a massive Indian company with its finger in many pies, has had a major IT presence in UK for many years. (For instance, they took over the the IT department of Legal and General Insurance about a dozen years ago. Programmers had the choice of working for TATA of walking. Many walked, rather than accept the new terms!)
TATA, of course, are free to share their workload - with their Indian head office!
-
Wednesday 31st May 2017 14:14 GMT Anonymous Coward
The big picture has historical precedents...
Some light has been shed upon the BA systems outage over the Bank Holiday:
"Speaking to Sky News, he [BA CEO Cruz] added a little more detail about what went wrong, as follows:
On Saturday morning around 9:30 there was indeed a power surge that had a catastrophic effect over some communications hardware which eventually affected the messaging across our systems.
Tens of millions of messages every day that are shared across 200 systems across the BA network and it actually affected all of those systems across the network. "
From a policy perspective for politicians as an IT Systems Engineer I would draw the following conclusion, and warning, to us all:
The BA disaster will be followed by others until all telecommunications are seen as a single multi-modal utility channel and as a critical local resource by every region across the planet and brought under common control and direct accountability to the people it serves.
-O-
What BA CEO Cruz describes is simple.. there was a failure of their Critical Component Failure Analysis process. This is de riguer an everyday functional department of any major IT operation.
The constant problem these exercises live with is in defining the boundaries of "your" IT operation, over which you have control, and the risks beyond that where you have none, or little, and to mitigate against them while hoping all the components 'sites' making up that external environment are as diligent in their CCFA processes as you are yourself.
The potential resilience and durability of complex systems that rely upon the Interconnection of many independent participants/components is theoretically high... if every component has available to it many alternative ways of making the connections it needs.
But the degree of interconnection of any specific independent site/component in a network of commercial entities is the consequence of cost-benefit driven decisions taken in its strictly local context. It is also often a factor the specific details of which are regarded as being commercially sensitive and are kept confidential within each local operation.
The result is the failure of the 'Internet' to achieve the absolute resilience which was the raison d'etre, and the main design feature, of the 1950/1960's USDF 'Arpanet' out of which the 'Internet' has evolved.
Historically, just like chaotic complexity eventually characterised the commercial competitive generation of electrical power and the laying of railway track, the entire 'multi-modal' telecommunications enterprise has now reached the same level of criticality to our economic existence in displays the same state of chaos and susceptibility to CCF events.
The BA disaster will be followed by others until all telecommunications are seen as a single multi-modal utility channel and as a critical local resource by every region across the planet and brought under common control and direct accountability to the people it serves.
Clearly there are existential impediments to achieving this, but they go beyond the froth of ICANN haggling and Steely Neely's earlier resistance to non-EU corporations' pressures. There is a context in our Economic History and the philosophies of political economy by which they are formed and interpreted. For some of that.. and a peek into the Pandora's box that lies there, read on... you should have time before they let you board your next flight...
-O-
The 'Internet', like roads, running water, rail and energy reticulation before it, has evolved now evolved from being a luxury novelty to being a critical part of the infrastructure of many national economies and their global trading links.
In the past whenever a complex system technology has reached this stage it has been brought under total central control within its country of operation. This 'nationalisation' has been achieved, in general, either by public ownership or nationally controlled 'sole operating licences' granted commercial operators in clearly defined non-overlapping domains whose service levels, revenues and operating profits are expected to be tightly controlled in exchange for an absence of competition, except when periodically bidding for licence renewals.
In the past the evolution of a new technology from 'novelty' to 'utility' was easily confined within the geopolitical domain of a country and regulated accordingly. It is inherent to the Internet, and telecommunications in general, that this technology creates complex local systems interconnected in complex global systems which transcend geopolitical domains.
However, the Internet is no longer alone. As it evolved so too did the global energy reticulation system and the rail and the road systems. None of them have any significant degree of resilience designed into them. Some resilience is there, but tends to be at the duplicative level. There is very little evidence of the complex interconnections that high levels of 'resilience' require. UK's winter gas supply depends completely on 2 'taps'. One in Russia and one in Sweden. The few remaining liquid methane tankers do not have the capacity, or operational flexibility, to quickly step in to the energy gap that would be created by a sustained cessation of piped gas from across the North Sea.
After commercial interests have successfully exploited a new technology, moving it from a market 'novelty' to a national 'utility' we have seen over a period of more than 200 years that it has been left to the national exchequer to foot, or underwrite, the bill for the expenses of mitigating the risks of economic disruption to which the commercially driven development of 'utilities' has exposed the general population of businesses and households.
Examples include energy stockpiles, rail track networks, channel tunnels, enlarged and deepened port facilities, motorway links to outlying areas, fibre-optic cables into rural areas, cellular phone networks, into areas with poor signal propagation characteristics and low population density even though having numerically large numbers of potential users.
The trans-national geopolitical domain which is the European Union, despite subversion by commercial and neoliberal politico-economic interests of late, has been very effective in creating the conditions for achieving high resilience at lower overall cost of many such complex 'utliity' technologies. Beyond its geopolitical domain it is the USA which seeks to arrange such things to its own benefit and under its geopolitical control through a combination of financial, gunboat and global 'administrative' functions and organisations which it economically dominates through their dependence upon its, the USA's, high-levels of funding relative to other contributors.
Brexit affords UK entities, which relative to their EU ones have large global and hence also US interests, to continue to exploit the 'special relationship' which the largest UK financial and trading corporations, through their governments, have sought to preserve since becoming indebted to the US by WWII. By this means they have been able to transform without loss the value of their global commercial interests from being under the direct protection of the then defunct Empire to the indirect, but potentially more effective, protection provided by the USA.
Now, in 2017, the EU has become the single largest trading entity in the world, outstripping the USA in the position it has long jealously guarded for itself. This has threatened UK interests whose commercial roots lie in the days of our own Empire and in the 'special relationship' which is so often spoken of but which is so difficult to define, unless one says exactly what it is. In the historical context, it is no more and no less than a global trading alliance protected by the mutually exercised global projection of military power.
By extending its commitment to the EU through the Lisbon Treaty, the UK was suddenly able to play 'both ends to the middle'. It was de facto a member of the two largest trading empires in the world, both commercially and militarily through NATO. During the last two decades the EU welcomed the strength in world markets that the UK added to it. On the other side of the Atlantic the US might be forgiven for feeling a bit double-crossed as the EU weakened its own growing need for global trade pre-eminence.
Entities participating in and benefiting from the 'special relationship', on both sides of the Atlantic, who were now finding themselves increasingly being sidelined by UK and other EU commercial allies, and by increasing EU resistance to US-dominated global organisations and to the demands of US global corporations. At the same time the US was having to compete with a resurgent China, then SE Asia and soon to be followed by India.
The EU's military capacity was largely confined to a defensive posture, mandated to protect its members' trading routes. The EU was a poor partner for UK global interests as compared with the indirect protection of the US loosely allied to the UK's own military. A Brexit presented, for them and the US, a win-win situation.. strengthen the 'special relationship' that mitigates risk in international trade, and weaken the competitive trading power of the EU.
In so far as those US global trading interests identify with the US political right, as do their UK counterparts, it bears some speculation as to how far back their overt alliance goes. And to what extent Mr Farrage's UKIP is a child, and deliberate pawn, of the UK and US political right embedded in the foundations of both their main political parties. Certainly the public appearance of the close relationship between Trump and Farrage was very sudden and very intense. And once one speculates this far it that raises the prospect of speculating upon the actions and influence, upon anti-EU moves to the right of players in many EU nations, of the other player that is mooted to be in the game to re-assert global influence... Mr Putin.
-O-
-
Thursday 1st June 2017 09:00 GMT Anonymous Coward
Wow
Did you write this as a response to the BA debacle? A good piece, if it was sourced elsewhere please say where. Up vote as I think a lot of what you say is valid.
Britain will be no more independent than we were a year ago except we will have different masters. I'm pro-US and pro-EU but to some extent we have burnt our bridges. As for BA they will be one of the first to go, it already has it's holding company registered office (IAG) in Madrid and has a Spanish boss.
-
Friday 2nd June 2017 13:06 GMT FudgeMonkey
A network issue?
Of course it's a network problem - anything you can't explain or don't understand is ALWAYS a 'network problem' in my experience.
If I had a pound for every time a misconfigured server, badly written application or lack of hardware maintenance was described as a 'network problem', I could hire somebody else to tell them it isn't and why...