Re: Not only in Ireland
Yep,
I remember years ago when a light aircraft crashed in to an Irish cemetery!
The rescue team were still discovering bodies months later...
355 publicly visible posts • joined 7 Jul 2007
I've never done the wrong server restart but I've got a very similar t-shirt to that, and not so long ago either!
The company I work for had a 3rd party system that I was the SME for. In a nutshell said system consisted of several Windows servers running services which would connect to our IVRs and telephony systems and grab the CTI events being chucked out of said systems, then write them to a very large SQL database. My company would then use that data for popping customers details on to agents screens as the customers call hit the agent's phones.
Just side tracking a bit, screen pop, for those who don't know, is very widely used in contact centers as it can give better customer experience, (for example you know who's calling and can answer with "Hello is that Mr Smith"), and it can also drastically reduce the average handling time of an incoming call, as your agent don't have to look the customer up in your customer DB etc.
So back to said system, it generally worked well - until there was a WAN flap or local circuit drop at a site. Then the CTI events would be missed or collected out of order causing all magnitudes of chaos, such as no screen pop for that site!
When the shit hit the fan, fixing said system was generally fairly simple, just restart one of the windows services on one of the several boxes. There was one exception to that, one service we never restarted! But more on that later...
So now you know a little about the system and it's general importance, let me set the scene for my t-shit moment.
I'm at a team meeting down in Birmingham, the whole team were there including the boss - and the boss's boss. Now, at our team meetings, one of us normally does a mini workshop on one of the systems we either spent lots of time supporting, or are SME on. Just a quick overview, a 'get to know it a bit better' for the rest of the team.
It was my turn that month to do 'said system' that I'm supposed to know so well!
So the boss is waffling on about process or something when we gets a fault in for my system. Boss suggests that rather than do the overview later, why don't I go through how I'd fix said fault now - live - with the team.
"What a great idea" I thought, it gets my bit out of the way and it means I can now have a pint at lunch to.
So I takes a look at the fault, it's simple enough, a circuit fault at a site has caused that site to drop the CTI connection and the site isn't getting screen pop. All I need to do is to restart the service on box 3, it's a ten second job!
After hooking the laptop up to the projector, I start explaining the system a bit to the team and RDP in to box 3 to show the team the service we need to restart. After waffling a bit I then log in to box 10 via RDP to point out some related and useful bits on there.
Now remember I said earlier than there was one service we never restarted? That was on box 10 and it logged the agents logon and log off events to the phones for every site. Now screen pop would only work if there was a phone logon record for the agent, and for reasons way too boring to go in to here, restarting the service on box 10 cleared all those logon records!
So if that service was ever restarted, to get screen pop working again, every agent in our company would have to log off their phones then back on. Doesn't seem like a big thing, but it would cause call queuing, customer experience issues and large headaches to the call centre managers who''d have to coordinate said exercise and would complain bitterly to upper management.
Oh one other thing about this system, while each box had a different function, the one service for said system which ran on each box was called the same thing!
But back to my t-shirt....
After waffling a bit more restarted the service to clear the issue, then I logged off RDP feeling smug that I'd done my bit for both the fault and the team meeting...
Which was when the new fault for no screen pop across the entire company came in!!
And I realised (as did the whole team) - that I'd restarted the service on the wrong box, oops!!!
How many agent had to log off and back in?
Oh about 7500.
Not quite true,
It's not banned, you're just not allowed to encrypt traffic. Most if not all vendors now have settings for justthis sot of thing. Likewise there are certain places where encryption has to be enabled (GOV for example).
If deploying CISCO you have to be careful with that one as during install it asks about it and if you pick the wrong one, you're fecked further down the line and it's a nuke / start the install again.
D