Thats what happens when
you allow an NT expert near comms kit
A hamfisted worker at colo provider Telecity shut down "several" ISPs and their customers across the country when he started pulling plugs at one of its datacentres late last night. The hapless wire-scrapper then proceeded to make matters worse, by trying to fix the mess himself. El Reg hasn't been able to discover the …
...when we used to have a supplier who sent in their man every now and then to do an upgrade of something or other. Anyway, he had an annoying habit of just powering off machines by the power button on the front. Inevitably, it was the wrong machine.
He was affectionately known as Pokey Pat. He was so good at it that he managed to shut down a MicroVAX when he was supposed to be working on a Wintel Tower.
Subsequently, he was barred from the computer room (yet somehow escaped a beating)
Being an engineer for an ISP I take access control in my stride, requiring multiple IDs, pre-authentication, the right badge etc etc etc..... THEN! Then let some idiot & Son in to pull cables.... Most datacenters are extremely tidy which is a total contrast to some customer end comms rooms with cables laying everywhere. I really wish some IT Managers would pull there finger out and sort out the mess I encounter everyday, no wonder things get accidentally pulled out. If it happens to you the 1st thing to do is tell someone and show them exactly what you did.
they cannot keep their sticky little fingers off the frikin equipment.
This is the difference; developer's love the equipment and cherish it, they dance with it in a cyberbraced tribute to logic and intelligence.
Dave, the big red button pusher, though has no concept of the underlying beauty of how this is all orchestrated so they just rip cables, inbetween scratching between their buttocks and picking their nose, in a deperate bid to evolve from the subpar primeval soup they have just emerged from. Oaths.
Oh, they are always called Dave.
I've stopped counting the number of time I've seen that sort of things happen in a data centre. The best excuse I ever heard was "we didn't know what this big yellow cable was so we cut through it" after we discovered that the hapless hacks hired by one of our customers to "clean up" the spaghetti junction that was their data centre had been slightly over-zealous. For some reason, the traders who couldn't get any market info until the ethernet backbone was replaced weren't very impressed.
What's more worrying is the fact the guy even had a tech job, and was then let loose in a data centre unsupervised. A remotely controlled fire suppression system could have nipped this problem in the bud quickly. Gas the feckers next time. (Before they start fixing things)
Can't fault Andrews & Arnold whenever there is a problem tho. They blog everything so you know where you stand, and you can normally grab the techs in IRC if there is an ongoing issue. Compare that to the drones that service the lines of the 'big' ISP's, its pretty refreshing!
We had our service down for about 4 hours (shortly after midnight to shortly before 4am). But, we couldn't contact our ISPs local router so either it's a co-incidence, or our connection is trunked all the way to London which seems an odd way of doing it.
We have a fibre connection from Thus (previously Yourcomms). Knowing what I do about their local topology, I was sure there's a Cisco router at the local POP.
obviously Qualified to sit about all night drinking lots of coffee, surfin 4 pron.
New Telecity job vacancy.
Critical qualifications:
1. To actually be truthfull on your CV.
2. Actually know what the hell your doing when wondering about near Racks stuffed with miles of cableing.
3. Ability to leave cat6 and fibre racks alone when bored. or using overhead cables and ducting as things to swing from.
4. Have at least 4 years hands on experience in data centres and in IT.
STOP..... Monkeying around...
Employ someone who is experienced instead of some monkey that has lots of worthless CBT qualifications and no real world experience. There's plenty of unemployed Good techies out there.
Plugged them back in randomly eh!?
So this must be the first case of ISP data roaming - cue bad joke about roaming data charges.
.
I agree, the person responsible is an idiot, more so. Clearly somebody let entry into a 'comms' cabinet or two who is irresponsible enough to make it worse by plugging them back in at random. I mean, who nowadays who works in server/networking or even cable routing DOESNT realise that the correct ports matter? Did the prick also try plugging a fibre cable into an RJ45 port?
Anyone who comments on here probably just feels guilty because they once did something equally boneheaded but they either got away with it or else they managed to cover it up.
I know I have done a bonehead before but not for a while now.
Nobody's perfect and anyone who has worked a night shift knows how easy it is to make mistakes when yer tired and overworked. It's easy enough to get the wrong rack in those warrens and some of the places I have worked just entering the server room was dangerous due to birds nest wiring.
So, go easy on the poor muppet as it could easily have been you.
And before you all tell me that you would never make such an obvious mistake, you are either lying or else you think you are God (in which case you should be in Sales!). <LOL>
This is depressingly predictable and well short of the worst case, just quite visible right now.
I could tell stories of the raised floor void being used as the pizza fridge by night shift, cable trays that nobody would work on because of the ops staff nasty habit of vomiting through the floor hole in the bottom of the rack after a heavy night out and it running along the cables until dry, large Internet systems you have almost certainly used being taken out by a VP deciding to shut a rack door on the bulging 'messy' cabling to clean up for a customer visit, vendor engineers who can't tell which is the failed PSU by the big red light on it and pull the working one with the big green light taking out a financial platform... But I won't...
By happy coincidence, the BBC is running a story at the moment about some id10t builders who did something similar: they went to a house and ripped out the interiors ready for demolition... except it was the wrong house, and the occupier was just away on holiday.
A long time ago, in Outback Australia where nerds are rare, I was once called in to do a routine check on a rack for a largeish wholesaling / retailing ISP. I'd never done anything with these people before...
One of their tasks was to 'run a diagnostic' on their APC UPS and report on battery levels, etc.
There was a nice familiar looking 9 pin serial connector on the back of the UPS, and I just happened to have a nice 9 pin modem cable and a copy of HyperTerminal - so figured it'd be a 9600,none,8,1 connection followed by a ? to learn what commands the UPS knew..
Instead, upon connection of the cable I got a clunk, and a sudden drop in white noise as a number of fans ceased to fanerize. A quick trip to the front of the rack and press of a power button on said UPS returned the room to it's former white noisiness, which was soon interrupted by a number of calls to my mobile wondering what happened.
So, a lesson here is that just because it looks like the plug goes in the hole, it might not be the right type of plug! (And even today I wonder at the logic that someone would design a connector like that which if used with the wrong lead would immediately power down the UPS.)
Gave a system admin a bundle of fibre cables going to the SAN during a lease return and told him to unplug them from the patch panel. I go around the other side to pull up cables from another returning storage array, I come around the front and he has a wonderful grin on his face of "boy did I do a good job". Unfortunately mine I'm sure was a look of horror, instead of unplugging the bundle of 15-20 cables I handed to him, he unplugged *ALL* the fibre cables in the patch panel. I grunt and squeek something in ununderstandable half words and run out of the datacenter (leaving him perplexed as to what happened but knowing that it wasn't good), on my way through the building I yell at the ops folks that "shit is going down" they say "what stuff?" I say "everything!". Get to my laptop pull up the cable report (that I luckily had updated 2 days before), run back in the datacenter tell him to get out of the way and I start plugging things back in. Spent the next 2 hours getting filesystems and databases happy again.
It's kinda funny now, back then it wasn't so much
"And before you all tell me that you would never make such an obvious mistake, you are either lying or else you think you are God (in which case you should be in Sales!). <LOL>"
OK, I'll bite.
About four years ago, tracing and auditing comm lines prior to a PBX upgrade at a large department store, identified and tagged all except two. For the life of us we couldn't figure out where they went, so we disconnected them, figuring that whoevers service we had just cut off would get on the phone sharpish.
Which would have been fine, had they not been the emergency phones in the lifts. And had one of the lifts not subsequently broken down over the weekend, trapping a box full of customers between floors for several hours until someone heard their plaintive cries for help and called the Otis engineers to rescue them.
Whoops.
Damn right I'm posting this one anonymously!
Well done for being honest. And yes, I have done it too.
I was trying to sort out the mess left by the previous incumbent - I've eaten spaghetti that was tidier than his cable work. After having stripped out all of the co-axial cable that was running between walls, over ceilings, along gutters covered in green slime, through drains covered in brown slime, it filled up a 15 cubic metre skip and still had loads left over to nearly fill a second skip.
The best bit was the point where 2 cables were joined in an electric terminal block - he had jammed a 3 inch nail through to make the connection. And yes, there was also 2 live electric cables in the other part of the block!
Unfortunately, I also removed the cable for the CCTV feed. We only found out 2 weeks later when they turned it on for the first time in a couple of months. As you can tell, security was a high priority.
I might have been forgiving if he had stopped when told to stop. Not stopping by plugging wires back in randomly disqualifies him for any sympathy. He is a dangerous idiot and should be fired.
And yes I certainly have made a few boneheaded mistakes in my time. However, not knowing what "STOP! DON'T TOUCH ANYTHING!" means isn't one of them.
This is probably the same prat who when I was helping a client install linux and finding the rack KVM keyboard unusable, found that borrowing a keyboard was going to cost 250+VAT as a "support callout". Thankfully the client told the bod on duty "No thanks" and we then rummaged through the company lockers.
Lo we found a old kB and managed to continue with the install. When the prat saw us using a keyboard he stormed in shouting "where did you get that keybaord from!". Thsi time my client told this idiot to go F**k himself". He stormed off in a rage.
Really great customer service!
Jacqui
It's not a case of making a mistake, this is not having a clue
We have all closed the wrong port, pulled the wrong cable or even hit the wrong power button, shit does indeed happen.
However you do not pull a number of cables (even if you are convinced you know what they are) without tracing, marking and documenting them yourself.
You DO NOT rely on other peoples markings or documentation, because it will be wrong when it matters the most.
Once you do unplug them you secure them nearby (just incase you got it wrong), you finally remove the unplugged cables once you are sure things are still fine (leaving at least a full regular working day is best)
If told to stop then you do exactly that
I'm sorry but there are too many muppets in the IT business already, without trying to up skill the canteen staff with the thought of cutting costs.
I personally never had any issues when the some developers & server staff strung fibre and CAT5 at head height across aisles on a data centre, when they tryied to bypass the comms dept I previously worked in.
My shift partner and myself could work wire cutters perfectly well, and amazingly it only took 8 cables to be cut before they actually raised a work request to have their new boxes installed. Though the last 3 times we did cut it into several sections to make the point
Apparently the 1st five times where ignored becasue they only thought that somebody had caught the cable and had some sort of accident (yeah go figure)
Paris cos she stops after removing things from orifices
My favourite is the contractor who changed the burned out light bulbs in the computer room. It seems that one of the steps on his step-ladder was the exact same height as the emergency power off (EPO) button beside the door. And, guess where he leaned his step-ladder on the way out? Any idea of what a room full of mainframes sounds like when the EPO button is pushed?
But, the even better part is the contractor who came in to put a shield around the EPO button so that random step-ladder leanings wouldn't hit the button. The first thing he did was to bring his step-ladder into the room and lean it against the wall beside the door. <CLUNK!> Yes, again. :-(
Dave
"developer's love the equipment and cherish it, they dance with it in a cyberbraced tribute to logic and intelligence"
Another case of the look at me.. I'm important because I know perl/php <insert other simple language here> Ive been a sysadmin for 8 years and you wont believe the number of times developers have fucked up databases or web servers. Number of failures by NOC or by Sysadmin ... 0.
As for TCR, they are IMHO a shit ISP (we are at their prospect house facility). They have absolutlly no concept of security... dont believe me? Phone up their support tell them you are from say Discovery/Google and ask them to reboot your servers....
They NEVER ask for authentication of who you are... Bad move TCR!!!
And linx are in bed with them... we're doomed!
There's a lot of people expected to work long hours in the IT industry. I know what it feels like. I know what it can do to my mental state.
We don't know why this guy did what he did, but I've heard enough stories about Programmer Death Marches. And the sort of thing he did: it feels familiar.
I'd suggest that there is an institutional recklessness in IT management. Most of the bugs you curse in the software you use are probably down to the same ultimate cause: an excess of programmers running on pizza and Jolt cola.
Tech going into one of our client's computer rooms saw it was dark, put his hand around the door to find the light switch, found a switch and pushed it ...
I wasn't there, but I imagine the silence emanating from a previously humming server / comms room when you hit the Master UPS Shutdown button would be deafening ...
Thanks for sticking up for me there. I promise not to tell anyone I was pinching the cables for your xbox 360.
Still. I think we got away with it this time.
It's not going to affect our contract to host the new ID card database is it ?
You are the best boss in the whole world.
Love you ! xx
Been there, not quite done that... It's an easy screwup to make but one that only a noob should make.
Document, document and document before you unplug anything, I carry a label printer for just that but I generally find the best thing is to get someone else to hit the big red switch or pull the requisite cables/fibres.
It's so much fun watching a sysadmin pull the mains on the *wrong* mail server or storage controller (it happens more often than you'd imagine).
Yeah, I'm the BSEH, Bastard Service Engineer from Hell. I'll be the one with a *huge* smirk on my face, arms folded and sucking my teeth when you down your SAN by accident.
Paris (yay, she's back), because I'd be smiling if she went down too....
Once back when we used leased 64Kbps lines and I learned(self taught. Leaving manuals near me is dangerous) how to switch on compression on the routers...
Talking a user through telnetting into a router and changing the port settings is... painful...
I was actually in HEX 8/9 when the chaos broke loose, although it was on a different floor AFAIK (we were on 3F).
The security at TC, on paper, looks good. Sign in at building reception to get your visitor pass, then to TC reception to get your card (and you need 2 forms of ID and a random number emailed to the cage owner).
In practise though, it's trivial to get into TC and start yanking crap out WITHOUT any of that. Hell, we got our delivery driver, with no ID on him, all the way into the cage without any checks at all. And HEX8/9 3F is completely unpatrolled.
So I'm not really surprised by this. Needless to say none of my own boxes are hosted at Telecity premises (my employer's are though).
I am not sure that the network admins in that place have actually ever seen a physical cab. I used to work in a datacentre and rebus was one of our clients. They had four cabs, with 2Mb connections. Eventually they filled all four cabs. They used to phone up and ask for port swaps. However I don't think they stopped to consider the physical nature of of nearly 400 2Mb co-ax connections. They certainly didn't stop to consider the consequence of repeated swapping from port 2 to port 45 on cab 1 and port 18 to port 50 on cab 3, and so on and so on.
We attempted to tell them their cabs were become a nightmare to work on. So we invited him in to sort it out. Ho ho ho. We had to strip all the cables out and start again, as it was impossible to do any more work on the cabs. The cabs resembled some kind of network bondage session.
Why weren't the cables labelled? Every cable in a datacentre should say where it comes from and where it goes to at each end (including port numbers). Cables are designed to be unplugged and plugged back in, it is therefore pitifully obvious that this will happen at some point, and if you have systems that are port specific you need to know you can unplug everything and plug it back in with no probs at all. As a storage person, I would be pretty pissed off if I had to replace a director card and found that none of the fibres had been labelled.
It may be the fault of the person who unplugged the cables, but the extended outage seems to be the fault of the installation engineers and the processes that they work under.
I have had personal experience of This Place and their lackadaisical attitude to client kit - in my case, it was tape stackers that they....
..actually, I can't go into detail, my blood pressure can't take it.
My opinions on that particular visit from the F***-Up Fairy can be found here:
http://dungeekin.blogspot.com/2008/03/event-visit-from-fuckup-fairy.html
It's time for my therapy session, I'm still trying to handle the memories and the repressed rage!
(Paris coz even she could do a better job!)
A good number of years back, working in a company which ahd a number of switches, but hadn't sprung for any of the bonus features, like backplane links or anything.
So all the switches were connected with patch leads (I Know, I know bandwidth bottlenecks and what have you)
Anyway, someone, it may be me, it may not, was 'tidying' up some cabling, and ran a new patch lead between two switches......
Two switches, that were already patched together....
Oh the joy, the joy...
Traffic on the network, fairly rapidly ground to a halt.
When the culprit cable was identified, the person responsible was of course on the wrong end of many jokes, and the cable itself was mounted on a piece of wood, and hung above the server room door as a reminder.
Anonymous, as I do have some professional pride!
So, looking at the story and originating blog: some guy writes that some guy says that some other guy is an idiot.
It's mostly fact free and yet were happy to point and laugh at some poor f'ker, who presumably will know about the incident's reporting on the Reg.
Cables should be labelled. People who are working on live kit should be qualified and have some sort of plan or process for the work they are doing. If this was not the case here it's a failure in managing the datacentre.
I take it you've never made a mistake in your life then?
'My' story is of an Electrician (I was a Computer Operator at the time, when computers were real computers and not these dinky little beige boxes - ah, George 3!) who had to change a lightbulb in the Data Centre (only we called 'em Computer Rooms at the time)...
Poor chap had had to wait a couple of minutes when he arrived as the 'front door' to the computer room had keycard access control on it and we were kinda busy. One of my colleagues let him in and he changed the bulb, then headed for the door. Since we were all still working, our little electrician friend decided to see himself out.
Darn, forgot to mention that the computer room had a raised floor, with a ramp from the 'front door' to an inner door, bit like an airlock if you will. So Alfie gets to the inner door and in a rather loud voice says, "Which button do I press to get out? Is it this red one?" as he reaches for the EPO...
Y'know that bullet time effect from Matrix? It was like that - visuals slowed down, sound seemed to drop several octaves and become a long, drawn-out shouts of "N-o-o-o-o-o-!" and half-a-dozen Ops start heading towards one hapless electrician but we were too late - he pressed the button and...
Nothing happened. Not a sodding thing. Well, the button went click and six grown men collapsed on a skinny bloke in blue overalls but nothing really significant.
Our much-vaunted EPO, designed to kill ALL power to the computers in case one of us came into contact with a live (415volt, lots of Amps, 'chargrilled Op in 30 seconds') wire, did ab-so-lutely nothing. Nada. Diddly-squat.
Ah, but there was some wailing and gnashing of teeth over *that* one, let me tell you!
I can have a little sympathy for the guy. Many times I've ripped out code that clearly could not work, has no references in the main codebase, and was commented everywhere with "Doesn't work." Of course the next day is catastrophic. Some obscure system that used to spit out errors all day now crashes. The obscure system is still listed as a health check for the hardware that replaced it two years ago. The health check failures cascade and everything is dead. Now comes 100 "WTF happened" e-mails, followed by a dozen meetings on the same topic, followed by a new set of rules so specific that they have no general value for future events whatsoever.
Knowing Datahop and their attention to detail - they would have been but if it was just a random member of Teleshity's 'intelligent hands' staff that was doing something - im surprised they could even read the work request properly.
We had a feed from DH a while ago for over 2 years - never had any issues with it. On the other hand, Telecity constantly screwed simple requests up from rebooting a wrong machine (we send request for server 123, and they reboot 124!?) to removing the wrong hot swap drive even when given the exact label.
At least everything is back up now.
At my last place of employment (an electronics factory), I turned off a flow soldering machine because none of the isolators in the room where I had been working late sorting out someone else's foul-ups were labelled. Those things take a full day to heat up to working temperature. I only kept my job because of how much s#!t the company would have been in if I did the usual "disgruntled former employee" act. (And nobody else would have done my job for what they were paying me.)
And a former colleague of mine once managed to write a WHERE clause that matched every record in the database, not just the one that he wanted to alter. This was in an old version of MySQL that didn't support transactions, but fortunately most of the information in the database was duplicated elsewhere -- in disparate files. It took some amazing Perl coding to fix it. Since then, he always added LIMIT 1 to the end of his queries; and I test out my WHERE clauses by first using them in a SELECT and seeing if the result set is sane. If anybody asks, it's to cache the matches in RAM and so speed up the real query.
Well apart from the cleaner pressing the data center EPO button, my favourite story goes back to the days of IBM Mainframe consoles, hardwired keyboards and screens with a cluster of keys next to the main keyboard, one of which would halt the processor, not a problem in it self, but putting it next to the tape machines, where real people work was the problem.
...... Long long time ago when backups and processing was performed to and from tape, big spools of 1/2" tape, the write enable mechanism was a plastic ring in the back of the tape spool. Good practice was removing the ring when taking it off the tape machine, but what to do with it, well throwing it into a box was the usual practice, but during night shift, just sometimes you might throw it high into the air, say 20 or 30 feet and let it fall into the box, if not, collect them all up at the end of your session.
Yes that's it ... (a million to one chance) the falling tape ring managed to hit the stop button, bringing the mighty mainframe to a crunching halt. Hours of delays and lots of explaining to do. Paul.that did make the rest of us laugh.
The fix well a small clear plastic box to cover the keys.
I'd say the good old days, but really I did hate night shift.
I once did a planned work that took me 10min instead of the usual <2min, when I explained to the boss that I had been a bit thick in not noticing a "transmit inhibit" had enabled after I changed some settings (neglecting to mention the pub dinner before shift start) he said: "That's why planned work notices are scheduled for 10min even if it's a two minute job."
As my lecturer in broadcast engineering said to us "Your not a real engineer until you've taken a broadcast network off-air." .... I'm a real engineer several times over.
Paris, because she knows where everything goes.
Big_Boomer your logic genuinely scares me. I don't understand how people like you and the plug-puller have the brain power to walk, let alone do anything with computers (like say, turn them on, or pull out their plugs..)
The worst thing about this story is that once again, one person can be blamed for a MAJOR problem. But it's not really fair to blame him alone since he shouldn't have been allowed in that situation in the first place. The whole system is f****d up. Quite scary really. What next ?
PS - What does "I know I have done a bonehead before but not for a while now" actually mean ??
PPS - Are you sure?