Single point of failure
These buggers are hidden in plain sight.
Welcome to another working week, loyal readers, and another dose of Who, Me? – the Reg's weekly safe space in which readers submit stories of times when tech support went not quite so well as they might have hoped. This week's hero we'll Regomize as "David" and his story has something in common with recent tales of finding out …
It’s the wild colour scheme that freaks me. I mean, when you try an’ operate one of these weird black controls which are labelled in black on a black background, a small black light lights up black to tell you you’ve done it. What is this? Some kind of intergalactic hyper-hearse?
Icon (Which is probably quite incapable of drinking coffee!).
We put the power button on the bottom so it's functionality wouldn't spoil the pure perfect lines of perfection (said in either a supercilious French accent or a rather sinister Dr Strangelove German)
I would hate to have to find the Fire Alarm button in Apple HQ, you probably have to swipe clockwise 3 times on a glass door
My favourite single point of failure so far had to be discovering that an entire office of 25 desks, PC's et al had been wired through a single domestic lightswitch by the entrance door. Walk in, flick the switch and everything sprang to life with a nice loud -crack- from the swtich.
Turned out the previous tenant of the space was rather lazy and/or ignorant of basic electrical practice.
Convincing the area manager to sign off the rewiring job was difficult until there were a few "full office shutdowns" in the middle of the working day due to someone flicking the switch by mistake.
"Always keep your KitKat wrappers"
Many years ago when I was a child, a cruise ship, crew section up front.
A big fuse board that looked like it wanted to be an old fashioned telephone exchange when it grew up. Loads of fag packets at the bottom.
Why? When a fuse blew, just take some foil from the packet and wrap it around the fuse and pop it back in.
Icon, because it wouldn't have been the first time. At least there was a plentiful supply of water...
Any metal will do, right?
This may be an urban legend (or rural, depending on where you think it may have originated), but a .22 caliber round was apparently used in lieu of a BUSS fuse.... until it got too hot.
My classic story for that was the "temporary" radio hut installed in 1948 adjacent to a water tower (for the antennas) which was only replaced when bits of concrete spalling off the tower started coming through the roof (250kg lump at one point)
The fact that said radio hut housed critical infrastrutcure for police, ambulance, civil defence, local councils and fire services was of no interest to the accountants
When the racking support rails were finally unbolted, the walls fell off and promptly crumpled as they were totally rotted out
Depends on the time period and location. Modern-ish PCs will tend to use around 50W, add about 25W for a monitor so 75W per setup, for 1875W in total.
In europe most circuits are 230V 16A, so the PCs would only use about half of what the circuit allows.
A quick google search shows even €3 light switches are rated for 10A (i rarely see stuff rated for less around here), so that would pass as well.
Is it a good idea? absolutely not. Would it pass inspection? Most likely.
I do hope they don't have too many outlets to attach space heaters or vacuum cleaners though...
Would it pass inspection? Most likely.
Genuine question, as I'm not an electrician, but would it really pass inspection? If the PCs ("et al"!) are plugged into wall sockets and not hard-wired in in some way then that 10A switch is controlling 13A sockets (in the UK at least), don't you have to take into account that something else might get plugged in? In any case my laptop on USB PD will happily pull 80W regularly and 100W at times (have measured with a metered cable), and that's only because it uses a lower energy mode when on USB, the normal power supply is 230W. Tower workstations can easily use more, and all computers will tend to pull maximum power during startup (except maybe the GPU isn't fully engaged).
In the UK number of sockets on a ring main is unlimited (!, but a ring is meant to be limited in the area it serves) and a ring is 32A. It looks like a smaller radial can be 20A, but even then don't you need an actual isolator rather than a light switch? OTOH, isolators do make a bigger snap when toggled than single pole switches, so maybe what was wired in was actually an isolator, in which case probably fine electrically and just an unwise choice practically.
"but would it really pass inspection?"
Depends on the country and how willing the inspector is to look the other way for consideration.
Here in France it would likely not pass, as there are many rules about how things are to be wired and this is a country obsessed with paperwork. That said, opening a bottle does tend to make the atmosphere considerably more jovial, if you know what I mean...
"But the fuse on a lighting circuit *should*[1] be 5A"
Fuse? Hehe... My place is a beautiful example of "farmhouse wiring". There's a backbone of three phase, and stuff is hooked in wherever and however. The three phase is black, red, red, and red. Earth was patched in later, sort of.
Oh, and just the other day I discovered the back kitchen light had been rigged up with the neutral on the blue wire, and the live on the green and yellow wire. WTF?
I don't do anything without probing with a multimeter first. One of the phases in a socket in the kitchen is dead, I'll need to find out why. Looking forward to that adventure.......
Back in the mid 90s PCs had a pretty low draw. Every member of the Pentium/Pentium MMX family had a max draw of less than 20 watts. You'd need other stuff like a graphics card, RAM, and hard drive, but even without power management 50 watts for the PC was probably doable for a typical office that wouldn't have the highest specced stuff.
Not a chance in hell for any monitor of that era to draw only 25 watts, but typically removing power meant they were "off" and restoring power required pressing the "on" button so they shouldn't have contributed to the overall load.
"Every member of the Pentium/Pentium MMX family had a max draw of less than 20 watts"
Yes.... but the average CRT drew 100-150W with a startup surge of 500-900W as the degaussing coils flicked (that's the "bong" or "thump" noise when older TVs are switched on)
> Depends on the time period and location. Modern-ish PCs will tend to use around 50W, add about 25W for a monitor so 75W per setup, for 1875W in total.
However, at startup these PC:s are, for a brief moment, almost short-circuits. If a lot of them are behind one fuse, and started simultaneously (such as after a power outage), the fuse blows even if during normal operation the current is reasonable.
Learned this when in a previous job we had a room full of various PCs and servers for testing purposes (not a production server room, fortunately). They had been placed there and started over time. Then a power outage occurred and the fuse blew when power came back up. Happened a couple of times before we understood why.
Why?
They aren't motors, they don't have any inrush current. I guess there is a motor if they have HDDs, but that's 10-15 watts per HDD. There's some inrush current to charge up the capacitors in a switching power supply I suppose, but they aren't going to be fully discharged if the power is suddenly cut to turn it off and even then they aren't that large.
The CPU isn't going to be running at full power instantly upon startup, and the GPU doesn't become fully active until a few seconds after boot has started so there's some margin out of the PC's power budget to account for any inrush current to the HDD and PSU.
I have a laptop next to me running without the battery installed (because the battery went bad) and just for kicks I put it on a Kill-a-watt and started it up. Far fewer watts at startup vs when I maxed out the CPUs. It is new enough it has an SSD instead of HDD though.
That setup would have lasted even less time at a previous workplace I was at. This particular workplace had access-controlled doors for every floor/wing of the building. You know, the usual scan RFID card to open the door setup. Then when leaving, there was a button to press which opened the door, all standard stuff. Except that the door-release button looked remarkably like a light-switch, and was in fact right next to the actual light switch.
I've honestly lost count of how many times people tried to open the door and just switched the lights off instead because they hit the wrong button...
I've seen emergency shutdown buttons mounted next to an maglocked exit door which looked remarkably similar, and yes, the obvious happened.
Apparently it took 3 full shutdowns before someone got permission to put a shield over the "do not touch" switch.
Imagine the feeling hitting the exit button and hearing all the fans spin down to the sort of deathly silence usually experienced in graveyards, and that in the days before widely available defib units..
There is something quite otherworldly about a server room losing all power.
Many moons ago a fire suppression system test resulted in such a situation. The person testing it had removed the solenoid from the huge CO2 bottle so that would not fire, had switched the interlock with the main building fire alarm system so that would not trigger BUT had not turned the magic key for the UPS main contactor.
He triggered the alarm, it does everything as expected and suddenly we are all stood in the dark in absolute silence. And this was mid morning on a work day!
At least you didn’t have those old crt monitors.
It was the first thing we did after a power failure. Run around and try to switch them all of before power came back on. Even a few of them could blow a single fuse with their startup surge. And too many of them could blow the main fuse even if the local fuses didn’t.
The other fun thing was to leave 2 identical monitors side by side to poweron at the same time and watch them trying to degauss each other.
Mind you these were 21’ sun or ibm monitors. The buggers officially required 2 people to handle them.
Nigel gave me a drawing that said 18 inches. Now, whether or not he knows the difference between feet and inches is not my problem. I do what I'm told.
But you're not as confused as him are you? I mean, it's not your job to be as confused as Nigel.
Spinal tap, where the stage designer builds a Stonehenge set 18inches high from a sketch by a confused heavy metal band
Not as bad as BS Johnson (the BS is for Bloody Stupid and Bergholt Stuttley) when you end up with a cruet set which is turned into four apartments and stores grain, a mail sorting engine where pi had been "cleaned up" to exactly 3 (so it sorted mail from alternate dimensions and from all time periods), and a few spectacular triumphs which fit in a matchbox.
Wow, we've found someone who hasn't seen "This is Spinal Tap", including the legendary scene where inches and feet are mixed up for a piece of scenery:
"The problem wasn't that the band was down, the problem may have been that there was a monument of Stonehenge that was in danger of being crushed... by a dwarf!"
"The buggers officially required 2 people to handle them."
Colour CRTs used to be heavy because of the lead glass. I can remember a large media panic in the mid 70's because someone had realized that if you point an electron gun at the metal foil just behind the coloured phosphor in the tube making your evening's Black & White Minstrel Show (*) pictures, you'd also created an x-ray generator. The population were being treated to the warm glow of ionizing radiation.
(*) To left-pondians, this was a real thing, bad for many reasons even at the time, never mind in hindsight.
They had color CRTs with the shadow masks from 1953 in the USA (tests from 1951). so I'm sure no UK CRTs gave X-Rays as colour didn't launch till 1967, though not many had colour till 1970. B&W TV using CRTs was from 1935.
Certainly the B&W Minstrel show was weird, even by UK standards, but it predated UK Colour TV by nearly 10 years (1958 to 1978. The weekly variety show presented traditional American minstrel and country songs, as well as show tunes and music hall numbers, lavishly costumed and presented with male cast members in blackface. There was also a stage version from 1962 to 1972).
I'm old enough to remember it in 405 line B&W TV and thought it was boring. Obviously supposed adults liked it as it even won awards and was exported!
My dad (an early adopter for any new stuff available) got a colour TV in 1972 I think, and though I can remember the scare stories my memory of them is a bit sketchy bearing in mind I was 8. But what's not to like about x-rays? Sadly I never met one of those machines that x-rayed your feet to see if those nice new shoes you were trying on were the right size.
You're right about the Minstrels being boring. The Good Old Days were just as bad too. But I did like the test card because of the music - I got the CD's of it.
TV designers were aware of X-rays and voltage. The CRTs were rarely run that hard.
There was a mini-panic when one type of high voltage rectifier was found to be making more X-rays than expected. The company paid a bounty for each old-type removed from service. This happened again a couple years later with a high-voltage regulator, again they paid the TV Service Technicians to cull the bad tubes.
CRT color TVs were just heavy all over, not just the lead-glass faceplates.
Yes they _could_ emit xrays but this is dependent on the EHT voltage
26kV is low enough energy that any xrays emitted are low energy and can't penetrate the metal foil on the inside of the tube
There were definitely issues with higher EHT voltages and insufficient shielding but the people most at risk were BEHIND the TV
Monochrome sets were never an issue as they only used 7-10kV
Basically had this at my first (and last) custodian job looking after a UK school computer cluster in the 90s (ring networked). I had to come in in the morning, switch everything on, clean etc. Check on it during the day and switch off lights and a single power bank by the door at the end of the day - leaving the computers to their server commanded start up and shutdown jobs. They didn't tell me or someone they got in to do a machine-by-machine backup and update to do anything differently the day he was in. He'd moved the lighting controller to another socket to make room for the enormous plug for his backup trolley, so when I left for the night and did what I was told to do by switching that off, I wiped out minimum 6 hours work and possibly totalled a machine or two.
My contract wasn't renewed at the end of the year, luckily, I've never studied or worked anywhere since that had electrical and network setups that easy to destroy. The school itself was shut down a couple of years later - no idea what the fate of the pricey OS X kit they had just purchased along with a new on site BOFH. I hope they both found a new home.
"Convincing the area manager to sign off the rewiring job was difficult until there were a few "full office shutdowns" in the middle of the working day due to someone flicking the switch by mistake."
Did you help the process along with a flick or two? Were you tempted?
Reminds me of the tale an old friend of mine used to tell. He was a TV repair man (remember them?) working for a TV rental chain (remember them?). When the area went colour, the showroom was filled with shiny new colour TVs, all powered through a time switch so that they would be up and running in time for the punters heading to work to see them through the shop window.
Every morning the manager arrived to find the TV definitely not lit up, and the mains trip off!
Turned out that these things drew a 15 amp transient when switched on! A normal fuse (13 amp in the UK) could cope with this brief surge, but when a shop full all turned on at the same time, the main breaker objected strenuously!
I recall a visit to our HPC centre with students from all over Europe in the 1990s. Our Cray J932 was of course a one of the centrepieces of the kit on display, and it boasted a large power led and somewhat recessed power button below. One student asked what would happen if he pressed that button. My deadpan reply was that a small claw would come out of the recess containing the button and would clip off hos finger, and if that failed I had some plyers that would perform the same task manually.
It never ceases to amaze me that folks will install the proverbial Big Red Mushroom Button on a wall or a rack and NOT put a transparent safety cover over it.
Seriously, folks? If you really, really NEED to mash that button, you'll have that cover open before anyone can say "NOOOO!", and if you DON'T need the button being mashed, the cover will prevent lots of needless drama and expense. There is *no* downside to the safety cover, especially if you cover it with painter's tape when the wall is being repainted so it remains transparent afterwards. Apologies to those painting contractors who really aren't that dumb; I know you're out there somewhere, busy as hell.
Previous employer - who will remain nameless - had a medium-ish sized data room with servers for several councils, itself and other medium sized businesses. Unfortunately, the emergency power off button was next to the door. And a poor HP engineer carrying some failed components out after a repair reached out to press the button he thought would open the door. Cue immediate silence as the entire place just powered down...
He wasn't popular, but it should at least have had a cover over it.
I've already told this story, but it seems fitting to tell it again.
I was once (long ago) an operator for a Bull DPS 7. Back then, that meant an entire room, with four dishwasher-sized things for holding one HDD each, HDD which could hold a magnificent (at the time) 40MB of data. Not to mention the four backup tape arrays. It was a pretty impressive site at the time, is what I'm saying.
There were no less than three engineers on site, and one day, two of them entered the server room in the middle of a lively discussion. It wasn't an argument, from where I was sitting (the other side of the room, in front of the console), they were just talking about something and it was obviously interesting.
The point is, the big red button was right next to the door, and one of the engineers was finishing what was obviously his big story. To emphasize the end of his story, he spread his arms wide and - whack - hit the big red button.
I believe that he didn't expect his story to end with the total silence of the server room . . .
Re: EPO next to the door button: That's where it needs to be and shouldn't really be covered - Imagine if the operators hands are burnt and they need to hit the button with an elbow while evacuating but can't open the cover...
However, I recall hearing a tale that, after an incident where a tape-ape had hit the EPO instead of the door opener, senior management convened to find out exactly what had happened. The poor soul was ordered to retrace his steps exactly.
"Well, I was leaving the room carrying a stack of tapes that obstructed my view, like this. I reached out like this..."
And the computer room went dark again.
An old classmate of mine did something similar in school. Our teacher left the classroom for a few minutes, and this guy took the opportunity to showcase his karate abilities in front of the whole class (he was That Kind Of Guy). Given that we were in a science room, his fanciest spinning jump kick naturally landed on the emergency shower trigger and unloaded about five buckets worth of ice cold water over him. The teacher came back and just shook her head and said that the punishment for doing that is mopping up all the water from the floor. As he was just finishing up, whilst on all fours, he reversed straight into the trigger paddle again, this time butt first, and got doused once more. We all saw it coming and no-one warned him.
Similar thing only with a guy dropping (Throwing) not lowering a calibrated weight into a vat of chemicals, which resulted in outside cold showers all round, sent home in paper overalls.
Elfin Safety came along & he did the same in the reconstruction, which resulted in fresh ablutions.
Needless to say he was sacked.
We had been discussing the replacement of some old machinery for 25 years!!! The customer (a statutory 'Authority') got round to updating and finalising the specification as the technology had moved on over the years and after a long process, we were fortunate to win the order. Big contract, four large machines controlled by modern wizardry with all the data acquisition and control you could imagine. Because of the importance of the site, a full spare machine was included in the quotation on the basis that 'At least three machines must be available at all times'. Also, a changeover had to be completed within 48 hours and would be demonstrated as part of the contract. Our proposal was to exchange two machines and their kit to prove the concept.
On the day of the 'demonstration' the customer, consultant and all the hangers-on assembled for the breakfast we had purchased for these big-wigs. As the clock was started, they dashed for the best cuts and fresh coffee whilst we got on with the job. An hour later they came out to see how we were getting on and were surprised to find the job complete.
"Impossible" they said, so we repeated the task before they went off for a self-congratulatory lunch, again at our expense.
We never did get the order for the spare.....
So here I am, 17:00, short planned outage so production lines have shut down for the day.
The job: adding an extra switch for a production subnet. The only place available is in one otherwise full rack where after a lot of careful mounting hole by mounting hole shifting of live units (not my choice, but I managed) I managed to free the 3U needed to fit this bit of kit. The cabling is running over the rack vertical rails and, by the look of it, that pile has been quietly building over the last few years (I was only there temporarily).
I myself would have re-cabled the lot from the ground up, but you guessed it: no time, and I'm on my own where normally 3 people would be present. And none of them would have a clue about networking and no, it's not my expertise either but I make do, and I am old enough to understand RS232 to a level that I managed to patch some management ports through to the comms room that are on the other side of the factory saving me a 10 minute walk - but I digress.
Switch fitted, dual power and network link to the 3-unit core stack connected, and just when I turn around to the stack with a laptop, PuTTY in serial mode and a cable for the terminal I hear the core stack fans go full tilt. Which is NOT a good sound: full tilt fan speed on a Cisco means it's done a restart. Which I didn't initiate, at least not willingly - I'll get back to that in a minute.
After waiting long enough for the stack to reboot I note there's no WiFi signal. There's no LAN. No WAN either - the exit router doesn't see the switch. So I call the network people, who are in another country. Who can't see the core stack - it's offline.
By now I have worked out that under that large bundle of cable over the LH side of the top of the stack there's a tiny button that the cable bundle has been resting on, and the movement of installing the new switch has moved the bundle resting on that button past the point where it says 'click'.
And kept it pressed.
I am now on the phone to the senior network guy who tells me that holding this tiny button for more than 10 sec means the stack has not just done a reboot, but a full factory reset - there wasn't a shred of config left in the stack. Thank God they do frequent config backups. I take a couple of zipties and work the bundle away from the danger zone.
I spent the next 3 hours slowly putting back a config file with my network guy keeping an eye out via Teams on my phone's hotspot (now hooked up to power), using PuTTY on a USB serial lead, feeding it chunks of text at a time because even at 19200 baud the Cisco can' apparently barely keep up.
Yes, it made for paid overtime, and yes, everyone was happy we managed to get this back online before the next day but I could have done without this..
Ah, the "password recovery" button. There were even some models where the face plate was so cramped that if you inserted an STP cable with the rubber-covered cap into port 2 (bottom left, that model was 1-indexed if I recall correctly) it would push that button and keep it down simply by clicking into the port...
I once worked in an office where we had a about 20 PCs plugged into the floor sockets which ran off a single 30A ring main. It was OK as long as you didn't try to start them up all at once after a power cut.
Then one day in the middle of winter, the heating failed. So I sent the finance lady, who had the company credit card, to hire a big space heater to warm up the office. And then I went to lunch. She went away and decided that it would be far better to buy 20 small domestic fan heaters from Argos and put one under every desk than to hire a big noisy space heater. When I came back from lunch I found all the PC's were off because the fuse had tripped out, because she was too stupid to follow instructions and didn't understand that a heavily loaded 30A ring main can't power an additional 260A of heaters.
Ahh! Beancounters. Who, on their own initiative, rewrite your purchase order to better conserve the company's valuable money.
Even more fun, when said beancounters are in a different country and have absolutely no understanding of what you're doing or why you're doing it.
Yet more fun when the PO in question is time sensitive, and you discover that the beancounter who approves your department's POs dislikes doing so, and only does them on [day of week most distant from the day you need it]
Some told me that their test department needed new servers and disks. They needed big disks because they supported many levels of software, and had to boot and test with them all.
About 20 machines were requested and approved.
The machines when they arrived had small disks, and smaller CPUs. The person in finance found a good deal - "smaller machines at half the original price" - so a "win" for the company.
The disks were not big enough, and the machines were too slow - it took 2 days to run all the tests instead of half a day.
The cost of the original machines were charged to the bean counter's department, to their embarrassment.
Eventually the right machines came - and boosted productivity!
Quantity surveyors are like that too
The city library where I went to high school was a 5 floor building - with insufficent strength to hold bookshelves on the upper 3 floors thanks to the intervention of one such animal who'd decided things were wildly overspecified for an office building
It is pretty hard to break a modern database. There was a maddening stretch some years ago when a switch would reboot itself once a week, interrupting a SQL Server machine's connection to its storage. I ground my teeth a fair bit, and I did get tired of running dbcc checkdb. But we lost no data.
Why am I picturing a trolley full of equipment being dragged across a door strip, sticking, the puller giving a good yank.... and the whole pile tumbling crashing to the floor?
Also, why do people somehow think the best way to push something is from the point the furthest away from its contact with the floor, which is another way of saying "force it to fall over"?
Big red button stories are plentiful. My contribution is one of several that have occurred on offshore oil and gas production platforms. There are often Emergency Shut Down buttons at various locations around the installation. The Central Control Room (CCR) would normally be in full control of operations throughout (though the drill floor is often run separately as shutting that down could, under certain situations, turn an emergency into a disaster). All sensors (and there are lots) will show up on the central screens and there are well drilled procedures on how to respond should they deviate from "normal". However, there are conceivable situations where somebody elsewhere on the platform will know that the proverbial may soon hit the spinner before anything shows up in the CCR. For example, seeing a small fire (or even smoke) in the gas compression module isn't time to find a phone or radio to explain it to the CCR - you find the nearest ESD and push it. That will immediately trigger various valves to either shut in or open to vent all flammable gas in the production system straight to the flare (where it will burn in a far less hazardous place). One of the lessons from Piper Alpha.
Anyway, I recall one time when some new arrivals were being given their familiarisation tour of their installation and one of the ESD buttons was pointed out to them. "These will cause an immediate shut down of all production - know where the nearest one is to your work station but only press it's actually needed," "Press it like this?" asked one brightt spark, as he lifted the flap and pressed the button. He experienced his first and very last offshore experience, as he was on the next helicopter back to shore (which wasn't much later as one of the responses to an ESD is to divert all available helicopters to the area to assist in any possible evacuation). Thereafter, the pre-tour briefing included the advice regarding the ESD buttons - with a BIG LETTER warning about not testing them (that's done as part of scheduled drills).
Oracle database eh?
I remember one time at a customer site where the 2-node Oracle database cluster (Sun kit) had failed completely. It turned out that the "fully redundant Sun hardware solution" did have a SPOF - the "independent" dual drive arrays were on a chassis sharing a passive backplane which had developed a fault that took out both arrays lol
Even more comedy developed when the Sun engineer arrived at the site entrance within the hour with a replacement chassis and was refused entry by security as his name was not on the list (typically access had to be applied for 48+ hours in advance). It took multiple frantic phone calls by customer staff to ever higher levels of management before the security guard was told by someone senior enough to "let him in or you're fired with immediate effect".
"let him in or you're fired with immediate effect".
That's a bit rough for someone doing their job and following proper procedures. Of course he had to wait for proper authorisation from someone high enough to give the ok. That's the whole point of security. More so when the person screaming down the phone probably set the procedure in the first place.
Used to manage a network for a national stockbrokers, and we had VMs set up so we could vmotion them between sites. An electrical contractor was finishing up some work in our Melbourne computer room and removing his gear. Unfortunately, he leaned his ladder on the wall next to the door, right on top of the electrical isolation switch which did not have a molly guard. We found out that day what equipment was not protected by UPS. Cue screaming from the trading floor that this was "costing millions". A shutdown for power redistribution work and mollyguard installation was programmed shortly after.