And the lesson we learn today is:
Percussive maintenance should never be performed with as hammer drill.
As any BOFH will tell you.
Welcome once again to “Who, Me?”, the reader contributed column in which we invite Reg reader to tell tales of the times when they got things very wrong. This week, meet a reader we’ll Regomize as “Slim” who told us of his time as head of IT in Asia for a global trading company – the sort of outfit that plays the markets and …
In that situation the lazy admin is the hammer drill, and the object to be drilled is the nearest large block of reinforced concrete. Drill to a depth of 2m, or until the drill bit has dismembered^H^H^H^H^H^H^H^Hintegrated and seal with quick-dry. Job done.
and as a follow-up:
Percussive maintenance should never be performed with as hammer drill... unless it is done during the removal of equipment for decommissioning, where that equipment was processing sensitive data, you MUST always use a hammer drill.
It's also quite therapeutic to destroy old HDD's that way - with added benefit of a lump hammer. We even raised some funds by charging a small fee to destroy a disk or two.
Yes, sensitive information on disks ripped from systems we were binning. Had a little maintenance area set aside for it, too, so it didn't disturb anyone, or anything of importance (I don't include rodents, obviously). Saved a fortune in disposal costs, too: No need to send those disks off to be wiped securely after they've been torn apart.
Shame that approach was halted and we had to pay for 'proper' disposal by sending intact disks off to a third party to be securely wiped and disposed of. At cost. Despite finding out the company 'lost' disks before they were wiped, and we've no idea if any of ours were amongst the missing disks 'cause we just got a receipt and certificate to say we'd handed over the disks and they'd been disposed of, before they'd been disposed of... although we did delete everything from the disks when decommissioning the servers, but as anyone who has had to recover 'deleted' data from disk knows: Data isn't truly lost until a hammer drill destroys the platter.
We had quite a number to dispose of. A coworker's wife had a pottery kiln and we just wanted to heat everything over the Curie temperature.
Went a bit overboard and melted them down into puddles while having a barbecue at their house. Enough witnesses signed a statement that the media had, indeed, been rendered unrecoverable.
There was a video a few years back (possibly even mentioned here on t'Register at the time) of a guy shouting loudly at a rack of HDDs, which then caused all the blinkenlights to go crazy.
https://www.youtube.com/watch?v=tDacjrSCeq4
When things are spinning at 7,200rpm - or even 10,000rpm for enterprise drives - it doesn't take too much to disrupt things!
I used to sell large-scale CCTV systems to casinos and the likes. As a tiny offshoot of a US globocorp, we were only allowed to sell Dell storage arrays, but I remember one customer insisting on using HP after watching this advert.
I asked him if he'd ever heard of active shooter incidents in UK datacentres, he said it didn't matter, he wanted to be ready if there was one...
I recall an former IT Manager that performed a similar exercise on a rack making the holes bigger at a client site.
I laughed my head off when I heard that this was his method to fitting a HP server into a rack.
He did have a tiny weeny bit of sense to shutdown the other servers in the same rack before the drilling commenced.
The client's boss was not impressed and sent him packing when he found his staff were sitting at desks with hand's over their ears not doing any work.
The IT Manager didn't last long in our company after telling the company boss that he could not be sacked !
I remember a few occasions, where things went wrong in the server room.
One was a new AC unit being put in, the builders were let into the server room, but couldn't find a free socket, so they just unplugged the first one they can across, which happened to be for the man microVAX running the site...
And the classic of the DEC engineer turning up to upgrade a VAX 11/780 with more memory. All jobs and users transferred to the next VAX in the line (computer room with 6 VAX units). Ops shut down the VAX and the console said it was safe to turn off the power, so the DEC engineer went behind the unit to the wall and threw the switch. Nothing happened, the console still said it was save to turn off the VAX. The engineer poke his head out from behind the VAX... Then the screams started, from ops of the next VAX in the line, everything had suddenly gone dark!
New AC! Installed in the ceiling of a 4mx4m server room, directly above the racks while they were ON! I mentioned it to the Facilities Manager, and said "I hope someone is insured for this idiocy." Nothing bad happened, but the AC wasn't installed correctly the first time, the second time, but come the third try, a senior installation engineer was sent and found the issue in about 3 nano seconds... He was not happy... I'm sure there was a full round of retraining for the first two engineers!
FacePalms all around!
Rather tangential, but that reminds me of the time my home phone service failed (back when I still had a land line). I determined that the failure was on the telco's side of the demarc box, and so it was their problem to deal with. They duly sent someone out.
I don't recall the actual problem, but I sure recall what he said about it: "I don't know how this ever worked!"
It's also why I keep a few old, beige power cables around, from back in the day before everything turned black.
Those cables are reserved for my systems' case power supply sockets -- it's drilled into everyone's head that you can unplug any of the black or whatever cables, but you NEVER touch the beige ones.
So far, the only unplanned shutdowns any of those machines have had were when the power went out at the source.
Not vibration related, but just after the Sperry Univac engineers had moved a 'mainframe' (read a big minicomputer) into a brand new office at the borough council that was my first proper job, they found that they needed another hole in one of the floor tiles. Rather than taking the tile outside of the machine room to cut, they decided to drill (or was it maybe angle-grind) the tile inside the machine room (stupid idiots - they should have known better).
Well, not only did they spread metal dust inside the clean machine room, they also set off the smoke alarms that meant that the entire office building was evacuated.
They could not escape blame, because the machine room was a big glass observation area, with full length glass internal walls on two sides, and normal windows on a third, and it was clear what they did.
To say that the DP manager (this was before IT was in common use) was not pleased is an understatement, and he got all of the dust filters in the system changed at the engineer's expense, and got a disclaimer plus a promise of extended maintenance at no extra cost in case there were any additional consequences.
This whole column, and On Call as well, occupy the infernal panel of The Garden of Nerdy Delights.
Whose lowest circle is reserved for uppity AIs. When I went to double-check the name of the real Bosch work, my phone auto-incorrected "earthly" to "early". Then DDG's AI summary had the nerve to lecture me that "The correct name is The Garden of Earthly Delights."
Tell you what, AI. How about you tell that to my phone and leave me out of it, m'k?
We were doing a major migration and this involved decommissioning a lot of old kit. The last one standing was an Ultra 1 which ran the Trade Waste system. They eventually came up with a migration with a couple of days to spare till the deadline for vacating the old computer room. I didn’t have time to run my little script that /dev/urandom over each disk in turn so brought my hammer in and proceeded to reduce the disk platters to something less useful. It was highly theurapeutic!
Concrete saws are designed to cut through rebar and other reinforcing steel[0], so no, not surprising. They are rarely (never?) on the approved list.
Thermite is faster, quieter, often on the approved list, and more fun :-)
The Barrett is faster still, but much louder. Toss up on funner, but never (rarely?) on the approved list.
Industrial shredders are faster than thermite, can be louder longer than the Barrett, are not much fun, but on the approved list and thus most often used.
[0] Note that cutting through too much alumin(i)um will gum up the blade of a concrete saw. Gets spendy quick.
"even if the clean up is tedious."
Recently took some old drives apart. Decided to try and get the platters out to put into the 'Will be useful some day bin'. Tried to lever a platter out, then discovered it was glass, not aluminium :(. Just as well that I was wearing safety glasses! And that corner of the workshop needed vacuuming anyway :)
Also had to dispose of hundreds of CDs and DVDs with back-ups. Having them shredded turned out to be quite expensive. Cutting through a stack with a jigsaw wasn't a great success as the plastic melts and coats the sawblade. And probably not very safe. Eventually cut them up with tinsnips, that didn't take too long - quite therapeutic and meditative. Did that inside a clear plastic bag to catch any plastic splinters which were flying around.
Last time I had to decommission a drive (failing, so mostly to make it unusable because I couldn't be arsed to try to wipe it around it's various sector errors), I put it on the ground and hit it really hard with the pointy end of a sledgehammer. Multiple times just to be sure.
It was quite cathartic. Picking up all the pieces, a little less so.
When I got up, there was a few emails from an old VM host used for dev&test guests. No problem it is RAID5.
When I got into work, there was a bit more panic, as someone had been given a guest for a demontration. And it was down. Odd.
Checking RAC logs, two disks had died within a couple of seconds. Really?
This was a 1U machine with 2.5" disks (*). Powering them on the bench, the one that died first sounded and vibrated like a tiny rock crusher. The second had *many* bad blocks in SMART data.
The two disks had been on top of each other, and the conclusion was; the first had failed with bad bearings creating lots of vibration, taking out the second.
(*) Stacked like:
----
----
Two, pah. I had three SSDs fault over a two day period just a couple of months ago, on my home server with five disks in RAID-Z. Naturally I looked for explanations and found... sequential serial numbers from this batch.
Yes I knew the rule about buying from different batches, but try that when you're buying from an online retailer and see how you get on. End result? Three new disks purchased, three more replaced under warranty a few weeks later and thanks to ZFS there was some data loss but I knew exactly which files. Replaced disks, deleted and replaced the bad files from backup, zero downtime. ZFS is awesome.
I have deliberately run proper calibrated tests to vibrate a disk to the point of error (they were going into kit in a harsh and moving environment) and it takes a reasonable amount of proper vibration to get a drive to persistently error. I never managed to outright kill one.
It also takes a significant amount of energy to vibrate a big heavy rack - the actuator we mounted my test rig on was *big*.
A fixed in place old telecoms rack full of servers is a very heavy solid lump of metal, and what would inevitably have been a battery drill/driver is not going to meaningfully shake it - even a high impact SDS wouldn't. More likely it would be the operator doing the shaking as the much lighter wobbly fleshy part of the equation.
And having actually drilled holes into a rack with a battery drill 1) you wouldn't accidentally have it on hammer for more than a fraction of a second, and certainly not hard, and b) you'd soon work out it was a lot of noisy effort & find a lazier approach.
So not to cast doubt but based on science and hands-on experience if someone managed to kill their systems it wasn't via a drill.
Plus - as a final killer - racks are cheap. With all these purchase orders where was the plan to adapt or replace the rack? Anything I've ever seen which was incompatible wasn't ever going to be fixed with a drill.
> takes a reasonable amount of proper vibration to get a drive to persistently error
Bullshit. I had a Western Digital 500GB drive in one of those USB-3 external enclosures, next to my keyboard, plugged in and running. (back when these were a newfangled idea and EXPENSIVE)
I had my glasses sitting on the keyboard. They slid off and landed on the enclosure, and the drive immediately started clicking and giving filesystem errors in the log.
I stood there in shock for quite a while, unable to believe what I had just seen.
I unplugged it and plugged it back in, and all it did was click. I was never able to get the data off, even taking the drive out of the enclosure and attaching it as SATA.
I'm still pissed about it, years later.
They can survive that kind of drop just fine. In the few years I spent working at a PC builder I saw harddrives (and every other component) subjected to all kinds of drops, shocks, impacts, and intentional drop kicks, and a surprising number of them continued to work fine.
Or at least, they worked long enough to pass QC and get packaged up, we didn't care once they were on their way to the customer.
I learnt a lot there; like, did you know your motherboard doesn't need all those capacitors? One or two can break off and it'll still boot fine. Also, server fans spin fast enough to cut your finger down to the bone. SATA is 100% hot-swappable although your OS might disagree. RAM on the other hand is not.
You get to find out a lot when you have a very boring job, and an inquiring mind.
Interesting.
My first PC had a 40Mb Seagate MFM HDD (the one with a stepper motor).
It was so solid, you could slap it really, really hard, and it did not crashed the heads at all. I tried allsorts to induce a head crash, but nope... just soldiered on.
Then, on the other hand, had a 500Gb external USB HDD. A slight bump, and it's dead. :(
Most telco racks are M5 or close imperial equivalents (2BA or 1/4 BSW)
As a junior I was tasked with rethreading a number of older racks. It was long before the days of cage nuts and most of the telco racks were made of heavy steel/aluminium U-channel
Being almost the same diameter it was a simple matter of using a tap, but 42U * 4 holes per U per side made it a job nobody wanted to do
Thankfully one of the managers realised this was a bullshit way of "saving money" telling the perpetrators of the orders to stop being stupid and replace the bloody racks "they're worth less than the labour, and don't use juniors as grunts" (same perpetrators had a tendency to have people doing stupid non-technical tasks rather than allowing anyone to be sitting and reading documentation or studying - can you tell it was a government department?)
Concur. But I had to try it.
I stuffed a couple of mid-90s servers, i486DX2WB CPUs, 8 megs of RAM. and drives into a 19" relay rack. Formatted the drives, FUJITSU M1606TAU, 1G, 5400rpm (I have a box full), and installed Slackware 3.0 with kernel 1.2.13. Set all three "servers" to compiling the kernel with the mods I used back then, the minimal RAM ensuring that swap would be used heavily.
And then I broke out the drills ... Even the lightest weight "hammer" drill, an older Makita LXPH01, promptly shattered the bit. Turns out that drill steel isn't built to be hammered into racks. Whodathunkit. Concrete (rock/stone) bits might survive, but they'd burn up before they made it even part way into a rack's 5/8" uprights.
So I changed bits and re-set the tool to the proper drill setting. Drilled half a dozen new holes on each side of the rack.
Note that I didn't bother covering the equipment, making this a worst case scenario. You might say I let the chips fall where they may.
That was several hours ago. The servers are still happily hammering the swap files, and there are no unexpected errors in the compilation logs. The HDDs are certainly not dead ... It'll be several more hours before they are done compiling.[0] I'll post here if anything odd happens.
[0] Kids these days have NO idea ...
I call BS on the story. Came just to ask about this type of thing, and also point out that according to the story, this wasn't even an in-use location for this company. So I'm confused about that. They hadn't installed and brought up any equipment yet BECAUSE the racks couldn't take their servers. How did the entire company go down while he was drilling in a rack that had none of their active equipment? This supposed downtime happened before the deadline to complete racking the equipment. They went live BEFORE he had even finished racking and the entire company became reliant on this new equipment that hadn't even been tested in place?
A little more info....
- Racks provisioned by the telco - could not be changed (they were those awful beige colour comms racks with trays)
- Drill was a Bosch 220v corded hammer drill - I know because mr lazy pants borrowed it from me (I still have it - great drill)
- We had earlier provisioned the racks (cut from this story by El Reg) and gone live before mr lazy pants joined.
- He was there to add another dozen or so servers to the rack that were already drilled and mounted.
- He had strict instructions to unrack and remove all servers before drilling during weekend time only.
- Rack currently held 12 x Dell R610 servers and 3 Dell R710 servers. All mirrors were 15k disks, RAID5 disks were 10K. (This is pre-SSD days)
- R610's: 24 x 15K disks R710's: 6 x 15K disks, 15 x 10K disks. Total 45 disks. All servers showed the dreaded RED on Solar Winds in under a minute. Every disk failed Seagate/WD tools test after this.
- The PDU's and Cisco switch survived.
- I still have the photo the data centre manager sent me.
And yet no one is blaming the rack manufacturers for creating incompatible kit. Computers have been full of standards & compatibility for decades - SATA, USB, IDE, Vesa local bus (just to prove I am old), etc.
Why can't equipment rack manufacturers agree on a standard mounting system and leave it at that? No, every one of them needs to push vendor lock in with incompatible rails & racks.
If they agreed on a standard, people could continue to use racks that were hundreds of years old! That means a lot less merchandise sold by rack mfgrs. They make more money by trying to lock customers into proprietary setups and by making equipment their competitors can't use. The more it costs to switch providers, the less likely they willdo so. But we know all this. :)
> racks that were hundreds of years old! That means a lot less merchandise sold by rack mfgrs.
> The modern 19" rack is from AT&T and that itself comes from the telecoms relay rack before that.
I suspect the 19" rack is from Strowger, or Western Electric. AT&T owned patents, even claimed to invent stuff, but there's little credit for "a mere rack". Who knew it would get taken by radio, TV, radar, instrumentation, and even gunnery and payroll machines?
There really are reasons for different racks and different makers. Actual relay racks are in secure spaces and like to be totally open. Other types need more stability and even security. Some times blue paint is the most important detail. Historically it has been easier to build a rack-factory than to build a market segment. Broadcast makers had to have the metal-whacking machines for quick model changes. Inventorying complete racks is huge warehouse so keep a shallow inventory and be ready to whomp-up more as ordered. (Yes utility racks store knock-down but other types will be welded and finished complete.)
Relay racks can't go back much before 1880, <150 years, but I grant your point because I have worked in racks that looked medieval.
Sadly, this seems to be the world we live in.Sensible compatibility for otherwise trivial equipment and stuff seems to hurt some C-Suite types.
Case in point; Exam boards stipulate special booklets for work written outside the question/answer paper, like if they need extra space. It's just a paper booklet, laid out according to how the publishers have chosen to design it. There is no sensible reason why these can not be standardised (or just use bloody A4 lined!- supplied by the school to make sure there's no messing about, not that official booklets can't be got hold of easily enough.) The official booklets all look pretty similar, and all contain the same spaces for the same student information, albeit positioned differently.. But they won't. So if there are three exams from 3 different boards in the exam room (quite common) there needs to be three piles of booklets. And the invigorators have to make sure that they give out the correct one if a kid asks for some. If they don't get it right there has to be a report to the exam board and lots of admin. But the reality is that it doesn't make the blindest bit of difference. Just that each publisher wants to make sure they get their share of the money, without having to lose their little mini-monopoly and compete for the sales.
Telco racks have been solid rails drilled at M5 for 60+ years. Computer racks went through a bunch of changes depending on the vendor
Square holes and cage nuts allow people to pick their poison, but you only usually find them in CABINETS or 4-post racks
You can solve the rail and arm issue by using generic slide drawers with arms. They're generally about the same cost as buying the vendor's own stuff but the advantages usualy only become obvious when systems are being changed out regularly (among other things, you aren't tied to using kit that's rack-mountable anymore, handy if you need to house a dozen RasPi units in the datacentre for some bizarre reason (I have, it's a long story))
Does this remind your own experience when doing code review?
I had a young team member. He is a bright guy and does know how to do his assignment. But when writing code, tomorrow doesn't exist for him.
He chooses the most meaningless variable names, took the shortest possible path (read hard to maintain!). And of course no comment, no doc. Every time I review his code, the review (and time!) is longer than the code.
Of course he understands the value of nice and well maintained code. Because he needs to read other people code too. But somehow, he doesn't feel code quality as a value. As long as it works, it all that matters.
We also got a mini nightmare, a "time saving" habit, saving him maybe 15 minutes of coding carefully. Somehow escaped code review. Deployed everywhere. Then a subtle bug appears a month later. Took many weeks to fix and schedule redeployments.
Although this guy is a special case. But I notice the trend of doing a quick job is very frequent in my code reviews.
Sometimes it makes no sense. You'd think after the third or fourth (heck, it should be the first, but let's cut some slack) code review that's essentially "this works fine but fix the indentation" that they'd check the indentation before submitting future ones, but no.
Place I worked at in the 80s was losing Unix servers to thieving thievy thieverizing thieves who thought they were PCs.
So the higher-ups put some contractors to work securing the servers.
What this *should* have looked like was mounting each server in a hefty cage that was securely fastened to the scenery the whole built to withstand a damn good crowbarring.
What it *actually* looked like was some idiot drilling through the case with a long electrician's bit and running a security cable through the holes in the case and around some more-or-less sturdy thing in the office.
All the "secured" servers ended up dead and having to be replaced.
Some because the mucho-expensive Winchester hard drives had been drilled through (upside: no-one was going to steal them), some because the motherboards had been perforated (each 'security installation specialist' apparently had their own ideas and techniques for exactly how to drill the holes for maximum security), some just because when powered up the metal flakes scattered all over the naked electronics by all this drilling (which by some miracle had missed every vital component) turned out to be Notte An Goode Thynge.
This one was at work but all on me. I often had hammers at work, from half-pound tappers to my father-in-law's short-handle 3-lb sledge for pounding-out car ball joints. (May have been using a star-drill for wall anchors.... this was some before affordable 'lectric hammer-drills.)
But for some reason I had the 10-pound long-handle sledge at work. When I took it home I set it on the passenger-side seat. WHACK! The head slid off the plush velour to the floor, the handle levered-up and smacked the windshield. Foot-long crack in the glass, grew edge-to-edge by the time I got home. Tole my boss I needed the next afternoon off to get a new windshield. (Didn't take that long but why rush?)
When the first-gen AOLNet dial racks were built, they were all specced with these giant, honking 6-rack-unit UPSes at the bottom, because there were a lot of install locations where we couldn't rely on constant, clean power.
A few years in, all of those UPSes were filled with aging batteries that had to be replaced. So maintenance started a battery-refreshing tour. The UPSes were built as drawers, there was a nice handle on the front you could pull to slide it open and expose the batteries for swapping.
Unfortunately, between the vibration of air-shipping and installing the cabinets, their age, the impressive weight of those UPSes, and the fact that they'd never been opened once since the racks were built, many of the drawer mechanisms were... not in the best of shape. All of the maintenance people ended up a little bit haunted by the memory of pulling open those drawers, an act that was far too frequently followed by the sound of dozens of ball bearings clattering onto the floor of the data center like a rainstorm.
(We left the UPSes out of future iterations of the design, deeming them more trouble than they were worth. Redundancy was observed to be the more effective insurance against downtime: If the power went down to an installation, customers could just dial in to a different one. That's why AOL had at least 2 or 3 local phone numbers in most areas.)
This hammer drill user was phoning it in.
For real vibration dealt to the workpiece one simply cannot beat a sawzall (or tiger saw if you go Porter Cable like me) fitted with a coarse "cut anything" blade.
Such is the earthmoving vibration produced that when I used one to cut out a burglar-proof door frame (metal inserts in frame, frame screwed into the house framing so securely no screwdriver would work; if it hadn't been an excuse to buy the tool I'd have been very disheartened by the prospect).
Halfway through the first cut one side of the frame disengaged from the house hung on the saw for a fraction of a second, ripping out the top and other side, then the entire frame was hurled into the back yard triggering manly howls of triumph and leatherface-like saw brandishing from yours truly.
This post has been deleted by its author
Never had to drill holes in a server rack, but I remember a few years ago working for a Uni. We had a switch in one of our patch panels that was refusing to connect, and just displaying an error light. It *was* end of life, and was eventually replaced, but to get it up and running again, I thought I'd give it a quick reboot. I did, eventually, and it came back up, but not before I'd accidentally unplugged the wrong switch, taking out the connection to two computer labs that were both full, and had lectures booked.
Luckily I was able to get the switch back up and running before any users complained.