What an smf of a manager... I'm glad the days of "On Call" are behind me...
Here's the Beer your manager owes you! Clever clogs...
Welcome to On Call, The Register's regular foray into the increasingly unreliable memories of those who have to pick up the phone when everything is on fire. Today's hot and steamy tale comes from "Chandan", who was keen to make sure we knew he worked within the hallowed halls of "Sales", where we assume expense accounts are …
Our company, a large brewery, had a "Presidents Award" for this kind of thing - a $1K check and a restaurant gift certificate for two. This was about 20 years ago. Our VP was rather free for handing them out for this sort of thing, so I got a couple. One was for a situation similar to this, where I had to be onsite for about 24 hours. He also told me to take the day off.
I don't guess anyone would be surprised to find out that this was the place everyone in the area wanted to work.
That’s the way to say thank you! Great company and managers. Every manager should have power to give extra perks when someone sacrifices one’s time and family file to go the extra km. When you are given some money, then you really know that your company appreciates what you have done (applicable to peons only).
Empty promises and indifference to employees’ efforts detoriate morals more than anything else.
I didn't volunteer sir! Everyone else took one pace back.
To quote Spike Milligan. Though I doubt that particular bit of his output was original... As Heinlein said, "come on you apes" was probably said by Caesar's centurions too. Military humour is required in order to cope with military life.
I expect you can go back further than Rome, Alexander's hoplites, Assyria, Ancient Egypt and the Sumerians of 3,000BC left us art showing identically equipped spearmen en mass. The Epic of Gilgamesh has 'motivate the troops' parts. Gilgamesh was made king as the citizens wanted an Elite warlord. Part of that was to motivate the troops.
This post has been deleted by its author
while working in 3rd line engineering (so no Sales excuse for me) for a large organisation. One of our DCs was on the ground floor of the building, facing onto a main road with opaque windows that could be, but obviously shouldn't be, opened (at some point it had been an office). One day there was a panicked call that our otherwise very efficient sea-water cooled (we were by the harbour) air-con system had packed up and temps were rising alarmingly fast. With no ability to get the air-con going again (IIRC the inlet pipe had been blocked it later transpired), we ended up opening all the near-floor-to-ceiling-just-wide-enough-you-could-walk-through windows facing onto the street and deploying whatever desk fans we could borrow/commandeer/steal from the offices above to get air circulating. Obviously having all the windows open meant that we had to stay in the DC to ensure nobody thought to sneak in. At least it wasn't winter!
In almost every company I have previously worked for the biggest issue encountered is with air-con in the server room (either our own or that f a supplier that we relied on).
Now I always insist on dual AC with both as independent as possible (even where possible on separate electric supplies) and able to take the full load each (and have the switch set to allow turn on after power loss). Then a couple of the big ground fans left in the server room (and not taken out for use elsewhere on a hot summer day. Then some way of extracting air out and bringing air in to the room securely. Sometime this was just a chain and padlock that allowed external doors to be opened slightly, other time is was knocking a hole in the wall and installing a couple vents that could be opened when needed.
The risk of a single aircon unit will, given enough time without replacement, cause you big issues.
However for a small server room, unless you go for an expensive purpose built rack system, the options are poor - especially the control panels. They are designed as a home office system and giving you proper alerting, snmp, dual unit monitoring, even current input temperature isn't available for a reasonable cost outside a BMS system with options from companies like Mitsubishi (not that I've found anyway).
If you're a home office or small-to-medium business, I highly suggest getting those water-cooling blocks usually used for GPU cards and putting them on every spinning hard disk and SSD drive, every network card and CPU backplane card you can fit them on.
In our off-site offices, it is ridiculous to spend $100,000+ US for custom air-conditioning gear when $3000 worth of bulk-buy water cooling blocks can be retro-fitted onto every hard drive, network and GPU card drive and CPU backplane card we've installed in the racks. From there, we just buy six of the 1600 Watt air con units from Costco and we can supercool a 10 metre by 5 metre rack room in mere minutes to acceptable temps (18 degrees Celcius) all for less then $10,000 US
Since each 2 RU blade server is doing maybe 500 watts and in total we got six of them for each the local offices, we only need to remove 9,000 to 12,000 watts of thermal energy total with those recycling water blocks, six air con units and some cheap room fans to get the job done.
Yes that will also work! The KEY ISSUE is to actually do some MATH and figure out how much wattage you are pulling from the mains on a continuous basis by ALL your computer gear. Not just your computers but also your UPSs, the routers and switches, the network NAS appliances. i.e. Count EVERYTHING in your room! Then ASSUME that is the amount of wattage worth of thermal energy you need to REMOVE from said room!
If you're pulling 10,000 Watts always ASSUME you need at least that amount of thermal pulling power from your air conditioners and ADD 10% to 20% as a safety factor. Get enough air conditioners that can PULL the X-required amount of watts or BTU from said confined server room space!
You don't have to spend a fortune BUT you DO need to do the MATH and get it right so you KNOW how much air conditioning you need!
Right up until the equipment changes (gets added to, or rearranged for prettyness without regard to airflow) and the air conditioning gets left out of the considerations.
Everything works fine - was originally specced with a redundant unit running in parallel - until one of the units farts it's last, and it turns out a single unit can't handle the full load.
Management will mangle...
"Right up until the equipment changes (gets added to, or rearranged for prettyness without regard to airflow) and the air conditioning gets left out of the considerations."
Which is why ass-clenchingly high paranoiac levels of power and environmental monitoring are ALSO required.
Get it on the way in, because one the kit's up and running you'll spend decades arguing to get that level of monitoring to know what's likely to overload where and when - until about 15 minutes after the shit hits the fan in a massive way, at which point you'll get the blame for not having emphasised the scale of the problem being of such criticality to the business. (You can turn this into a "How much money do you need to fix this?" and a "How much money do we need to avoid recurrences?" moment)
If you're a home office or small-to-medium business, I highly suggest getting those water-cooling blocks usually used for GPU cards and putting them on every spinning hard disk and SSD drive, every network card and CPU backplane card you can fit them on.
At home, I've often thought of grabbing some of them blocks and running lines from the header tank for the hot water heater (and yes, it does heat hot water - anything below 75°). 2 lines per room (in and out), with some taps off them for each machine. On the plus side it reduces the cooling requirements and also adds a little pre-heating to the header tank (so the system will pay for itself - maybe in 20 or 30 years!) and if you use 4mm and 13mm irrigation pipes for most of it it's really quite cheap. Downside is you're limited in where you can put your machines, have to have pipes running around, and if you sit something on a pipe you crimp it and eventully overheat, somehow dislodge it and you have water everywhere plus an expensive mess if the water gets inside the machine.
Still, I have a noisey fan on the main switch in a cupboard not far from the header tank, maybe that can be a test case :)
--> Closest we have to water, and it's also the patron saint of stupid ideas!
Seen that done wrong.
Technical people were as usual kept well away from any planning
Since the company had a subdivision delivering aircon, the project leader basically ordered the largest units he could fit two of. And placed them in the ceiling somewhere.
Spring comes, and we're moving in. all sorts of cables have been run nicely to the racks. The racks have been nicely bolted to the floor.
IT comes carrying the servers, and notice the aircon units are set to pull air from the front of the racks, and blow over the tops of the racks to the rear.
Not ideal, but can't be that bad, right?
That summer, we had a heatwave. And the massive stand alone UPS units shut down from heat. I seems to recall some of the legacy kit didn't come back up after. (And would be the silver lining of this story.)
This one wasn't on my shift, but in my wife's lab, where they do climate modelling pretty much 24/7 on their on-premises servers. The air con died during a scorching summer period 2 years ago, quickly raising room temperature to high 50's. Solution was similar, although not that effective (as the outside temperature was over 40ºC), but the air con on the adjacent room was turned to its lowest, doors wide open and some ventilators strategically located to increase airflow between rooms. Instead of having one room at hellish temperature, they got two at purgatory levels.
When finally maintenance came to fix it, their only (immediate) solution was to put portable air con units as the fixed ones had really died the death.
" The air con died during a scorching summer period 2 years ago, quickly raising room temperature to high 50's. "
Won't happen in my server room.
Mainly because there's a crowbar dropped on the power rails if the room temperature goes over 35C
Most of the kit is now setup to respond monitor its IPMIs and if the inlet temperature goes high, will power down before that point anyway (dumb little cron jobs)
It's easier to recover from a power outage than dealing with cascades of failures resulting from a mega heat excursion.
Except if someone's multi day modelling run is mid run when the system powers down you have failed in your job of keeping the stuff going 24/7.
My youngest is in Bioinformatics and can do runs lasting two days on the server. She has the admins sweet so they know not to kill her processes.
Academic stuff is not like commercial uses. Think bleeding edge. When I think of some of the specialist tech I've had my hands on. Back in my electron microscope days the glass for ultramicrotome knives had to come by sea from Sweden (we were in NZ) so needed to be ordered 3months in advance. The technicians if the scopes fell over was 400km away. I still have the sweatshirt, we were the world's Southern Most EM Unit. I think there's one on the Ice now.
Out of the 4 technical roles I've left previous employers from, 2 started off as sales.
With regards to air-con, at least it came back on and didn't then proceed to pour lukewarm condensate all over the floor in an ever increasing torrent. Thankfully we'd had the good sense to make sure no servers were directly under the unit and that the racks kept everything at least 6 inches off the floor but still.. I strangely do not miss the place.
(icon because closest thing to swimming goggles)
in a previous job we made the mistake of trusting the aircon company to properly deal with a decommissioned ceiling vent, which happened to have a rack underneath it (there wasn't anywhere else in the room for it to go).
The Monday after this happened we came in to find the carpet around the rack soaked, and the one server in the rack full of water. Which led me to try to invoke our support agreement at HP (they would send us the parts we needed to recover from an emergency - in this case it was just disks needed). Which led to what seemed like hours of trying to get the support person to break out of their script - it was certainly long enough for me to end up putting them on loud speaker and random people wander in to listen. It went something like this:
Me: *explains problem, and that we need to invoke our agreement and need this many disks of this size*
Them: "What happens when you turn it on?"
Me: "It's full of water - that's not happening."
Them: "I understand. But what happens when you turn it on? Does it beep?"
Me: "It's full of water, I'm not turning it on"
Them: "I understand this. But if you turn it on, what happens?"
Me: "Electricity and water don't mix well. I'm not turning it on"
Them: "OK. I understand you don't want to risk turning it on, but what noise does it make when you turn it on?"
At around this point one of the people listening in muttered, I think just loud enough to be heard by the support person, "If you want to find out you come and turn it on"
Shortly after the support person did understand that I wasn't going to turn it on, I wasn't interested in trying to fix the server and that I just wanted disks to start the rebuild of the server on other hardware.
Standard office towers are not your hosting friend.
A few years ago.... office a/c units on top of the building.... no-one clears the previous years leafy detritus out of the a/c water drains.... hot days... a/c water drains back-up... no problem!... backup plan is there is a catchment pan under the a/c units... it has a drain as well... which someone has attached to a hose pipe to and creatively routed it down the nearby lift shaft head (no kidding) and into the office kitchen sink drain on the top office level below. All good except the cleaners wonder what the **** this hose pipe is in under the kitchen sink and move it out of the way!
Water flows down pipe onto floor and down to the floor below which has a server rack (ok single tower) just in the right place to catch it!
1. If you have equipment in an office environment always know what is above (kitchens etc.).
2. If someone is stupid enough to put equipment in these environments just put a water pan and diverter pipe on top of the rack ( seriously I recommend this to clients now when assessing things... so you could then pass it on to the floor below...).
3. Ask the office owners for their a/c maintenance plan that includes annual environmental maintenance.
I used to work for a large cheese wholesaler.
One very hot summer a lorry delivering 20 tonnes of brie arrived, I could tell as soon as it backed into the warehouse that its chiller was off, as I opened the back doors the smell hit me, the temperature in the truck after driving from France was almost 60’c.
We basically had 20 tonnes of fondue but no bread to take advantage, so I resealed the doors and sent him back to France...
That was my thought. Open doors, see wall of brie flowing towards you. Man drowns in freak cheese accident. Frantic workers tried to save him, but didn't have enough bread or crackers.
But.. I wonder if the OP saved contact details? I have a plan to survive the apocalypse in Mileștii Mici, so a good supply of cheese and crackers would make that even more agreeable.
ps.. soo.. why was the on-call engineer 3hrs away? A sharp sales person could have finessed that into at least a case of beer.
I was visiting a small city in the US to upgrade some software about 30 years ago. We could not work on Friday because the "old" tape drives from IBM were being replaced with some nice shiny ones from another supplier. At 4pm the engineer departed saying "see you on Monday". At 5pm they started the weekly payroll run. This had to run or else the city employees did not get paid, and so they buses did not work etc and the city stopped.
At 17:05 there was a problem in that the "tapes would not load". At 17:30 they still were not working so they called the support number for the new tapes. "Sorry, they have officially not been installed yet. We'll send someone out on Monday". They phoned the IT manager who was not in "he's out driving his new car". Was there a shady deal, new tape drive -> new car?.
Meanwhile the IBM engineer who was doing paper work in his cave, quietly came out, took the covers off the competitors tape drives and fixed it in about 15 minutes. He came past our desk and just said, "they are working now".
On Monday there was a crate of beer hidden in the cave. No one could shout about the good deed the IBM hardware engineer had done, because it was not IBM kit.
Next time I came to visit there was a new IT manager.
It feels like the IT manager would have been replaced because it would have been him that signed off the idea of updating a critical system on (1) Friday (2) payroll day.
Friday evening after payroll is completed? Less bad.
Another Friday that wasn't payroll day? Less bad.
any day of the work week that isn't a Friday? Much better.
The number of times I've had to patiently explain to senior management that nothing, repeat nothing, should ever be launched on a Friday. (Usually explaining that they will have to respond if anything goes wrong and/or there'll be a whole weekends worth of irate customers to deal with, works).
Of course any sensible tech support manager would have run the tape systems in parallel for at least 4 weeks gradually migrating processes to the new devices. This would normally involve testing older not recently accessed tapes as drive heads can drift and this can lead to a tape only being readable on the older device.
Of course any sensible tech support manager would have run the tape systems in parallel for at least 4 weeks gradually migrating processes to the new devices.
But this was a government site, wasn't it?
Government sites don't tend to have the kind of provisioning that allow for this extravagant style of migration. Either there isn't enough floor space or there aren't enough power outlets or the contracts aren't funded to overlap in that way.
Been there, done that.
Had similar as a semiconductor process engineer working at customer with a 2-vendor policy.
Our tools running fine, but competitor (the 2nd vendor) tool causing them all sorts of issues. Their support was "leaving something to be desired" (to be diplomatic), so as customer guys were good to us and I had time I quietly asked them to explain the issue they were having.
Half an hour of chat later, some "suggestions" given as to what they could try and the following day came in to smiles and gratitude as the line was up again.
A great way to reinforce the "we're all a team working toward the same target" mentality with the customer, not to mention severe kudos with them and got a good free dinner as well.
I have quite a bit of time for some IBM people. Years ago I bought one of their thin clients on eBay and was trying to get linux running on it. I was having trouble getting it loaded. I thought I would take a punt and email the IBM service folks. I actually told them that I got it second hand, and was just playing around with it, so it wasn't critical. After being asked some questions about the problem, I was given an email address to which I was supposed to send detailed information, which I did. It turns out it was the person who wrote the bootloader for it. I got it running.
Actually service from IBM just over 10 years ago for servers was spot on (24x7 contract 4 hour response), had several blades fibre cards die after losing power. Moved some cards to the important blades so core systems and reconfigured the SAN.
Logged a call on a sunday, wasn't asked was the firmware up to date, 3 hours later a black cab arrived with the replacements. These days lucky to get a call back within 4 hours.
Actually service from IBM just over 10 years ago for servers was spot on
Going back over 30 years I installed one of the first PS/2 Model 60 servers in the UK. When the network card (StarLAN IIRC) failed shortly after there wasn't one anywhere in the UK. Within 48 hours IBM Greenock have completely fabricated a replacement and delivered it to site.
As we were a college, and this was a student use system, we couldn't claim it was a critical issue but they responded as if it was.
IBM used to have the reputation of selling over priced hardware but with superb maintenance. If your mainframe failed because it was under water they would air mail out a new mainframe and get you time on another mainframe. Once, there was a problem with a disk pack array that could tip over if all of the drawers were pulled out simultaneously. They airmailed out lead bricks to be attached to the back of the units.
"They airmailed out lead bricks to be attached to the back of the units."
This is something I never expected to read! :D
It would be the source of said lead bricks that would be the more... disturbing...
I hear that when a certain IBM manager heard of the design flaw that made these units so dangerous, he became so angry he started sh........
The IBM of olde had a LOT of pride in their kit, their company, their customer support and encouraging people who were taking steps into computing hardware to discover the products - various folk in the company had my back when I had to take steps to kick the ISP part out of one of the larger IRC networks for a prolonged period because of chronic network abuse coming out of Asia (cyber warfare and script kiddies are nothing new). It was the support of those staffers which enabled those bans to stick AND to force IBM to make policy changes about the way they owned handling of abuse originating inside their networks instead of disowning what their customers were up to.
The attitude coming from those parts of the company made it clear that the days of the "IBM of olde" were numbered and whilst Marketing and Sales were eventually spanked that time (quite hard - at least one of the managers involved was "removed", only to show up at another ISP shortly afterwards) it wasn't long before the tail finally went into full wag the dog mode and the company lost a lot of very good people.
Our IT insists on pushing out Windows Updates on Fridays.
I don't blame him.
I spent some 4 hours of precious SMGT1 this morning watching the dreaded-yet-common "Installing updates... Updates failed, reverting... installing updates... updates failed reverting..."2
So if your company doesn't work over the weekend, Fridays would be the best time to do the installs so you have some (albeit slim) chance of them being done before Monday arrives.
1Sunday Morning Game Time - an exceptionally precious very limited resource worth more than it's weight in blood (either of opponents or those who cause you to miss it)
2 My fault for not reading up on the recent CERT issue (which doesn't affect W7), and deciding to quickly install a couple of others while I was there.
--> There's someone I'd like to meet at MS. Some manager who let the update system be such a mess (perhaps related to issues with the file system). I have a nice little "house warming" gift for them.
Well, it was a bunch or two of years ago and I got a text message from the sensor in the computer room at the office to say that the temperature was heading past "comfy" into "pleasant" for the hardware there ensconced, which included public-facing web sites (those were the days). The A/C just wasn't up to the task for the number of servers in its small space.
Since a recent restructure, and as Software not Hardware I had given-up access to the computer room door-code, however I was the only person within a, hour's drive. And yes, I didn't have remote access to any of the servers to power anything down remotely.
I drove into work and attempted to try all the possible combinations that the door used to have, Nah.
Obviously as the server room was *so* well secured, it still participated in the same suspended ceiling as the rest of the office. So I pushed some filing cabinets together and climbed up them, lifted tiles out of the way and, yes, could get into the space above the server room... I pulled some more tiles off and could see over into the warm, rack-filled, "large cupboard" masquerading as a server room. Closing my eyes I dropped into the room... Because of the lack of space, I couldn't drop easily to the floor or even risk flopping over and so I dropped down on straight legs and the jar on my spine seemed like it was going to push it up through my skull. Now in agony I stumbled to the door which luckily opened easily from the inside because I wasn't going to climb out of there in my new state. Luckily the main office was a number of degrees cooler than the server room, so I grabbed a large portableish fan and propped it in the door. (*)
I then identified as many servers as I could and ascertained a number I could "safely" power down rudely...
The room temperature down to serviceable levels I then rang the missus to drive me to the inappropriately named "walk-in-centre" at the local hospital and promptly took the next few days off work in a codeine-induced haze.
Someone did say "thank you".
(*) I've seen enough TV shows where people go through a door and it locks behind them. Always wedge a door open. Always.
...and I've BEEN one of those people who has fallen through a false ceiling. Somebody laid rolled fiberglass insulation across the mezzanine floor and onto the ceiling leaving no indication of where one ended and the other began. 12' unexpected drop.
Fortunately for me, and unfortunately for them, there was a bundle of 25-pair trunk cables looped through that attic and I was just quick enough to grab hold and swing on them like Tarzan on a vine as I came down and survived without a scratch. The cables, however, were a bit stretched and every one of them now had broken pairs.
As for the hot pseudo-DC, I've reached across the wall, lifted the panels, and placed 20" box fans sucking the hot air out and into the plenum, letting it draw cool air in from under the door. Not very efficient, but enough the heat rise down until someone could get there with the door codes.
I got lots of temperature warnings from monitoring at 3am and assumed the aircon had failed, as it had been acting up the week before. Unfortunately, when I tried to call building maintenance/security to get them to meet me there and unlock, I found out the building was on fire...
"The temperature had reached the point where the in-room air conditioner simply threw up its hands in despair and refused to turn on."
Really ? I've never seen this in any DC ... Fortunately as I've seen plenty of over-temp conditions, one was recently at 65 degrees C !
"'I owe you a beer' from the service desk manager," recalled Chandan... "which never materialised."
Makes me remember I spent one full afternoon, years ago, to rescue very important pictures from one dead laptop HD. I had to try to manually copy every one of those via an xterm on Linux, since Windows would just do nothing.
The dude promised me a full pack of Duvel beers. I got it eventually 2 years later :)
I set up my friend's parent's WiFi for them - back when that was not as easy as get router from ISP. But it's not like it was hard. The most difficulty I had was getting the login info out of him, then BT "tech support". I was asked how he could repay my kindness and I said my usual "fee" was a bottle of wine or some chocolate.
He went out to buy the joint for Sunday lunch, I guess also part of my payment, and came back and presented me with 3 bottles of wine. Which was nice. Overkill I thought.
Until I opened the door to a delivery the next morning, and it was a case from the local wine merchant. Rather better than the previous day's supermarket offering too (not that I was complaining as that was perfectly pleasant). At which point I was a bit embarrassed that I'd been over-paid. I suppose he was a consultant who worked from home, and I got about what he'd have paid if he'd hired someone professionally through the business - but then those weren't the terms, given that I'm not a professional at this - though I do fix the pooters at work, but only because there are 6 of us in the company and I everything from technical sales to the VAT returns.
I used to work for a large VAR in the UK, and walked into the server room early one day to be met by a wall of "superheated" air.
Normally it was a "fleece needed" room if you were in there for any length of time.
It was behind two door from the main tech area, one with a limited access lock on the door, so they both got wedged open and all the stolen fans from the sales area setup to blow the hot air out.
We managed to keep it cool enough until the aircon was repaired later in the day - shutting down was not an option as we were turning over in excess of £1 million a day in software sales (Yay the end of budget splurges in the UK!) and one server was getting backed up every hour!
First time I had been in an environment that dried my eyeballs in seconds, had to go to Phoenix, AZ and Death Valley to have that experience again.
the aircon was repaired later in the day
Lucky you ! A few years ago we had a client with their own server room, nicely air conditioned. Needless to say, the A/C failed so they did the usual "open all the doors, use a fan" trick to keep things running.
When teh A/C "engineer" turned up, he denied absolutely that the A/C was faulty - he declared that it's because the server room isn't properly insulated and too much heat is getting in from the space it's within. As the client's staff didn't know enough to tell the guy he was spouting manure, they even (at his suggestion) went to the builder's merchant and got a roll of insulation that they laid on top of the room. It didn't make any difference.
After several days they rang us, I spoke to the engineer who didn't seem to consider the fact that it had been running for 2 or 3 years without any problem as being any sort of clue. So I rang the "engineer"'s office and spoke to the service manager there. I explained the situation, he agreed that it "didn't sound right", and within a couple of hours the A/C had been fixed ! It was a faulty reversing valve at the outdoor unit end - yes it was a standard office type system, but the specs specifically allowed for the dry conditions in a server room (I checked before it got installed).
After several days they rang us, I spoke to the engineer who didn't seem to consider the fact that it had been running for 2 or 3 years without any problem as being any sort of clue.
I've met them. Sometimes the idea that the customer does actually know what they've had for the last few years just doesn't seem to cross their mind, the customer must always be wrong,
Worst case of ours was a Chorus1 person who spent 6 hours on site, wasn't able to fix the issue, but left us - a business that relied heavily on incoming phone calls - with one semi-functional jack out of 6 (2 lines, 3 jacks each) and no internet. Tried to argue it'd been like that when he arrived, I spoke to the telco who told us the Chorus tech had told them things were already so. Asked the telco to check traffic and call logs and... Oh yes, we had several calls per day on each line plus lots of traffic up until the time the Chorus person arrived, then soon after everything stopped.
So Chorus sent out a technician. Same guy. 8 more hours on site, "everything working ok". But when he left, actually no nothing at all.
We trespassed the guy, let our Telco know (and spoke to someone very high up - the joys of having had a popular BBS years before and getting to know many people who went on to "become someone") who also made it clear to Chorus that monopoly or not, they had trouble headed their way. The next guy they sent round strangely was only on site for half an hour, yet when he left everything was working.
1Chorus are part of what once was telecom - a rather terrible telco split up by the gubbermint when LLU and multiple telcos became a thing in NZ. They put their worst and dullest into Chorus, which became the main national lines maintenance and installation firm. The most clueless idiot at Vodafone or Telstra or [insert preferred incompetent telco here] would be over-qualified for Chorus.
I used to work in an office on the top floor of a converted country house.
Every summer the office temp was over 30C, but manglement denied all requests for aircon.
One year we got new servers.
First day of summer they all overheated and shut down.
Second day of summer we had aircon.
First time I can across it was as a junior TV service engineer in the mid 1960s. I was called up by the branch manager on a Sunday, and asked to sort out a TV for one of his friends, with a promise of a reward. I was far from keen (obviously) but in view of my status didn't think I had much choice. the problem with the TV wasn't particularly difficult to sort out but very time consuming.
And the reward? I don't know I never got it - carefully worded hints were ignored. I wasn't there for much longer. I'd learned the lesson, and from then on was quite hard nosed about my availability out of hours. No shortage of attempts though. A few years ago I even had a complaint that my personal phone was switched off.
thats why a burn sim for job apps is worth it, find a job, turn off phone / bin sim and from then on calls go nowhere
work calls "sorry I don't get a signal at home, constant hassle with calls dropping, phoned the network about it multiple times but nothing gets done...I realise its a nuisance, sorry I don't own a landline phone either
I'm an over-hyped spreadsheet wrangler now (AKA "VBA Developer" or "EUC Specialist", god help me) , but in the early stages of my IT career I had the job of babysitting a room full of 486s running stats for a finance company. They were housed in a "server room" - AKA a converted cloakroom under a post-war office with aircon, some real servers and a UPS for said servers - that one day had a fire.
I got a call on the sunday - can you come in early on monday, check the machines are okay, see what can be up and running before the finance peoples come in.
I agreed, and made my way in at dawn o'clock, wandered down to the basement, badged into the server room, walked past the burned-out hulk of the UPS, and set to work. Most of the old pentiums fired up fine: three out of the 50 didn't. Two were just dead, the other would randomly crash periodically: not bad for a hard-power-off followed by a battery acid fire.
Later in the day I was chatting to the head of our local IBM team, who said "I wanted to go rubberneck, but security say no-one should be down there unaccompanied". Guess who's manager neglected to mention that when calling me in?
So anyway, I had a nasty cough for weeks afterwards. Might not have been related, but... yeah, I probably should have thought of that myself, really.
Used to work for Midland Bank back when they were still a thing. We had a power failure in London after a JCB dug through a mains cable, PDU and generators did their thing, but we soon realised that whoever installed it hadn't hooked the aircon up to the PDU so there was no cooling available. Luckily this was noticed almost immediately so we were able to shut down all but the essential systems, and open all the doors & windows before things got too steamy and we survived for what turned out to be a 5-6 hour outage. Sadly the JCB driver wasn't so lucky.
I'm pretty sure I remember that event.. I was working with a small ICL mainframe at a stockbroker's site, inevitably positioned in the basement,
Suddenly we were left with just the emergency lights. No chance of a generator in anywhere near that old building.
The disk drives had impressive bearings - they stopped spinning a full five minutes later.
Panic ensued. Data had to reach the Stock Exchange before The Deadline or serious penalties would ensue, and there was now no way to generate the mag tapes.
80-column cards were found, and staff started the data prep, under the emergency lights - on HAND PUNCHES !
Working for the IT company rather than the broker, I headed off, carefully forgetting to mention my speed with a hand punch.
Because the power event hit so many offices, the Exchange actually decided to waive the massive fines for that day.
Big bank's DR site in East London was brand new and ran a bunch of dev workloads... and some illicit production workloads too. Oops.
JCB went through the main power. Oops.
No worries, we have UPS and a generator. UPS kicked in just fine, yay. Generator kicked in just fine, yay.
But then the whole DC went off abruptly - with the generator still chugging away outside. Hmmm.
Turns out that if you don't wire the fire system into a UPS it can get fried when a JCB cuts through the mains.
This meant that it tripped into "fill the sprinklers" mode. Guess what the next automatic (and legal) consequence of filling glass bulbs full of water in a running DC is? Yes, it automatically cut the power.
While stationed in Germany as a tanker, 1AD decided to deploy something called "Warlord Notebook".
They were laptops running Red Hat Linux on them.
That's right. The U.S. Army back in the 90s thought it would be cool to issue Linux laptops to soldiers who didn't really know a damn about computers much less Linux.
Blowing things up with M1A1 Heavy tanks they were pure genius about, but logging into a Linux laptop and using email and custom software on it they didn't know their collective asses from the holes in the ground their tanks would often create.
Out of the entire division there were two of us that had a clue about using them. One was an artillery guy and me, the tanker. No one in commo really knew anything about Linux back then. We played with it for laughs in our limited spare time and were comfortable with it.
And since division didn't believe in using DNS servers at the time, we had to run around to every single notebook and hand-load a custom hand-built HOSTS file that contained the IP addresses and names of every other Warlord Notebook laptop out there. Luckily there were only a few hundred.
And the IP addresses changed for every exercise.
Luckily the contractor who built them left all the default "Games" installed, including netdoom.
Soon there were large classified network LAN parties of netdoom going on.
Eventually the entire "Warlord Notebook" program went the way of the Dodo and died off, thank God.
Back in the days, a hospital lab I worked with had a computer room off to the side filled with with PDPs floor to ceiling. One night a rat attempted to crawl though a fan ... CHOMP! and then dead silence with dead rat ... eventually for some reason the temperature in the PDP climbed high enough that the fire suppression system turned on (the days before Freon systems) and the entire room looked like Christmas!
They cleaned it all up with an industrial vacuum cleaner, even had to vacuum out the RK05s but, once the rat was removed, the fan restarted and everything was rebooted ... easy in those days, you just toggled the boot sequence in on the front panel - took about 20 seconds.
Sorry but getting hold of a sales guy out of hours on a company line, maybe the support desk used their own personal number.
Sales guy with technical experience, OMFG just no. There is a reason that tech guys get moved to sales and its not usually considered a promotion.
Letting freezing cold air into a hot server room? May seem like a good idea at the time but remember to turn the dehumidifier to max please.
Been clearing up for sales for 30 years, the fekkers usually get to move on with their fat commission and leave the crap imaginative solutions that they sold for techs to resolve.
Proper reply "oh I'm sorry I've just started on my 3rd double whisky/rum/vodka/gin and I'm nowhere near fit to drive" "Sorry I got a flat tyre on the way home and I don't have a spare on my car" "my wife's just went out with the car and she doesn't have her mobile on her, said she was going to visit her mother, no I have no idea how long she'll be, could be hours, wouldn;t be the first time she's stayed overnight" "my wife's just left with the car for a night shift, won't be back till past 7am tomorrow" etc etc etc
Or from the significant other "sorry he's currently doing the technicolour yawn, I think I undercooked the chicken a bit and when I say a bit...I mean a lot, sorry I have to go I think he's choking" "sorry he's not come in yet, no his phone was going flat, of "course" i'll mention it when he gets in *mis repeat message until caller gives up in fury*"
... Aircon failed in server room at hospital.
Not picked up by monitoring systems... found 50°C temps after temp warning on a switch.
Collected as many fans as possible and wedged doors open
Very nervous new year not fixable by Aircon supplier until after the new year.
Fortunately, everything survived.
Another time (different org) went into server room to be hit by wall of heat and fans sounding like 747 at take off. Estates / Facilities office bod had to escort someone in there and turned the AC off as it was a bit chilly for her and didn't switch it back on...
'Estates / Facilities office bod had to escort someone in there and turned the AC off as it was a bit chilly for her and didn't switch it back on...'
"incident: unauthorised interference with equipment, damage ongoing (heat stress causing premature failure), cost estimated to be not less than £10000 (replacement overstressed hard drives, etc)"
You can profit out of that one for a year and estates/facilities will leave your equipment the hell alone in future
AND..... as a very cool story about just how far some server farm centres will go to cool their systems, here is a story about a late 1990's/Early 2000's era system in the USA near the Paradise Valley region of Nevada that is DEEEEEEEP underground, fully rad-hardened, and guarded by 20 metre thick concrete walls and STAINLESS STEEL WALLS and floors all underscored by tonnes of sandstone and rock.
Hither 200 metres+ down the elevator was a HUUUUUUUUUGE ROOM probably about 75 metres long by 25 metres wide by 5 metres high filled with racks of servers from floor to ceiling of unimaginable CPU/DSP chip horsepower and based upon MY calculations was at least half-an-exaFLOP of DSP-specific 32-bit number crunching power (for the late 1990's time period, that was unfathomable processing horsepower!)
Now what was REALLY COOL was that this ENTIRE ROOM was built out of METAL and was actually IMMERSED in a larger POOL of de-ionized water. We were actually FLOATING in an extremely large underground reservoir and all the heat was being sucked from the racks and routed into a large pool of water via water cooling tubes and fully immersed cooling fins which formed a vast hydro-thermal cycle system!
This system was NOT classified confidential or secret at the time but it WAS held in high secretive regard in certain mathematics and physics circles because at the time it was operated by a well-known university. It is TODAY STILL ACTIVE but NOW it has been moved over to being operated a major national laboratory and ALL activities are FULLY classed as being Top Secret, although its actual physical location IS known within the math/physics community!
Again, this ENTIRE 75x25x5 metre room is a floating box of stainless steel kept within a larger cooling pool !!!! Talk about expensive to build and run!
Sooooo...... for comparisons against TODAY'S available CPU/GPU horsepower, 75 metres is about 130 of the 72 inch high racks long (allowing for some spacing) by 2 racks high by about 15 racks wide so in total about 3900 racks where each rack would contain about 20 blade server cases with power supplies so in total about 78,000 of the common 4RU cases with each 4 RU case containing about 8 blades each with two EPYC or XEON CPU's per blade and some in-case SSD drives.
So we are talking of a total of about 1,248,000 CPUs and if it's AMD Epyc 7742-series in 2019/2020, that is 3.4 Teraflops per CPU so it is now probably at about 4.2 ExaFLOPS of 64-bit real number and integer crunching in JUST THAT SERVER ROOM ALONE !!!! And I have ALWAYS heard about even more secretive underground server rooms that were MUCH MUCH MUCH LARGER than the 75x25x5 metre one (i.e. 10x larger in volume !!!!!).
Since this national lab deals EXCLUSIVELY with nuclear physics and plasma physics for its compute requirements, I am assuming it's a DOE (Department of Energy) laboratory probably operated by the SAME USC/Bechtel/BWX/AECOM/TexasA&M consortium that operates and manages the OTHER national physics labs!
So there you have it --- Some people take server cooling REALLY SERIOUSLY !!!
I always go to my work's Christmas do. It's a free meal, at the very least.
Nothing says I can't take my Kindle and just sit in a corner reading, when I'm not eating and drinking. And when I've had enough I can head back to the office or home, depending on the time and whether I have anything pressing I want to get done.
I had a customer who was complaining their telco servers were corroding and failing in place in Thailand. Went to the customer site and everything looked great with their environmental systems and this was looking to be a mystery until.. as I was leaving they shut down the AC and opened the doors of the CO these servers were in. I asked what was going on and they said, oh, we always do this late in the day to save AC costs and the fog coming in from the ocean is so cool at night.
Worked for a large enterprise customer in Newcastle (Australia). A lot of customers would co-lo their gear to a datacenter in the area, who also acted as an ISP with their own dark fibre network. We received a bunch of temperature warnings in the wee hours of the morning from our DR rack that was located at the datacenter, turns out the AC unit(s) had failed. AC tech couldn't get the replacement circuit board quickly, so he ended up propping every door open between the datacenter racks and the main entrance, and had a fan running. Bye bye security.
Was interesting seeing the various lies being spouted by the co-lo provider as they tried to cover it up lol
Fortunately our racks were closest to the entrance and survived, other customers such as banks and health insurance providers probably fared much more poorly though!
... or "deploying special measures" as I've seen it referred to in the past. Many years ago a DC hosting one of our racks had an aircon failure, it was detected and sorted before anything broke. The official update from the company informed us that their engineers had "deployed special measures" and were working to resolve the problem. Once it was all fixed we unofficially asked one of the engineers who knew us well what "special measures" entailed, strongly suspecting we knew the answer... yep, they'd opened all the doors to the DC and dug out all the office fans they could to get air flowing. :)
1980s, I was an apprentice in my teens at Ferranti, working in the computer room (Three VAXen, 11/750, 11/780s if I recall, and various micro axes) and arrived one morning to consternation. One of the massive floor to ceiling sized aircons had failed, and the other had been forced to work so hard that it had iced up, rendering it useless too..
Later in my career, visited a customer (whom would later be better known as Phones4u) in their brand new offices in Stoke, and the drip tray on the aircon had overflowed, pouting water into the "computer room." It missed their single server box, but did make a mess of the power distribution panel due the entire floor!
Biting the hand that feeds IT © 1998–2020