And this is why every server should have ILO, iDrac or similar tech installed.
The swift in-person response is part of the service (and nothing to do with the thing I broke while trying to help you)
Corporal Cockup meets Major Outage in this week's episode of On Call as a reader's helpful walkthrough takes down the telephony server. Today's recollection by reader "Simon" (for that is not his name) takes us back more than a decade to the early part of this century, where he was putting in time doing first line support for …
COMMENTS
-
-
Friday 7th May 2021 07:55 GMT Anonymous Coward
.. provided you protect that as hard as the server or you have just created a major command line backdoor. Doubly so if you build it into the CPU (I'm looking at you, Intel, for making me lose a serious chunk of server capacity just to keep that backdoor closed).
We have them on a separate network, excluded at the router from going anywhere else.
-
-
-
-
-
-
Sunday 9th May 2021 13:35 GMT Nick Ryan
Which causes no end of hilarity with Windows.
A good few years ago we had a newbie quit his role because he couldn't cope with our "non standard ways of doing things"... which in his mind was knowing what we were doing and using the command line to fix things that no matter how much he waved the mouse pointer at them just weren't getting fixed.
Scripting allows us to produce consistent, reproducable management processes. Waving a mouse pointer while sometimes quick and easy is not reproducable nor so accountable.
-
-
-
-
-
-
-
-
-
Sunday 9th May 2021 08:13 GMT roytrubshaw
I see your borked network switch and raise you an entire server room!
So the guy was here to service the UPS; so of course I have to switch the server room over to the mains.
There are two switches:
1) A bypass switch - to allow the current from the mains to reach the UPS ring-circuit directly
2) An isolation switch - to isolate the UPS from the UPS ring-circuit.
He couldn't do the switch-over as he wasn't allowed to.
So the customer (me) had to do it. This was something I had done several times before (I had actually installed the UPS originally) so no problem!
Guess which switch I inexplicably threw first?
Cue the silence of thirty (or more) servers (and their cooling fans) suddenly going very quiet!
Cue the red-faced guy quickly turning on the bypass switch and hoping that no-one "upstairs" will notice the short outage!
-
-
Friday 7th May 2021 13:45 GMT longtimeReader
Really remote shutdown
Back in 1989 a group of us were working in the company's Texas offices. Some of the rest of the team remained in Hampshire. We were all working with Unix boxes, but there was no direct TCP/IP connection available between the sites - the only communication where we could work on the "other side's" Unix system required using a 3270 terminal emulator and then using the SNA network which did run around the world to log onto a mainframe in the right country from where we could do line-mode telnet to the Unix box. Which was not very usable but just about OK for occasionally looking at any essential logs or executing simple commands. That single line input prompt did make it very obvious whether you were working on your own local Unix system or a transatlantic machine.
So I was very surprised when my machine halted unexpectedly, and then found out that S (for that was his initial) had managed to type "shutdown -f" in the wrong window and sent the command across the ocean.
-
Friday 7th May 2021 19:48 GMT DS999
I remember doing something like this
Long ago in a job far far away I was installing a new and fully patched version of HP-UX 9.x on all the engineering workstations. To avoid disrupting the engineers, I was doing it a few each evening - I'd remotely boot them off another server, reformat the drive, run a script to untar the OS image, make it bootable etc., then manually update a couple things that differ from the clone image like hostname and IP address, then reboot and confirm everything was OK. Only about two minutes of actual work per server, the rest of the time was waiting for the tarball to be unspooled so I'd watch an hour of TV then move onto the next.
So of course eventually I fat fingered the hostname / IP address on one so it didn't come back up. Or rather it did, but I couldn't access it to confirm so I knew something was wrong. And of course it was the ME application lead's workstation, who had been the most resistant to these updates - he was a "if it ain't broke don't fix it" guy so the ME workstations were NEVER patched and some of them were running HP-UX 8.x versions five years out of date without a single patch! I had to "prove myself" by updating all the EE workstations first (the EE app lead was the guy who got me the job and was 100% behind this effort) and then a couple ME workstations as test cases with his power users who are the most picky before the ME app lead was willing to go forward.
Worse was the fact this guy was an early bird, he was a former drill sergeant legendary for coming in at exactly 6am every morning. So I had to get up at about 4:30am to insure I was there at 5:30am to manually fix his workstation. I did so and rebooted and confirmed it was all good and just as I was walking down the stairs to my office I see him come in the door. I quickly ducked off at another floor so he didn't see me, and he was never the wiser that the ONE time I messed up in the whole project on was on his workstation!
-
Friday 7th May 2021 20:05 GMT Willie T
Hence the reason for AdminSDHolder
The whole "locking yourself out of the room you've left the keys in" problem is why the AdminSDHolder object and SDProp process exists in Active Directory. Certain built-in accounts can't be de-privileged because the OS just stamps their privileged back again on a regular basis. I'm guessing before that was in place Microsoft support engineers fielded a few calls from admins in tears.
-
Saturday 8th May 2021 11:37 GMT Jim Willsher
I did something worse.
Remote Desktop onto a server, happily working away editing code in Notepad++. Cue 5PM and I suddenly remember I’m late and will be slaughter by SWMBO.
Ctrl+S, exit, start, shutdown, confirm, grab coat, run.
I’m now sitting in the car on my way home whilst rest of office is wondering why they can’t access stuff.
-
Saturday 8th May 2021 14:05 GMT Tom 7
Set a few machines up in Paris once
came home and loads of complaints about very slow responses on an NT server. Then I realised I'd got bored after a couple of bottles of red and a whole round of Camembert and a pack of butter and two baguettes for lunch and had played with the Pipes 3d screensaver and obscene French wrap around textures for it. This of course used every last drop of cpu it could and it took me what seemed like lifetime to ISDN into the machine, kill the bloody screensaver and disable the bloody thing. Would very much have liked to have been sent back for some more Camembert - its the President stuff in the round wooden box that in the UK is fuck all like the stuff you get in France.
-
Sunday 9th May 2021 14:27 GMT Nick Ryan
Re: Set a few machines up in Paris once
Had a customer with that, Southampton possibly so not a short drive, that experienced arbitrary serious server slow down issues on their NT4 server that our in house hardware sales team had configured and supplied. They'd gone and set one of the 3D screensavers with the name of the client and therefore whenever we turned up and looked at the performance everything was fine, we went to lunch and 20 minutes after we had left the server performance tanked again.
I was not impressed, and I deleted every 3D screensaver from the server. The problem never returned and the hardware team was given very strict instructions to do the same (and to go fix other sites that they had recently supplied)
These days servers have load balancing preventing a shell task from using 100% CPU but back then there was no such thing and Microsoft were too busy shouting about how wonderful their pre-emptive multitasking was while carefully glossing over the fact that it was in fact still cooperative multitasking rather than true multtasking.
-
-
Saturday 8th May 2021 14:41 GMT Bruce Ordway
Wonder where these addresses are coming from?
Took a couple of days once to figure out why a site had started seeing random user problems.
Finally tracked it down to a rogue DHCP server that had been enabled on that switch.
Oops.. it was the spare switch I had grabbed from the equipment room earlier in the week when I needed a few extra network connections at my test bench.
Now before installing any gear that has been "lying around", I ALWAYS reset to factory default and configure as needed
Nobody realized I had been the cause, were just happy it was resolved.
-
Saturday 8th May 2021 21:22 GMT Soruk
Been there, I typoed the default gateway when remote configuring a new physical server. No ILO or iDrac either. Oops... (Though it wasn't yet live, thank $DEITY).
I was preparing myself to have to take that long journey to the data centre to fix it at the console, when it dawned on me that I had access to another box on the same subnet. Thankfully this worked and I was able to SSH in and sort it from my desk, but that was a really close shave. Almost too close.
-
Sunday 9th May 2021 12:06 GMT MotionCompensation
Windows NT 3.51 unintended shutdown
After logging in and out of the console of a Windows NT 3.51 fileserver many times, it had become routine. Click and hold File in Program Manager, release the mouse button on Logoff (the menu item right above Shutdown) and click OK on the “This will end your Windows NT Session” dialogue box that followed. Until the inevitable happened: I released the mouse button a little late. And automatically clicked Ok. And watched the server shut itself down.
I went to the phone, waited for the calls and explained that some emergency maintenance was underway, but it would not last more than 10 minutes. Within 10 minutes, the server was back up. One of the few times IT correctly predicted how long maintenance would take.
-
Sunday 9th May 2021 15:11 GMT Anonymous South African Coward
Remote shutdowns
I'm part of the "shutdown the remote server by accident" club.
Oh that joyous feeling when you realized too late that you've selected shutdown (either via GUI or CLI) the server you're working on...
iLO was enabled on the remote server, however the VPN was down. Client still have to arrange for a portforward to enable the VPN.
Luckily an onsite tech was there to start the server up. Said server also hosts a couple of VM's...
Lesson learnt. Are looking at vays und means of makink ze shutdown button goe awaye.
-
Monday 10th May 2021 08:49 GMT gh1978
Ah the joys
Whilst it's mainly aimed at the academic world, until a few months ago, we'd been using Impero for our remote desktop support solution for about 300 devices on one site.
It worked very well.
Though both engineers on site have at least once managed to restart every PC onsite by not reading what's on the screen (went to reboot one device, it said you haven't got any devices selected, do you want to select all'.
Queue the phone ringing off the hook (our network wasn't too bad, but it really wasn't geared to 300 users all trying to login at the same time in the days when roaming profiles were in favour).
-
Saturday 15th May 2021 19:47 GMT Jou (Mxyzptlk)
GPO controlled firewalls are worth so much!
Every network I come across which does not have a GPO to control the Windows firewall on clients and servers gets some from me. Avoids exactly that type of problem, avoids unreachable servers because they are stuck in the state of "detecting network" or "unidentified network" after reboot and so on.
Avoid deactivating it or even disabling the service. I had cases where Windows couldn't install a font due to a deactivated firewall, a paranoia Microsoft added quite a while ago to catch weird-hacker-fonts.