back to article The swift in-person response is part of the service (and nothing to do with the thing I broke while trying to help you)

Corporal Cockup meets Major Outage in this week's episode of On Call as a reader's helpful walkthrough takes down the telephony server. Today's recollection by reader "Simon" (for that is not his name) takes us back more than a decade to the early part of this century, where he was putting in time doing first line support for …

  1. Trygve Henriksen

    And this is why every server should have ILO, iDrac or similar tech installed.

    1. Anonymous Coward
      Anonymous Coward

      .. provided you protect that as hard as the server or you have just created a major command line backdoor. Doubly so if you build it into the CPU (I'm looking at you, Intel, for making me lose a serious chunk of server capacity just to keep that backdoor closed).

      We have them on a separate network, excluded at the router from going anywhere else.

      1. tip pc Silver badge

        a secured management network is the way to go.

        1. lglethal Silver badge
          Trollface

          a secured management network ...

          Thats the network you dont let management anywhere near, right?

          1. J. Cook Silver badge

            Yes, because they'll find the game server(s) sitting there...

            1. Yet Another Anonymous coward Silver badge

              Remember to also have some sort of remote access to the remote management network router.

              Perhaps a dialup modem and a serial port ?

              1. jake Silver badge

                Dial-back modem, to be precise.

                A very good reason to learn how to admin a network with nowt but a CLI.

                1. Nick Ryan Silver badge

                  Which causes no end of hilarity with Windows.

                  A good few years ago we had a newbie quit his role because he couldn't cope with our "non standard ways of doing things"... which in his mind was knowing what we were doing and using the command line to fix things that no matter how much he waved the mouse pointer at them just weren't getting fixed.

                  Scripting allows us to produce consistent, reproducable management processes. Waving a mouse pointer while sometimes quick and easy is not reproducable nor so accountable.

  2. Dave K

    Are we sure that Simon didn't disable the voicemail server intentionally? And I'm pretty sure he would have brought his cattleprod to help with the "user" as well...

    1. John G Imrie

      Simon wouldn't disable voicemail, an off site backup with all the juicy details of who was shagging who is more his style.

      #BOFH

    2. Korev Silver badge
      Pint

      I'm pleased I wasn't the only one to think BOFH when I saw the name Simon...

      1. Antonius_Prime

        Wishing for BOfH-mas

        We all did...

        Subconciously, we need him... want him.. on the system...

        (Throwback to a Few Good Bastards...)

      2. The Oncoming Scorn Silver badge
        Coat

        I thought it was very brave of El Reg to use the BOFH's name in vain like that.

        Personally I'd be getting the fuck out of town, hence icon.

  3. juul

    Not a Nottingham company

    Netop was a Danish company at the time.

  4. nintendoeats

    Pedantry detected on radar, incoming, brace for impact

    Technically, isn't this really a Who, Me?

    1. algotr
      Facepalm

      Re: Pedantry detected on radar, incoming, brace for impact

      I agree, this belongs to "Who, Me?"

      1. Excellentsword (Written by Reg staff)

        Re: Re: Pedantry detected on radar, incoming, brace for impact

        It was discussed, but since a call was involved and the whome mailbag is rather full, we dropped it into oncall. "I'm sure the readers will be quick to tell me I'm an idiot," was also uttered.

  5. Anonymous Coward
    Anonymous Coward

    Been there, done that at the switch level by breaking the 'allowed vlans' statement on the switch's uplink ports. Whoopsie.

    1. roytrubshaw
      FAIL

      I see your borked network switch and raise you an entire server room!

      So the guy was here to service the UPS; so of course I have to switch the server room over to the mains.

      There are two switches:

      1) A bypass switch - to allow the current from the mains to reach the UPS ring-circuit directly

      2) An isolation switch - to isolate the UPS from the UPS ring-circuit.

      He couldn't do the switch-over as he wasn't allowed to.

      So the customer (me) had to do it. This was something I had done several times before (I had actually installed the UPS originally) so no problem!

      Guess which switch I inexplicably threw first?

      Cue the silence of thirty (or more) servers (and their cooling fans) suddenly going very quiet!

      Cue the red-faced guy quickly turning on the bypass switch and hoping that no-one "upstairs" will notice the short outage!

  6. longtimeReader

    Really remote shutdown

    Back in 1989 a group of us were working in the company's Texas offices. Some of the rest of the team remained in Hampshire. We were all working with Unix boxes, but there was no direct TCP/IP connection available between the sites - the only communication where we could work on the "other side's" Unix system required using a 3270 terminal emulator and then using the SNA network which did run around the world to log onto a mainframe in the right country from where we could do line-mode telnet to the Unix box. Which was not very usable but just about OK for occasionally looking at any essential logs or executing simple commands. That single line input prompt did make it very obvious whether you were working on your own local Unix system or a transatlantic machine.

    So I was very surprised when my machine halted unexpectedly, and then found out that S (for that was his initial) had managed to type "shutdown -f" in the wrong window and sent the command across the ocean.

  7. DougMac

    NetOp

    I loved using NetOp. Worked on so many things where other solutions barely did.

  8. DS999 Silver badge

    I remember doing something like this

    Long ago in a job far far away I was installing a new and fully patched version of HP-UX 9.x on all the engineering workstations. To avoid disrupting the engineers, I was doing it a few each evening - I'd remotely boot them off another server, reformat the drive, run a script to untar the OS image, make it bootable etc., then manually update a couple things that differ from the clone image like hostname and IP address, then reboot and confirm everything was OK. Only about two minutes of actual work per server, the rest of the time was waiting for the tarball to be unspooled so I'd watch an hour of TV then move onto the next.

    So of course eventually I fat fingered the hostname / IP address on one so it didn't come back up. Or rather it did, but I couldn't access it to confirm so I knew something was wrong. And of course it was the ME application lead's workstation, who had been the most resistant to these updates - he was a "if it ain't broke don't fix it" guy so the ME workstations were NEVER patched and some of them were running HP-UX 8.x versions five years out of date without a single patch! I had to "prove myself" by updating all the EE workstations first (the EE app lead was the guy who got me the job and was 100% behind this effort) and then a couple ME workstations as test cases with his power users who are the most picky before the ME app lead was willing to go forward.

    Worse was the fact this guy was an early bird, he was a former drill sergeant legendary for coming in at exactly 6am every morning. So I had to get up at about 4:30am to insure I was there at 5:30am to manually fix his workstation. I did so and rebooted and confirmed it was all good and just as I was walking down the stairs to my office I see him come in the door. I quickly ducked off at another floor so he didn't see me, and he was never the wiser that the ONE time I messed up in the whole project on was on his workstation!

  9. Willie T
    Facepalm

    Hence the reason for AdminSDHolder

    The whole "locking yourself out of the room you've left the keys in" problem is why the AdminSDHolder object and SDProp process exists in Active Directory. Certain built-in accounts can't be de-privileged because the OS just stamps their privileged back again on a regular basis. I'm guessing before that was in place Microsoft support engineers fielded a few calls from admins in tears.

  10. Jim Willsher

    I did something worse.

    Remote Desktop onto a server, happily working away editing code in Notepad++. Cue 5PM and I suddenly remember I’m late and will be slaughter by SWMBO.

    Ctrl+S, exit, start, shutdown, confirm, grab coat, run.

    I’m now sitting in the car on my way home whilst rest of office is wondering why they can’t access stuff.

  11. Tom 7

    Set a few machines up in Paris once

    came home and loads of complaints about very slow responses on an NT server. Then I realised I'd got bored after a couple of bottles of red and a whole round of Camembert and a pack of butter and two baguettes for lunch and had played with the Pipes 3d screensaver and obscene French wrap around textures for it. This of course used every last drop of cpu it could and it took me what seemed like lifetime to ISDN into the machine, kill the bloody screensaver and disable the bloody thing. Would very much have liked to have been sent back for some more Camembert - its the President stuff in the round wooden box that in the UK is fuck all like the stuff you get in France.

    1. Nick Ryan Silver badge

      Re: Set a few machines up in Paris once

      Had a customer with that, Southampton possibly so not a short drive, that experienced arbitrary serious server slow down issues on their NT4 server that our in house hardware sales team had configured and supplied. They'd gone and set one of the 3D screensavers with the name of the client and therefore whenever we turned up and looked at the performance everything was fine, we went to lunch and 20 minutes after we had left the server performance tanked again.

      I was not impressed, and I deleted every 3D screensaver from the server. The problem never returned and the hardware team was given very strict instructions to do the same (and to go fix other sites that they had recently supplied)

      These days servers have load balancing preventing a shell task from using 100% CPU but back then there was no such thing and Microsoft were too busy shouting about how wonderful their pre-emptive multitasking was while carefully glossing over the fact that it was in fact still cooperative multitasking rather than true multtasking.

  12. Bruce Ordway

    Wonder where these addresses are coming from?

    Took a couple of days once to figure out why a site had started seeing random user problems.

    Finally tracked it down to a rogue DHCP server that had been enabled on that switch.

    Oops.. it was the spare switch I had grabbed from the equipment room earlier in the week when I needed a few extra network connections at my test bench.

    Now before installing any gear that has been "lying around", I ALWAYS reset to factory default and configure as needed

    Nobody realized I had been the cause, were just happy it was resolved.

  13. Soruk
    Facepalm

    Been there, I typoed the default gateway when remote configuring a new physical server. No ILO or iDrac either. Oops... (Though it wasn't yet live, thank $DEITY).

    I was preparing myself to have to take that long journey to the data centre to fix it at the console, when it dawned on me that I had access to another box on the same subnet. Thankfully this worked and I was able to SSH in and sort it from my desk, but that was a really close shave. Almost too close.

  14. MotionCompensation

    Windows NT 3.51 unintended shutdown

    After logging in and out of the console of a Windows NT 3.51 fileserver many times, it had become routine. Click and hold File in Program Manager, release the mouse button on Logoff (the menu item right above Shutdown) and click OK on the “This will end your Windows NT Session” dialogue box that followed. Until the inevitable happened: I released the mouse button a little late. And automatically clicked Ok. And watched the server shut itself down.

    I went to the phone, waited for the calls and explained that some emergency maintenance was underway, but it would not last more than 10 minutes. Within 10 minutes, the server was back up. One of the few times IT correctly predicted how long maintenance would take.

  15. Anonymous South African Coward Bronze badge

    Remote shutdowns

    I'm part of the "shutdown the remote server by accident" club.

    Oh that joyous feeling when you realized too late that you've selected shutdown (either via GUI or CLI) the server you're working on...

    iLO was enabled on the remote server, however the VPN was down. Client still have to arrange for a portforward to enable the VPN.

    Luckily an onsite tech was there to start the server up. Said server also hosts a couple of VM's...

    Lesson learnt. Are looking at vays und means of makink ze shutdown button goe awaye.

  16. wyatt

    Ah yes, I've disabled a NIC before without thinking things through. Fortunately it wasn't that far of a drive, it took longer to get into the server room once I was there.

    I rarely work with customers who connect the management NIC.

  17. gh1978

    Ah the joys

    Whilst it's mainly aimed at the academic world, until a few months ago, we'd been using Impero for our remote desktop support solution for about 300 devices on one site.

    It worked very well.

    Though both engineers on site have at least once managed to restart every PC onsite by not reading what's on the screen (went to reboot one device, it said you haven't got any devices selected, do you want to select all'.

    Queue the phone ringing off the hook (our network wasn't too bad, but it really wasn't geared to 300 users all trying to login at the same time in the days when roaming profiles were in favour).

  18. Jou (Mxyzptlk) Silver badge

    GPO controlled firewalls are worth so much!

    Every network I come across which does not have a GPO to control the Windows firewall on clients and servers gets some from me. Avoids exactly that type of problem, avoids unreachable servers because they are stuck in the state of "detecting network" or "unidentified network" after reboot and so on.

    Avoid deactivating it or even disabling the service. I had cases where Windows couldn't install a font due to a deactivated firewall, a paranoia Microsoft added quite a while ago to catch weird-hacker-fonts.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like