back to article Junior techie rushed off for fun weekend after making a terminal mistake that crashed a client

Shifting focus from weekend fun to the reality of a return to work can be hard, so The Register tries to ease the transition with a fresh instalment of "Who, Me?", our reader-contributed column that tells your stories of making mistakes and making it out alive afterwards. This week, meet a reader we'll Regomize as "Bill" who …

  1. Mishak Silver badge

    Not me, but...

    I once worked for a company down south that provided energy management systems for office blocks and the like.

    The hardware was a eurocard version of the BBC micro, with the code being stored in battery-backed RAM. An external board had been added to provide a watchdog with a thirty second timeout - if this wasn't serviced, then a reset was triggered to restart the system. A modem was used to transmit alarms, to allow settings to be changed, and to support remote software updates.

    However, there was a "known risk" with the "software updates". These needed the running program to be stopped so that new code could be entered at the command line over the modem (redirected from a file). The trouble was, the watchdog was not being serviced when the program was stopped, so the person loading the new software had to ensure that the blocks were small enough that "resetWatchdog" could be manually entered at least every 30 seconds; timing was therefore critical and distractions a real risk.

    Of course, the inevitable happened - with a shout of "I'm just off the airport to get the next available flight to Glasgow"...

  2. Bebu sa Ware
    Coat

    "clients of IT service providers don't pay to have their servers broken and applications go down"

    Not only before the advent of mobile phones, but also cloud services and likely before MS.

    Now pretty much de rigueur.

    1. Peter Gathercole Silver badge

      AS/400

      AS/400s aren't as old as that. Launched in something like 1988, a long time after MS was set up (1975 or 1981 for the incorporation date), and even after some versions of Windows were produced!

      Of course, the AS/400 was the follow on replacements for the System/36 and System/38 systems, so you could say that they pre-dated MS (maybe... I've not checked the actual dates!).

      1. aelfheld

        Re: AS/400

        S/36 came out after the S/38 - it (the S/36) was a replacement for the S/34 which was, IIRC, announced at the same time as the S/38 but came out about 4 years before the S/38.

  3. Mishak Silver badge

    And one closer to the story here...

    It was the end of the week, so the sys admin responsible for the CAD servers at a large automotive OEM executed "shutdown -h now" on his local workstation.

    "That's strange", he thought, "it's still running".

    At which point all the phone lines lit up - and he remembered he had been remoted into the main CAD system...

    1. phuzz Silver badge

      Re: And one closer to the story here...

      Anyone who claims they've never done something similar is lying.

      I should probably look at installing molly-guard by default on new servers.

      1. Jedit Silver badge
        Joke

        "Anyone who claims they've never done something similar is lying."

        I haven't, and I'm not lying!

        (I've also never been a sysadmin. Am I cheating?)

        1. handle handle

          Re: "Anyone who claims they've never done something similar is lying."

          Yes.

      2. Vometia has insomnia. Again.

        Re: And one closer to the story here...

        Turned the reset key in the wrong computer. :| The one that the C-suite secretaries used. As much as I might try to defend myself because the then ops manager had decided in his wisdom to mismatch the computers and their consoles, it was pretty careless. I realised when I had it at full lock.

        Also left a call to a comedy "panic" script in the sysadmin login to wind up the day-shift ops staff. Then, inevitably, something actually important cropped up and while I was dealing with it I forgot all about my prank and went home. The night shift triggered it and called out the emergency engineers whose manager didn't see the funny side. My manager did a good job of protecting me from the fallout, thankfully (for me), but my teenage years seemed to extend through most of my 20s too.

        1. 8bitHero

          Re: And one closer to the story here...

          Oh to be young and dumb again.

          At the time I was working for a large national freight carrier in the US. I was part of a team that had taken their paper and pencil driver scheduling system and had automated it performing least cost driver to load assignments "in real time" (did a solve once every 15 minutes). Data was transferred from the IBM 3090 mainframe to an AT&T 3B2 mini-computer to do the processing. Disk was too slow, so we did in-memory data structure updates processing changes only from the mainframe. A full reload from scratch would take 6 to 8 hours to get back in sync with the records on the mainframe, so we kept the development box in sync with production and wrote scripts to allow it to fail over if the production box was not communicating.

          The system was still fairly new and we had been spending long weeks with some 24 hour days monitoring and troubleshooting some recent updates. The inevitable happened and the production server fell over. The development server picked up and everything ticked over as it should. I called the AT&T tech who said they wouldn't look at the server unless I removed it from the rack. Stopped by my boss's desk on the way to the data center to tell him what I was up to, and he said no - come in Saturday night and remove the sever and have the tech come in on Monday to look at it.

          I left his office, thought about it a bit and decided no, he was wrong, there was no way I was coming in at midnight on Saturday having already worked too many nights supporting this system this week - I am not an idiot, I can get the server out of the rack without touching the production server now, and he'll never know the difference. I grab my buddy, we go into the data center and for reasons I will never know decide to undo the power cord at the plug, in a rat's nest under the raised floor, instead of the back of the already dead machine. I have my buddy pulling on the cord so I can identify it and now sure I have the correct cord pull it out of the socket. I immediately hear a hard drive spinning down. Odd, the machine was already off and without pulling my head out of the raised floor... "Jeff" - yes - "That's the production box" - yes - "sigh".

          Finish getting the machine out of the rack (I am going to do the time I might as well do the crime), repower the backup and get a resync with the mainframe started. Longest walk of my life heading back to the boss's office where I fess up to what I did. He starts (justifiably) ripping me a new one and I am just waiting for him to get to the point where he hands me a box and tells me to clear out my desk. In the middle of this tirade the senior VP of supply chain comes in and starts yelling at both of us because the company is blind, we don't know where any of our drivers are, can't make assignments to new loads, etc. My boss stops him mid-rant, tells him we the production system fell over, but there was a communications glitch and the backup machine didn't pick up. I had already sorted the issue and started the reload and he should be thanking me for my quick action and to leave me alone and let me finish fixing the problem. VP leaves in a huff, boss turns back to me to finish giving me the dressing down I had earned - but I am smiling now because I know he isn't going to fire me. He wings a stapler in my general direction (it was a different time) and tells me to get out. Best boss I ever had, and I never did anything that stupid again.

    2. mhoulden

      Re: And one closer to the story here...

      One of the things I've picked up from these stories is the importance of making your local desktop look different from the remote one. A different colour at the very least. The sysinternals app BGInfo will add the hostname and other info for you. I think we have it running on most of our remote systems.

      1. collinsl Silver badge

        Re: And one closer to the story here...

        I proposed a colour scheme at my last work (which was never adopted) using bginfo and group policy which went something like this:

        RED - PRODUCTION

        YELLOW - BC/DR

        BLUE - TEST

        WHITE - DEV

        This caters for people who are red/green colourblind but not unfortunately for anyone who is fully colourblind

        You may also add borders to the backgrounds (manually created and applied per OU by group policy) for customer, security classification, datacentre/location, etc. BGInfo can of course give you hostname, OS version, free text for datacentre/location/virtualisation platform etc as you see fit.

        I've also started using a similar product to BGInfo called DesktopInfo - also freeware, but this one can update "live" and give you disk usage stats, network IP addresses on the fly etc. More useful for desktops I find as it's a bit wasteful on servers if they never change those details.

      2. C R Mudgeon

        Re: And one closer to the story here...

        At one job, where my desktop machine ran Windows but I spent most of my day ssh'ed into various Unix boxen, I arranged something along that line using per-host prefs settings in the ssh client (SecureCRT, I think it was). Pale pastel background colours so as not to make my eyes hurt, but with distinctly green/yellow/pink casts as appropriate.

        At another job, one of the web developers made a conspicuously messed-up version of their site's main top-of-page banner, and hacked his own desktop's hosts file such that he saw his weird banner on the dev site but the proper one on production. (This being "Who, Me?", one wants a punch line about his messed-up banner getting promoted to prod, but I'm not aware of that ever happening.)

        In a more distantly related sort of visual cue, another web dev had, as his desktop's background, a series of rectangles (not concentric but, umm, con-top-left-corneric), sized to match then-common monitor resolutions, to remind himself of the limited visual real estate our users had available, as opposed to his own generous monitor.

  4. Michael Hoffmann Silver badge
    WTF?

    Kept his job?!

    Rather than being forced to eat his own testicles while the customer was watching?

    Was this in a parallel universe?!

    1. Fading

      Re: Kept his job?!

      I guessing the agreed and known processes were not robust enough to prevent this eventuality and hence management were not entirely blameless. Hard to place all the blame on a Junior techie if you are responsible for their actions, training and access levels.

    2. NoneSuch Silver badge
      Holmes

      Re: Kept his job?!

      This was back in the day when HR wasn't a thing, "policies" were unheard of and a few pints and/or bruised knuckles sorted things out.

      Those days are long gone.

      1. ICL1900-G3 Silver badge

        Re: Kept his job?!

        And no, things are not better now.

    3. Doctor Syntax Silver badge

      Re: Kept his job?!

      He had just learned a lesson at considerable expense to the company. Why double down on that expense by dismissing a now wiser employee?

      1. DS999 Silver badge

        Re: Kept his job?!

        Yep if someone who works for me makes one really huge mistake I'll chalk it up to gaining valuable experience. Do it again, and I'll figure they learned the wrong lesson by not being fired. You'd have to pull him from that client's account for sure though, they aren't outsourcing so the people managing their systems can learn on the job.

        1. John Brown (no body) Silver badge

          Re: Kept his job?!

          "they aren't outsourcing so the people managing their systems can learn on the job."

          Except one of the main reasons for outsourcing is to lower costs, so while the out-sourcer mught put the "A" team on at the start of the contract, they will very quickly move them on to the next new client while the now existing client gets the cheaper, leass experienced staff who often *are* larning on the job. In some cases, constantly due to staff turn-over.

  5. An_Old_Dog Silver badge

    Less-of-a-Problem These Days

    ... one would hope, as many Unix and Linux distros default the command prompt to something like:

    (machinename) $

    I don't recall that older IBM mainframe OSes let you do anything like that.

    1. Anonymous Coward
      Anonymous Coward

      Re: Less-of-a-Problem These Days

      Ah yes .. the hostname in the prompt,

      so that you have something to look at to cnfirm the syncing feeling that you were indeed typing the command in the wrong prompt ..

      Been there, done that ...

      Have sat with someone else where we both confirmed that the terminal window with the yellow lettering on the black background was thge remote server and the one with the green font on the black background was the remote ... and even so the reboot command was entered and typed in the wrong window

      Have accidentally shut down a remote modem which was the only way in the remote system even though it had a different shutdown command than the server I was supposed to be reboting ...

      Restarting servers on a friday afternoon, just before going home .. just say no ...

      About the only thing that works is to not type the shutdown command, in stead go for a walk (to the coffee machine, bathroom, whatever) maybe splash some cold water on your face .. think abut something else for a few minutes and even then mistakes have been made...

      1. Yet Another Anonymous coward Silver badge

        Re: Less-of-a-Problem These Days

        >Ah yes .. the hostname in the prompt,

        Followed by a corporate policy to stop naming servers "gandalf", "bilbo" etc and call them "SYSYSRVP123" for testing and "SYSPSRVP123" for production

        1. Anonymous Coward
          Anonymous Coward

          Re: Less-of-a-Problem These Days

          This particular change (renaming all servers to alphabetsoupnumber) was the direct cause of my shutting down the wrong server on at least one occasion when they differed by a single digit

  6. Anonymous Coward
    Anonymous Coward

    Anonymous since it happened very recently, while tidying up hosted servers and despite taking my usual care and making the usual checks I somehow managed to delete a live one.

  7. Giles C Silver badge

    I know someone who did that

    As400 admin demonstrated the command format to a junior but with a 30 minute delay then hit enter instead of cancel.

    However the machine would not let then cancel the shutdown request and so frantic calls went out informing of an impending loss of service.

    The worst was the restart time was up to an hour in those days….

  8. Ian Johnston Silver badge

    I once typed "SHUTDOWN" into the Vax 11/780 of the university engineering department where I was working to see what checks it would make. Answer: none. As I found out immediately, any user could shut down a system which typically had 20+ users and long batch queues. Oops. Still, i was forgiven and even thanked for finding a problem which they had been able to fix.

    A fortnight later some friends in another research group asked me if it was true that I had shut down the departmental computer and, if so, how. "I'll show you", I said. "It's quite safe, because they've fixed it."

    No they hadn't. One formal written warning later ...

  9. NXM

    "professional services"

    Is this a euphemism?

  10. Anonymous Coward
    Anonymous Coward

    I did disable the network on a virtual machine in the cloud... never could get back into it. Fortunately it was only a test box so didn't matter that it went missing.

    1. Yet Another Anonymous coward Silver badge

      Is that a Quantum computer?

      A computerin a closed rack that you can't observe - is it alive or dead?

  11. bemusedHorseman
    Mushroom

    Ah yes, Wrong Window Syndrome... Are you truly an I.T. Guy™ (gender neutral) if you haven't done something like this at least once?

    1. Doctor Syntax Silver badge

      There are a number of - shall we say unintended uses of admin privilege? - in that category. With luck exposure to one provides immunity to all. At least until the next time.

  12. Anonymous Coward
    Anonymous Coward

    Have done this.

    And learned my lesson.

    Now when I'm in your data centre, you power it off.

    Which has lead to plenty of ohnosecond duration moments when they sysadmin, manager etc has pulled the wrong power cable, pushed the wrong switch or somehow shit down the wrong box.

    1. Yet Another Anonymous coward Silver badge

      Re: Have done this.

      It's Dell's fault for making them all look the same.

      At least with SGI you knew to turn off the slightly-mauvish-purple one

      1. Richard 12 Silver badge

        Re: Have done this.

        Just turn off the black one.

        Using the black button lit by a black light.

        It'll flash to let you know it's been pressed.

  13. IGotOut Silver badge

    We did this a couple of times..

    ...when applying updates.

    We would apply update to hot standby, reboot (yes Windows) Flip over, if all good, update main system, reboot, flip back.

    After a couple of fuck ups (note there was a lot of dodgy hacks to get these hot standbys running, as the manufacturer told us it was impossible to do, but we did) we came across the genius idea.

    As we had to do the work via RDP, simply changing the desktop colours completely eliminated the "Which server am I on?" Issue.

    1. Anonymous Coward
      Anonymous Coward

      Re: We did this a couple of times..

      Years ago I worked as a techie for a company providing paging services and was able to watch as they hit a bug in the software that put the whole network off the air

      The sales guys added new customers to both the live and hot-standby system and had to allocate the pagers to up to 20 zones using single, comma separated or a range as input. Unfortunately there was a bug that meant that when one person typed '1-200' it overflowed a buffer and halted the live system... and crashed the hot-standby as it executed the same input!

  14. storner
    Mushroom

    Not entirely a mistake, but ...

    Just as "security" had begun sneaking into the minds of IT people - 1998 - I was working for a very young start-up doing security testing. We had a bunch of tools to perform various tests trying to wriggle information from systems, send mails via systems that shouldn't, work around firewall rules (if there were any) etc.

    And a few tools to perform destructive tests like shutting down systems. We didn't use those unless the client specifically asked us to do so and authorized it - in writing.

    One day I was working on-site at a customer with a large dinosaur of an IBM system, trying to make my way in. The sysadmins were quite smug about this "security test", and I was supposed to run the full set of tests - including the potentially destructive ones. So late in the afternoon I dig into that section of the toolbox and begin poking around the network interfaces using SNMP. The server joyfully provides all sorts of interesting info - the configuration, IP-adresses of systems it is connected to etc. Okay, let's see what else is possible - we had been authorized to try the potentially destructive tests, so I fired off the SNMP "write" command with the default password to switch off the primary network interface.

    Which it did.

    Quite a bit of frantic activity followed to get the system back online. I just leaned back and pulled out the (virtual) popcorn.

    Another big-iron experience - mid 1990's - was when connecting one of those newfangled "unix" systems at a branch office to the REAL mainframe computer via an X.25 connection. All went well until we should try sending some data: The mainframe crashed hard enough that a full IPL (reboot) was needed. Turned out there was a bug in the mainframe X.25 comms stack which the unix system accidentally triggered.

  15. Anonymous Coward
    Anonymous Coward

    oof

    I pulled the same trick as Bill, except I intended to do a restart, not a shutdown, and accidentally chose no restart. Fortunately the system was in the same room as the terminal.

    Unfortunately, the power switch on the old AS/400 systems required an alarming amount of force to convince the system to turn back on. Not wanting to make a bad situation worse, I assumed something was wrong, and quit trying to force it lest I break the switch. Time for a call to IBM tech support.

    I'm sure the support tech is all laughing about the ticket he closed by telling the PFY to "press harder".

  16. Jou (Mxyzptlk) Silver badge

    Yes I have

    That is when you call the customer, and ask to be so kind to hit the switch. Similar for firewalls, when you adjust the config - and luckily you cannot hit save the config if you lock yourself out.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like