back to article Your trainee just took down our business and has no idea how or why

Welcome once more to On Call, The Register's Friday foray into tech support memories contributed by you, our much-appreciated readers. This week, meet a Reg-admirer we'll anonymize as "Leanne" who once worked as a senior architect in a mainframe environment – a role that saw her keep an eye on several of internal chats in …

  1. simonlb Silver badge
    Facepalm

    Whoopsie!

    You'd think the vendor would have had the forethought to tell the trainee not to do something unless they were certain they knew what they were doing, but more importantly, to check with the client first to see if it was ok to actually do anything.

    Personally, I once had a brain fart and shutdown a node of an Exchange 2003 cluster. Fortunately, it failed over to the other node as it should have done and I had to connect to the ILO and power the server back on.

    1. Anonymous Coward
      Anonymous Coward

      Re: Whoopsie!

      We looked after a small operations servers (machines on customers site, in broom cupboard, connected via leased line) and there were a stack up updates to install which required multiple reboots (manual and automatic) so it was decided that the updates would be applied over a bank holiday weekend when servers pretending to be yo-yo's wouldn't impact staff.

      The first couple went fine, connected in using VNC applied the OS updates (auto reboot) and then applied software updates (manual reboot).

      3rd machine went fine until the manual reboot, "Start", "Shutdown" "Enter"..... Shit, that said shut down not restart!!!!

      And that is how I ended up having to drive 200 miles to press a power button

      1. Bent Metal
        Happy

        Re: Whoopsie!

        >And that is how I ended up having to drive 200 miles to press a power button

        ...and in a perfect world, a followup discussion about the upgrade led the clients to use servers with a baseboard management controller, like HP's iLO or Dell's DRAC; so (almost) everything can be done remotely.

        For an extra fee to the server vendor, of course.

        1. Anonymous Coward
          Anonymous Coward

          Re: Whoopsie!

          I don't think such things existed (affordably) at the time (1997/1998)

          from memory the servers specs were..

          dual Pentium II 350 mhz (newly released)

          ATX Power supply (brand new tech at the time)

          dual 9.1Gb IBM Ultrastar SCSI hard drives (top of the line at the time)

          128 Mb ram (I think it was still 72 pin dimms)

          Windows NT 4

          Thinking back I'm fairly sure it was the upgrade from exchange 5.0 to 5.5 combined with installing a service pack that was happening.

          1. ShortLegs

            Re: Whoopsie!

            ILOs existed for Compaq ProLiants in 1997/98... I specced all my servers with them, and still have an original from "the day"

            1. Tom Chiverton 1

              Re: Whoopsie!

              Or power strips you can telnet too.

            2. An_Old_Dog Silver badge

              Re: Whoopsie!

              Let's also remember the software associated with the iLO boards, whom the server techs dubbed, "Compaq InFright Manager".

          2. Anonymous Coward
            Anonymous Coward

            Re: Whoopsie!

            1997... SDRAM DIMMs were a thing back then. 72 pin SIMMs existed, but not 72 pin DIMMs.

        2. J. Cook Silver badge

          Re: Whoopsie!

          Yup. [RedactedCo] also has some fancy power strips that the servers are plugged into controllable via the KVM the servers have, so if the server is hard locked along with the iLO/Drac/CIMC/etc., the "unplug it and plug it back in" of last resort is still possible. (which I've had to do on at least one Cisco UCS box, annoyingly- the machine had purple-screened, and somehow managed to take the CIMC down with it.)

      2. Alan Brown Silver badge

        Re: Whoopsie!

        If what youre doing remotely is critical enough to need to be done after hours, it's critical enough to require the client have someone onsite to push reset if necessary

        1. John Brown (no body) Silver badge

          Re: Whoopsie!

          It very much depends on the client. Just having a server and remote support doesn't mean they have money to burn and actually understand or care about resilience. At least not until something bites them hard enough to do the downtime versus investment calculation in favour of the investment. Remember, the OP was talking about the later 90's. There were lots of ad-hoc networks that had "just growed" and may never have been designed, especially in smaller companies where the "server" sat in a broom cupboard and might only ever have been intended to server half a dozen users when it was installed.

          1. nintendoeats Silver badge

            Re: Whoopsie!

            I have a counterpoint to this, which I'm afraid I must heavily anonymize.

            Friend: "Most of our customers carefully plan out their infrastructure, with years from acquisition to full rollout. However, there is ONE HUGE client that is from another industry and they basically just build stuff at random".

            Me: "Who?"

            Friend: "I really can't tell you, you know, security and such."

            Me: "It's definitely $MASSIVE_WELL_KNOWN_COMPANY."

            Friend: "...I said I can't tell you that you are correct."

      3. The Oncoming Scorn Silver badge
        Pint

        Re: Whoopsie!

        This is why I remove the Shutdown option on the Start Menu for my machines at home that I remote into when travelling.

        Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\PolicyManager\default\Start\HideShutDown

        On the right pane, change the "value" name to a 1 value.

        1. JR

          Re: Whoopsie!

          I put a logoff shortcut on the machine's desktop and a reboot shortcut in a 'reboot' folder on the c drive (to keep accidental reboots/shutdowns from occurring).

      4. yetanotheraoc Silver badge

        Re: Whoopsie!

        Thus proving the wisdom of waiting for the bank holiday weekend.

      5. Anonymous Coward
        Anonymous Coward

        Re: Whoopsie!

        I had my boss machine BIOS programmed to power up if juice was ever cut to it.

        Miraculously, the Dell power supply and ATX standards on that BIOS complied, which meant several trips were averted when I asked people to just power cycle the mains and VPN was made once again available when he was locked down at home during COVID craze.

    2. Dave K

      Re: Whoopsie!

      Agreed, this is definitely the primary fault of the vendor. You could argue that the rookie/trainee should have asked for help instead of pressing the big red button, but the point is if you give a trainee the power/ability to do this then don't provide sufficient support for them, you're asking for disaster to occur sooner or later.

      1. Doctor Syntax Silver badge

        Re: Whoopsie!

        And yet at some point the trainee has to fly solo. Nobody said it was easy.

        1. Anonymous Custard
          Headmaster

          Re: Whoopsie!

          Arguably though, when they're ready to fly solo, they are no longer trainees...

          The key point is enough supervision to understand properly when they reach that point, and not rushing it.

          That latter one of course being an alien concept to the beancounters and general manglement sometimes, who only see headcount and not knowledge or experience (let alone capability and competence).

          1. Zippy´s Sausage Factory

            Re: Whoopsie!

            That latter one of course being an alien concept to the beancounters and general manglement sometimes, who only see headcount and not knowledge or experience (let alone capability and competence).

            This is why there seems to be an uptick in job adverts that require ten years experience and a PHD for an "entry level" position...

            1. Anonymous Coward
              Anonymous Coward

              Re: Whoopsie!

              there seems to be an uptick in job adverts that require ten years experience and a PHD for an "entry level" position

              I put that down to "lazy" recruiters and "fake it till you make it" types in the industry. I reckon it goes like this :

              Advert: Required: Experienced in (e.g.) Windows Server admin

              Applicant: Well, I once put the CD in, clicked OK to all the questions, and got a "working" server at the end (i.e. it booted up to a login screen). That's "experience".

              Interviewers: (when ranking interviewees) Well they were a bunch of *****s weren't they, we need to be recruiting decent candidates.

              Feedback to recruiters: They were a waste of time, we asked for experienced Windows Server admins.

              Recruiters: Well we asked for experience, it's clear candidates are exaggerating - so lets up the requirement to compensate.

              Next job advert (in 2020 say): Required: 10 years admin experience with Windows Server 2016.

              Me: (And I assume many other autistic (mostly I guess aspergers) people who gravitate to IT) look at the adverts, take them at face value and don't apply as I can't tick all the boxes - we know the agency will filter based on ticking boxes. And on the rare occasion we might manage to get an interview, we might struggle to avoid a meltdown, and in any case will tend to fail by answering the questions asked instead of the questions some strange neurotypical person thought we ought to be able to mind read were the questions they didn't ask but wanted answers to !

              1. Anonymous Coward
                Anonymous Coward

                Re: Whoopsie!

                My son falls on the spectrum and has quit 2 jobs because his bosses did not understand his need to do the job properly and to industry standards, rather than half assing it like their client wanted it.

                At the last job, his new manager took him into a meeting room, berated him because he would not do the job the way the client wanted it (knowing that the client's methodology would screw the network completely), and then when my son stood up and left because he was not comfortable, followed him into the office and continued accusing him of all sorts of things. HR completed his resignation papers because he would not return to work for that manager. I have since heard that manager has an unusually high staff turnover rate.

                1. nintendoeats Silver badge

                  Re: Whoopsie!

                  I'm NOT on the spectrum (I think) and I'm currently unemployed because I left my last job for essentially the same reason :/

            2. Not Yb Bronze badge

              Re: Whoopsie!

              Also, five years of experience in ____, where the blank is filled by something that's only existed for 2 years.

          2. Doctor Syntax Silver badge

            Re: Whoopsie!

            "The key point is enough supervision to understand properly when they reach that point, and not rushing it."

            Nobody said that was easy.

    3. heyrick Silver badge
      Happy

      Re: Whoopsie!

      "Fortunately, it failed over to the other node"

      Which means... nope, totally not a brain fart. You were conducting a resilience test to ensure that the system is working as it is supposed to.

    4. Vometia has insomnia. Again. Silver badge

      Re: Whoopsie!

      You'd think the vendor would have had the forethought to tell the trainee not to do something unless they were certain they knew what they were doing, but more importantly, to check with the client first to see if it was ok to actually do anything.

      You'd think so, but I have experienced (and witnessed) that particular form of genius management whose idea of training is to yell at the new kid to get it sorted out and to not embarrass anyone by asking "stupid" questions. Which tends to result in chaos, predictably. Well, predictable to anyone but that particular type of manager. The article kinda reads like that, and it might explain the outcome, too...

  2. This post has been deleted by its author

  3. aje21
    Facepalm

    Leanne's console told her all was well with the big box

    If the only storage for the mainframe was offline yet "Leanne's console told her all was well with the big box" then it tells you there was something not right in how she was monitoring things.

    1. Anonymous Coward
      Anonymous Coward

      Re: Leanne's console told her all was well with the big box

      Unless Leanne's employer was the Post Office.

      1. Anonymous Coward
        Anonymous Coward

        Re: Leanne's console told her all was well with the big box

        Unless Leanne's employer was the Post Office.

        In which case it would report the storage not offline, but in fact embezzled by an octogenerian postmistress in the Outer Hebrides...

    2. Robert Carnegie Silver badge

      Re: Leanne's console told her all was well with the big box

      I assume that errors were to be logged on the DASD. DASD goes missing = no errors are logged?

  4. Aladdin Sane

    I hope the kid wasn't sacked.

    1. wolfetone Silver badge

      We've seen enough of On Call to realise no one has protection from the hurt ego of a manager.

      I hope they weren't sacked, but I would imagine an embarrassed manager would have gleefully signed the kid's P45.

    2. chivo243 Silver badge
      Happy

      I learned early on, hands in pockets when unsure of which button to push... ask a senior engineer for a second set of eyes on my plan.

  5. batt-geek

    so the kid had enough privileges to take the storage offline but not enough to bring 'em back online - that in itsself is somewhat dumb to my way of thinking

    but no getting away from the fact the kid should never have been let loose unsupervised...

    1. b0llchit Silver badge
      Angel

      The "privilege" for taking something offline is rather easily obtained. You can also do it with a screwdriver. However, bringing it back online... that is a different story.

      1. Anonymous Coward
        Anonymous Coward

        > You can also do it with a screwdriver

        You need to mention that to El Reg readers?

        (Cough) grandmothers, sucking eggs (cough)

        1. Doctor Syntax Silver badge

          Other persuasive hand and power tools are available.

    2. AustinTX

      And I bet that afterwards, they made absolutely no changes to the privilege heiarchy.

    3. John Brown (no body) Silver badge

      "so the kid had enough privileges to take the storage offline but not enough to bring 'em back online"

      As other have said, shutting something down can be easy. Bringing it back might be a much more complex task. The kid likely had privileges to do both, but the not the skills to do the latter.

  6. Dave Schofield

    A couple of decades ago, we were showing the new DBA around the datacentre and he logged onto the production server console - he hadn't got his laptop yet - and checking out his permissions, etc. He went to logout, but instead of clicking on "logout" the idiot moved the cursor to "shutdown" and we weren't fast enough to stop him. Luckily it was a cluster and failover worked so users had a temporary blip. He never really lived that down...

    A few years later, at a different vendor, a hardware engineer was walking towards the DC door carrying his toolbox and various spare parts and reached out to press the door unlock, but for some reason known only to themselves, an idiot had put the emergency power down near to the door - and the engineer pressed that by mistake. Cue suddenly silent DC and sounds of panic as the helpdesk was swamped with calls from 5-6 councils and a few businesses...

    1. Mishak Silver badge

      Also fire alarms

      I've not seen that with the emergency power, but I've seen loads of fire call buttons next to light switches near doors.

      Reach round frame to turn lights on... oops...

      Not so bad if they have Molly Guards.

      1. Doctor Syntax Silver badge

        Re: Also fire alarms

        Hospital toilets are apt to have similarly placed alarm buttons or, even worse, pull switches which look like light switches. Based on observation rather than experience, I should add.

        1. Bluecube
          FAIL

          Re: Also fire alarms

          Also hotels. I was assigned a disabled room in a particularly posh hotel. I pulled a light cord in the bathroom which didn’t work because as it (eventually) turned out it an alarm cord. When reception called to check everything was ok, I then proceeded to complain to the confused receptionist about the lack of free champagne I’d been promised…

        2. Anonymous Coward
          Facepalm

          Re: Also fire alarms

          This is especially fun when you're the only person in the building who ever bothers to respond to the panic alarm. *Check panel to see which room has been activated, traipse over there, shout through the door to check the occupant is okay, then loiter outside until they're done so you can deactivate the alarm*

          (This is made ten times more irritating as all the rooms with panic alarms have their lights activated by motion sensors...)

    2. Doctor Syntax Silver badge

      "silent DC and sounds of panic as the helpdesk was swamped with calls"

      Providing you can power up quickly enough the trick is to tell them not to answer the phones for a few minutes then tell them to check again, it's all runng here so it must have been a comms failure at their end. Providing the customers don't get together to compare notes.

      1. Diogenes8080

        I believe that a hasty attempt to reverse an inadvertent power-down was what caused The Great Fire of Hounslow whereby British Airway's global booking system irretrievably desynchronised itself.

        Was that the 2017 event or am I thinking of another occasion?

      2. Gene Cash Silver badge

        So you're saying you work for Amazon AWS then...?

        1. Doctor Syntax Silver badge

          I'm saying I've encountered hell desks.

    3. Anonymous Coward
      Anonymous Coward

      and that's why you always have a flip up plastic cover over those type of buttons!

      I did manage to take the whole of my college campus down back in the late 80's I was doing electrical engineering and for a bit of course work we had to go in to the main intake room for the campus and sketch and label what we found. The room was pretty small and about 6 of us crammed in with our A4 notepads, I came across the main breaker, a big old lever type that looked like a one armed bandit, I knew what it was. On the side of said breaker there was this little flapper style switch, to this day I don't know why I did it but I thought "whats that" and pressed it CLUNK as the breaker tripped, yep it was the test button for the breaker! All the power to the campus went, lecturers came out of teaching rooms wondering whats going on. thankfully we were a close bunch of lads and my mates said I knocked the switch with my note pad. For the rest of the college term the bell was out by about 10mins as that's how long they took to get the power back hahahahahah

      1. adam 40 Silver badge
        Mushroom

        When I was at school doing Physics pre-O-level, I was playing with the on-bench mains sockets, as one does, put a big nail into the earth and a compass into the live, and joined them together.

        My teacher was watching me do it, apparently...

        Then I threw the switch on the socket and BLAM! big flash and the power went out.

        The scary head of department then appeared about 10 seconds later and demanded to know why the power was out.

        I don't know why to this day, but luckily the teacher gave some excuse, and the head went away.

        I got a minor bollocking, which was all good in retrospect. I went on to get A's in Physics (and more besides) then A's in Physics A level, being taught by said head of department, plus S level too, an AS Electronics. And then a degree in Electronics/Computing and the rest is history.

        I stilll have the compass somewhere with the weld mark in it.

        1. Doctor Syntax Silver badge

          I hope someone looked at why the power circuit fuse for the bench failed to protect the rest of the power.

          1. John Brown (no body) Silver badge

            Hah! It was a school, at least pre-1988 (approx end of 'O' level, start of GCSEs)

            1. Anonymous Coward
              Anonymous Coward

              yerp I was in the last year to do 'O' Levels and CSE's class of 87

        2. John Brown (no body) Silver badge

          "I got a minor bollocking, which was all good in retrospect. I went on to get A's in Physics (and more besides) then A's in Physics A level"

          If he was a good, or even just decent, teacher, he'll have already marked out those with ability, curiosity and interest and will protect and nurture them (and all of their student, of course, but especially those likely to do well) :-)

      2. yetanotheraoc Silver badge

        Fatal attraction of finger to button

        "to this day I don't know why I did it"

        I think this is why people invent theories of the devil, as in "the devil made me do it". The simpler explanation is the brain is a complex thing and the rational part is not always in control.

    4. Alan Brown Silver badge

      This is why EPOs should be covered with a flap. It's sometimes tempting to simply demonstrate why instead of just saying it

      1. Doctor Syntax Silver badge

        You're thinking that would make it foolproof. You'll just discover a better grade of fool who knows the flap is there to protect the sight switch or the door release or whatever it is he's looking for.

    5. chivo243 Silver badge
      Coat

      Cue suddenly silent DC

      I've silenced a DC... it's deafening, rally. Once all those fans and spindles stop, it's eerie.

    6. Anonymous Coward
      Anonymous Coward

      One of the MOD data centres kept getting shutdown for this reason.

      It was also the data centre which didn't have another to provide redundancy. (MOD being cheapskates).

      It was the data centre with the big wigs as it had the BES Servers....

      1. An_Old_Dog Silver badge

        A Horde of Angry Monkeys

        ... unable to access email via their CrackBerries is a fearful thing.

    7. Ozmosis

      Hope no-one is colour blind!

      Yup, we had that at a DC I worked in. Red, yellow and green buttons all side by side. Green did the door, red shut off the power, and I genuinely can't remember what the yellow did. Once the power to the DC got killed a couple of times, the button was moved and covered with a shroud.Professional DC designers were paid to come up with this sh*t....

      1. John Brown (no body) Silver badge

        Re: Hope no-one is colour blind!

        "Professional DC designers were paid to come up with this sh*t...."

        But, like architects, it's not *always* their fault when some builder or sparky or plumber decides to do things the "usual" way instead what is stated on the drawings.

  7. Mark White
    Facepalm

    VM Network

    I was doing some investigation on a cloud hosted VM to see why it wouldn't connect and for some reason thought it would be a good idea to turn off the network.

    Yeah, never spoke to that VM again, quickly spun up a new one and didn't try that again.

    1. Anonymous Coward
      Anonymous Coward

      Re: VM Network

      You've not lived until you do that to a physical server and have to get someone to drive there and turn it back on.

      1. collinsl Bronze badge

        Re: VM Network

        Had to do that a couple of times for some kit which wasn't responding at all over the network. Drive 45 minutes there, 15 minutes to get through security, climb the stairs to the data hall, walk in to the aisle, hold down the power button, wait for it to start, someone logs into it remotely, turn round, walk out, 10 minutes to get out of security, 45 minute drive back to the office.

        1. John Brown (no body) Silver badge
          Facepalm

          Re: VM Network

          Boss: Next one of you to remotely shutdown a device instead of restarting gets to go there and manually turn it back

          Staff: But boss, I don't have a car!

          Boss: So? You'd better be extra careful then!!!

      2. This post has been deleted by its author

    2. Anonymous Coward
      Anonymous Coward

      Re: VM Network

      Closest I came to that was a thin client.

      They rolled out an automatic update, and for some reason on my client it shutdown instead of restarting.

      Contact Helpdesk: "Oh that client is handled by team X, you'll need to contact them instead.

      Contact TeamX: "You'll need to log into the cluster and turn your client on through vsphere."

      Me: "I can't"

      TeamX: "Here's some instructions from the net."

      Me: "No I literally can't, I can only access that through my client.

      TeamX: "And?"

      Me: "My client which is currently shut down, that I'm trying to restart."

      TeamX: "Oh, right."

      It wouldn't have been as bad if this only happened once.

  8. The Dogs Meevonks Silver badge

    A little knowledge is dangerous in a user

    I'm sure we've all had users who consider themselves 'tech savvy' and like to think that they know what they're doing... Only to start poking around in things they shouldn't and screwing everything up.

    Thankfully most of mine have been dumb users who tried to adjust settings in windows and screwed it up.. like the user who back in the late 90's and NT4 days wanted a quicker refresh rate and higher resolution on their basic low end CRT monitor.... then wondered why the screen turned into a big fuzzy garbled mess. Or users installing software and using company networks to download dodgy stuff.

    It wasn't 'all' the users fault though... we'd been arguing for tighter control over user systems for months after the first malware incident from a rogue user with a tiny bit of knowledge. The dept heads had first ignored, then resisted making changes to the whole network and a couple hundred systems.

    That was until a user infected a system in the payroll dept causing delays to theirs and everyone else's wages being paid on time.

    1. ShortLegs

      Re: A little knowledge is dangerous in a user

      We have a few IT people who consider themselves "network savvy", to CCIE levels of savviness. Despite not having a CCNA between themselves.

      In reality, they are DBAs and the ilk, and are CCFAs

      1. The Dogs Meevonks Silver badge

        Re: A little knowledge is dangerous in a user

        Networks are my achilles heel... For some reason I just can't wrap my head around them and struggle with even simple things on my home network... It's been like this for 20yrs or more.

        I once got very angry just trying to make a WinXP and W2K system talk to each other on my home network. No amount of twiddling and tweaking, rebooting and adjusting this would work. After a few hours of back and forth every now and again... as well as getting advice from a friend who WAS a decent enough expert... I gave up and went to bed.

        The following day... everything worked just fine... I hadn't even rebooted the systems.

        Just the other day, my broadband provider was changed to my new one... I currently have to use 3 powerline adapters to get it up to my office and pick up the wifi signal from my solar inverter and battery system.

        2 reconnected just fine... I even managed to reset my mesh router upstairs (which thankfully was a slightly older version of the new ISP supplied one (Fritzbox 7530AX).

        But that 3rd adapter for the garage... took me almost 2hrs to get working again.

        Maybe I'm just getting old and being a near middle aged man... have no desire to start learning new shit.... he mumbles under his breath whilst looking get a Rasp Pi to set up and run home assistant to monitor cheap electric rates to auto charge the battery at night.

    2. ShortLegs

      Re: A little knowledge is dangerous in a user

      Ot the two MCSE's who, looking at a recently built pair of NT4 servers with two NICs apeice decided that the "forward IP packets" should be enabled (cant recall the exact wording of the opton, this was 1998)

      Cue havoc as RIP updates were injected into an OSPF routing environment. MCSE - Must Call Someone Experienced

      1. GlenP Silver badge

        Re: A little knowledge is dangerous in a user

        That's reminded me of an incident at a former employer where we were running 10BASE2 ethernet for PCs running 5250 emulators. I inherited the infrastructure which was basically the maximum 3 segments - 1 ran from the computer room to 2/3rd of the way along the main building, the second from there to the end of the building and across to a separate office block then the third in that block. The separate office had another group company in who had their own server of some kind (it wasn't my responsibility) but they then started complaining about having a lot of network problems.

        After a lot of investigating we found that they'd wanted to add some extra PCs so their server support people had created a fourth segment using a second NIC in the server, without bothering to check whether this was acceptable, with forwarding so they could access the AS/400. My response was to unplug the extra segment and tell them to add to the existing segment properly, not bodge the job.

    3. Doctor Syntax Silver badge

      Re: A little knowledge is dangerous in a user

      "until a user infected a system in the payroll dept causing delays to theirs and everyone else's wages being paid on time"

      At which point, no doubt, it became your fault for not "doing something about it".

      1. The Dogs Meevonks Silver badge

        Re: A little knowledge is dangerous in a user

        This is why you create a paper trail... or in my case... emails repeatedly putting forward the recommendation to lock down user accounts to prevent programs being installed and maybe even disable USB ports

    4. An_Old_Dog Silver badge

      Re: A little knowledge is dangerous in a user

      When money talks, management listens.

  9. Anonymous Coward
    Anonymous Coward

    Fastest firing after major prod cockup....

    The fastest firing I've ever seen was about 90 mins after a problem. Overnight callout and one of the contract admins was asked to rollback the overnight prod batch database, they accidentally restored the test over the prod DB.

    1:15am - Callout

    1:20am - Restore started

    1:55am - Restore finished

    2:00am - Batches restarted

    2:15am - Problem found by data analyst

    2:25am - Issue identified as wrong restore

    2:42am - Contractor sacked on phone by systems manager

    3:00am - I and a colleague get called out to fix it!

    1. yetanotheraoc Silver badge

      Re: Fastest firing after major prod cockup....

      2:43am - Contractor sleeps like a log for the first time in ages

    2. An_Old_Dog Silver badge

      Re: Fastest firing after major prod cockup....

      "I know what I mean when I say, 'restore'. What do you mean when you say, 'restore'?"

  10. phuzz Silver badge

    Has an ignorant kid broken your boxes?

    Yes. It was me. I was the ignorant kid.

  11. Hopalong

    YTS and the IMPL button

    We once had an YTS (Youth Training Scheme) lad who managed to initiate the Microcode load process on an IBM Mainframe

    He was yacking on the phone and for some reason known only to himself, was gently tapping this little blue button on the console, which went 'click'.......

    A couple of days latter, a button guard appeared over the button.

    1. An_Old_Dog Silver badge

      Re: YTS and the IMPL button

      But that button was such a fascinating shade of blue and naturally attracted the tapping back-end of a ballpoint pen ...

  12. Anonymous Coward
    Anonymous Coward

    we were a netware 4.11 shop late 90's early 2000. I was introducing windows 2000 AD and had a few boxes up. My boss was a old school mini computer and Netware guy, so I was showing him RDP into one of the w2k boxes, he decided to have a play, finished and then went start button shutdown, opps. Lucky the server room was just across the corridor and only a few users were migrate over to w2k at the time.

    1. Admiral Grace Hopper

      2000 AD

      Zarjaz!

      1. Return To Sender

        Re: 2000 AD

        Squaxx dek Thargo

    2. An_Old_Dog Silver badge

      "Doublefuck!! Wrong [rebooted|powered-off] Computer!"

      BGINFO and color-coded screen background colors are your friends. Red for PROD, yellow for ADM, green for TEST, and blue for DEV.

      1. I could be a dog really Bronze badge

        Re: "Doublefuck!! Wrong [rebooted|powered-off] Computer!"

        You do know two interesting statistics don't you ?

        "IT" is a very male dominated world.

        1 in 7 males is red-green colourblind. I am one of them. And this is also why well designed systems also use things like relative position (e.g. for traffic lights*, red is the top one, green is the bottom, so even if they all look the same then you know which is which) or shape.

        The relevance of those two facts to using red & green for "safe" and "dangerous" should be obvious.

        As an aside, some years ago I found myself working for a very short time on a ship built in the 50s in Italy. These days, it's universal on starters** for green to mean start/on and red to mean stop/off. On this ship, green meant stop/off and red meant start/on - which if you think about it makes more sense as "red for start" correlates better with "red = dangerous" than having "green = safe" while having "green = turn some body mangling machine on".

        * UK ones at least.

        ** Generic name for controls, typically used with motors, where pressing a button turns it on with a clunk as the internal "relay" pulls in, and pressing a different button turns it off.

        1. Giles C Silver badge

          Re: "Doublefuck!! Wrong [rebooted|powered-off] Computer!"

          Yep another colourblind person here.

          Mind you came across a dashboard that had been built which used dark grey on dark blue to indicate that something was selected. So had a complaint about it on a call. They didn’t seen inclined to fix it until I decided to say I would make a discrimination complaint with HR for designing a system which was not usable…

          Not that I was going to make such a complaint but it shows a lack of understanding.

          As a for network ports with green / amber lights depending on settings……

          1. ricardian

            Re: "Doublefuck!! Wrong [rebooted|powered-off] Computer!"

            Me too! BT "support" asked what colour lights were showing on the router and got annoyed when I said I couldn't say for sure

          2. An_Old_Dog Silver badge

            Re: "Doublefuck!! Wrong [rebooted|powered-off] Computer!"

            I had a yellow/green colourblind co-worker tell me over the phone that the network LED on the back of a racked server was on.

            Me: "Yes, but is it green, or yellow?"

            Him: "It's ON."

            Me: "Oh, right, sorry."

        2. An_Old_Dog Silver badge

          Color-Coding

          As a child, I always had wondered why industrial motor controls -- "Square-D" brand -- were color-coded red and black, vs red and green.

          I also couldn't work out which was which: is red for danger I would be creating by turning something on, or was red to stop the motor in case of danger?

        3. nintendoeats Silver badge

          Re: "Doublefuck!! Wrong [rebooted|powered-off] Computer!"

          Just a point, traffic lights have special filters on them so they can be clearly distinguished by colorblind people (relative position isn't that useful when it's night time, and you can only see one light).

          The sole exception in the entire universe is the traffic light at the exceptionally cursed offramp from the A15 Decarie Avenue to Decarie Boulavard in Montreal.

        4. Giles C Silver badge

          Re: "Doublefuck!! Wrong [rebooted|powered-off] Computer!"

          Just listening to an episode of Tom Scott’s Lateral podcast.

          There is a set of traffic lights in Tipperary Hill New York where the colours are inverted, this is due to a lot of people coming from Ireland they kept vandalising it due to the colour order as red (English ) was above green (Ireland)

          https://www.youtube.com/watch?v=1QGDH6dE35c

          Weird but true

          1. jake Silver badge

            Re: "Doublefuck!! Wrong [rebooted|powered-off] Computer!"

            It's one traffic light, and it's in Syracuse, New York, hanging over the center of the intersection at Milton Avenue and Tompkins Street.

            That part of the city is in the westside, and is known as the Tipperary Hill district, named after Irish immigrants from that part of Ireland. The inversion supposedly happened in the 1920s, when Syracuse first installed the light. Legend has it that it was broken several times before the City Council capitulated to the vandals, but there is no proof of this in the town records, and indeed no proof that it existed before the mid 1940s. Here's a picture of it, with a nice Irish flag hanging in the background:

            https://www.google.com/maps/@43.0465423,-76.1855833,3a,90y,333.01h,131.61t/data=!3m7!1e1!3m5!1saAh7TKat-q4J-RZo1RccmQ!2e0!6shttps:%2F%2Fstreetviewpixels-pa.googleapis.com%2Fv1%2Fthumbnail%3Fpanoid%3DaAh7TKat-q4J-RZo1RccmQ%26cb_client%3Dmaps_sv.share%26w%3D900%26h%3D600%26yaw%3D333.00591609534223%26pitch%3D-41.61353032466428%26thumbfov%3D90!7i16384!8i8192?coh=205410&entry=ttu

            The flag is at the district's Irish Heritage Center.

      2. Trygve Henriksen

        Re: "Doublefuck!! Wrong [rebooted|powered-off] Computer!"

        No, you make the DEV background FLASH.

    3. Giles C Silver badge

      Someone did that on the first Citrix box installed where I worked, logged in and chose shutdown to exit not log off and took the whole box down.

  13. Bebu Silver badge
    Windows

    I thought it was just me...

    Having shutdown, reboot etc options on the same menu as logout always seemed brain dead to me as the two do two differents classes of thing - the first takes the machine down losing all volatile state whereas the second just ends a user's session.

    Having been caught out on my own systems and from pre GUI days I always perform shutdowns and reboots from the command line (even Windows always had that option.)

    Remote graphical sessions like RDP or Nomachine are an easy trap to fall into.

    If the trainee was from a service provider it might have made more sense for her to get some experience on something less formidable than an enterprise production mainframe.

    Unix/Linux workstations might be a good place to start and learn judgement and the names of things outside the mainframe bubble.

    If you went to your local PC hardware / parts supplier and asked for a DASD for your PC it would be poetic justice if you were sold a magnetic drum but you would likely also get a couple of K of core in a suitcase thrown in. ;)

    I have a lot of sympathy for trainees in general. In most organisations they seem to be assigned to the least competent or useful staff member who is often more clueless than the trainee. In this way I think a lot of talent is wasted and frequently lost. Some understandably turn to the dark side of the conveniently open window school of system administration.

    1. Mishak Silver badge

      I always perform shutdowns and reboots from the command line

      That didn't help a friend of mine who looked after CAD systems for a large automotive company.

      End of the day he did "shutdown -h now".

      His machine did not shut down and looking at the screen showed "ssh session terminated" - followed shortly after by all the support phones going off.

      1. I could be a dog really Bronze badge

        Re: I always perform shutdowns and reboots from the command line

        Ha ha. My very first introduction to WANs was when I worked for a small manufacturer of gift items, and we'd acquired another business. Cue setting up of network links between the sites - which was totally new to me. I was at our main site, while the "consultant" was there setting up his end - IIRC we'd been to a nearby warehouse and pre-configured the router there first. He's busy setting up IP address and the like, then I hear words to the effect of "oh dear" - he was connected to the console port of one router, but telnetted to the remote router and changed the Ip address of the WAN link. Luckily he was able to change the local end to regain communication, then fix the mistake.

        On Cisco kit, "reboot in x" (IIRC) can be a life saver - set it before making changes, and if you've screwed up and lost your ability to connect, wait for 10 minutes when it'll reboot and you are back to where you were. Still an outage for the clients, but much less of one than having to wait while you get in the car and drive an hour to fix it. Just remember to clear it when you've made the changes and not locked yourself out.

    2. Anonymous Coward
      Anonymous Coward

      Re: I thought it was just me...

      you can disable the ability to shutdown a windows box via GPO only allowing logout

  14. Anonymous Coward
    Anonymous Coward

    I did something similar as a rookie

    Back in the mid 80s I was working as a high-school intern on a proprietary computer system based on multiple 6509/6502 processors.

    It hosted dozens of terminals for the business. Accounting, warehouse, payroll, etc... The whole thing. My role was programming in BASIC for various subsystems.

    Eventually after 3 months I was given the responsibility to execute backups of the system which gave me access to the room where the computer resided (a closet).

    I was shown how to remove/insert the tape, how to package and stream the data to tape, etc. I was mesmerized by the wire-wrap bread boarded internals of the system.

    On the front of the machine was a switch labeled "Turbo and Normal". It was red.

    I inquired about the turbo switch and was told "DONT touch that". No explanation why.

    Day after day, week after week I looked at the switch. Eventually.... of course, I flipped it. Maybe two months I held off my impulsive internal voice.

    Nothing happened.

    Everything was OK. Terminal response was snappy. All was well.

    About 2 weeks later the system finally had a fault and crashed.

    I switched the switch back to normal to hide my malfeasance. But another day later we had to do a recovery from tape for the

    Arthur Andersen annual audit. It didn't work. The data was scrambled and unrecoverable. My sponsor/boss was super angry

    and mumbled something to the effect "This damn thing isnt reliable even in Normal mode, first they told us we couldn't use turbo and now normal doesn't work"

    And then,,,,, I knew what the issue with the turbo switch was. Did 18 year old self tell him I was responsible? No way.

    1. Phil O'Sophical Silver badge

      Re: I did something similar as a rookie

      a recovery from tape for the Arthur Andersen annual audit. It didn't work. The data was scrambled and unrecoverable.

      Ah, the days when people just put busy loops in the code to get timing intervals right for tapes...

    2. nintendoeats Silver badge

      Re: I did something similar as a rookie

      This sounds like such a nightmare frankenputer. The idea of such a system using a cluster of 6509s...

      I have programmed assembly for a 6509. There is joy in its simplicity, but also pain.

  15. Management Order

    We have all done it once...

    su

    cd olddir

    cp . newdir

    # get rid of olddir

    rm -rf .*

    # hey, where did my system go?

  16. Anonymous Coward
    Anonymous Coward

    Inconvenience is better than death

    I have taken down a number of sites when I have kicked exposed live wires against neutral wires after a mains socket has been removed -- presumably after getting damaged.

    I consider that a fair consequence for what they were prepared to accept. A 'choc-block' and some insulating tape or blanking plate would have avoided both. Fitting a replacement socket would have been the correct course but I accept that is not always immediately possible.

    1. I could be a dog really Bronze badge

      Re: Inconvenience is better than death

      Leaving live wires sticking out of the wall is NEVER an acceptable course of action regardless of limitations. As you say, at the very least some chock-blocks - which in extreme could be applied live if the circuit really couldn't be taken down for a few minutes.

  17. billdehaan

    Has an ignorant kid broken your boxes? Have they ever

    I've worked in the defence, finance, energy, transportation, medical, food, and general IT sectors over the past few decades, and almost every one of them has some variation of an "unsupervised new hire brings the company to a halt" story.

    Bank trading floor brought down by a new hire plugging in incompatible equipment? Check.

    Server room and business center evacuated because new hire thought the big red button was the "unlock the exit door" in the server room, when it was really the HALON fire system? Check. "Fortunately", the HALON actually malfunctioned on the new hire wasn't killed, at least.

    Run the "build a hex file from the source tree and copy it to the EMPROM programmer" scripts in wrong order, and accidentally overwrite the project's entire, and not recently backed up, source code base? Check.

    Start the test bench sequence in the incorrect order and start a small fire? Check.

    Send confidential (fortunately only embarrassing and not legally concerning) information out company wide by using REPLY-ALL and attaching the wrong file? Check.

    The details all differ, but the common problem was that an untrained and most importantly unsupervised new employee was given duties/responsibilities/access to resources far beyond their current state of knowledge, and/or training, and expected to have the same skill and knowledge as an experienced employee. In many cases, it wasn't even standard industry practices, but an in-house created, and usually arcane process that the company was convinced should be obvious and intuitive when it was anything but.

    In looking at the aftermath of some of these disasters, my reaction has been "well, what did you expect?". In one case, the poor new hire had to execute a script that included warnings like "Does the J: drive have enough free space?", and "Is the M: drive mapped correctly?". How the hell is a new hire going to know what is enough free space, and what the correct drive mappings are?

    In one case, the FNG (fricking new guy) was told to "run the script on the G: drive". When he asked what the script was called, he was told he'd know it when he saw it. He saw the script directory had half a dozen scripts with extremely similar names, picked the most likely one, and caused a near-catastrophe. In the end, it turned out IT had incorrectly mapped his drive letters, so his G: drive was mapped to a completely different system than it should have been. There was literally no way the poor guy could have even accessed the script he needed, he had no idea what it was called, and when he asked, he not only got zero help, he was called an idiot for not being able to figure it out.

    While most supervisors blame the new hire for not being omniscient and magically knowing undocumented corporate lore, there have been some good ones. The best response I ever saw in this situation was the new hire, having caused high five figures of loss because of his actions, fully expected to be fired by his manager. The manager's boss, the VP, interjected, and said "why should we fire you? Your manager just spent $80,000 training you!", clearly showing that he understood the real fault lay with the manager and the lack of guidance provided.

    1. Doctor Syntax Silver badge

      Re: Has an ignorant kid broken your boxes? Have they ever

      Realistically, the manager spent $80,000 raining himself. Perhaps he was told that in private.

    2. An_Old_Dog Silver badge

      Free Space on J:, G: Mapping and FNG

      When I was the FNG, I had (and asked) so many questions I could tell my supervisor was starting to become annoyed, until the day I asked, "How much free space is the \\SHARE1\SHR: volume supposed to have on it? It's lookin' ... pretty low, though I don't know --"

      Manager ran out of my office, yelling the the names of our two top engineers, and, "My office! Right now!"

      The answer was (a), "No, the volume did not have enough free space for the (ad-hoc, yet critical) business processes which depended on it and were due to automatically kick off in an hour and a half." and, (b) the source of the low free space was an unknowing doctor who "backed up" his PC by using Win3.X drag-and-drop to copy his PC's entire C: drive to his deparment's subfolder on the SHR: volume.

  18. Boris the Cockroach Silver badge
    Facepalm

    The kid did

    indeed break stuff

    Trouble with what I do is that not only things can be expensive to fix after being broken , but liable to eject large heavy objects spinning at 2000 rpm ..... which leads to the 'having a really bad day'

    So with that in mind, and with little joy, a previous PFY, having been given instructions on how load programs, howto set tooling, how to check the robot loader etc etc, and been closely supervised by myself and the mangler while setting a job up, decided to ignore the part on the bottom of the next job sheet that said "RUN IN SLOW MOTION TO VERIFY JOB" (actually it says that on every job sheet).

    And went 100% speed on everything and hit the go button.

    On the plus side the loading robot was completely untouched by the carnage.

    On the downside it cost £20K to fix the rest of the machinery.

    The PFY was duely grabbed by the griddlins and had his futtock roasted.

    1. collinsl Bronze badge

      Re: The kid did

      > The PFY was duely grabbed by the griddlins and had his futtock roasted.

      Hopefully you twisted his woggling irons as well until his nadgers scroped.

  19. QuiteEvilGraham

    Sounds unlikely.

    If IBM, varying devices offline is not quite the thing a trainee would enter at the console.

    Unless the site were idiots like a past employer where there were some v. dodgy commands defined to the ops console function keys.

    1. billdehaan

      Re: Sounds unlikely.

      I worked at IBM on contract in the early OS/2 days (as in, OS/2 1.x days). And while I have many (many, many, oh so many) criticisms of IBM, one of the things they did right, and better (in my experience) than any other large organisation, was the on-boarding process for new hires (and in my case, contractors).

      My first day, I was assigned a (very) small office, a phone, a short tour, and a map of the building floor I was on, highlighting the paths to most important things I needed to know: the fire exits, the washrooms, my group leader's office, the coffee machines, and the cafeteria.

      Most importantly, I was given a huge (200+ page) 8.5x11 inch binder of paper. Each page was to be read, and initialed that I'd read it, and agreed to it. There was a scratchpad where I was to write out any questions and/or objections. The binder included not only job duties and responsibilities, but restrictions, processes, and how to escalate questions. The overall tone was "if you don't know or understand, don't guess, ask".

      Being young, and this being early in my career, I thought this was silly, and overkill, as 90% of it was self-evident, or of the "well, duh" type of information that should be obvious.

      Later in life, when I saw the disasters at other companies because they didn't have a good on-boarding process, I understood the importance of it. It may well have been that 95% of that initial on-boarding was redundant or useless, but the 5% that prevented new hire disasters more than paid for itself over time.

      Of course, although everyone agrees that 95% of it is useless, no one could agree on which 95% can be cut, so it stays. Today, whenever I see one of these new hire disaster stories, I keep looking to see if any have hit IBM yet, but they don't seem to (although many other types of hits occur, certainly).

      This was 30 years (or more... sigh) ago, so it could well have changed, but back in the day, IBM's new hire on-boarding was the gold standard.

      1. Anonymous Coward
        Anonymous Coward

        Re: Sounds unlikely.

        You probably won't see this type of mistake from IBM now, simply because they spun technology services off a few years ago into a new company with a very forgettable name.

  20. Mishak Silver badge

    My most memorable mistake

    Working on motor drives I accidentally typed < (less than) when I meant > "greater than" when I was working on braking software.

    Testing involved spinning a "small" (1kW) motor up to speed in one direction and changing the direction - it was supposed to brake to a halt and then accelerate in the other direction.

    The typo meant it thought the motor had stopped as soon as the direction was changed, so full power was applied in the other direction.

    The motor leapt half a meter in the air at the same time as all the mosfets exited the controller at high speed.

    1. Phil O'Sophical Silver badge

      Re: My most memorable mistake

      We had somewhat larger motor-generator sets in the Uni basement lab. The motors were fed by 200v DC via a rheostat, which was used to bring them slowly up to full speed. When you switched off, the rheostat holding magnet was supposed to release & drop the rheostat back to zero for the next restart.

      One student didn't notice that their magnet hadn't released, and so re-powered-on the motor at 0 ohms. The very thick cables all sagged noticeably, the fuses surprisingly didn't blow as the motor went from zero to several thousand RPM in under a second. The two things I remember were watching the thick steel case of the motor visibly flex under the torque (but remain attached to the floor) and the deafening tortured scream from inside it which brought staff running from the adjoining lab. I suppose we were lucky it didn't detach from the floor & exit via a wall. No-one ever dared to try it again.

      1. Antron Argaiv Silver badge

        Re: My most memorable mistake

        We had those motor generators in the basement of our EE building, with big, honking jacks for plugging jumper cables into. This was 1970s. Never once saw them under power or even plugged in, and they were covered in dust. Curriculum changes, I suppose.I was doing Computer Systems Engineering. Closest I got to them was etching PCBs, which used the same room.

        1. I could be a dog really Bronze badge

          Re: My most memorable mistake

          Nah, it'll be the "can't let students anywhere near that, the risk assessment says no".

  21. PRR Silver badge
    Mushroom

    > We had those motor generators in the basement of our EE building,... This was 1970s. Never once saw them under power

    They may have been from the early 1950s, when my father did his EE. Motor Theory and (power) Networking were hot topics. He knew all the types of motors/generators and how the flux flowed. He did a lot with analog computers, the NEW way to figure power flow over reactive networks (a couple tubes/valves instead of actual wound-iron on a breadboard). FWIW, he also published an essay about how computers could guide missiles in wartime. It was a different world. (And in some ways, still with us.)

    {The icon I never wanted to use --->}

  22. Anonymous Coward
    Anonymous Coward

    "We f**ked up"

    I joined a large UK service provider company a few years and was charged with patching servers as a "fresher" task. I was given a list of servers that were OK to do, but it turned out not to be as OK as I thought.I sent an email out asking if it was OK to do a particular server and no-one replied saying no so, after a half hour, I cracked on. It turned out I took down the internal phone system for a city Fire Brigade. If you were affected I am truly sorry. As far as I was aware, the kit was all good to go. The section head was, and is, a decent fellow, and mildly chastised me but fessed up to said FB that "we f**ked up".

    I am definitely old enough to know better but I took people's words at face value......

    Anon for obvious reasons, and I can't reveal which FB got done over for various reasons, including saving my own ass!

    1. I could be a dog really Bronze badge

      Re: "We f**ked up"

      Your defence there is that it was one you were told to do, you did double check and no-one questioned it, so it's not really your fault if it caused a problem.

    2. John Brown (no body) Silver badge
      Thumb Up

      Re: "We f**ked up"

      "I sent an email out asking if it was OK to do a particular server and no-one replied saying no so, after a half hour, I cracked on."

      Yeah, been there done that. The lesson learned was that if not sure, do not proceed if no reply. Maybe others were too busy and didn't reply yet, or thought others who should have replied would have done so using REPLY instead of REPLY-ALL. If there's no response of any kind in a reasonable time, ask again :-)

  23. CowHorseFrog Silver badge

    Maybe they shoudl ask the CEO of the company to fix the problem, after all they pay them at least 20x more than the newbie... the company needs to get some value from their leader.

  24. Andy A
    Facepalm

    I used to be part of a small team supporting a customer with over a hundred Netware servers. Novell certification was a job requirement.

    We had a standard question when interviewing for the team.

    "You have Netware 3.12 server with 80 users. You need to take it down to replace a disk drive. How do you do it?"

    Most would spout the Official Novell Answer - "At the console you use the DOWN command followed by EXIT".

    Bye bye candidate. You missed the important information. There are now 80 ANGRY users.

    1. David Hicklin Bronze badge

      > You missed the important information. There are now 80 ANGRY users.

      And there you have the big difference between big companies and small ones

      I for many years worked at small companies where IT was an afterthought and done by me as a part time activity. Wanting to move onto bigger things I did a few interviews and one day one was at a banking/credit co, the 3 questions asked were when something broke, how and what did you fix to it? well with a small co you just pile in and fix the problem....simples!

      ...it was not until I was with my current Much Bigger Co where IT is a core dependency that I got to understand the concepts of planning, change control, escalation up the food chain and finally after the event lessons learned and how to avoid it in the future.

  25. ColinPa

    House rules

    With my team we had super user authority, and could do anything on the system and we often worked late at night. I came up with some rules which worked well, and we had no problems

    - If you break it, you fix it

    - If you do not know how to fix it, do not break it.

    - It is better to ask, rather than hope and pray.

    - Every one needs to check that their backups work - and you can restore critical data.

    - If people notice you've broken it - you buy the beers till it is fixed

    - If you do anything "naughty" or unusual, send a short email to the owners of the system as a CYA in case they get audited, or you subtly change the system.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like