back to article Uptime guarantees don't apply when you turn a machine off, then on again, to 'fix' it

With another working week almost behind us, The Register has found another tale to tell in our On-Call column – the home of reader-contributed stories of thankless tech support tasks. This week meet "Rod" who, in the late 1990s, found himself in a new country, seeking a new job. Rod had dabbled with computers in the past …

  1. Korev Silver badge
    Coat

    Rod stuck in the job long enough for the wheel to turn: a re-org or two later he was again asked to take on storage support, trained another group of colleagues, and watched them leave with the skills he had imparted on their CVs.

    Looks like he had to leave to keep his SANity...

    1. Anonymous Coward
      Anonymous Coward

      Yeah, sounds like a NASty place to work...

      1. Korev Silver badge
        Coat

        He needed to fibre channel his anger somewhere...

        1. Zippy´s Sausage Factory
          Coat

          Sounds like a SCSI way to treat your employees to me.

          1. Korev Silver badge
            Coat

            Sounds like a SCSI way to treat your employees to me.

            You mean terminating them?

            1. MrDamage Silver badge

              What a RLLy hIDEous pun.

          2. Anonymous Coward
            Anonymous Coward

            Not the sort of puns I want to read on a SATAday.

    2. b0llchit Silver badge
      Coat

      NAScent technology keeps your SANity.

      1. The Bobster

        ISAy it's a good job he didn't get run over by a BUS, as he was so important to the organisation.

        1. TrixyB

          So you could say...

          He has been around the block

          1. mdubash

            Re: So you could say...

            Well, he hadn't SATA upon his laurels...

            1. Korev Silver badge
              Coat

              Re: So you could say...

              He should file for damages...

  2. Prst. V.Jeltz Silver badge
    Thumb Up

    meet "Rod"

    Those quotation marks are so much more efficient than that tedious and repetitive "... who the Regonomiser has dubbed Rod , for that is not his real name" crap, Well done!

    1. MJB7

      Re: meet "Rod"

      I'm never really very convinced about effective the Regonomiser is, and how true the "not his real name" bit is.

      1. Philip Stott

        Re: meet "Rod"

        I've submitted a couple of 'who me' and 'on call' stories, and can confirm that they do change your name.

        1. Phones Sheridan Silver badge
          Holmes

          Re: meet "Rod"

          Yes but now you’ve given us a clue that you’re at least 2 of them. Only a matter of time before we figure out who you were!

          1. The commentard formerly known as Mister_C Silver badge

            Re: meet "Rod"

            and we can start by ruling out the ones about "Philip"

            1. Anonymous Coward
              Anonymous Coward

              Re: meet "Rod"

              ...or can we!?

    2. PRR Silver badge

      > Those quotation marks are so much more efficient than that tedious and repetitive "... who the Regonomiser has dubbed Rod , for that is not his real name" crap, Well done!

      Dear Abby (mother and daughter) has been doing this for over 50 years. Recent examples:

      DEAR ABBY: My daughter, "Maddie," 34, just left what I thought was a great marriage. .....she cheated on her husband, "Glenn."

      DEAR ABBY: My 20-year-old niece, "Andrea," came to visit her grandmother wearing a nose stud.

      DEAR ABBY: A dear friend, "Lorraine," contacted me because her daughter "Gabby" came out to her....

      DEAR ABBY: I have been friends with a woman I'll call "Blanche" for 40 years.

      Readers are not confused.

      www.DearAbby.com

      1. Sceptic Tank Silver badge
        WTF?

        Sorry, what on earth is all this about?

      2. Prst. V.Jeltz Silver badge

        "Readers are not confused."

        Nor should they be its perfectly clear and much more elegant than the Regs usual clumsy "for thats not his name" thing.

        I think my original comment came across as sarcastic rather than genuine , due to sarcasm count in normal Reg commenting I guess.

        Hence the 50/50 up down votes .

        for the record , i was serious.

        1. Anonymous Coward
          Anonymous Coward

          It's just a quirky writing style. The author knows that we know the name isn't real.

          Maybe they should write their stories in concise bullet points?

  3. Prst. V.Jeltz Silver badge

    Automation needed

    Rod explained that machines designed to achieve 99.999 percent uptime were not designed to be turned off and turned on again.

    Sounds like a bad setup to me. I can buy that "switch it off and on again" is not the first port of call with that machine but that shouldn't mean it needs vendor support and action to do so.

    Whatever this "He duly guided the customer through the correct startup sequence" is , surely it can be automated, or at least documented as a procedure the user can do fairly easily, rather than the machine " has lost all configuration and data"

    Perhaps also put a series of "Are you really really sure?, this machine isnt meant to be rebooted" warnings on the off button.

    I mean ... given the amount of cleaning ladies that The Reg would have us believe are constantly unplugging things its bound to happen!

    1. Sgt_Oddball

      Re: Automation needed

      Generally for such boxes you make sure it's in a room that does not see a cleaning lady(or man, such things do exist). Secondly, the box might come up just fine but that doesn't mean the attached servers will automatically play nice, seeing the boxen mit blinkenliten immediately and carrying on their merry way.

      Sometimes, you have to restart things in a specific order. Usually the order is passed down in arcane rites by grey beards in locations where the sun dare not shine. Sometimes you get lucky and just have to remind the servers that things exist again.

      I've been there a few times, even more fun having UPS's ripple start only to have one release the arcane smoke thus causing further embuggerance after a lengthy power outage.

      1. Sam not the Viking Silver badge
        Pint

        Re: Automation needed

        Your comment about a cleaning lady/man rang a distant bell..... When we were ejected from the old company and set up on our own, we were successful and eventually decided we could employ cleaners rather than an unenthusiastic rota.

        The consortium that got the contract, to clean about five offices, reception, toilets and kitchen etc. comprised some ex-employees from the old company. One of them was a super-highly skilled machinist; his task had been typically to machine 8-m long crankshafts for large diesel engines on a bespoke lathe. These forgings were very expensive before they reached the machining stage. Not to belittle cleaners but it felt such a waste of talent/skill to take him on as a cleaner. He took the contract with good grace and did a good job until we were taken over and the accountants appointed a different set of cleaners; it will surprise no-one that this lot were slightly cheaper and much less competent.

        For all those who deserve better ---->

        1. J.G.Harston Silver badge

          Re: Automation needed

          "Rod"'s situation seems to be similar. An engineer taken on as a technician. Sadly, all too common. "Wotyer complaining about, it's "computers" you said you want to work in "computers"".

          1. JulieM Silver badge

            Re: Automation needed

            A technician is somebody who knows at least as much as an engineer, works at least as hard as an engineer and gets paid half as much as an engineer.

            An engineering manager is somebody who gets paid five times as much as an engineer for working half as hard as an engineer.

        2. FIA Silver badge

          Re: Automation needed

          Was he happy? That's all that really matters.

          (Also, there is a saying... 'where there's muck, there's brass').

      2. Anonymous Coward
        Anonymous Coward

        Re: Automation needed

        DO NOT SWITCH OFF! - plastered on all the vaxstations back at uni. It was a nightmare reclustering them, apparently.

    2. Terry 6 Silver badge

      Re: Automation needed

      Well, stuff happens. A machine might be on for years, but there's always a significant chance that it may not remain that way. Apart from our beloved cleaners; floods, fires, maintenance to the building and other acts of fate will eventually befall.

      1. Killfalcon Silver badge

        Re: Automation needed

        I once worked with a many-terabyte SAN that had several years of uptime - close to a decade since last cold-start, with new storage being occasionally added or replaced in-situ to as it went.

        Then came the call to migrate it to a datacentre and not the basement of a grade-2 listed building. A long bank holiday weekend was chosen, the setup replicated, data copied over several weeks. The old SAN was shutdown, the new one started and... nope. Computer Does Not. After a few hours, they thought "helll, we better make sure we can bring the old one back on line!"

        It took the entire long weekend to work out a safe boot order.

        All of those upgrades added had created a tangle of dependencies between different nodes and clusters and other network storage words - a tangle so bad that even when they worked out the right order, it still took several hours to boot.

        Its all been designed and rationalised since, thankfully, but it was a hell of a shock at the time. Our infallible, bulletproof storage that had gone years without problems was, in fact, very unhappy.

        1. An_Old_Dog Silver badge

          File Mount Entanglements

          Home directories residing on NFS mounts. There's a darned good reason we had user root's home directory located in /root.

        2. DS999 Silver badge

          Re: Automation needed

          Why in the world would they shut down the old one before the new one was up and running and users were happily using it none the wiser that anything had happened other than an email about a weekend downtime they ignored?

          I wouldn't even shut down an ordinary server that had been migrated until customer acceptance was deemed complete. If the new one has the same IP or something like that then you unplug it from the network, you don't power it off!

          1. Prst. V.Jeltz Silver badge

            Re: Automation needed

            IKR?

            "close to a decade" of uptime he says , and then switched off a day too early . lulz.

        3. ilmari

          Re: Automation needed

          After 10 years, I'd imagine spinning rust would no longer spin up. Unless it was spun down and up a few times before shutdown...

          1. gnasher729 Silver badge

            Re: Automation needed

            There was a story about a power company. Being a power company, they didn’t get electricity bills. Getting no electricity bill, nothing ever got turned off.

            They moved one day, and none of their hard disk drives worked. Running for five years turns the lubricants in a hard disk drive into something different that is hard until it gets moving. The solution was dropping the hard drive from three inches on the desk _just_ at the right moment when rebooting your machine. The shock turned the lubricant back to a fluid for a tiny amount of time and allowed the drive to spin up.

            1. An_Old_Dog Silver badge

              Stiction-like Symptom Fixes

              At least the drop-fix was quicker than pulling the drive, leaving the cables connected, removing the cover (thus voiding the warranty) of the hard drive, starting the platter stack moving by pushing a soft eraser-stick ("Staedter" or "Momo" brands) against the spindle hub, powering up, then reassembling ...

              1. Missing Semicolon Silver badge

                Re: Stiction-like Symptom Fixes

                I used to remove the drive from it's cage, connect it by suitably-long cables, then power on whilst executing a deft hand-twist on the drive. This would unstick things enough to allow it to go. High-G shocks would give me the heebies. Even then, I had colleagues (not aware that the limit was acceleration, not motion) look on in horror.

            2. Anonymous Coward Silver badge
              Holmes

              Re: Automation needed

              The spinning disk drop fix is to fix a stuck head, not the motor. The heads normally fly above the surface, but when the disk is not spinning they touch the surface (not good, hence they're meant to park off-disk) and stiction then prevents the disk from starting up again.

            3. Elongated Muskrat Silver badge
              Boffin

              Re: Automation needed

              If anyone really cares, this phenomenon is known as thixotropy

      2. Caver_Dave Silver badge
        Unhappy

        Re: Stuff happens

        Left the home office (for the first time in months) to meet colleagues today. Unplugged the laptop from where it has been happily sitting al that time.

        Plugged it in when I got back and the wired Ethernet ports have completely disappeared from Windows including from the device manager!

    3. Jou (Mxyzptlk) Silver badge

      Re: Automation needed

      > Sounds like a bad setup to me.

      That is a bit hasty, considering the box was probably pre-2000, maybe pre-1995. Things weren't so simple back then, you should try such old storage boxes...

    4. TRT Silver badge

      Re: Automation needed

      Basically there's two ways to look at this... every machine is going to need some kind of period of time for recovery (excepting maybe a Commodore 64, which you can measure in microseconds), and the question is WHEN do you put in the time... do you have an overly elaborate shut down procedure which puts everything into the right state and the right position ready for powering on, OR do you have an overly elaborate start up procedure?

      I needed to power off our S3 storage due to extended electrical works which were going to take out both of the supplies (there's ALWAYS a single point of failure, somewhere).

      I looked through the manual and there was a section for starting the system, which was about 8 pages long, but nothing about powering it off again.

      I put in a support call to the vendor and they went "um... ah... hang on... we'll write a procedure for you." They came back with a 14 step process that covered 4 sides of A4. "And the restart procedure?" I asked... "Is that the same as the one in the manual?"

      "No! You just push the power buttons in the sequence you generated in step 3 of the shutdown procedure, leaving 5 minutes between each node. See? It's that one line there, right at the end of the page 4."

      1. HappyDog

        Recovery Commodore 64 style

        "every machine is going to need some kind of period of time for recovery (excepting maybe a Commodore 64, which you can measure in microseconds)"

        That'll be a recovery to a blinking cursor and empty RAM, sounds more like a reset to me?

        1. An_Old_Dog Silver badge

          Re: Recovery Commodore 64 style

          **** COMMODORE 64 BASIC V2 ****

          64K RAM SYSTEM 38911 BASIC BYTES FREE

          READY.

      2. Anonymous Coward
        Anonymous Coward

        Re: Automation needed

        "do you have an overly elaborate shut down procedure which puts everything into the right state and the right position ready for powering on, OR do you have an overly elaborate start up procedure?"

        Yes.

        1. Overly elaborate shutdown procedure which should put everything in a safe, consistent state, with a simple startup procedure if that was followed.

        2. Overly elaborate startup procedure to make sure everything's ok, for use if #1 wasn't followed. (Ex. EPO, UPS failure, cleaners...)

    5. An_Old_Dog Silver badge

      Documented End-User Procedures

      or at least documented as a procedure the user can do fairly easily

      We had clients who could follow the documented Unix shutdown procedure (with literal checkboxes) correctly, and those who ... could not. (This was before Ext- and BTRFS-type filesystems had been invented.)

      1. that one in the corner Silver badge
        Trollface

        Re: Documented End-User Procedures

        > with literal checkboxes

        So that's one checkbox for each of the times you type 'sync' before the 'shutdown -h'?

        1. An_Old_Dog Silver badge

          Re: Documented End-User Procedures

          There's more to it than that. I don't recall the full details, but it was something like, but with more user-centric language than below:

          [ ] 1. Ensure everyone is logged out of the app. (You may have to visit their offices.)

          [ ] 2. Ensure everyone but you is logged off the computer. (The output of the 'w' command will be helpful in showing who is still logged in. You may have to visit their offices.)

          [ ] 3. When everybody else is logged off, log off yourself.

          [ ] 4. Log in as 'root' from the main console.

          [ ] 5. Shut the system down to single-user mode [I don't recall the parms of the shutdown command for their flavor of Unix]

          [ ] 6. [multiple instructions re making a daily tape backup, and a monthly tape backup, if it was the last day of the month]

          ...

          [ ] 12-ish. Give the shutdown command [whose parms I don't recall]

          [ ] 13. When you see the message, "*** SAFE TO POWER OFF ***", shut off the computer power using the Big Red Button.

          Important note to Reg readers: Unix file system consistency != application-file consistency, so thus the rigamarole.

    6. Joe W Silver badge

      Re: Automation needed

      Ah...

      Let me guess, you have no experience with older hardware. This stuff needed arcane rituals to come up clean again - especially after ungracefully shutting it down. Thus the "and we'll dispatch a warlock (ah, no, engineer) immediately" clause.

    7. spuck

      Re: Automation needed

      > Whatever this "He duly guided the customer through the correct startup sequence" is , surely it can be automated, or at least documented as a procedure the user can do fairly easily, rather than the machine " has lost all configuration and data"

      The startup procedure probably is documented, right after the section called "Shutdown Procedure" which the customer also didn't read...

    8. adamr001

      Re: Automation needed

      But if they told you how to do it, why would you pay them $$$ for the support contract?

      1. An_Old_Dog Silver badge

        Support Contracts (Even When Given Full Instructions)

        The customer purchases a support contract so that when they screw things up, they don't have a to pay the much-more-expensive out-of-contract per-incident fee to get it professionally fixed.

    9. Sceptic Tank Silver badge
      Meh

      Re: Automation needed

      As is usual in these stories, the customer is always an idiot. They who pay your salary.

  4. Anonymous Coward
    Anonymous Coward

    So - The 'customer' should have had to wait till a support person arrived -

    possibly days depending on the customer location -

    rather than have the problem fixed in maybe under 1 hour - typical manglement .

    He ought to have been praised rather than rebuked.

    1. wolfetone Silver badge

      You're missing the bigger picture.

      Rod doing that, took work away from a service engineer, their boss, and affected their own quota bullshit.

      Rod ruined a particular department's day. The nasty man. How bloody well dare he etc.

      But, the more surprising thing is Rod stayed around. I'd have been out the door if management started being pricks about it like that.

      1. TRT Silver badge

        But what the discerning reader really wants to know is, where were Jane and Freddy?

        1. Korev Silver badge
          Coat

          > But what the discerning reader really wants to know is, where were Jane and Freddy?

          They could help with the Bungled storage shutdown

    2. MJB7

      Re: wait till a support person arrived

      I don't think that is what the problem was.

      They should have dispatched the engineer _straight away_, in case the on-site engineer was needed. Then they should have diagnosed and fixed the problem remotely (and then told the engineer to come back).

      The alternative is wait an extra half an hour while they diagnose the problem and realize they need an on-site engineer. That's half an hour wasted.

      1. ChrisC Silver badge

        Re: wait till a support person arrived

        "The service plan bundled with the product promised that any failure would be met by the instant intervention of a crack support squad"

        Which, by the sounds of it, is what the customer received - that description doesn't say anything about the intervention needing to be in the form of someone being sent on-site...

        So I'm going to assume here that something has been lost in the mists of time and that the actual service plan did indeed promise instant (or at least ASAP) dispatch of a field tech to the customer site, hence the bollocking received for failing to do that.

        1. Adrian 4

          Re: wait till a support person arrived

          'Crack' he may have been, but could he wrangle an alligator ?

      2. AnotherName

        Re: wait till a support person arrived

        And get bollocked instead for wasting engineer's time travelling on an unnecessary trip?

        1. An_Old_Dog Silver badge

          Bollocked for Dispatching an Engineer on a 'Wasted' Trip

          It depends on your company's / boss's policy. If said policy is based on how your boss is feeling, today, you need to find employment under a different and sane manager.

      3. Trygve Henriksen

        Re: wait till a support person arrived

        This.

        That half an hour may be the difference between 'barely catching the plane going vaguely in the customer's direction', and 'there's a plane heading there tomorrow'

    3. A.P. Veening Silver badge

      He ought to have been praised rather than rebuked.

      No good deed goes unpunished.

  5. ColinPa

    What do you mean - its never been rebooted ?

    Someone told me of when they worked in a small software company doing pretty advanced stuff. You could get fixes onto the system and incorporate them without having to restart the system. For 6 months the development team fixed all their problems this way. It was fast, it was slick - and looked great.

    At a moment's notice they were asked to do a demo for an important customer. The unlucky person created a memory stick image, and rushed to the conference room with his laptop. The system almost started. When the boot got to their code, it fell over. For the previous 6 months their server had not been rebooted. He sat there trying to fix it, when the manager came out and said "We are over running, do you mind just giving a 2 minute description rather than the demo". Suddenly all was calm, and he said "that was fine by him".

    1. dmesg
      Mushroom

      Re: What do you mean - its never been rebooted ?

      A friend of mine used to serve on submarines in the US Navy, and part of his duties involved maintaining some on-board computer systems. Said systems were engineered in such a way that he could (and sometimes did) replace memory _while the system was running_.

      Icon for the event I'm glad never happened.

      1. Jou (Mxyzptlk) Silver badge

        Re: What do you mean - its never been rebooted ?

        Oh, those must be really old systems. 'cause as soon as it was possible to cluster two computers doing it that way was cheaper than those extremely expensive boxes with hot-replace of RAM, CPU and so on...

      2. An_Old_Dog Silver badge
        Facepalm

        Hot-Swap RAM

        I observed my officemate, a software engineer, working on his Compaq 486 Prolinea desktop PC. The cover was off, and he had some RAM DIMMs sitting on the power supply. I heard the PSU fan running, looked, saw the power light on, and (I began) saying, "That box does not have hot-swap RAM!", but didn't get more than halfway through the sentence before he'd popped out one of the DIMMs. The PC then failed as a hardware-knowledgable person would expect ...

    2. phuzz Silver badge

      Re: What do you mean - its never been rebooted ?

      uptime is a measure of how long it's been since you last verified that a server would boot correctly.

  6. chivo243 Silver badge
    Flame

    Internal squabbles

    The only thing that burns my butt more than internal squabbles, is a flame about 75cm high... I've had a couple.

  7. Martin Gregorie

    A strange feature of fault-tolerant systems...

    ...is that, when Stratus fault tolerant systems (also re-badged as IBM System/88) were the new, cool, kit most of the faults were caused by operators showing off its fault tolerance to their chums. They did this by opening the cabinet and pulling out boards while exclaiming "See: its still working!".

    Unfortunately, the machine reacted to a removed board by dialing home to report the failure almost immediately, which resulted in an engineer and new board arriving at the site only to find that all was well and that this was an unnecessary call-out. Fortunately,the fix was simple: a 10 minute delay was introduced between fault detection and dialling home: if the fault disappeared, i.e. the operator put the board back within ten minutes, the pending call was not made.

    The reliability of the Stratus was excellent: the one I used for a secure networking project shared a small machine room and mains supply with a Tandem fault tolerant system. During our project the Stratus crashed twice: both times because the Tandem power supply failed, blowing the computer room's mains fuse because its failure caused a mains short. This caused the Stratus to switch to its built-in batteries which didn't have the capacity to keep it running for the rest of the weekend, so it did a clean shutdown just before its batteries were fully discharged.

    1. GlenP Silver badge

      Re: A strange feature of fault-tolerant systems...

      Operators used to like demonstrating these features.

      DEC did vertical tape drives with self closing doors that had a trip if the door was obstructed. I was visiting a government department computer room (long story) where they demonstrated this with a packet of cigarettes. Apparently they'd previously often put their hand in the way but on one occasion, fortunately, had used cigarettes and the door trip failed. The operator concerned had some very quick smokes, as the packet was chopped in half, but at least he still had his fingers.

      1. WonkoTheSane
        Headmaster

        Re: A strange feature of fault-tolerant systems...

        Still being able to count up to ten without removing one's shoes and socks is always a plus at the end of the day.

        1. keith_w

          Re: A strange feature of fault-tolerant systems...

          that would be 5, 5 and a half, 6, 6 and a half, seven?

          1. James Wilson
            Coffee/keyboard

            Re: A strange feature of fault-tolerant systems...

            Thanks Keith. The invoice for the replacement keyboard is in the post.

        2. JulieM Silver badge

          Re: A strange feature of fault-tolerant systems...

          Pah! I can count up to 31 without breaking off from scratching my butt .....

    2. Joe W Silver badge
      Flame

      Re: A strange feature of fault-tolerant systems...

      One of my mates tried to demonstrate the hotplugability of his new system - and dropped the NIC onto the Mainboard. Sparks...only a Realtek card got destroyed, the rest was still fine.

  8. Bebu Silver badge

    not designed to be turned off and turned on again

    "99.999 percent uptime were not designed to be turned off and turned on again."

    So why have a power switch on the device at all? A well hidden reset button behind pin hole which accepts a mutilated paperclip would be more the job - not one of those brightly coloured plastic coated paperclips either. The designer should have the option of sending a couple of kV up an uninsulated paperclip. Nothing like a near death experience to focus concentration. I imagine in reality the device has a standard 3-pin plug and if its lucky plugged into UPS, if not the charwomen's wall socket.

    Just worked out the five 9s is just over 5 minutes per year - just long enough to give the machine room a quick hoovering.

    Naively I would have thought high availability devices would be designed to recover extremely rapidly and reliably.

    1. An_Old_Dog Silver badge
      Flame

      Re: not designed to be turned off and turned on again

      So why have a power switch on the device at all?

      You want a hardwired turn-it-off power switch in case the device catches fire.

      (Icon, because I've seen flaming computers before)

  9. I Am Spartacus
    Thumb Up

    Got to love 5x9's uptime

    I love the 99.999 % uptime. Thats a downtime of 0.0001% or about 52 minutes a year.

    Interestingly I did have one box, a Tandem non-stop server, that absolutely refused to go down. Even when the power went off because the computer room was flooding this box stayed on, serving credit card transactions, storing them up to pass to the customer services Sequent when that finally came back online hours later.

    1. Anonymous Coward
      Anonymous Coward

      Re: Got to love 5x9's uptime

      Um, a quick mental calculation suggests you are out by a factor of 10 in one of your results and by a factor of 10 in the other direction for the other. 100% minus 99.999% is 0.001% not 0.0001%.

      0.001% is 10ppm. A year is (roughly) 31 million seconds, so 10ppm is roughly 310 seconds, or five and a bit minutes. 0.0001% downtime would be only ~31 seconds.

      1. Joe W Silver badge

        Re: Got to love 5x9's uptime

        Pi seconds is a nanocentury.

        1. jfm

          Re: Got to love 5x9's uptime

          I've always found that more useful in the form "pi megaseconds is a millicentury" (31.4 Ms in a year, which is within 5% of the actual value), and for quick estimations, 30 megaseconds—and knock off as many zeros as there are 9s in the reliability.

  10. Boris the Cockroach Silver badge
    Happy

    Well

    theres a new question to ask interviewees before we hire them

    "Can you recite the machine's source code backwards while wrestling a crocodile underwater?"

    But that leads to the further question of

    "Whats the best response to the aforesaid question?"

    1. J.G.Harston Silver badge

      Re: Well

      } ; 0 nruter } .....

    2. David Robinson 1

      Re: Well

      "Fresh or saltwater?"

    3. hplasm
      Happy

      Re: Well

      "Yes - it goes - 'blub blub blub bloop grr grr blub...' "

    4. Sceptic Tank Silver badge
      Windows

      Re: Well

      Maybe you should get the plumber in first.

  11. Bruce Ordway

    Linux.... back when Slackware came on floppy disks

    Yeah, wasn't that an exciting time?

    1. Francis Boyle Silver badge

      Re: Linux.... back when Slackware came on floppy disks

      My introduction to Linux, though by then someone had thoughtfully decided to put the disc sets onto a CD.

      1. Anonymous Coward
        Anonymous Coward

        Re: Linux.... back when Slackware came on floppy disks

        It's been all downhill from there

    2. Antron Argaiv Silver badge
      Thumb Up

      Re: Linux.... back when Slackware came on floppy disks

      ...and the excitement lasted much longer, (due to dialup Internet) too!

      I still remember the joyful feeling when Walnut Creek CD started publishing the releases on CD.

    3. Anonymous Coward
      Anonymous Coward

      Re: Linux.... back when Slackware came on floppy disks

      Yup. I still have that set somewhere. It wasn't Slackware 5.x as that came on CD, but something earlier, 3.x?

  12. Paul Hovnanian Silver badge

    Not following procedures?

    I don't understand why they came down on "Rod" for handling the call. Rather than dispatching the elite bug hunting team. Certainly the customer's rebooting the machine stepped outside of the normal restoration flow chart. This reminds me of the apocryphal mechanics hourly rate sign:

    $70 per hour

    $90 per hour if you watch

    $110 per hour if you help

    $150 per hour if you already tried to fix it first

    $300 per hour if you try to tell me how to do my job

    As an aside: What sort of parents would name their child with enclosing quotes. Such as those of "Rod"? He must have had a difficult childhood (excuse the presumption about pronouns).

    1. Joe W Silver badge

      Re: Not following procedures?

      Rod";

      I guess the semicolon was left out?

    2. aerogems Silver badge

      Re: Not following procedures?

      As Rod explained in the story: internal politics.

      1. that one in the corner Silver badge

        Re: Not following procedures?

        >> What sort of parents would name their child with enclosing quotes.

        > As Rod explained in the story: internal politics.

        So sad when families start to filibuster and demanding that voting lines be redrawn (he is only your step-grandfather!)

        Ending up, at the christening, with the mother handing over a note reading: his name is "Rod", not "Roderick". "Rod"!

        They were lucky the pastor couldn't pronounce the pling.

        https://www.youtube.com/watch?v=Qf_TDuhk3No - Victor Borge, Phonetic Punctuation

        1. that one in the corner Silver badge

          Re: Not following procedures?

          The advanced version of Victor Borge's routine for real punctuation enthusiasts:

          https://www.youtube.com/watch?v=TIf3IfHCoiE

          (aka "curses, couldn't find the video I'm looking for and the edit window is about to close)

        2. Cheshire Cat
          Go

          Re: Not following procedures?

          Esmeralda Margaret Note Spelling of Lancre!

          - known as "Spelly" to he enemies.

    3. An_Old_Dog Silver badge
      Joke

      Unusual Names

      He may have been related to this kid: Robert'); DROP TABLE Students;--

      [https://xkcd.com/327/]

      1. JulieM Silver badge

        Re: Unusual Names

        Surely that would only do anything if the `name` field was last in the table and the query had been built up in a string using 'single' speech marks around values?

        If "double" speech marks had been used, that ' would not close them, but just be a normal character within the string. And if the original programmer had used 'single' speech marks when building queries but there were more fields after `name`, then the resulting mismatch error in the first query would stop execution of the whole string of queries?

  13. Mishak Silver badge

    Not support related, but...

    I was once contracting for a large company. We were all sitting in the large, open-plan office when the "get ready to leave" alarm sounded (it was a big place, and a fire alarm in one block would set this off when it was triggered as a "heads-up").

    Shortly after, we noticed a slightly unpleasant smell, followed by sore throats, coughing and streaming eyes. We unilaterally decided that, as it appeared as if we were being poisoned, it was probably a good idea to put on our coats, pick up our laptops, and evacuate the building.

    Many minutes after we left, the alarms changed to "evacuate" - a transformer in a large UPS had got upset, and the combustion products were getting sucked into the ventilation system.

    The permanent staff were forced to go though mandatory fire drill "retraining" as they had clearly failed to follow the procedures by evacuating before they were told to do so. They were also told that putting on coats and taking laptops was against the evacuation policy, which they had just been told was not in force when they left the building...

    1. Antron Argaiv Silver badge

      Re: Not support related, but...

      You would hope...that the exhaust air from areas likely to have combustion (server rooms, etc) would not be shared with areas containing people.

      You would hope.

      1. yetanotheraoc Silver badge

        Re: Not support related, but...

        I am without hope.

  14. spuck

    When it isn't Joe-Proof

    I used to support a classified system that had a 5-node Isilon storage array. Each of the 5 nodes held 40 disk drives. Built-in redundancy meant that even with 2 nodes off-line the system was able to keep running, and up to 2 HDDs in each node could also fail before that node would drop from the quorum.

    The redundancy was important, because being in a classified environment, the storage array was not allowed to phone home to report failed drives and getting replacement hard drives into the area involved extra-ordinary hoops to jump through. We had assured the customer that a monthly check for trouble lights would be sufficient to keep things running.

    Everything worked great, until the customer's on-site engineer (Joe) was tasked with the quarterly audit. The outgoing engineer had hated this task: spending all day on his knees in front of the rack, removing all 200 drives, one at a time, and squinting against the poor server room lighting to check the serial number and asset label on each drive against a printed list.

    What made it even more tedious was the need to re-insert each drive and wait for it to be marked as healthy before proceeding to the next drive.

    But Joe was smarter than that. Rather than take an entire day to do the audit with the system running, he took advantage of a planned outage when the system would be powered down for other work. Then he pulled all 200 drives and load them onto a cart, which he was able to then wheel over to a more comfortable work area.

    It all worked great until that afternoon. Only after wheeling the cart back to the rack did Joe realize he had no idea which slot each drive belonged in. But never one to shy away from a challenge, Joe soldiered on, taking the drive off the top of the pile and putting it in slot 1, the next drive in slot 2, and on.

    The customer seemed genuinely puzzled why their data was not available once the system was turned back on.

    1. DS999 Silver badge
      Facepalm

      Re: When it isn't Joe-Proof

      Maybe they should have put the asset label on the FRONT of the drive?

      1. Antron Argaiv Silver badge

        Re: When it isn't Joe-Proof

        Or that at least ONE of the techs might have profitably spent their down time making legible stick-on labels with the serial numbers & stuck them on a part of the drive visible when it was installed.

        1. DS999 Silver badge

          Re: When it isn't Joe-Proof

          I'm assuming the asset labels were added by the government, so they could have put them anywhere they wanted. Maybe there's a separate asset team and they don't know which way the drives will be oriented after installation, but after something like this happened once I'd make sure to contact the asset team and tell them "don't affix asset labels on any IT equipment until you talk to us first to figure out the best place to put them!"

          1. John McCallum

            Re: When it isn't Joe-Proof

            don't affix asset labels on any IT equipment until you talk to us first to figure out the best place to put them! OHH tempting very tempting!

        2. yetanotheraoc Silver badge

          Re: When it isn't Joe-Proof

          "stick-on labels"

          Wouldn't that trigger an audit to verify the labels on the front still match the tags on the back?

          1. DS999 Silver badge

            Re: When it isn't Joe-Proof

            The asset tags I have experience with are nearly impossible to remove - they definitely cannot be removed without causing visible damage to the label making it impossible to re-affix elsewhere without it being very very obvious. Plus who is going to swap asset labels between drives going into the same array? Swapping the asset label for a desk lamp and some super expensive device small enough to smuggle out is what you have to worry about.

            If the serial number on the back was sufficient they wouldn't need asset tags, they'd just keep a list of serial numbers...

      2. spuck

        Re: When it isn't Joe-Proof

        You'd think so, huh? ;)

        In fairness to them, they were supposed to confirm the serial number on the label that was installed at the factory, which isn't on the edge of the drive.

        If it were me being asked to do the job I would have tried to push for putting barcodes on the visible edge of all drives in the system. Beep, beep, beep... drives all scanned and then ask a computer: Does this match the list from last month or not? Computers are good at that sort of thing.

        Sadly, the government doesn't always welcome the idea of being told how to do their job, and especially when their security teams get involved the best way to keep sane is to not think too much.

        1. DS999 Silver badge

          Re: When it isn't Joe-Proof

          I've never seen an asset tag that didn't have a barcode, so I would think a simple app on your phone or similar device could tie with GPS (assuming you have some assisted GPS available for places like datacenters where you may not "see" the satellites) and verify the tagged device is in the assigned location and it will alert you if it isn't where it was supposed to be. Then you can deal with the exceptions and missing tags manually but 98% of the drudge work would be eliminated.

          I haven't dealt with asset tags since smartphones became a thing so maybe that's how they are doing it now.

          1. Anonymous Coward Silver badge

            Re: When it isn't Joe-Proof

            No GPS will be accurate enough to distinguish between drive bays in a cluster.

            Possibly some form of augmented reality would work, but not worth building the environment for such a niche job.

    2. Anonymous Coward Silver badge
      Big Brother

      Re: When it isn't Joe-Proof

      > "check the serial number and asset label on each drive against a printed list"

      Surely the printed list indicated which slot the drive should have been in?

  15. J. Cook Silver badge
    Boffin

    I will cheerfully admit that [RedactedCo] has had a few interesting issues with the various brands of storage we've have in the nearly 15 years that I've had the 'owner' hat for the storage systems.

    We've had:

    a Netapp FAS eat one of it's controller heads. (We had it set as an active /active pair, so the userbase didn't notice a damn thing.)

    a couple Dell Powervault MD3000 arrays; One had a double drive failure on it, eating the raid volume and the backup data spool therein. We had another one that some chucklehead only allocated a single drive to the quorum volume that the database servers that connected to it used for cluster management, and no hot spares on the entire array. (To that appliance's credit, that one drive failed after I had virtualized one of the database cluster nodes that used it; That virtual machine ran as a 'cluster of one' for a few months as we decommissioned the last application from it.)

    a Nimble CS1000 getting powered off the hard way when the entire data center shut down from low battery when the site was flooded and the substation in the sub-basement of the building was drowned. (that was a fun couple months, but I was very impressed when it came back up with only minor complaints!)

    a brand spanking new Pure storage box that ate not one, but two controllers before deciding the third one was OK. (with some collateral damage of me having to replace a fiber patch cable when the field engineer slammed the insert/extract level over it and destroyed the connector- I was Not Pleased.)

    And finally, a Nexsan E18 that one day decided that several of the drives had packed it in (along with one of the controllers burping)- the support engineer set up a zoom meeting, and walked me both through using the super sekret page on the device's web interface to un-fail the drives and re-sync the RAID array, with zero data loss, and getting this very paniced admin out of the 'oh shit oh shit oh shit' anxiety attack that was occurring. :)

    a "small business" Qnap appliance decided that it would shut down and never power back up after we rebooted it in order to install firmware updates. While I was able to transfer the disks over it's replacement and didn't lose any data from that adventure, we are looking to move that data... elsewhere.

    The one appliance we didn't really have a problem with was the Data Domain, outside of the usual "I'm going to take an hour to re-hydrate a ~800 GB database backup" process and the "you want how much for the year four support renewal!?!?!" shenanigans.

  16. J.G.Harston Silver badge

    In Rod we trust?

  17. Ozan

    I'm oldie but I never installed Slackware from floppies. I did boot and root dick but packages were on the cd.

    1. Toe Knee

      Nothing like a carefully selected set of floppies with JUST the package sets you need. In duplicate, of course, because these are second and third hand disks with a tendency to fail at the worst possible moment.

      Some things just scar you for life!

  18. Necrohamster Silver badge

    Outsourced

    … an outsourced chap did the deed…

    “Did the deed” or “Did the needful”?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like