back to article Untrained techie botched a big hardware sale by breaking client's ERP

Nobody starts the working week by planning to fail, but mistakes do happen and The Register likes to write about them in Who, Me? It's the reader-contributed column in which you tell us how you escaped from nasty scrapes of your own making. This week, meet a reader we'll Regomize as "Kane" who told us about a job he said was " …

  1. tip pc Silver badge
    Boffin

    IBM?

    sounds like those IBM/lenovo switches, bundled with IBM hardware as a supposed win win for the client.

    i used to like those HP procurve switches and deployed hundreds back in the day.

    cant beat a trusty cisco though. Not always best performance on paper but IOS consistency was the key to success.

    its not uncommon to happen upon a cisco switch with uptime of a decade or more, of course no firmware updates in that time but that s a different story for a different day.

    https://www.reddit.com/r/Cisco/comments/bgv7a3/uptime_record/

    1. GlenP Silver badge

      Re: IBM?

      HP procurve switches

      I was a big fan of those until I had two fail to come up after a power off (due to fixed electrical testing on the premises which was too prolonged for the UPS to keep everything running). Of course this was late on a Saturday afternoon and, being a small office, there weren't spares readily available.

      I had everyone up and running on the Monday morning but it did involve delving into my personal stash of come in handy gear.

      1. Anonymous Coward
        Anonymous Coward

        Re: IBM?

        uptime is a record of how long it's been since your last successful boot.

      2. Phil O'Sophical Silver badge

        Re: IBM?

        Procurves have a lifetime warranty, and I was surprised how well that was handled. I found a few, many years old, lying in a dusty cupboard & reused them in a rack of lab systems where performance wasn't really an issue. After a while two of them indeed failed to restart after a power outage. Not expecting much I filled in the warranty form - and was pleasantly surprised to receive two new (well, refurbished) switches pretty much by return mail, with RMA labels for the two failed ones.

        1. Antron Argaiv Silver badge
          Thumb Up

          Re: IBM?

          I have one (and a spare) in my home wiring closet. One of them is a warranty replacement. I think I paid $75 for each. Well worth the money (until HP goes under, which, given the way they are behaving...)

          and plenty of performance for my needs.

          I do find the commercial grade switches stand up better to things like nearby lightning strikes, than the consumer grade ones. Those twisted pairs running through my walls are big antennas, and a nearby lightning hit has buggered more than a few consumer grade switches (to say nothing of my garage door openers...they're #1 for failures).

          1. Anonymous Coward
            Anonymous Coward

            Re: my garage door opener

            Failed after a real close lightning hit. Lightning hit the 130 year old tree that was about 10-12 feet from the garage. Luckily during testing, I thought to disconnect the wires running to the push button in the garage. The wire was shorted somewhere between the opener and the button. So I didn't have to go buy a new opener. And I think I had left over wires to do the replacement right away.

          2. collinsl Silver badge

            Re: IBM?

            Don't forget that those switches now probably come under Aruba, which is part of HPE, not HP. I'm sure if the consumer-oriented HP goes HPE (enterprise) will continue on, especially since they seem set on acquiring Juniper.

    2. Trygve Henriksen

      Re: IBM?

      We used a lot of the old 8port unmanaged ProCurves back in the day. Never an issue with them...

      Until the PSU slowly died.

      Thankfully, I had kept 'a few' of the PSUs for the JetDirect EX3 print servers.

      1. JulieM Silver badge

        Re: IBM?

        Small electrolytic capacitor next to the ferrite transformer, by any chance? That's always the one to suspect if something won't turn back on again after being turned off.

        1. Trygve Henriksen

          Re: IBM?

          I would guess at a capacitor, too.

          I tossed out the last Procurve a few years ago, but I still have one or two spare PSUs, so there was no real reason to investigate. Besides, I think they're glued together.

  2. Caver_Dave Silver badge
    Angel

    Anit-Sales - or not?

    I worked for a small computer dealer in the late 1980's.

    My boss was high up in the local Chamber of Commerce and so we had a fairly captive audience (something critical to this story).

    Consequently, most of the local businesses came to us when they wanted their first computers.

    I used to go in and analyse their business needs and see where the use of computer would help. (This did not cover the PA typing the CEO's missives - that was always required!)

    Around 8 out of 10 times, I would demonstrate to the company management that they needed a business process change rather than a computer at this point in time. But I would also state how this would enable them to use a computer in the future, what benefits would be gained and by which metric they should decide that they needed the computer.

    My boss took the long term view and over the next few years we became the almost exclusive supplier to all the local businesses.

    I considered myself to be the exact opposite of most salespersons.

    1. IanRS

      Re: Anit-Sales - or not?

      I was once told, by a highly experienced and very capable salesman, that I should never go into sales. I was too honest about what the client really needed.

      1. GlenP Silver badge

        Re: Anit-Sales - or not?

        I once interviewed for a consultant role at an ERP provider that's no longer in business.

        By the end of the interview I think we'd separately decided the job wasn't for me, my focus was on solving the customers problems (and I could have brought some talents to the company that they were clearly lacking in the UK), their focus was on "managing the project" or, in other words, selling the customer more consultancy time.

        1. lglethal Silver badge
          Go

          Re: Anit-Sales - or not?

          I can still remember the poster on the wall at a customers facility that we were visiting for an important Review.

          Consulting - If you're not part of the solution, there's money to be made prolonging the problem.

          Funnily enough, our Engineering team got on fabulously with them, the Sales team on the other hand...

          1. FrogsAndChips Silver badge

            Re: Anit-Sales - or not?

            A classic demotivator from Despair, Inc.

          2. Caver_Dave Silver badge

            Re: Anit-Sales - or not?

            A consultant will never tell you anything that you don't already know (but may not admit to).

            1. Doctor Syntax Silver badge

              Re: Anit-Sales - or not?

              That depends on what's meant by "you". If it's the company it's probably true. If it's just senior manglement it isn't. Manglement could find out themselves but it would involve believing what oiks on a lower pay grade say. By adding a fee the consultant adds value because he's so much more expensive than the workers so must be right.

              1. Anonymous Custard Silver badge
                Headmaster

                Re: Anit-Sales - or not?

                It would mean opening the black box and understanding how the company actually does things.

                In my experience, that's often beyond the capabilities of manglement, either as it's too scary or it's beneath their opinion of themselves.

                They just look at spreadsheets and PowerBi dashboards without wondering where the data and profits actually come from.

      2. -maniax-

        Re: Anit-Sales - or not?

        I was in a not dissimilar situation a couple of decades back where a number of customers would only talk to me as they knew they'd get sensible answers from me rather than the sales answers they'd get from any of the sales team

        Needless to say the sales people weren't entirely happy but couldn't really do much as they couldn't block the customers from contacting me when I was the point of contact for any support issues

    2. Brewster's Angle Grinder Silver badge
      Pint

      Re: Anit-Sales - or not?

      I did that last month. Someone was offering to shower me with money to write a feature. I was prepared to do it, but I said, "It would be a waste of your money. There are dedicated tools [named a few] and this is the export feature you want to use them." They gave me a thank you "consultancy" payment.

    3. John Robson Silver badge

      Re: Anit-Sales - or not?

      There are reports around about being a successful car salesman by not looking to sell this car to someone, but wanting them to come to you for their *next* vehicle.

      Repeat trade is far easier to come by if you're just honest...

  3. Sam not the Viking Silver badge
    Pint

    A major project for a utility was to be discussed with the End-User, Consultant, machinery-supplier (us), Electricity board, Control-panel supplier, Telemetry and the cabling-contractor. For some reason, the control panel supplier was leading the meeting and they had determined all the major items of equipment to be supplied. The intention was to use state-of-the-art starters to drive the machinery and that supplier had been identified and was joint-leader of the meeting……

    After detailing their proposal, I innocently asked for some technical particulars regarding the starting procedure which I thought was unnecessary. A sort of waffle-response from the starter-people raised eyebrows. Seeking further clarification, the electricity supplier scratched his head and said that they would have to lay on a new HV cable through the town at a preliminary cost in the millions of pounds.

    Flabbergasted, the End-User looked at his Consultant, who looked at the control-panel supplier who looked at the starter-supplier whose face revealed panic.

    Consequently, the starting method was revised to the relief of the end-user. The control-panel had his order-value reduced to tatters. The starter-people went away with nothing.

    A good outcome for us.

    1. Doctor Syntax Silver badge
      Pint

      And for the customer. You deserve another.

  4. Bebu sa Ware
    Windows

    "Unbeknown to Kane, the GUI was buggy."

    Young Kane must have led a sheltered existence if he was sufficiently naive to believe any GUI was free of serious bugs.

    I run at the sight of a web interface and faster still if there is a whiff of java. Most platforms have a reasonable to decent command line interface (if you can get to it.)

    A broadcast storm is pretty decent achievement. Curious how he managed that. Turned off spanning tree?

    1. Korev Silver badge
      Coat

      Re: "Unbeknown to Kane, the GUI was buggy."

      > Young Kane must have led a sheltered existence if he was sufficiently naive to believe any GUI was free of serious bugs.

      Maybe he was Kaned at the time...

      1. PB90210 Silver badge

        Re: "Unbeknown to Kane, the GUI was buggy."

        Certainly not Abel...

    2. John Sager

      Re: "Unbeknown to Kane, the GUI was buggy."

      The switches I use at home (Netgear) seem to have a reliable web gui. I've certainly never had a bad config. There is a pseudo-IoS cli configurator but that's really only used for saving configs as text.

  5. ColinPa Silver badge

    Did you really pull the big red switch?

    Someone in our department was testing some software, and being a thorough tester turned off the power to the machines halfway through a test.

    When the systems finally came back the data was in a mess. Some transactions had been run twice. Some transactions had not run at all, and some transactions were stuck in the middle.

    The call to the development team went something like...

    Her:"I turned the power off at the wall and when it restarted..."

    Them: "You mean you shut the system down?"

    Her:"No - I turned the power off at the wall"

    Them:"But you never do that as it makes mess of the system - surely you just shut the system down in quick mode?"

    Her:"No I turned the power off at the wall - I was simulating a power cut"

    Them:" Wow - we've never seen anyone do that before..."

    She worked for a very effective test department who were asked to write an article on "testing". It started something like

    "If we were to make cars, the first thing we would do it drive it at top speed in first gear down a rough road with plenty of pot holes..."

    1. Anonymous Coward
      Anonymous Coward

      Re: Did you really pull the big red switch?

      One of our users in a remote location decided to proactively solve a (completely unrelated) problem by rebooting the router. Except that they couldn't be bothered to reach into the back of the cupboard where it was piled, so they just unplugged the extension. The extension which all the computers were also plugged into.

      And that's how we found out that modern NVMe drives have a write cache, that can be lost if they lose power in the fraction of a second between a file being written, and being flushed to the SSD. (We eventually decided not to do anything about this, and the window for data being lost was so small).

      1. mirachu Bronze badge

        Re: Did you really pull the big red switch?

        You didn't know that? HOW?! The good ones have on-drive RAM cache, cheaper ones use host memory. Then there tends to be "SLC" cache which, while already written to the drive, is limited and gets rewritten when things quiet down to utilise all the levels of multi-level cells.

        1. that one in the corner Silver badge

          Re: Did you really pull the big red switch?

          > The good ones have on-drive RAM cache, cheaper ones use host memory.

          Hmm.

          *GOOD* ones would provide, say, an internal supercap to allow the cached writes to complete (you know, like industrial CompactFlash cards).

          Cheaper ones that just want to game the average user's speeds will have their own cache that will fill and cause your write speed to plummet if you do anything "out of the ordinary".

          The most cost effective & BEST will honestly report what they can do and allow you a way to mitigate that by sharing system RAM and allowing processes to be written to take all the factors into account (good old fashioned Producer/Consumer interactions). Giving you the opportunity to make your system consistent and predictable.

          > You didn't know that? HOW?!

          > utilise all the levels of multi-level cells.

          Not everybody who uses computers (enough to have them on a LAN with a router - low bar, these days) actually gives a damn about those details, they just want their boxes to Do All The Things. And they've been told that SSDs are so much faster/better than spinning rust - after all, anyone can easily understand the principle of "wait until it spins and puts the right bit of disc under the head" - but the flash doesn't suffer from (at which point, the salesman kicks the engineer who was about to spill the beans).

          1. G.Y.

            super? Re: Did you really pull the big red switch?

            Power comes in an 50 or 60Hz anyway; why does the normal capacitor in the power-supply suffice?

            1. collinsl Silver badge

              Re: super? Did you really pull the big red switch?

              Probably because you've got every component in the computer pulling power from the capacitor as the incoming supply dies, so it's not fast enough for the drive to complete writing.

              After all, with modern PCs, especially ones with nonstandard power supplies (Dell for example does a whole-12V supply IIRC in most PCs) or TinyMiniMicro nodes which use an external ~20V laptop PSU the motherboard is responsible for supplying the voltage required for the NVMe chips, which mean they're at the far end of a long and complex power path which is shared by many other components. Thus, for (mostly) reliable write cached devices, they have onboard power sources which can guarantee sufficient power for buffers to be flushed and disk heads to be parked etc. These days that's usually in the form of some very large capacitors for a server with (possibly) spinning disks in the front of it, capacitors onboard SSD or NVMe drives, or a full UPS tray in the case of some Storage Arrays which are too large for capacitors.

    2. Caver_Dave Silver badge

      Re: Did you really pull the big red switch?

      Exactly the sort of test I am discussing at the moment with our testers. The key difficulty is repeatability - how do I prove that it was mid transaction when the power is removed? - including how fast do the power lines decay?

      I think in the end we are just going to have to programmatically put 'many' simultaneous, large transactions into flight at the same time as programmatically (possibly pre-emptively) turning off the power supply.

      (I did test a system a long time ago where the capacitors in the PSU were large enough that the power rails to the rotating rust allowed it to run for an extra 2.95 seconds after to 400V inputs were removed. The processor (on different power circuits) had stopped communicating long enough before hand, that the disc heads had parked safely before the disc lost power.)

      1. that one in the corner Silver badge

        Re: Did you really pull the big red switch?

        > how do I prove that it was mid transaction when the power is removed?

        Put a relay into the power lines, tied to a GPIO line [1] and flick it off mid-transaction. Depending upon what is doing the transactions, put this call into your C code, or put use your external function interface from your database or - you figure it out.

        You can even measure the latency from the call to power off and move the trigger to just before the actual transaction.

        You can even leave this code in, if you don't want a separate test build - just don't wire up the relay! Put a manual toggle switch into the test box to disable the relay when you are doing another test.

        If you are really into it, put a relay onto the mains side and another onto the PSU outputs, then you can see if the PSU caps can keep go *just* long enough.

        Then go wild with setting it up to only trap certain calls, triggering the power on and off in rapud succession (how long until your PSU gives up the ghost?) etc. Think Evil.

        Note: solid state relays for a quick response time.

        [1] from an FTDI chip on a comm port - preferably not USB, to keep the latency down - or on a hardware serial port - check the motherboard connectors, they still exist - control lines or even the motherboard port for controlling RGB LEDs!

    3. DS999 Silver badge

      If a system can't survive loss of power, it doesn't pass the most basic test

      If the developers knew a power cut would bork the system, why the heck didn't they tell anyone "hey we need help to determine why this is happening so it can be fixed". Putting their head in the sand and saying "you don't cut power" is pretty useless, given that power cuts can't be 100% prevented no matter how much redundancy you have in your power supplies, UPS, generators etc. Even if you got all that perfect by magic there's always the "oops" moment where the big red button is hit either by accident or because there is a fire or flood or something.

      1. Anonymous Coward
        Anonymous Coward

        Re: If a system can't survive loss of power, it doesn't pass the most basic test

        (Anon for reasons which will become apparent)

        A colleague of mine was working in a Big Government Datacentre here in Blighty - it cost £1m per hour if it were outaged, and took 16 hours to get back working again whenever it was shut down.

        The rack he was working in was close to the wall and on the wall was an Emergency Power Off (EPO) button. Recent legislation in the UK which the Government clearly took Very SeriouslyTM meant the EPO had to be disability compliant - to this end the lift-up plastic cover on it which had previously been present had been replaced with a cage which prevented stuff bashing into the button from the sides but which allowed free access to the button from the front.

        My colleague bent over to look at a piece of equipment, stepped backwards a bit, and suddenly heard silence...

    4. WanderingHaggis

      Re: Did you really pull the big red switch?

      Isn't this why we have UPSs (not sure about the plural) for anything that is that critical.

  6. Wang Cores

    > "thought the GUI wasn't buggy"

    > belief in not-buggy GUI.

    It's like the new kid in a war movie discovering his rifle was not the all-purpose killing machine the drill sergeant made him believe it was but a piece of kit made by the lowest bidder.

  7. chivo243 Silver badge

    either

    Netgear or Allied Telesis. I remember both having situations where using the CLI was the only way to be sure your commands worked.

  8. Doctor Syntax Silver badge

    Salesman. Always ready to drop you in it.

    1. Ken Shabby Silver badge
      Facepalm

      Something I have uttered a few times, “You told them what?”

  9. Anonymous Coward
    Anonymous Coward

    broadcast storm

    ""I ended up causing a Layer 2 broadcast storm that brought down most of the network,""

    I had the opportunity to watch this once, in a big DC. Very nasty stuff: all switches CPU maxed out, unable to log in, most packets dropped etc ...

    I think they had to switch them all off to get back on track ...

  10. An_Old_Dog Silver badge

    Sending the Janitor to Fix the Plumbing

    ... simply because both the janitor and the plumber have spent time in the plumbing access/supply closet is on the company.

    Likewise, sending a techie untrained on unit X.

  11. DS999 Silver badge

    Despite GUIs sucking

    I've seen a lot of companies that encourage or mandate their use. I consulted for a major pharma company for a couple years once, and their policy was that all changes to storage switches or arrays were made with the GUI. While they weren't (to my knowledge) buggy they were incredibly slow, and not because of internet delays. Since part of what I was doing on that project was not possible in the GUI they gave me an exception to use the CLI, so when I sometimes picked up tickets for simple tasks like adding storage for new servers I could handle 10 tickets in a half hour by enabling ports, creating the new zones, and creating and assigning the storage in the CLIs. The equivalent task for everyone else would pretty much require their entire 8 hour workday.

    I tried to tell the manager of that group how much time her people were wasting doing it the hard way, but she was convinced using the GUI led to fewer mistakes. I don't know, maybe that's true or more likely doing changes in the CLI had bit her in the past. I suppose if nothing else the waits between every click in the slow ass GUI gave you time to double and triple check every step. But damn considering the size of that team (at least a dozen people in India, three other stateside consultants aside from me, and two full time employees, all solely dedicated to storage - and just storage, backup was a separate team she managed) it was sure costing them a lot of money to do things the hard way! No wonder our drugs cost so much lol

  12. Jou (Mxyzptlk) Silver badge

    Not Kanes mistake

    Who was the dumb sending the fresh-trained to solve a switch problem in a production environment where ANYTHING could have caused that, like a LAN cable someone accidentally stepped on having a half short circuit or filter out high frequency in its squished state. Or having only 90% of the expected AC voltage, and the aging PSU starts acting up. No, I did not make those two examples up, they are real things I came across.

    I hope Kane found a new and better employer soon.

    1. TSM

      Re: Not Kanes mistake

      As per the article - it was the sales people's idea to offer "we fix your current problems" in exchange for "you consider buying our servers". As always with the sales department, offered without any understanding of whether it was technically feasible.

      The company quite likely didn't have anyone trained on the other vendor's kit. But hey, might as well have a stab at it, right? The worst that can happen is you don't get the contract, which is where you are if you don't send someone, so....

  13. Xalran Silver badge

    Four Switches in a Circle... What can go wrong ?

    That reminds me of some event that occured in my previous life in Telecoms...

    My colleagues and me, at that time, wanted routers, and only routers as gateways to the systems we were integrating in the $TELCO networks... It lead to several fight with the pre-sales guy in charge of this product line, as he wanted to put L3 switches as they were cheaper than routers (with a switch card in some case)... we are talking of something like 300€ less overall on a whole thing that's being billed in hundreds thousands Kilo Euros or more....

    Since we also were the guys doing the On Call Support, the game was skewed our way : if we said we weren't going to support it, it meant getting a new team, and that meant getting that new team up to speed on all the legacy stuff we were covering. It was a big No Way, so after much fight we always got our routers.

    Then came the day where the last Legacy Stuff went the way of the dodo, and a brand new shiny thing was coming to replace even the not that old stuff... the Grumpy team got disbanded ( well technically we saw the writting on the wall and self disbanded each of us finding a nice little corner to tuck in ) and a new team was put on the new stuff... We tried to tell them about the Switch against router thing... but they didn't really have a clue, and the new shiny system offered the option to use L3 switches instead of routers.

    And then one day, after they installed 2 new swtiches at a $TELCO, the backbone of said $TELCO went down, for 4 hours... That's the time it took to locate the tw new, unconfigured switches that were creating a loop and breaking everything. After that at $TELCO an order was given to put all the unconnected ports down.

    The worst is that they did it again... at another $TELCO (this time it only took an hour to fix it)

    now the technicallity of why we wanted routers and not switches :

    - we didn't have that many switched ports to connect and a 8 port switch card in a router was more than enough most of the time.

    - by default router ports comes unconfigured and are administratively shutdown, while by default switches ports are administratively up and put in a default VLAN.

    - It was much easier to get routing going in a router, and all the external flows were routed

    - we didn't have to bother with Spanning Trees (because obviously it's a story about spanning trees gone AWOL)

    The most interesting bit of the story, if that following generations went full custom hardware, with routers (in card form) integrated in the rack as the front end of the system towards the external world.

  14. Griffo

    Vlan Issues? Then Dell right?

    Dell switches I take it. Vlan issues and buggey webUI sound like Force10.

  15. Johan Bastiaansen

    Preparation, preparation, preparation and . . .

    a bit more preparation.

    Don't just go in there and wing it.

    Was it a big hardware sale? Why did they send the untrained techie then? Why couldn't he be bothered to fire up the GUI before and gave it a little test drive.

    These companies are going to code self driving cars now and they assure us they have their most competent people on it . . .

  16. Kane
    Black Helicopters

    Wasn't me

    Wasn't me

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like