back to article Are you being robbed of sleep by badly designed servers?

How should we design the servers and end-user computers of the future? The construction of my testlab has given me the opportunity to play with technologies I normally wouldn't be able to get my hands on. The "advanced" features in them – standard fare by now for large enterprises – have caused a measure of introspection …

COMMENTS

This topic is closed for new posts.
  1. Fuzz

    another advert for supermicro?

    Why is that these articles from Trevor all seem to read like adverts for supermicro? This one starts off masquerading as an article about remote server management. Whilst this kind of stuff might be new to Trevor I can't be alone in thinking this is something I had on the Dell servers I bought in 2003 and every other server I've bought since had either a DRAC card or an iLO and I don't remember paying extra for any of them. In fact I'm pretty sure that the two Dell 4400s I had which were dual Pentium II Xeons and looked like the Jawa Sandcrawler from Return of the Jedi even had DRAC cards.

    1. Eugene Crosser

      Re: another advert for supermicro?

      Indeed, Supremicro was late to the LOM party. And, at the time when I was in this business, much less reliable than the competition.

    2. Anonymous Coward
      Anonymous Coward

      Re: another advert for supermicro?

      It's quite sweet in a way, watching young Trevor get excited about discoveries of things that experienced Reg sysadmin readers have already been using for many years.

      1. Trevor_Pott Gold badge

        @dz-015

        The difference is price. The cost of this enterprise-standard tech has come down enough for there not to be an excuse for its inclusion in even the most basic of SMB gear. The tech is mature. The pricing is a transformative element enabling far wider adoption than was possible even two years before.

    3. Trevor_Pott Gold badge

      Re: another advert for supermicro?

      Enterprise vendors have had this for ages, but lots of folks who make "whitebox" kit (ASUS, Gigabyte, Tyan) don't. Or if they do, it is often quite a pricy extra. We're finally at the point that SMBs and bulk-buy folks using whitebox servers can buy IPMI-equipped stuff without pushing virgins into volcanos. It's time we stopped buying the crap that doesn't have lights out management. Send a message to companies like ASUS that if you market a "server board", it isn't okay for it to lack IPMI.

    4. Keith_C

      Re: another advert for supermicro?

      Just checking; are you referring to the full remote console iDrac/iLO, where you can see even a gui console (Windows), mount ISO's/floppy images, have mouse control, etc etc? I was under the impression that was an iDrac Enterprise set of features, which have a decent cost.

  2. Eugene Crosser

    But security?!

    I admit my experience with light out management is dated, I am in development for the last seven years. Still - ipmi is rather problematic from the security standpoint. No audit (and usually no source available), no updates if/when problems are discovered, limited choice of connection/authentication methods. This boils down to the need for a well-maintained gatekeeper machine on premises, isolating your management network from the Internet. Which means extra cost in equipment and support, and partly defeats the purpose because you won't be able to remotely handle that machine if it fails. We could afford such arrangement for a datacenter of a few hundred servers, but if you only have a few machines on a remote location, it's uneconomical.

    1. Trevor_Pott Gold badge

      Re: But security?!

      Remote gatekeeper = router /w VPN. Repair for router = PDU with network port on main network. Worst case: someone can reboot the management network router at will. Problem solved.

  3. keithpeter Silver badge
    Boffin

    diurnal cycle

    "I disagree that the day starts at some pre-determined hour simply because it coincides with the rotation of our local patch of mud to face some fusing ball of hydrogen plasma about eight light-minutes out. "

    Mr Pott may disagree but his brain/nervous system/sensory system is wired up to synch with the ball of mud's rotations. Not hippy stuff. just how meat based processors work. Local partners?

    1. Trevor_Pott Gold badge

      Re: diurnal cycle

      Actually, I have sleep phase disorder. Left to my own means, I naturally fall into sync of sleeping at 4am and waking at noon. It's certainlh timed to the passage of the evil daystar, but significantly offset from the middle of the bell curve.

      1. Anonymous IV

        Re: diurnal cycle

        I think the medical name for what you're suffering from is "teenage"...

      2. Chris T Almighty

        Re: diurnal cycle

        Ssleep phase disorder! I have the same pattern, I didn't realise it had a name. So I've learned about remote server management and circadian rhythm disorders in a single article. :o)

    2. Tom Chiverton 1

      Re: diurnal cycle

      It's not. If you lock people in the dark, for instance, they drift to a different rotation cycle...

      1. Ken Hagan Gold badge

        Re: "It's not."

        Surely the fact that we drift away when the sync is absent merely confirms that we are synchronising with it?

        1. SImon Hobson Bronze badge

          Circadian rhythm

          Look up circadian rhythm.

          If left without external clues/influences, I believe humans will generally fall out of sync by running at a period of a little over 24 hours. Normally, clues like that big fusion reactor coming into view each morning reset the clock each day so we stay synchronised to the rotations of our lump of rock.

          Obviously, not everyone is wired up the same, so it does vary a bit.

  4. Scott Bartlett

    iLo's and DRAC's have been a standard part of my life for longer than I now care to remember. They're as much part of a "standard build" for servers now as a power supply or NIC is. They've saved my bacon enough times, and now that Dell are up there with HP for general reliability and stability on those things I'm far happier. I'm not really a desktop person, so can't really comment too much there.

    My personal annoyance has always been with HP: if you had a Windows server, and if you wanted to be able to KVM properly to it and see the GUI, then you *had* to buy the optional iLo "Advanced License" - that just totally smelt of price gouging in my view. It still rankles.

    1. Anonymous Coward
      Anonymous Coward

      My personal annoyance has always been with HP: if you had a Windows server, and if you wanted to be able to KVM properly to it and see the GUI, then you *had* to buy the optional iLo "Advanced License" - that just totally smelt of price gouging in my view. It still rankles.

      Yup - ditto here. I *hated* that, it got really annoying when I forgot to set up a Linux box so it remained in *standard* text mode, as I was by that time several miles away from where it was installed. Thankfully I could bounce the box and grab the boot up process to clean that up. Not impressed..

      1. Andydude

        Yeah that was annoying, but the "nice" thing about it is if you were really stuck you could get a trial and bung a key in and it could save your ass in a few minutes. And if you were naughty and the trial had run out there were keygens (not recommending/advocating their use!).

        However, if you had a Dell equivalent without an iLO you were screwed. You'd have to pay for a hardware upgrade, get downtime on the server, unrack it etc. So in that instance the HP license was better. Be even nicer if it was free and standard though...

    2. Alan Brown Silver badge

      "My personal annoyance has always been with HP: "

      There's a pretty effective solution to this problem: Don't buy HP.

      IPMI is bloody handy but there are still a surprising number of crappy implementations around (including what Supermicro used to flog)

      It's a little worrying to me that Supermicro is rapidly becoming our "go to" manufacturer for most servers, not least among reasons being that they refuse to certify hardware ex-factory for Linux distros (Their blade shelves have a few points of major suckage too) Intel/IBM/HP/Dell seem to be sleepwalking into irrelevance.

  5. P. Lee
    Trollface

    So much complexity, so little time

    stop-a

    ...

    go

    9600b is fine in case for some reason you don't have 3g either.

    Youngsters... always trying to re-invent the wheel by making it square!

  6. Peter Gathercole Silver badge

    It's good to see

    that what has been standard practice in the Mainframe/Midrange platform market is finally becoming a reality in other architectures.

    I'm an IBM pSeries and Power person, and we have been able to do remote IPL, console, and configuration/management. for years. OK, you paid a premium for some of the features, but many of the base capabilities have been built in for well over a decade, and now includes IVM/PowerVM. The RS/6000 F40 had a service processor when announced in 1996.

    It does help that most of the system admin can be done from a command prompt, though.

    One of the customers I worked at had a majority of their critical servers in lights-out, mainly unattended sites scattered around the country from before the Millennium..

    1. Adam R

      Re: It's good to see

      Its actually been standard in pretty much any kit from an enterprise vendor for a long time either x86 based or not (HP, IBM, Dell etc). Even on their low end gear, typically SMB stuff.

      What is new is the white box manufactures have finally caught up, so those who are used to building servers themselves can now get the features at a price point that is right for them.

  7. OhDearHimAgain

    Doesn't everybody run VMs these days? We ditched the "100s of little boxes" model years ago and run a mix of big Dells, with DRACs, and Supermicro with whatever they call it. The VMs all have VNC console access.

    My biggest concern is the lack of security on the remote management cards - typically they don't even have IP filtering.

  8. Anonymous Coward
    Anonymous Coward

    Sounds to me like your buying the wrong brand, HP have had basic Lights out via WebApp for nearly a decade at least as standard even on the lowly DL140 G3 (G5) servers we have, which set you back under £100 on Ebay at auction, and can be fitted with decent memory and raid (a DL160/DL360 G5) provides the disk performance by default usually for a little more, and 6 2.5" bays.

    1. Anonymous Coward
      Anonymous Coward

      DL120 is on UK website from £499, but it's an additional £399 to add full lights-out management. licence. * "currently may not be available direct from HP"

      1. Anonymous Coward
        Thumb Down

        Well then they are ripping you off, I put together number of these servers with battery backed raid and all can be done for under £250 if you know how to bargain hunt (I did say auction). Of course if you go to a fixed price reseller who might say send soliciting emails to non tech savvy businesses, you might pay over £1000 for a G5 server kitted out with stuff.

        The most powerful a DL385 G5 with 2 VT-x capable quad core CPU's, 16GB of ram, a VT1000 quad port network card, a P212 with 512MB BBWC and a P400 with the same, 4 10K 300GB SAS disks, an additional single boot disk for the hypervisor, this cost about £450 a year ago.

        The cheapest DL140 chassis without disks I saw sell, had 2 Xeon 5160's and was £50, I've picked up a working P400 from the US for £12, you can even cut costs on the batteries replacing the cell and keeping the management electronics if you really want to.

        The only think you need to be careful with is the disks, I've had 1 fail in 18 months from about 15, but I also learnt fast not to buy disks from people with ratings of under 50, and check every disk (3 Ebay cases issuing a refund), as some sellers sell working disks which fail on a full scan (this happened twice)

        I've also had one dodgy server which went very cheaply, which I didn't check until it was too late, and all the signs were there that it was bad before I completed the transaction, along with a £12 replacement fan board expense.

        Of course, the electricity is another matter.

        If you care about the cost of an enterprise iLO license and are using multiple machines, then you can save a bucket load more money than that by buying pre-built usually lightly used entry level servers, replacing the disk infrastructure with something more capable, to build a small datacentre. Now though I expect something with full virtualization support (the i7 Xeons or AMD's not quite equivalents) would be a better target, at probably less than a white box.

  9. Joe Harrison

    Remotely flashing BIOS?

    Do people really do that, on boxes that matter? I wouldn't fancy it!

  10. Anonymous Coward
    Thumb Up

    Have an upvote.

    Is this a REAL sys admin extolling the virtues of bieng able to remotely flash a BIOS?

    Sure, 90odd% of the time you'll be fine. you don't get a good reputation though when your response to one of the f-ups is "oh, i'm five hours away from you, see you lunchtime"

    1. Trevor_Pott Gold badge

      It isn't the end of the world if you bork a node in a cluster. But in the past three years of updates remotely, I've had 100% success on over 250 flashes. Good enough for me to consider it solid for most use cases.

    2. Skoorb
      Alert

      These days what scares me isn't flashing a BIOS (it's a lot more reliable than in the 90s and every system I've come across also runs at least one verification check to ensure it worked), it's flashing other firmware; especially storage. Egads you are taking your life in your hands when you try that even sitting next to the machine.

  11. Paul Crawford Silver badge

    System watchdog?

    Quite a lot of ordinary motherboards have hardware watchdogs built in, for example the w83627 and similar chips that provide hardware monitoring (voltages, temperature, fan speeds, etc). This can provide a last-resort method of rebooting a sick server if you don't have lights-out support, but only SSH access.

    With Linux you can add the corresponding watchdog driver module (they are black-listed by default in Ubuntu) and then the watchdog daemon and configure it to check a few vital signs. Typically you would check the load averages are not stupidly high (say over 5 per CPU core), maybe that rsyslogd is running, that you can run a simple bash script, etc.

    If any of those tests fail then you get a moderately orderly reboot, and the hardware watchdog makes sure you get a reboot even if there is a kernel panic style of fault. Brutal perhaps, but it gets the system back up and hopefully either all OK again or at least you can SSH in to fix it.

    1. Skoorb
      Thumb Up

      Re: System watchdog?

      This is a rather interesting idea...

  12. ElNumbre
    Thumb Up

    Network Connected PDU's

    A few years ago, we had a linux box that would occasionally lock tighter than a nuns c....hurch donation safe.

    It was in a distant far flung place called London, and rather than someone driving up there to give it a quick kick in the PSU, we bought a PDU with a web-server built in. If the box fell over, one of the support techs could dial in remotely and flick the power off and on. 5 mins later, you could be back in bed, rest assured that your "hour" of billable time would get signed off as a job well done.

    Then they bought proper appliances and servers and the problem went away.

    Oh the good old days.

  13. chris 143

    Managed PDUs and a serial console

    Managed PDUs and a serial console server, if you're managing more than a couple of servers the actual cost per server isn't too bad. Admittedly not that helpful if you're using windows but if you're using linux it can be a life saver if you break your network connectivity

    Also even quite old servers generally support serial bios....

    1. Nate Amsden

      Re: Managed PDUs and a serial console

      windows has had serial console support for a while now.

      http://www.msexchange.org/articles-tutorials/exchange-server-2003/monitoring-operations/Windows-2003-Server-Emergency-Management-Services.html

      I have not personally tried it. But I do remember EMS support going all the way back to some Cyclades terminal servers I had in 2004. I'm sure it's gotten better in the last 2 generations of windows servers.

      1. chris 143

        Re: Managed PDUs and a serial console

        It's there but other than setting an IP or rebooting it, you're limited to the windows command prompt which isn't particularly helpful...

  14. Fazal Majid
    Boffin

    Remote PDUs + serial console won"t help if the system is aborting in the middle of the boot sequence, e.g. due to fsck.

    The reason Google, Facebook and other hyperscale companies don't provision LOM on their servers is not so much cost as the fact their ops model treats individual servers like cattle vs. pets. If a server dies it is automatically failed over and the FRU is the server itself.

    1. Nate Amsden

      If it is a linux box (for example) and it is setup correctly then yes a serial console will be fine to recover a system if it is stuck from a failed fsck. In most cases full bios is accessible (3ware bios for as long as I can remember did not work with serial console), linux boot loader, full linux kernel messages, single user mode works fine, multi user.. whatever. You can even access the magic "sysreq" sequence over serial port in most cases. Serial makes good for logging too, the terminal servers can often send data going to the consoles to a syslog server.

      For DRAC and HP iLO at least you can normally stick to serial consoles (virtual serial ports) to access linux systems w/o having to pay for the enterprise/Advanced license for those that don't need things like virtual media.

      I agree with other posters that this article is quite weak -- perhaps would of been good to cover solutions for such types of systems that do not have integrated management in them like someone mentioned Raritan. It's not cheap, but it works fine. One deployment I deployed raritan on top of remote serial consoles(many many years ago), I had one raritan drop in each rack that on site people could connect in the event serial console was not adequate.

      I have a friend who runs a big lab at MS, all HP stuff but they too use Raritan KVMs (no integrated PDUs as far as I know) instead of the integrated iLO. It's just what they are used to.

  15. J. Cook Silver badge
    Boffin

    iLO, DRAC, and Managed PDUs, oh my.

    While all of our dell servers have a drac built on them*, we don't use them at all. What we do use is a managed PDU/KVM combination that Raritan makes- while the units we have are quite pricey (the controller itself is something like 16 grand retail for starters!) it's definitely worth it when you have a server that's shagged itself and needs a kicking.

    * IIRC, they are standard on all poweredge servers at this point. I could be wrong, though.

  16. randommagic

    IPMI how I wish it was old school

    I thought IPMI was a thing of the past but some vendors still use it. Dells cloud systems still use IPMI as they only have a BMC installed on them. On the plus side their BMC can be logged into and they have a nice GUI like a RAC card. I would still rather have the RAC or should I call it iDRAC with a separate NIC port to separate the management from the rest of the network. One cost cutting I would prefer never happened!

  17. Anonymous Coward
    Anonymous Coward

    Time warp? And who is this chap Eadon?

    This article felt like reading something from a timewarp.

    HP servers with iLO and these problems simply don't exist (and haven't for years!).

    And who is this Eadon chap ? What are his credentials ?

    Those of us who live in the real (Enterprise) world, know that :

    - The Server OS you run is mainly dictated by the app you are trying to run on it.

    - Windows / Linux are both mature stable operating systems.

    - The argument that Linux is free goes out the window (no pun intended) as soon as you need support!

    - Some admin tasks are easier with a GUI, some are easier with a command line.

    - Command line and scripting are available on both operating systems.

  18. Anonymous Coward
    Anonymous Coward

    Wrong headline

    "Are you being robbed of sleep by badly designed servers?"

    If that's the case, you're being robbed of sleep by badly designed infrastructure. If the failure of a single server requires you to get up at night (or whenever you care to sleep) and do something about it urgently, you've got a massive single point of failure in a place where you cannot afford to have it.

    KVM-over-IP (Drac, iLo, IPMI, or external devices like Raritan/Peppercon) only help you go back to bed quicker, but don't solve that problem.

    That said, I cannot remember when I last ran any server without KVM-over-IP. Maybe 6 or 7 years ago, and only temporarily until I convinced the customer to hook them up to a Raritan. I agree with Trevor, though, that the last thing you need in case of a failure is a person who acts on your behalf (while you can't see what they see), very often underpaid and not-so-well trained staff who cover night shifts.

  19. nuclearstar

    Real Point?

    I saw the headline and started to read this with interest, but it soon became apparent that the "article" was really nothing useful for sysadmins at all. It just tells us stuff we already know, its not a solution, its not a tip. I didn't find it useful at all.

    If anything, the article has told me that I shouldn't be using physical servers with a single OS installed onto a single physical server. Instead of telling vendors to include IPKVM as standard on their servers, you should be purchasing a set of hardware with virtualisation, which will have built in redundancy. No longer would you need to get up in the middle of the night to reboot server. just connect to the management console and check that it has migrated to a working physical server.

    1. Anonymous Coward
      Anonymous Coward

      Re: Real Point?

      It's so wonderful you live in a world where every single one of your customers can afford a minimum of two virtual hosts, centralized storage and the cost of the visualization software + management tools. For many of companies, that isn't an option. Especially if they are replacing one item at a time (or adding items) to an extant network. For many companies out there the (conservatively) $15K you would need to drop for your entry-level visualization cluster is simply not possible.

      But great job being a condescending douchebag too all those folks who have small clients, own a small business or are sysadmins for a small business.

      1. Matt Bryant Silver badge
        Stop

        Re: Re: Real Point?

        "It's so wonderful you live in a world where every single one of your customers can afford....." In my spare time I often help out local charities which often have next to zero cash. Key to building resilient solutions for such cast-strapped organisations is dumpster diving, auction haunting and guilt-tripping real businesses into giving you old kit for free, including license transfers for software. I always aim to get remote management cards or software included as these types of "customers" simply cannot afford 24x7 support. If you are willing to stay one or two steps behind the latest and greatest you can have remote management software very cheaply. And before you start excusing yourself by saying "Charities - pah!", you may want to go read about the IT department for the city of Largo in Florida, which bought all their thin clients, servers, desktops and even tablets for their police cars off eBay!

  20. Matt Bryant Silver badge
    Facepalm

    Are you being robbed of sleep by badly designed servers?

    No. Next question?

  21. koolholio
    Pint

    Harmony in a network

    I dont know whether its specific to servers or network infrastructure, since infrastructure is what servers sit and reply on, so they must somehow work in unison.

    Heres a metaphor:

    The network is like a road, a server is like a petrol station... where the server gets its oil from is just as important as who it serves, but also the effect it can have on its customers vehicles too.

    Dodgy batch (patch) of petrol? :-S Or is it 'the standard', 'the design' or 'the implementation'?

    The problem is, if a remote administrator can do something, if not thought about whos doing what and when... its possible for others to do too? :-/

    Just a thought of metaphorical proportion?

  22. Hoe
    WTF?

    Don't understand why you are limiting this to Servers, vPro is great for example, remotely re-image an infected or corrupted machine etc. it has been around for years now and it still not even close to common let alone standard. :(

This topic is closed for new posts.