back to article Give in to data centre automation and change your life

As an IT professional, unless you’ve been living under a rock you are probably familiar with automation, even if only in passing. Automation process has been in use in the business world for many years, but somewhat paradoxically IT is usually the least automated department in any organisation. Whole data centre automation …

  1. Joe 48

    UCS Director

    I've had some hands on time with this in the past 12 months and the potential is amazing. I certainly don't want staff installing ESXi or adding service profiles to blades. Simple admin tasks should be automated imo. No techie wants to watch an ESXi install, especially on large scale deployments.

    Regarding my recent experience. From the point of physically putting a blade into a chassis, to it being online in the cluster with VM's running on it is around an hour. The only user input is assigning a name/IP and putting in the blade. The rest was automatic. Too many people assume its their jobs on the line, when in fact they are now doing much more interesting work and rather than fighting fires are ahead in so many areas. Its made a huge difference.

    Complex to setup, yes, but worth it. Add in a little, cough "ITIL" cough, incorporate it with processes and procedures so it all hangs together.

    There are two rules to automation. 1. If you are going to do it more than once automate it. 2. Assume you are going to do it more than once.

  2. Philip Skinner

    Puppet & Chef

    We had to discount these for automating over 250 server instances because they run on Ruby.

    Ruby uses a terrible amount of memory when its run for any length of time. You'll soon find your automation system is taking up to 1GB of your RAM.

    A far better tool is Fabric which lets you execute everything over SSH from a single point.

    1. wigginsix

      Re: Puppet & Chef

      I haven't yet come across Fabric in my travels, but I'm always looking into new solutions so I'll have to schedule some lab time with it in the coming months. Having the ability to execute everything over SSH from a single point is an interesting point of difference.

      Thanks for the tip.

    2. Nate Amsden

      Re: Puppet & Chef

      I have crons on all ~500 of my systems that use chef to auto restart chef if it takes more than 80MB, runs every 4 hrs. 408 restarts in past 24 hrs. Seems to be pretty reliable, set the cron up over two years ago, never have had an issue that I can recall. I have several other crons that are set to restart chef under various failure scenarios(getting stuck etc).

      The topic came up of possibly migrating off of chef because it is too complex. As much as I hate chef, migrating off is more work than I'm willing to invest, I remember simply just replacing a broken CFEngine implementation at a company a few years ago with a good implementation, not even changing the version by much. In the four 9s environment to do it safely took well over a year to do. Chef sucks for most things I want it to do(wasn't my choice and I wouldn't use it today, not sure what I'd use, CFEngine v2 worked great for me for ~8 years), but it's not bad enough to switch to something else.

      I hate ruby too, using chef just rubs salt in that old wound. Fortunately there are other people on the team that do much of the work with chef, so I can focus more on stuff I care about (one of the driving reasons why I didn't fight the fight to replace it two years ago).

      But automation.. there are of course levels of automation. The author of the article basically lost me at "web scale". Obviously 99% of orgs will never see anything remotely resembling web scale. We pumped more than $200 million in revenue through a dozen HP physical servers and two small HP 3PAR storage arrays. Have since added more gear, still sitting at less than 3 full cabinets of equipment though. Getting to $4-700M in revenue maybe we add another cabinet(have one sitting empty at the moment already and with our new all SSD 3PAR I/O is really not much of a concern - I can get 180TB of raw flash in the 4 controller system that is installed now without taking more space/power).

      We have quite a bit of automation, but to get significantly further, to me the return just isn't there. Spend 6 months to automate the hell out of things that may otherwise take you two weeks to do manually during that time? Seems stupid. I got better things to do with my time.

      1. wigginsix

        Re: Puppet & Chef

        Thanks for your comments Nate.

        I have to agree with you. I personally loathe the term webscale, but unfortunately Gartner has started using it and it's steadily making its way into the everyday terminology of vendors in the IT world. Everyone wants to pretend that their offering is webscale. Like cloud it has little to no real meaning other than to be a buzzword category for vendors to pay Gartner to advertise them in.

        For most of the clients I've ever worked with any installation with 3 physical servers and above IS webscale as far as they are concerned. In the real world most admins, myself very much included, won't ever see an implementation of sufficient size to be defined as webscale by the folks at Gartner. Does it mean then that we shouldn't use automation (or orchestration) to make things easier? For me that answer is absolutely not.

        I'm sorry that I lost you when I mentioned webscale (and I hope I didn't just lose you again), but I tried to make the point near the end of the piece that there is no one size fits all equation when it comes to automation. It's purely subjective when and for what you should use it.

        Personally, I use scripted automation to perform all the mindlessly repetitive tasks that need to be done everyday and then to email me the log files for those tasks. I also use runbook based orchestration (with a side order of scripted automation thrown in) to make build processes for virtual server instances (lab and production) simpler.

        In total, automation probably frees up around 10-15 hrs a week of my time on a light week and anywhere from 45-60hrs when I've got heavy project work on. As a guideline I won't automate a task I wouldn't feel comfortable farming off to a PFY but if I would delegate it (and it can be automated without having to spend 6 months creating a runbook) then its fair game.

  3. Anonymous Coward
    Anonymous Coward

    The eternal issue

    The eternal issue is highlighted here.

    Departments are under intense pressure to slash IT/Central costs so there is often no money for creating labs to test new technology and no valid environments to run against (dev teams get upset if you take out their estate)

    The next problem is that most IT Departments are already over stretched so even if you had a budget for the test labs then you don't have the IT staff to dedicate to it.

    Which of course leads you into the spiral of pain which is, people want to cut costs so they trim resilience and performance components, which means more issues, which means less time, which means people don't believe IT is competent, which means reduced budgets to pay for a consultant to tell you why IT is not competent, which has led to further trimming ad your consultant probably going "you need more automation" which then leads to people going "why aren't you automating - we heard about it and it's going to save the world" and the IT team goes "We'd love to but we have no staff and nowhere to test anything" and the business goes *jazz hands* makes it happen. Which of course it doesn't as there's no money or time. So the spiral continues. All at the mean time base costs (storage / backup / keeping the lights on) go up because there's no intelligence in the estate so you can't be smart and the business keeps adding new things, and all at the same time IT morale tanks and the department falls apart so you're left with juniors and furniture.

    Talking about automation is great - finding the time and budget to do it so that in the long run you can save time and money is the hard part.

  4. Alistair
    Coat

    My current tool

    cfengine.

    try it some time.

    Learning puppet for Hadoop. Good things come from automation. Even when it prevents the security team from breaking their *own* standards. (yes, I have INDEED been there .. and won the battle)

    1. wigginsix

      Re: My current tool

      Thanks for the heads up Alistair.

      I've had a quick look at the CFEngine website and I like what I see. I've added them to my must lab soon list of products.

  5. Dan Paul

    The greatest expense is energy, NOT IT.

    but you can't know to control what you don't bother to measure. The vast majority of Data Centers are over cooled and the IT department won't change that because they really don't want to..

    First the incoming power feed needs to be measured and the data brought in to a real Building Automation system that has BACnet & MODBUS communication protocol so it can talk to the cooling system, CRAC units and emergency generators.

    Measure how many btu's are being delivered to the cooling system by measuring flow for the Chilled water & Supply & Return Temps. Communicate with the Chiller and CRAC units and control them accordingly. Now you now how much energy is being used. That's the first step to saving it.

    Utilize an "Outside Air Economizer" strategy on the CRAC units when the OA is cold enough.

    Use a warmer setpoint temperature, 55 F discharge temp is too cold for modern server racks.Those CPU's can handle temps that are 30 degrees higher.

    Put temp sensors in the racks so the CRAC units actually control rack temp, not room temp.

    Directly duct the A/C to the server racks and close up the back and sides so you cool the IT equipment and not the whole room. Open the bottom of that rack to the plenum floor and use that for cool air supply and the duct on top of the rack for the heated air return. This is way more efficient than curtains or "hot aisle, cool aisle" games

    Use "Load shedding" by turning off any DX cooling or high load at peak times.

    Use occupancy sensors for lighting.

    I could go on but the upside is that by use of these ideas, you will definitely save 20 to 30 % of your existing energy costs.

    Just "automating" the server operations will not compensate for the excess energy usage by the cooling system.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like