back to article Hyperscale data centres win between their ears, not on the racks

Organisations that hope to improve their own data centre operations by adopting the techniques used by hyperscale operators like Google or Facebook need to consider the stuff between their ears, not just the stuff on their racks, because changing data centre culture is more powerful than changing equipment. That was the gist …

  1. Anonymous Coward
    Anonymous Coward

    new metrics by which to measure on-premises data centre teams,

    Does "cost of failure, including some external costs" figure anywhere in calculations and decision making? I didn't see it, maybe I missed it.

    And how about who picks up that cost? The poor suckers paying for the undelivered service, or the IT Director's bonuses? Seems like an odd week to forget about those...

    And finally: why would on-prem and cloud providers not have the same fundamental metrics, based on service availability and impact (cost) of non-availability? Innovation is for innovators, coin-op consultants and their disciples, at least until it's been proven fit for use in critical services.

    1. Mark 110

      Re: new metrics by which to measure on-premises data centre teams,

      I think the 'blast radius' concept was addressing the cost of failure issue. Ensuring you understand the implications of failing and limiting the impact.

      The metrics of quick failure detection and recovery will also reduce the cost of failing.

  2. Denarius Silver badge


    But the pair also declared that “the biggest opportunity is changing how people think and react”. The PHB class think ? All I ever saw was a reflex action to protect bonuses.

    Not sure how much the hypergasm about outsourcery and cloud is dying, but it does seem that big organisations not in death throws may want to keep everything, including staff on their premises given all the caveats.

  3. Doctor Syntax Silver badge

    It's all easily explained

    Just go back and look who these guys are working for. Gartner.

  4. Mad Mike

    And this says it all

    “They won't do 100 changes at once because the blast radius is big,” Zeng said.

    The above statement shows the level of thought applied here. 1 change or 100 changes don't change the blast radius unless you assume all the changes fail. A single small change could have a huge blast radius (if it hits something critical), but 100 small changes against low importance areas could have almost no blast radius, especially individually. The blast radius has nothing to do with the number of changes. Perhaps they need to go back to the drawing board if this is their thinking.

  5. Terje

    I think that one point they do have (even if possibly by accident) is that in many situations it might be better to plan for and expect things to fail and have plans for quick response and recovery then to spend all effort on never failing.

  6. Androgynous Cow Herd

    Moderately interesting

    But while many a venture backed startup talks or dreams of being the next hyper scale break out, most of them are not ever going to need this sort of architecture. The one or two that makes it to scale will end up defining things like "Failure domains" specific to the workload they are delivering.

    Trying to design or run a normal to large Datacenter using methodology developed for a hyperscale solution is laughable. Because at true hyperscale, the failure domain is not the application, or the server or even the rack. It is the entire room if not the entire site. You can lose a rack or two of servers and maybe a load balancer twitches, potentiallly there is a slight performance degradation and you schedule rip and replace of those racks for the next maintainence window. But no one has to stay late or freaks out simply because you lost a couple racks worth of processing.

  7. Anonymous Coward
    Anonymous Coward

    Disaster Recovery anyone?

    Everything will fail one day, so get ready now.

    A DRP should encompass all possible scenarios and be tested. It doesn't matter if it's big or small it just takes some thought.

  8. Anonymous Coward
    Anonymous Coward

    MTBF, MTTR, reslience, availability

    "“In the enterprise we measure and pay people on mean time between failure,” Skorupa said. “The whole operating principle is to avoid risk at all cost.”"

    Shirley a sensible principle might be to minimise service disruption (ie maximise service availability), taking into account the impact (cost?) of service disruption. But the article talks about risk without explaining what today's meaning is?

    Stuff fails, inevitably. Some stuff fails more than others - to have advance knowledge of what's likely to fail (and what the associated effects might be) can sometimes be handy, but isn't always essential.

    If the overall system design includes suitable resilience, a single subsystem failure shouldn't lead to service disruption. Sometimes multiple failures can be tolerated without visible disruption. Sometimes transactional integrity is required, sometimes it's not, the required designs may be different depending on a particular setup's needs.

    All of which might actually have been their point, but it's kind of hard to tell.

    Notice where the terms MTBF and MTTR appear in the description above, and where "service availability" appeared in the article?

    It's almost as though the last couple decades mostly never happened. Which might well be the case as far as lots of Gartner staff (and their MBA-indoctrinated clients) are concerned, but the two Gartneers in question here appear to have been around in the 1990s too:

  9. Anonymous Coward
    Anonymous Coward

    don't reboot memcache servers

    Meanwhile at my org, devs were brilliant in putting persistent data in memcache with no backup(brand new e commerce app built from nothing about 3 years ago). So while we wait for them to migrate to redis (close to 3 years) management has asked we not reboot those memcache nodes for things like security updates. Lucky for them those servers have uptimes of over 2 years at this point. (Last outage was to move them to a newer vmware cluster with new cpus. No vmotion between those 2 cpu types)

    Just a small example.. i laugh when people mention the possibility of using a public cloud provider for DR or bursting into. Clueless.... so clueless.

    Company originally used public cloud(years ago ) and I moved them out before costs got too out of control.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like

  • IBM CEO explains why he offloaded Watson Health: Not enough domain expertise
    And not enough customers, Shirley?

    IBM chairman and CEO Arvind Krishna says it offloaded Watson Health this year because it doesn't have the requisite vertical expertise in the healthcare sector.

    Talking at stock market analyst Bernstein's 38th Annual Strategic Decisions Conference, the big boss was asked to outline the context for selling the healthcare data and analytics assets of the business to private equity provider Francisco Partners for $1 billion in January.

    "Watson Health's divestment has got nothing to do with our commitment to AI and tor the Watson Brand," he told the audience. The "Watson brand will be our carrier for AI."

    Continue reading
  • TomTom to chop 10% of workforce, blames automation tech
    Improvements in mapmaking platform to cost 500 employees their future at geolocation tech provider

    TomTom says it is laying off 10 percent of its global workforce due to advances in automation technology and greater use of digital techniques in its mapmaking process.

    The planned cuts will equate to about 500 employees at the Netherlands-based geolocation tech specialist, which was hit hard by the pandemic and remains in recovery mode.

    "Higher levels of automation and the integration of a variety of digital sources will result in fresher and richer maps, with wider coverage," said CEO Harold Goddijn. "These better maps will improve our product offerings and allow us to address a broader market, both in the Automotive and Enterprise businesses."

    Continue reading
  • Fully automated AI networks less than 5 years away, reckons Juniper CEO
    You robot kids, get off my LAN

    AI will completely automate the network within five years, Juniper CEO Rami Rahim boasted during the company’s Global Summit this week.

    “I truly believe that just as there is this need today for a self-driving automobile, the future is around a self-driving network where humans literally have to do nothing,” he said. “It's probably weird for people to hear the CEO of a networking company say that… but that's exactly what we should be wishing for.”

    Rahim believes AI-driven automation is the latest phase in computer networking’s evolution, which began with the rise of TCP/IP and the internet, was accelerated by faster and more efficient silicon, and then made manageable by advances in software.

    Continue reading
  • Worried about being replaced by a robot? Become a physicist
    Scientists develop algorithm that decides if your job can just be an algorithm

    An online index of nearly a thousand jobs may useful in cluing in folks to the automation risk their field of employment faces.

    At the bottom of the list – meaning they're the most likely to be replaced – are meat packers and slaughterhouse workers, while the least likely to see their jobs automated are physicists. In general, those in food services, grounds and maintenance, and the construction industries are most at risk. Those in education, social services, and management least so – but not entirely.

    Scientists at École Polytechnique Fédérale de Lausanne and the University of Lausanne in Switzerland collaborated to draw up the index as they contemplated the potential social impacts of automation. While their list is targeted at individuals, the team said there's other uses for the methodology they developed to create the automation risk scores featured in the index and map career transitions for displaced workers. 

    Continue reading
  • Qlik moves to boost position in crowded automation space during IPO year
    Initial boom in data vizualization, but CEO Mike Capone says it needs to 'play nice' to win in new segment

    Interview It's never a zero-sum game, according to Mike Capone, CEO of data visualization software specialist Qlik. For the sake of the company, he'd better be right, as Qlik is betting its success on a foray into a market with way more competition than the one in which it earned its stripes.

    In January this year, Qlik confidentially filed paperwork with US regulators for an initial public offering return to public markets, roughly six years after it was purchased by private equity investor Thoma Bravo.

    To win over investors it will need to convince them it can make it in the market for automation software, which tech analyst IDC predicts will grow 17 percent each year between 2020 and 2025.

    Continue reading
  • Insteon's vanishing act explained: Smart home biz insolvent, sells off assets
    Acknowledgement of shutdown almost a week after backend powered off

    Smartlabs, Inc, parent of vanished internet of things vendor Insteon, is unable to meet its financial obligations and has assigned its assets to a financial services firm to be sold.

    After recently shutting down the servers supporting its smart home hub app and saying nothing to its customers or partners, the California-based firm on Wednesday evening finally got around to publishing an update to its website.

    The notice explains that in 2017, Smartlabs had financial problems and obtained additional capital and new management to improve its situation. But the emergence of the global pandemic in 2020 and the ensuing supply chain disruption proved too much, prompting the company to seek a buyer in November 2021.

    Continue reading
  • California suggests taking aim at AI-powered hiring software
    Automated HR in the cross-hairs over discrimination law

    A newly proposed amendment to California's hiring discrimination laws would make AI-powered employment decision-making software a source of legal liability. 

    The proposal would make it illegal for businesses and employment agencies to use automated-decision systems to screen out applicants who are considered a protected class by the California Department of Fair Employment and Housing. Broad language, however, means the law could be easily applied to "applications or systems that may only be tangentially related to employment decisions," lawyers Brent Hamilton and Jeffrey Bosley of Davis Wright Tremaine wrote.

    Automated-decision systems and algorithms, both fundamental to the law, are broadly defined in the draft, Hamilton and Bosley said. The lack of specificity means that technologies designed to aid human decision-making in small, subtle ways could end up being lumped together with hiring software, as could third-party vendors who provide the code.

    Continue reading
  • Celonis buys German process miner with Power BI links in $100m deal
    Move follows Microsoft's own tools to find out how things are really done

    Celonis has bought Process Analytics Factory (PAF), a firm that integrates its technology with Microsoft's Power BI platform, in a deal said to be worth $100m.

    The two German companies claim the merger will help users on Microsoft's popular Power Platform benefit from output data from Celonis's process mining, automation, and collaboration tech.

    Process mining is the analysis of application logs to figure out how people actually run processes across multiple applications – as opposed to how management think they run or how they were designed to run. The argument goes that a detailed understanding of processes is a vital element of application strategy and underpins modernization and workarounds like robotic process automation.

    Continue reading
  • Boston Dynamics' latest robot is a warehouse workhorse
    When does this thing get to unionize?

    Robotics company Boston Dynamics is making one of its latest robots more generally commercially available: a mobile, autonomous arm called Stretch.

    Stretch is outfitted with a vacuum gripping arm able to move a wide variety of box types and sizes, up to 50 pounds (≈22.7kg). Its footprint is about that of a warehouse pallet, and it can move around on its own, which Boston Dynamics said makes it a good fit for companies trying to automate without building a whole new factory.

    "Stretch offers logistics providers an easier path to automation by working within existing warehouse spaces and operations, without requiring costly reconfiguration or investments in new fixed infrastructure," Boston Dynamics said this week.

    Continue reading
  • ServiceNow jumps into RPA with imminent 'San Diego' release
    Will bring bots to workflow users, perhaps not with the same precision RPA specialists muster

    Workflow specialist ServiceNow has announced a heavy emphasis on robotic process automation (RPA) in the next release of its platform, to get aging applications working with each other in a more user-friendly fashion.

    One analyst said users already invested in workflow would welcome the release, but RPA specialists may have a technical edge.

    ServiceNow has promised the "San Diego" release of its Now Platform will include more modern visual design as well as the RPA capabilities. The latter sits within the Automation Engine, which combines Integration Hub with a new RPA Hub that provides centralized command and a control center to monitor, manage, and deploy digital robots that automate repetitive manual tasks.

    Continue reading

Biting the hand that feeds IT © 1998–2022