back to article Google reveals the wheels almost literally fell off one of its cloudy server racks

Google has revealed that the wheels almost literally fell off some of its servers. A late Friday post about the virtues of its site reliability engineering (SRE) teams told the story of a “recent incident” in which its uptime squad found “evidence of packet loss, isolated to a single rack of machines”. On closer inspection …

  1. Anonymous Coward
    Devil

    As Terry Pratchett reminds us

    “Scientists have calculated that the chances of something so patently absurd actually existing are millions to one.

    But magicians have calculated that million-to-one chances crop up nine times out of ten.”

    Kudos to Google for understanding this.

    1. Anonymous Coward
      Devil

      kudos to Google

      Understanding that overloading a rack creates issues?

    2. Peter2 Silver badge

      Re: As Terry Pratchett reminds us

      Scientists have calculated that the chances of something so patently absurd actually existing are millions to one.

      And when you have say 2 million users, at one chance per user per day a "one in a million" occurrence happens twice a day.

      1. phuzz Silver badge

        Re: As Terry Pratchett reminds us

        Or if someone tells you that "you're one in a million!", that means that there's at least seven and half thousand people like you in the world.

        Or to put it another way, there's at least sixty people like you in the UK.

        1. ArrZarr Silver badge
          Boffin

          Re: As Terry Pratchett reminds us

          The trick is to do TWO things that each make you one in a million.

          That way you are almost certainly unique.

          1. Will Godfrey Silver badge
            Happy

            Re: As Terry Pratchett reminds us

            Or...

            There is always someone who can do a thing you do better, but nobody else can do everything you do as well as you.

            1. redpawn

              Re: As Terry Pratchett reminds us

              Unlike others, I am not unique, so this does not apply to me.

              1. KarMann Silver badge
        2. W.S.Gosset

          Re: Or to put it another way, there's at least sixty people like you in the UK.

          OT, but this line reminded me:

          The standard UK forensics DNA test (police, court, etc) is for budget reasons sharply cut-down in accuracy compared to other countries' tests. And appears to be less accurate, have more variability, than 1-in-a-million.

          So that UK DNA tests will show you have at least 59 clones, just in the UK, and probably more. Your honour.

    3. IceC0ld

      Re: As Terry Pratchett reminds us

      I thought I would be the first to add the TerryP reference

      so, proof again that million to one happens all the time :o)

      although in this instance, it's millions of us trying to get THE one LOL

  2. Anonymous Coward
    Anonymous Coward

    What were they thinking?

    Using crappy cabinets that don't have solid feet? The company I work at hasn't had this happen, precisely because we calculate the total weight of the kit in the rack, and match it against the cabinet specs. Then, once the cabinet is in place, we screw down the feet that are meant to sit on the floor, instead of leaving the cabinet sitting on some weakling plastic casters, because that's what the other legs are there for. And then we bolt them in place, so a bloody earthquake doesn't take them out, even if we're not in an earthquake zone, because that's just what you do when you want uptime.

    1. Simon Sharwood, Reg APAC Editor (Written by Reg staff)

      Re: What were they thinking?

      I suspect they were thinking they want whole-rack modularity, so wheels make it easier to just swap whole racks in and out.

      1. Snake Silver badge

        Re: What were they thinking?

        Then upgrade the castors or add easily-removable supports, as I had to do with wheeled storage bins when their castors couldn't support the weight. The plastic castors are the weak link in the (steel chassis) chain, no need to throw out the baby with the bathwater just because of a module failure.

        1. LovesTha

          Re: What were they thinking?

          I would jump to a pallet jack solution. Integrating holes for the pallet jack to use into the racks. Management of them in the facility might get slightly annoying at times, but nothing unmanagable.

          1. phuzz Silver badge
            Devil

            Re: What were they thinking?

            "Management of them in the facility might get slightly annoying at times"

            You mean that everyone will have pallet-truck races, when the boss isn't watching?

            1. usbac Silver badge

              Re: What were they thinking?

              This reminds me of a great story. A few years ago I took one of my high pressure tanks to the shop for hydro-testing. Keep in mind that this shop tests and services commercial fire systems also.

              It seems that while the owner was on a business trip, his employees decided to have pallet-jack races with a little extra gusto. One the the things this shops tests is the large fire suppression tanks that are used in commercial kitchens (I think they hold about 200Lbs of liquid Co2). These tanks have a pin valve that once a chain is pulled, the entire tank discharges.

              The staff strapped two of these tanks to each pallet jack. Then they would sit on top and when someone said "GO", they would pull the chains!!.

              The owner said when he got back a little early from his trip, he pulled around behind the building, and a pallet jack went speeding by with a huge trail of white "smoke".

              I asked him what he did about it, and he said:

              "I told them they have one minute to get their asses in the shop...

              ...and, fill me up two tanks!!!"

              That's a cool guy to work for!

              1. dr john

                Re: What were they thinking?

                Along similar lines to the CO2 racing pallets.

                During my first year at uni, in the chemistry lab I spotted a very large shower-head and a long new shiny metal chain next to it.

                So I asked what it was for. It's an emergency shower in case you get covered in acid, or catch fire in an explosion or any similar chemistry accident. It dumps one gallon of water per second over you to save you life, says the lab assistant. That's 4.5 litres, 4.5 kg per second.

                Being a scientist I obviously asked how do you know it's one gallon per second? And the lab assistant burst out laughing, then eventually explained.

                Every holiday, just before each term started, two technicians would come in with a VERY large plastic bucket, a stop watch and a ladder. Up the ladder goes Techie 1 with the very large bucket and holds it so that the shower-head is well inside the bucket. On the count of three, Techie 2 pulls hard on the chain and starts the stop-watch, pulling the chain at the five second mark to stop the flow. Then they weigh the bucket to get the total flow in five seconds.

                And just before the current term started, when the chain was pulled hard, the valve opened, the water came down and the chain as well. Broken off at the very top above the shower-head.

                Techie 1 is now at the top of a ladder holding a container that is gaining weight at 4.5 kg per second, soon struggling to keep his balance, while Techie 2 is trying to jump very high to grab the lever that controls the valve, screaming for help, can someone turn off the water supply to the entire lab.

                Eventually, Techie 1 fell off, dumping everything over Techie 2 and after several minutes someone found the mains stopcock to switch off the water. The lab was flooded and closed for the afternoon.

                But the conclusion was that if the lab caught fire, under the shower was definately the safest place to be! Many of the class would touch the chain occasionally, but none of us dared to pull it.

                1. Muscleguy

                  Re: What were they thinking?

                  Such things are de rigeur in many sorts of science lab, usually above a shallow drain in the floor. Back when we squirted phenol/chloroform about to clean up bacteria grown dna (now done via nifty spin columns) we had a wash bottle of PEG 2000 handy, for putting in your or someone else's eyes if they got phenol in them.

                  I was once sitting at the bench in shorts and a drop of p/c fell from the pipette tip and hit my bare thigh. it BURNED.

                  Women allow phenol solutions to be applied to their faces for 'face peels'. A lot less concentrated than 50% mind but burns have happened when the dilution went wrong. The idea gives me the heebie-jeebies.

                  Do NOT fuck with phenol.

            2. Will Godfrey Silver badge
              Happy

              Re: What were they thinking?

              One large open plan office where I worked had wheely chair races... organised by the boss!

              1. GeekyDee

                Re: What were they thinking?

                And remember, mop strings are not conducive to your health in chair races!

            3. Aussie Doc
              Big Brother

              Re: What were they thinking?

              ...and put the video of it on youtube.

              He's watching, too ---->

        2. the spectacularly refined chap

          Re: What were they thinking?

          It's easy to say that after someone has already encountered, identified and rectified the issue. Essentially they have said "we've had a problem with these..." and your response is "well don't use them then". They've already said they're replacing them so you don't get brownie points for stating the bleeding obvious with hindsight. The smart thing is avoiding the issue cropping up in future after it was initially encountered.

          1. Anonymous Coward
            Anonymous Coward

            Re: What were they thinking?

            It's not hindsight when you are already doing it before it happened and then state it should have been done to someone who didn't, especially since it would have been in the specs that the feet are to be used when the racks are in place and before it is loaded, and the casters shouldn't be used in a loaded rack.

        3. Anonymous Coward
          Anonymous Coward

          Re: What were they thinking?

          You're bikeshedding....

      2. Anonymous Coward
        Anonymous Coward

        Re: Swapping whole racks out

        Two questions come to mind.

        1) How long does it take to 'un-cable' a rack? i.e. disconnect it from networks, power and the like

        2) How long does it take to raise up the feet that are there to take the load of the rack?

        If 1 and 2 are about the same then the 'swap' argument pretty much falls on its face if the company sends two people into the DC as you should from a safety POV if power is being mucked around with.

        Plus

        Any server farm worth using would have at least one spare rack already in place just waiting to be connected, powered up and configured. Then the faulty rack can be removed to a workshop for attention.

        1. Fatman
          FAIL

          Re: Swapping whole racks out

          <quote>Any server farm worth using would have at least one spare rack already in place just waiting to be connected, powered up and configured. Then the faulty rack can be removed to a workshop for attention.</quote>

          Said the beancounter:

          "You can not do that as it represents an unnecessary CAPEX cost, which does not increase shareholder value. Budget request denied."

          1. cdrcat

            Re: Swapping whole racks out

            Why would they ever have unused hardware? That would be a waste of money - hardware should be used.

            “Google's Borg system is a cluster manager that runs hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to tens of thousands of machines.”.

            The system is set up so that hardware failures are dealt with by restarting jobs. Google have done that since they started (optimising for cheaper machines that are expected to fail, rather than expensive reliable machines).

        2. Anonymous Coward
          Anonymous Coward

          Why bother?

          Honestly at Google's (or other major cloud providers) scale, they should just let racks fail in place. Making them mobile, and actually moving out old racks and bringing in new racks certainly offers far more than a "million to one" chance of something REALLY bad going wrong that risks losing other racks and could in theory require shutting down the whole datacenter in things went pear shaped enough.

          Since it sounds like they are using liquid coolant, what happens if they break the piping during that process? What happens if a poorly installed tile fails while they are wheeling these overly heavy racks around and while falling it takes out multiple other racks?

          It isn't like enough racks in a whole DC will fail to make a difference to Google overall. Leave them there until the whole DC is slated for an upgrade. How many failed racks will there be in at that time, maybe a couple percent of the whole DC? Big deal...

          1. TRT Silver badge

            Re: Why bother?

            Floor tiles are a weak point. I've seen cabinets sink into a vinyl floor before now. One really appreciates the energy required to roll one up these things up even a 1mm depression. Possible if you have an electrically assisted pallet truck, or some specific manual handling device for them.

            1. Antron Argaiv Silver badge
              Thumb Up

              Re: Why bother?

              Work, not energy.

              Work is required to move the rack, which, when moving, has kinetic energy.

              // why, yes, I am an engineer

              // Physics was a long time ago, so I admit to checking before posting

              // Once watched the installation of a CDC Cyber-74. Lots of work

              1. David Nash Silver badge

                Re: Why bother?

                Both.

                You do work by transferring energy to the rack, kinetic to get it moving, and potential if you are lifting it, even 1mm.

                It's worth noting that the SI unit of both work and energy is the same thing, the Joule.

                Also not a physicist so please correct me if necessary.

            2. Kobus Botes
              Stop

              Re: Why bother?

              @TRT: "...energy required to roll one up these things..."

              Our company moved into a new building in the late '90s. Since IT was seen more as a grudge expense by the local beancounters, the server room was specced to be the absolute minimum it could be (the rear doors on the racks could only be opened half-way whilst the front doors had about 5mm clearance to the wall).

              In order to facilitate working on the backs of the racks when required, a decision was made to put the racks on wheels.

              The first time we had to move the racks (four side-by-side, bolted together) three of us could not move it at all. We would have been able to push the whole lot over before it moved - in fact, we abandoned the exercise when we tilted the rack by a scary amount (due to lack of space to get down low enough to apply force to the bottom of the cabinets rather that at chest height).

              The general concensus was that the cabling was too heavy (it disappeared into the raised floor cavity), coupled with the fact that the rubberised wheels were flattened (seen upon closer inspection), as well as the fact that the wheels had caused depressions in the vinyl tiles. And there were only about four servers per rack (to provide for future expansion, which never happened, as almost all servers were later virtualised, rather that have single O/S boxes).

              Since we could not go ------------------------->

            3. EvilDrSmith Silver badge

              Re: Why bother?

              Would they have floor tiles?

              I would have assumed a concrete screed floor, much as any of the big logistic sheds (such as Amazon) that increasing litter the place. Designed for heavy loads, very strict tolerances on tilt of storage racking and flatness/levelness of the slab.

              1. TRT Silver badge

                Re: Why bother?

                Repurposed canteen space in the case with tiles.

                Concrete screed covered with sealed vinyl in another.

                1. EvilDrSmith Silver badge

                  Re: Why bother?

                  TRT - Thanks for the answer (more helpful than just a downvote).

                  Never been in somewhere like that; I was assuming that they were all purpose built (which on reflection is obviously wrong), and the only details I've ever seen for them in general were proposed new builds with concrete/screed floor (no indication of any further floor finish, but I wouldn't necessarily have seen those details).

                  1. TRT Silver badge

                    Re: Why bother?

                    Also see proper linoleum floors. Antistatic. There were companies selling flooring at Data Centre World. Having a "softer" surface that's a manufactured finish can make them smoother, less prone to vibration, have lower maintenance costs due to painting & resealing, less prone to water ingress etc.

                    We had the same issue with a TV studio floor. Took umpteen coats of a proper latex sealant to reach a finish where the dollies would move around without a noticeable vibration in the image, especially went we went to HD cameras & I imagine 4K/8K/10K & IMAXdigital is even worse! Film, of course, is slightly more forgiving as there's no line structure.

        3. Anonymous Coward
          Anonymous Coward

          Re: Swapping whole racks out

          "1) How long does it take to 'un-cable' a rack? i.e. disconnect it from networks, power and the like"

          For AWS and Azure (and I assume Google and others), racks are built offsite and connected via a minimal number of cables to allow multiple rack generations as newer CPU's or changing customer requirements alter the mix of what they need to offer. From what I have seen, installing a couple of truckloads of racks (~40 in total?) takes about a day to deliver to site and I guess another few days to connect and commission.

          And as you mention, there is a level of resiliance in the rack although mainly for networking/cooling/power - individual servers failures just reduce the service capacity of the rack and I believe the cloud providers have largely moved away from repairing units in situ as the risk of causing other issues in the rack is too high.

        4. A.P. Veening Silver badge

          Re: Swapping whole racks out

          1) How long does it take to 'un-cable' a rack? i.e. disconnect it from networks, power and the like

          How long would it take you to unplug two power plugs and two network cables? Those are all the connections to the backplane necessary and two of each for redundancy, one of each is enough.

          1. The First Dave

            Re: Swapping whole racks out

            And the liquid cooling comes from where?

            1. Pascal Monett Silver badge

              Check any motherboard, video card or CPU cooler today. Liquid cooling is self-contained. I'm sure they can do the same for HDDs/SDDs.

              There is no hose coming through the wall to squirt cold water over the components.

      3. Anonymous Coward
        Anonymous Coward

        Re: What were they thinking?

        That's all fine and well... we do that where I work. We have casters AND feet on the cabinets. Roll cabinet into place, lower feet to support it for the duration of its lifecycle, then bolt it down to meet earthquake regulations AND good sense. When it's time to remove it, twist the feet back up, remove the hold-down bolts, and roll it away. A failing caster could still happen (in theory), but not in a way that would cause service degradation.

    2. Anonymous Coward
      Anonymous Coward

      Re: What were they thinking?

      On failures:

      Person A: I manage one rack and the rack has never failed

      Person B: I manage one rack and it failed catastrophically

      Person C: 1 manage 20 racks and 1 of them failed once

      Google: We manage 30,000 racks and get materials defects that make it into production in 1 in every 1000 cases.

      On availability:

      Person A: we never have maintenance or hardware failures so we never have downtime

      Person B: we patch asap and occassionally have hardware failures

      Person C: we have some resiliance and redundancy but a power issue took everything down last Monday

      Google: we see frequent hardware failures, power issues and network outages but the entire system continues to self-heal and provide high levels of global availability.

      There are many ways of looking at a problem grasshopper....

      Which answer do you take?

    3. katrinab Silver badge
      Meh

      Re: What were they thinking?

      Would bolting them in place protect against earthquakes?

      Surely you would want them on shock absorbing springs if you are worried about that?

      1. Anonymous Coward
        Anonymous Coward

        Re: What were they thinking?

        How much energy does it take to snap a rack when only one end is securely bolted to the floor? Is there a YouTube video for that?

      2. Anonymous Coward
        Anonymous Coward

        Re: What were they thinking?

        According to the regulations related to securing things against seismic events, yes. In theory? I haven't put a fully loaded and bolted down rack on a shaker table to find out. But, it's what all the DC providers in the Bay Area and Seattle do, so probably someone has tested it or at least done the math.

        1. W.S.Gosset

          Re: Earthquakes

          > it's what all the DC providers in the Bay Area and Seattle do

          Most of the *BIG* datacentres are now in Arizona: close to west coast, but cheaper energy and --key point-- no earthquakes.

          .

          Downside for the locals: the massive banks of refrigerators generate a low hum which drives many people bananas. Becoming an exponentially larger issue as the DC providers pile into Arizona, and are taking up land startlingly close to town. E.g., one 57-acre building on an 85-acre lot, in Phoenix, next to the park and coupla hundred yards from Nandos.

          1. Bitsminer Silver badge

            Re: Earthquakes

            and --key point-- no earthquakes

            But it's only about 1km from the Orbital-ATK rocket factory, also in Chandler. It probably won't go boom.

      3. Jon 37

        Re: What were they thinking?

        Leaving them loose means they're going to fall over and/or walk across the floor during an earthquake, with cables breaking. They might fall on a passing worker.

        Bolting them in place means they're going to move with the building during an earthquake. That's probably not great for hard disks, but the rest of the parts can probably survive that. Passing workers are safe.

        Fitting shock absorbers to each rack would be very expensive. It would make it easier for parts such as HDDs to survive. Passing workers are safe so long as the aisles are wide enough that they're not hit by rocking racks, although there is an amputation risk if a worker has their hand between two racks. Avoiding that risk probably requires drastically reducing the rack density, so there are large gaps between adjacent racks, then filling that gap with a flexible plastic sheet to allow hot-aisle/cold-aisle separation.

        The cost of bolting down can be easily justified; the cost of shock absorbers can't.

        At Google scale, losing a datacenter due to an earthquake isn't a big deal. And having to fit a lot of new HDDs in a datacenter is a risk worth taking when compared to the cost of shock-mounting the servers.

    4. Glenn Amspaugh

      Re: What were they thinking?

      If they want whole-rack modularity they shoulda used small but strong turtles.

  3. Anonymous South African Coward Bronze badge
    Coat

    The Leaning Rack of Google.

    Getting my coat to go visit the Leaning Tower of Pisa and make a comparison...

    ...oh wait, COVID-19 put a crimp in my plans...

    1. Jimmy2Cows Silver badge
      Coat

      No queues though...

      (yes I know you can't actually go there at the mo)

    2. Anonymous Coward
      Anonymous Coward

      Great time to visit Pisa. Cheap flights, no crowding. Return trip presents some problems, though.

  4. corestore

    A couple of degrees...

    Of tilt were enough to disrupt the operations of the machine?!

    I once had an IBM System/38 (google it) fall off the liftgate on the back of a truck. The whole thing dropped ~ 4-5ft onto concrete, landed on its back.

    Damage? Broke the cast alloy hinges holding the back doors in place.

    Dragged it upright again. It powered up no problem and IPLed (booted) and ran just fine.

    https://www.youtube.com/watch?v=2cAWArBXRhE

    IBM quality.

    1. Pascal Monett Silver badge

      Re: A couple of degrees...

      I checked the pic and had somewhat the same reaction. It's really not tilted much.

      On the other hand, liquid doesn't need much to start going down hill, so that tilt was certainly sufficient to have the coolant pool on one side, which means the other side didn't get cooled. It also obviously indicates that Google is not using pumps for its coolant system (saving on electricity), otherwise the liquid would not have a choice.

      1. phuzz Silver badge

        Re: A couple of degrees...

        Indeed, the watercooling on my home computer will work fine up to at least 70° off vertical (more in some axes).

        This is handy because I have to tilt it back and forth to bleed all the bubbles out every time I fill it.

      2. Anonymous Coward
        Anonymous Coward

        Re: A couple of degrees...

        "obviously indicates that Google is not using pumps"

        Or maybe they have short flexible hoses connecting to the top of the rack that got kinked and restricted the flow? Or maybe they are the bottom and got smashed like the wheels. There could be more possibilities.

    2. Allan George Dyer
      Joke

      Re: A couple of degrees...

      @corstore - "Damage? Broke the cast alloy hinges holding the back doors in place."

      But what about replacing the concrete slab?

      1. corestore

        Re: A couple of degrees...

        It survived mostly intact!

    3. Antron Argaiv Silver badge
      Coat

      Re: A couple of degrees...

      They didn't call it "big iron" for nothing.

      402 Accounting Machine: big as a Volkswagen, weighed a heck of a lot more, and had a bigger (electric) motor

      // plugboard wires in the pocket

    4. Alistair
      Windows

      Re: A couple of degrees...

      @corestore, that quality was laid off and forcibly retired in the late 80's and early 90's. Sadly.

    5. Luiz Abdala
      Thumb Up

      Re: A couple of degrees...

      Their Aptiva desktops... if it had 8 screw holes to hold CD-ROM drives in place, it had 8 screws in them. It also applied to every ISA card, every PCI card, every riser rack, HDD... you get the point.

      I took one apart to replace a single HDD and I went through 36 screws before losing count. Put it back together with just 10 or so.

      Even IBM desktops were earthquake-proof.

    6. Bitsminer Silver badge

      Re: A couple of degrees...

      Ditto for a Sun rack with a few 420 sparc servers and some disks in it.

      The sides of the rack were a bit bent, and the paint somewhat peeled. The peeling was from me, screaming at the stupid c* that spun that c* F* S* F* R* and d* the * 100k ** ... etc etc. you can guess the rest.

  5. bazza Silver badge

    The post is of course self-promotion for how seriously Google takes its quest for uptime.

    Well, some of Google takes it seriously. Other bits of Google just seem content to just leave racks leaning over on wrecked plastic castors.

  6. chivo243 Silver badge
    Coat

    I used to wonder

    What is the weight of your data? I guess we know it's HEAVY!

  7. Giovani Tapini

    I must say I'm surprised....

    If the racks don't have feet, as well as castors, that probably means several hundred kilo's on each castor I guess a square inch or so making contact - that's a problem for the floor not just the castors, particularly if it is tiles on stations... They are probably lucky not to lose a water cooled rack into the subfloor, which may have taken a bit more aggressive winching to recover :)

    I like the fact they had to send someone with eyes to look at it - I my world we have cameras that look at the racks and aisles to save on meatwork :)

    I do like the mantra that incidents should be novel too - now if only that would creep from the infrastructure people to the application people the world will be a better place !

    1. TRT Silver badge

      Re: I must say I'm surprised....

      The in-rack environmental control system should have vibration, tilt, noise, emf etc. sensors.

      1. Giovani Tapini
        Coat

        Re: I must say I'm surprised....

        All I have in my head is a sort of advanced pinball machine, made out of expensive and heavy components :)

      2. DCFusor

        Re: I must say I'm surprised....

        Something along the lines of "the more complex you make things, the easier it is to stuff up the plumbing"?

        I've run into that one a few times. Either false positive or false negative can really mess up your day.

        1. Anonymous Coward
          Anonymous Coward

          Re: I must say I'm surprised....

          "Either false positive or false negative can really mess up your day."

          The 2nd generation EELM KDF9 mainframe prototype was thought so technologically advanced that its ferrite bead memory didn't need parity checking. They soon had to add parity checks. However - when the system stopped with a "parity error" - it was often a checking circuit glitch that had given a false alarm.

          1. TRT Silver badge

            Re: I must say I'm surprised....

            Pirate IT systems, eh?!

            Bits-of-seven, bits-of-seven. Squawk!

            A parroty error.

          2. Mike 16

            Re: I must say I'm surprised....

            Parity?

            I recall a talk on the early Cray 1 computers. They were originally designed with only parity, but soon (after experience with the "production" use of Serial #1) switched to ECC. Serial #2 was too far down the assembly line to be worth the retrofit and was scrapped. All later systems were a bit taller than #1, to accommodate the ECC mod.

            (I assume the case of Leinenkugel beer intended for shipment with #2 was properly disposed of :-)

    2. JSIM

      Re: I must say I'm surprised....

      Castors! Fitting, that it sounds like a cuss word. Never met one that didn't eventually cause me trouble.

      While Google works on the latest design revisions of its custom-built racks, beefing up the castor specs, no doubt, they could do better.

      If the racks have wheels, I assume that one design goal must be quick and easy physical swapping/removal/addition of fully loaded modular rack assemblies.

      Ditch the castors for something like a custom-built roller jack to raise and move the custom-built rack. Two mating jacks, front and back, joined by lifting bars inserted through the rack. Foolproof quick connect fittings for all cabling, fibre, cooling. Solid footing. How about a super-rack system - like a rack for racks, maybe, while you're planning your next 1000-rack deployment?

    3. Anonymous Coward
      Anonymous Coward

      Re: I must say I'm surprised....

      "They are probably lucky not to lose a water cooled rack into the subfloor, which may have taken a bit more aggressive winching to recover "

      In 1970 a new 600MB hard disk was enormous in all dimensions and weighed well over a tonne. The engineers managed to squeeze it through the computer room double doors and trundle it down the central aisle. Then they turned towards the vacant space. At which point they left the concrete based aisle . As the false floor tiles started to collapse they managed to drag it back onto the aisle. Otherwise they would probably have had to take the roof off the room for a crane.

  8. Lazlo Woodbine

    A few pounds saved often leads to a few hundreds pounds spent later

    We used to get this all the time, a customer would buy a rack full of servers and UPSes then after spending the best part of £100k on the kit, they'd buy the cheaper 300kg casters, "because we'll only push the rack once"

    Then a year or so down the line they'd complain the rack has tipped, and we'd say, "well, you do have almost a tonne of kit in the rack standing on wheels rated to 300kg, what did you expect to happen"

    So they have a few hours of downtime while they strip the rack out and fit the heavy duty casters, if they're lucky, because sometimes the rack buckles at the edges when it tips, so they have to either sit it on the floor and never move it again, or buy a new floor for the rack, or a new rack...

    1. Roger Greenwood

      Re: A few pounds saved often leads to a few hundreds pounds spent later

      ".. wheels rated to 300kg, what did you expect to happen"

      I would expect 4 wheels to withstand 1200kg, unless they were not up to the job......

      1. I am the liquor

        Re: A few pounds saved often leads to a few hundreds pounds spent later

        Load rating for a set of 4 casters is typically 3x the single caster. If the thing they're supporting is rigid, and the floor is even slightly uneven (as floors invariably are) then one caster will often be carrying very little or zero weight.

      2. Lazlo Woodbine

        Re: A few pounds saved often leads to a few hundreds pounds spent later

        The set of 4 x 300kg casters is for a rack with a total weight of 300kg, we'd reasonably expect them to cope with a bit more than that, but not 3x the rated weight

  9. Anonymous Coward
    Anonymous Coward

    Had it happen

    Had this happen, rack provider at data centre had left cab on transit wheels and not wound down the feet, only noticed when I couldn't shut the door one day. Took me ages to spot the cab was lop sided. Had to get the data centre to jack the cab up and wind down the feet. Only about half populated if that. No down time.

  10. Anonymous Coward
    Anonymous Coward

    Apparently the front fell off.

    They will need to take it out of the environment.

    1. W.S.Gosset

      Re: Apparently the front fell off.

      Harder to tow now, though, without any wheels.

      ( +1 for the Fred Dagg / John Clarke reference! )

  11. IHateWearingATie
    Devil

    I was working on the national insurance system in the early 2000s - we had 70 million people records (as pensions for live people sometimes rely on those who are not), and the saying was that one in a million possibilities happen 70 times in a single batch run, so we had to really think through the code design.

    Have 15 slots for children in your database design? There's always a family with more kids than slots so the design needs to be different.

    Don't expect numbers in a name? One idiot changed his middle name to his national insurance number.

    Only one spouse? Bigamists pay national insurance like the rest of us. Can't just dump out an error there and crash the batch job.

    The list was endless - it taught me that whatever screwy situation you can imagine definitely exists out there in more numbers than you can imagine!

    1. Hans Neeson-Bumpsadese Silver badge

      Don't expect numbers in a name? One idiot changed his middle name to his national insurance number.

      That reminds me of an old Top Tip from Viz...."Avoid paying for expensive personailsed licence plates by simply changing your name to your car's registration number - yours sincerely, C695 LCU, Southend"

    2. Anonymous Coward
      Anonymous Coward

      I heard similar stories about the census.

      In particular, they would find situations like father apparently being younger than son. You could just discard such data, but maybe you should guess what they meant - did they transpose digits in the date of birth? Did they just get confused being “your relation to person A” and “person A’s relation to you”? So they have lots of data cleaning to fix those errors .

      But great care was needed to make sure that it didn’t wrongly “correct” rare but possible situations - like housemates who just happen to share the same surname, or couples with a large age difference, and so on.

      1. Richard 12 Silver badge

        A stepfather could indeed be younger than their own stepson.

        1. IHateWearingATie

          "A stepfather could indeed be younger than their own stepson."

          See? Richard here gets it!

          Every single scenario you can imagine exists out there somewhere!

        2. Tomato Krill

          Hi Richard, how’s the weather in Norfolk? :-)

    3. NorthIowan
      Facepalm

      Re: Have 15 slots for children in your database design...

      I had car insurance from a company where the programmers though you'd only have insurance on 4 vehicles. With two teens and a pull camper we had 5. So the extra vehicle came on a separate bill.

  12. cschneid

    Santayana, again

    Twenty-mumble years ago, I came into support of a roll-your-own DB/DC system built in the 1970s. It was kind of creaking, but we were in year twelve of the ten year migration out of the system and management wanted no maintenance done. My senior and I ignored that, quietly declaring that any production problem would be met with our intent of making that problem never happen again.

    At the time, the system was executing ~2,000,000 transactions every business day during prime shift. We used to say, if it's a one-in-a-million chance, it'll happen today, twice.

    It's nice that Google has learned these particular lessons that mainframe people knew decades ago. I wonder if they'll learn the rest?

  13. Jeffrey Nonken

    If your girl is one in a million, there are at least six of her in any major city.

    Unfortunately for me, several people have already made the same point, just with different phrasing. I won't even bother with my coat.

    1. swm

      But all of these have boy friends.

    2. Anonymous Coward
      Joke

      This joke...

      is now one in a million. So common huh?

      1. Anonymous Coward
        Anonymous Coward

        Re: This joke...

        If I've told you once, I've told you a million times... stop exaggerating!

  14. BigE

    Propoganda

    If you are being throttled due to temperature then you should know that there is a cooling problem, usually the flow of heat in air or fluid isn't working, which is part of the usual monitoring. I cannot see why the broken caster is anything special. It could have fallen over due to a centre of mass problem too.

    Why don't they install hall effect devices and get the actual level of the rack too? Or about about's they practice some OH&S and physically secure the top of the rack to something that is sturdee and/or structural. I mean your average plumber knows that the hot water heater can fall over due to its center of mass, so all I take from this is that they (google) are just reinventing what has been known for centuries.

  15. Christoph

    "a Googler was despatched to endure the indignities of meatspace and inspect the problem rack with their actual eyes"

    Sounds like time for Google to install little robots than can move round the datacentre with a camera so they can check remotely. And maybe have manipulators to fix simple problems such as shoving a loose connector back into place.

    1. Anonymous Coward
      Anonymous Coward

      Silent Running...

  16. Anonymous Coward
    Anonymous Coward

    Back in the 2nd Generation days when mainframe hardware glitches were common - my support team leader gave me a piece of advice. If something goes wrong and the cause isn't quickly obvious then it isn't "a problem". If it happens twice then it IS a problem and has to be sorted because it will probably keep happening. A pragmatic approach that avoids pouring scarce resources into an investigation that gets nowhere. Presumably the Boeing 737-Max was subjected to those criteria.

  17. Anonymous Coward
    Anonymous Coward

    Bearings. Brass bearings.

    On the apartment building my parents lived, the bearings for the automatic garage doors were made out PLASTIC. A steel gate, weighting at least 400/500 pounds, supported by FOUR PLASTIC DISCS.

    OF COURSE THEY FAILED every month. PTFE, Nylon, PVC, no matter, they were obviously crushed under the weight.

    So I suggested GOOD'OL BRASS for bearings/bushings, and a generous application of heavy axle grease. It could have been copper, aluminum, stainless steel, but brass tends to work better for this application.

    The thing operated flawlessly for 10 years before the cables were frayed.

    So yeah, keeping several hundred thousand dollars worth of equipment, weighting just almost as much as an overweight BOFH, over 4 plastic rollers, is a certifiable "Galaxy Brain move™", and Google could get a hefty discount ordering them by the thousands in good'ol steel/metal, all over the world.

  18. Dr Gerard Bulger

    What that on the second picture? I thought 240v kills Americans.

    Or have they at last woken up to the waste and waste of cooper at 110V?

    1. Anonymous Coward
      Anonymous Coward

      That label says "240VA." Which, really, doesn't make a lot of sense for a caution label, as 240VA (assuming PF=1 for simplicity) would mean 240w, which really isn't much power. If it were actually 240kVA, that would definitely warrant a caution sign, but is highly unlikely for a single rack. 24.0kVA would make sense for a high density rack, but still tells us nothing about voltage being used (hint: It's probably 415v if they're drawing 24kVA, but hard to tell what Google's doing these days).

    2. Getmo

      We Americans already have 240v/230v coming into our homes to power large appliances, it's just that it gets split into 2 "legs" of 110v for everything else.

      Well, most homes. On the east coast where some houses are >100 years old, you might be lucky if the wiring includes a ground/earth wire.

  19. TheProf
    Headmaster

    Almost Literally

    I almost literally fell off the planet when I read that.

  20. Anonymous Coward
    Anonymous Coward

    As someone once said "There are a finite number of things that you can think of that could go wrong - out of an almost infinite number that could go wrong".

    Murphy's Law "Anything that can go wrong - will go wrong - at the worst possible moment".

    Donald Rumsfeld "...then there are the unknown unknowns".

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like