back to article What did OVH learn from 24-hour outage? Water and servers do not mix

An external water-cooling leak crashed a Dell EMC VNX array at an OVH data centre in Paris and put more than 50,000 websites out of action for 24 hours. OVH is the world's third largest internet hosting company with 260,000 servers in 20 data centres in 17 countries hosting some 18 million web applications. The failure took …

  1. W4YBO

    Pure water isn't so bad...

    Pure water isn't so bad for electronics. It won't conduct electricity at all. I recommend a distilled water dunking for equipment that's been exposed to soda, fruit juice, or tea. I've even washed mouse pee off a motherboard, and rescued a ocean-water dipped cellphone. Both lived.

    The problem with water-cooling occurs when you mix conductive pump lubricant and antifreeze with the water coolant.

    1. Prst. V.Jeltz Silver badge

      Re: Pure water isn't so bad...

      so if i got an old motherboard and dipped it in a fish tank ( not the PSU i guess, I'll just dangle it in via the low voltage side wires) what would happen?

      Can i put a fish in?

      1. Lee D Silver badge

        Re: Pure water isn't so bad...

        1) That's not pure water.

        2) If it was pure water, the fish would die.

        3) The fish dying would make it not-pure water.

        1. Prst. V.Jeltz Silver badge

          Re: Pure water isn't so bad...

          "The fish dying would make it not-pure water."

          Ah, so it would then support fish 2.0?

          How many ohms are we looking at to not conduct?

          1. Anonymous Coward
            Anonymous Coward

            Re: Pure water isn't so bad...

            How many ohms are we looking at to not conduct?

            I would not worry too much at exactly how much conductivity you get, instead worry about the fact that you also get a fun degree of electrolysis taking place which over time creates its own entertainment.

            Oh, and may again not be very conducive to fish occupancy :).

      2. Anonymous Coward
        Anonymous Coward

        Re: Pure water isn't so bad...

        so if i got an old motherboard and dipped it in a fish tank ( not the PSU i guess, I'll just dangle it in via the low voltage side wires) what would happen?

        You'd get fish and chips.

    2. Aitor 1

      Re: Pure water isn't so bad...

      pure water will cause corrossion.

      use car coolant liquid.

    3. Steven Jones

      Re: Pure water isn't so bad...

      Pure water may not be a good conductor, but it is not an insulator either. Even the purest water will still conduct to some extent, as there is a dynamic equilibrium due to H+ OH- dissociation, and it has to have dissolved monatomic gasses to be optimal (that is inert gases). Degassed pure water is much more conductive and (more practically) ultra-pure water exposed to the atmosphere is worse again, mainly due to being exposed to CO2.

      Of course your ultra-pure de-gassed water isn't likely to remain that pure either as it picks up contaminants once it starts leaking.

      There is a reason that some old Cray supercomputers used freon to cool their circuit boards and not water.

      1. W4YBO

        Re: Pure water isn't so bad...

        "Pure water may not be a good conductor, but it is not an insulator either. "

        Klystron tubes in most high power UHF+ transmitters/amplifiers are distilled water or steam cooled, with anode voltages of 30,00 volts up. So, it's close enough.

  2. Mark 110

    Not that bad

    Considering they had no warm redundancy for the array and had to go to backup I think 24 hours is astonishingly impressive. They did have an 'in principle' plan which had been tested and it worked.

    So I agree that in an ideal world they would have just flipped (hopefully automatically) to recovery infrastructure but that means you have to charge your customers double infrastructure costs.

  3. A Non e-mouse Silver badge

    I think they deserve credit for being so open about the incident.

    1. Prst. V.Jeltz Silver badge

      Well , "we spilled some water on it, but backup worked eventually" sounds a whole lot better than "we got pwned" and possibly lost / compromised data.

  4. wolfetone Silver badge

    Ah, if only 123-Reg followed OVH's lead in being so open about their fuck ups.

  5. Anonymous Coward
    Anonymous Coward

    Missed one.

    5. Use non-conductive coolants. Expensive, but not as expensive as trashing a shit load of servers, storage and reputation.

    1. Voland's right hand Silver badge

      Re: Missed one.

      Use non-conductive coolants.

      Most of the non-conductive coolants are either flammable, or toxic, or have special pipework requirements.

  6. Anonymous Coward
    Anonymous Coward

    Ah, that explains it!

    I was wondering why the number of hack attacks on customers servers originating from OVH dropped so much..

    1. Anonymous Coward
      Anonymous Coward

      Re: Ah, that explains it!

      French spam probably decreased also....

    2. Alister

      Re: Ah, that explains it!

      I was wondering why the number of hack attacks on customers servers originating from OVH dropped so much..

      Ha! I was going to post exactly the same, I noticed a reduction in the size of the firewall logs today, that would explain it.

      We gave in and black-holed a whole range of IP blocks belonging to OVH some time ago, as all we seem to get from them were dictionary attacks on our mail servers.

  7. Hans Neeson-Bumpsadese Silver badge

    The system from Roubaix arrived at 4.30am with all the failed system's disks moved over by 6am. The system was fired up at 7am but, disaster, the data on the disks was still inaccessible. Dell EMC support was recontacted at 8am and an on-site visit arranged.

    Interesting that support was recontacted. I would have thought that the procedure would be to check that data was accessible before leaving the scene, and so not need to be contacted again.

    1. Anonymous Coward
      Anonymous Coward

      Rumour has it that the original support/maintenance contract had expired -and as is EMC's way an expensive PO would need to be drawn up to continue to work

  8. Old Used Programmer

    That fast?

    Only down 24 hours due to a water leak? Not bad. In a shop I worked in in the 1980s, we had a pipe joint failure in the water distribution unit for a water-cooled IBM mainframe. We were down for 4 days. Contributing factors were:

    Draining a 10,000 gallon storage tank (data center on the 14th floor, storage tank on the roof of the 45-story building) in part because no one had told the machine operators where the shutoff valves were (under the false floor).

    The water got into the main power duct and blew out a two-story high main bus bar for half the building. (We were in San Francisco, the replacement was air freighted in from Chicago.)

    The ultimate cause was a manufacturing defect waer distribution unit.

  9. DagD

    was wondering why we had a downtick in hits from the outside

    OVH Down = bad day for malware dispensers.

    oh well...

  10. sgp

    Once

    I tried to register a domain with them once. The purchasing wizard failed every time on a page which tried to flog extra services. Makes you wonder about their service quality if you can't even buy something from them....

    1. Hollerithevo

      Re: Once

      A web design agency stuck me on OVH for w while. Migrated off at first opportunity.

  11. Anonymous Coward
    Anonymous Coward

    Heat exchanger on TOP of rack?

    Ugh, that's the part most likely to leak! Why not put it on the bottom? Assuming it is a raised floor you can move air at much higher velocities than you would where people go, and if a leak develops it won't hurt anything (assuming cabling is run in trays above the racks) and water sensors here and there on the floor can let you know.

    1. Mark 110

      Re: Heat exchanger on TOP of rack?

      It was in the basement I think the article said. Agree in principle.

  12. Lorribot

    Two things of note here, first VNX support cost in years four/five generally equate to the price of a new storage system, so would not surprise me if it was not under support agreement as it was a 2012 purchase so 5 years old.

    The other thing is, be nice to your tech guys as when the shit really hits the fan they are the only ones that can dig you out of that collapsed mineshaft your management decisions have led the company down.

    1. Anonymous South African Coward Bronze badge

      "The other thing is, be nice to your tech guys as when the shit really hits the fan they are the only ones that can dig you out of that collapsed mineshaft your management decisions have led the company down."

      Agreed and agreed. But most companies doesn't realize this fact until it is too late.

  13. JakeMS
    Thumb Up

    A lot of people hate OVH but...

    I know a lot of people hate OVH, most noticeably those on a certain forum talking about web hosting.

    The thing is, people hate them for their lack of "support". But honestly, I've been using them since 2013 now for my public business servers and honestly. They are a pretty decent host. Yes, you have to manage the servers yourself, but OVH are quick to replace faulty hardware or provision new servers for you. You get full KVM access (on newer servers) and can re-install your OS when and how you want, along side you can also force-reboot your server if required (power reset). Oh, and don't forget the attempt at DDoS protection.

    The thing is, from a strictly by-their policy way they are spot on for what they do. They keep to the SLA's as best as possible (I think I've only had about 5 minutes outage with them in all the time I've been with them) and provide real server hardware for very cheap prices (hence all the hate from other providers......)

    24-hours to fix a leak like this and restore a backup and have it up and going? That is a decent turn-around. Usually when something like this happens you could be waiting up to a week.

    I mean, yup it sucks it was offline for 24-hours for those customers, but liquid leaking onto hardware is a decent reason for why it should be powered off.

    Honestly I just prefer to stick with OVH now, they've always been reliable for me. Which is more than I can say about 1&1 whom I tried a server with previously.

  14. jpb4516bs

    Business Continuity Issue, Not a DR Issue

    Hind sight is truly 20-20. The placement of the storage in close proximity to the water pipes should not have happened. The lack of a DR plan was a mistake that is unlikely to be repeated. But IMHO, shouldn't OVH be a high availability infrastructure and have had business continuity in place? There are several products out there that virtualize storage and would have provided for a mirrored copy of the environment to be immediately accessible without an outage. Would have turned the Disaster into an inconvenience. Look at IBM SVC or EMC VPLEX. .

  15. Steve Harrington

    What if it could not leak water out, only air in?

    If they were using Chilldyne Negative pressure liquid cooling, a leak would have been a minor maintenance issue, not a disaster. Look up "Chilldyne cut the line video" for what happens when you cut a cooling line with our system.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like