back to article 'No BS' web host Gandi emits outage postmortem, has 'only theories' on what went wrong

Hosting outfit Gandi has published its postmortem regarding this month's outage and concluded that while it still has "no clear explanation", the main problem was "the duration". So that's OK then. The mystery incident took down a storage unit in the company's Luxembourg facility at 14:51 UTC on 8 January. It wasn't until 13 …

  1. Still Water

    Chalk one up...

    ... to cosmic rays.... :-/

    1. Michael H.F. Wilkinson Silver badge
      Thumb Up

      Re: Chalk one up...

      Good call. Interference from sunspots won't fly that well, given the current solar minimum. I also rather like "routing problems on the neural net "

    2. theblackhand

      Re: Chalk one up...

      More likely /dev/null filled up and they weren't monitoring it

  2. Tom Paine

    For whom the bells toll

    those pointing smugly at their own storage and hosting setups would do well to take a careful look at Gandi's experience.

    Very well said. Hardware and software mostly do a very good job of abstracting and hiding the complexity behind the scenes, but who among us can honestly say they've never gone down a diagnostic rabbit hole and quickly come up against vast swathes of components, libs, interfaces, interactions, config options and mechanisms, such that Google can only show you how much you don't know? Who here has complete docs for every aspect of the systems or apps they're responsible for? (Yes that includes you Devs!)

    1. Anonymous Coward
      Anonymous Coward

      Re: For whom the bells toll

      Oh come now, it's far worse than that..

      The joys of using java + spring via maven and then... Start adding some JavaScript library/framework de jour with some node build tool which routes through a redis instance or 5 and then slap it into Docker and run from AWS.

      Always feels like each and every part of the whole house of cards is usually on fire if not smoldering at the very least... Fun.

      Welcome to the modern world of serious websites with a hell of a legacy behind it.

      Anonymous because I'm not getting drawn into just how bad this can get.

  3. IGotOut Silver badge

    Well it's obvious what went wrong...

    They were still using block chain technology to run the back ups, when we all know it should be AI now.

    1. katrinab Silver badge

      Re: Well it's obvious what went wrong...

      zfs literally is blockchain technology. It is afterall based on Merkel Trees. It is good, but not that good.

      1. Anonymous Coward
        Anonymous Coward

        Re: Well it's obvious what went wrong...

        IIRC dedupe has a certain risk of hash "collisions". Most systems do (but I assume you mathematically put those things where it's near impossible a user/computer would want that type of data structure, like 100% zeros, or the Star Wars prequel trilogy).

        ZFS is probably the same?

        1. dinsdale54

          Re: Well it's obvious what went wrong...

          ZFS allows you to just check the hashes or do a full block verification. If you are using SHA-2-256 the chances of a hash collision are very very very small. According to the author 50 times less likely than an undetected & uncorrected ECC memory error.

          I'd more likely suspect multiple disk failures after a power outage forcing recovery from backup when the procedure hasn't been tested recently.

  4. katrinab Silver badge

    ZFS is not backup

    Nothing wrong with zfs, and nothing wrong with updating it to the lastest verson. But it is not backup, and they need to implement a backup procedure as well.

    1. Anonymous Coward
      Anonymous Coward

      Re: ZFS is not backup

      Most of the blog/website posts of ZFS are pretty good at showing where things went wrong. "The user" is generally the answer. ;)

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon