back to article MAID to order: Inside Facebook's cold-storage data ziggurats

Facebook is storing old photos in special cold storage halls, Zucky ziggurats housing racks filled with MAID (massive arrays of idle drives) using erasure coding and anti-bitrot scanning to increase storage density and lower power costs while providing faster-than-tape access. Some two billion photos are shared every day on …

  1. Dave Harvey

    Perfect for medical image data

    PACS (Picture Archiving, Communication and Storage) systems need exactly this sort of storage for radiological images (especially CT scans - typically now 0.5 GByte per study), and if/when pathology whole slide imaging takes off, than could be up to 1TByte per slide. All this needs a lot of access initially, but with rapidly decreasing usage over time, and a requirement to keep for X years (varies by jurisdiction), so migration to this sort of system would be perfect.

    So - who would like to sell these systems off the shelf?

    1. HPCJohn

      Re: Perfect for medical image data

      Dave,

      I worked on the original PACS at the Hammersmith, then went to work at Guys/Tommies (*)

      Why not use a tape robot?

      I suppose that comment is not just relevant to PACS storage - it is relevant to all long term storage requirements.

      I also speak as someone who managed an HSM system with a tape library tier for an F1 team.

      Have a look at the Spectralogic Black Pearl - an SSD array, with an S3 interface.

      The SSD array gets full, and the data is pushed onto a Spectra tape library, using LTFS.

      I would say that would be a perfact match to a PACS system.

      (*) War story alert. The Hammersmith PACS was engineered by Loral, the original design was done for the US Veterans Administration. It had a 20 (?) Gigabyte RAID array which took up one huge rack. It was considered 'portable' as it could be slung under a helicopter.

      (**) Second war story. Early digital X-ray sets were deployed aboard US aircraft carriers. They ended up using them more for X-raying the landing gear of Tomcat fighters after heavy landings, than for sailor's broken limbs.

    2. HPCJohn

      Re: Perfect for medical image data

      Dave I should also say that I just installed 4 Petabytes of disk storage, in a high performance parallel filsystem, for a UK STFC project. So I'm not against using disk for huge amounts of storage either!

      You can buy MAID systems - Google is your friend!

    3. Desidero

      Re: Perfect for medical image data

      Disk Archive already sells the ALTO off-the-shelf for petabyte spun-down near-online storage (<10secs), focused mostly on video & large image. And yes, they avoid RAID.

      Not sure if they've addressed the, ahem... 1 Terabyte "cat" photos which seems a bit too close for say comfort or propriety.

  2. Justin Pasher

    Not just Facebook

    Backblaze also created a home grown solution to large quantities of data storage, also using Reed-Solomon versus traditional hardware RAID. It has some pretty impressive scalability (in theory). Some of the concepts seem similar to Facebook's solution, although they didn't put much focus on reducing power consumption and the like.

    It's a very interesting read

    https://www.backblaze.com/blog/vault-cloud-storage-architecture/

  3. Anonymous Coward
    Anonymous Coward

    A Digital Garbage Dump

    Ooh, ahh, very high tech.. too bad the information being stored has no value.

    When are Facebook's investors going to ask real questions, like - how exactly are you planning to monetize this information, given that you're using our money to build out all this hardware to store it? Facebook users rarely go back to look at old pictures, even though they love living in the past - because the pictures suck and just remind them of how banal their lives are.

    This is, of course, the same problem their pals over at the NSA are facing - the Utah Data Center is just a garbage dump, the information is useless by the time it gets over there. In both cases, the 'customer' is not willing to pay for the 'service', so in the case of Facebook it is funded by clueless investors, and in the case of the NSA by clueless taxpayers.

    1. Mark 85 Silver badge

      Re: A Digital Garbage Dump

      Valid points. I'm perturbed by all this tech and money going to store... meaningless crap.

      I suppose FB and other "social" sites could change their TOS to something like: "If the picture isn't accessed in 2 years, it's gone. Same for posts."

      If they did try that, I'm sure there would be a lot of screams along the line of: "Oh.. wait... the only pictures of our dear departed kitty are on Facebook"

      1. This post has been deleted by its author

        1. Steve K Silver badge

          Re: A Digital Garbage Dump

          Genius!

          Have you had that one saved up just for this day?

          Steve

    2. This post has been deleted by its author

    3. This post has been deleted by its author

    4. Natalie Gritpants

      Re: A Digital Garbage Dump

      You need to think a bit more meta. Each data may be worth $0 and if you add them all up you still get $0 but their presence is not worth $0. It's the cost of doing business for Facebook. If they start deleting stuff they will be replaced by the next social media start-up that does not. I am glad they are figuring out how to minimize the power consumption, not for Facebook but for the human race.

  4. url

    very cool

    pun intended

  5. Anonymous Coward
    Anonymous Coward

    I'll Tell You What I Want...

    ...I wanna really really really wanna zucky ziggurat

  6. gizmo23

    Astronomy

    Seems to me this goes with the article about the kilometre array that's going to generate enormous amounts of data.

  7. amusedscientist

    Schrodingers Pictures

    may, or may not, show a live cat, depending on when a cosmic ray hits the ziggurat, and/or when someone looks at them....

  8. burjoes

    failure rate

    I'm very curious about the failure rate of hard drives that spin up and down so often. In my experience, doing data center moves, for example, drives that are more than a couple of years old have a drastically higher failure rate when powered off then back on. I haven't seen MTBF comparison from constantly-spinning drives vs those that are powered off and back on regularly - anyone seen such reports?

  9. ecarlseen

    Cute, but does it actually work?

    I'm pretty sure that I'm not the only person who's found that searching for old images on their account is so slow that they may as well just delete them. I've sees plenty of instances where refreshing an image for several minutes does nothing to bring it up - which, combined with Facebook's "endless scrolling" interface always wanting to return you to the top of the page more or less kills the functionality.

    So it's really cool that they've constructed a technically elegant system, but if it doesn't work for end users then ultimately who cares? Unless they have some other purpose for hoarding this data (possibly).

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2020