back to article Google File System II: Dawn of the Multiplying Master Nodes

As its custom-built file system strains under the weight of an online empire it was never designed to support, Google is brewing a replacement. Apparently, this overhaul of the Google File System is already under test as part of the "Caffeine" infrastructure the company announced earlier this week. In an interview with the …


This topic is closed for new posts.
  1. Bassey

    I'm glad I'm not

    I'm glad I'm not in charge of the project to move the whole of Google from GFS to GFS2. Good luck to whoever is. I hope they are wearing massively reinforced underwear when the time comes to press that button!

  2. MyHeadIsSpinning

    I predict that... day Google will announce the invention of a dynamic file storage and distrubution service along the lines of a p2p grid computing system.

    Google will allow people to download a free screensaver to tie in with Google Desktop and Google Documents etc which will use individual user's computers to help with the load, at virtually no cost to Google (assuming they don't use a master data centre to process what the users computers spit out).

  3. Richard 102


    I've done my share of migrations in my time, but that is one that I'd think twice about.

  4. Jay Jaffa
    IT Angle

    buy a bigger server

    Why don't they buy a bigger server? We've just moved from a 2 CPU server to a 4 CPU server and added some more disks - everything runs much faster and we've got loads more room.


  5. Lockwood

    Extra help

    "Hey Dave! I'm stuck getting this migration to finalise. Any ideas?"

    "Google it - oh wait."

  6. Anonymous Coward

    Forgive me if it's obvious but why re-invent the wheel?

    Why the hell did they not use lustre? Or haggle a prmotional discount (for advertising space) on something like QFS?

    Sure other distributed filesystems are otu there and free so why code an inhouse special or am I thinking lustre is stronger than it is (or imagining gfs is not really that large) ?

    HPC eats data space like crazy so why not lustre.

  7. Lockwood

    Best plan ever...


  8. Anonymous Coward
    Anonymous Coward

    @jay jaffa

    Irony overload !


  9. Gary Littlemore
    Thumb Up


    Look at Google Wave

  10. Anonymous Coward
    Thumb Down

    @anon coward

    "Forgive me if it's obvious but why re-invent the wheel?"

    Because - to use the same analogy - they want 21 inch low profile alloys, not steel rim 17 inchers.

  11. Anonymous Coward
    Paris Hilton

    google reinventing the wheel.......

    for undisclosed reasons..... lustre, qfs, zfs, gpfs (and ..... ) come to mind. a proper mix of those should easily do the job.

    they've been doing it on a terabyte scale for many years, after all.....

    paris, cuz she would have figured that a long time ago.....


  12. valen

    GFS2... truely awe-inspiring

    We (the guys in charge of moving all the data from GFS to its successor, and the day-to-day maintenance) had a good laugh in the office about the comment "wearing massively reinforced underwear". Sometimes it's better not to wear underwear when doing these sorts of upgrades...

    As for "other tools"; Lustre was invented as a local network filesystem. GFS was invented to handle thousands of tasks all reading & writing as fast as they could all day every day. The indexing pipeline; download the internet, index it, run a few mapreduces over it to mark down spammy sites, crappy sites, duplicate sites, dead sites etc. and then compress it so it could be shipped all over the place. As Sean says in his interview, these days 'routine use' is dozens of petabytes of data that has to be randomly accessed - as in, the metadata has to stay in RAM.

  13. jumblie

    Huge scale

    The scale of GFS is enormous, far beyond Lustre or QFS. Last I heard it was scaling to about 22,000 nodes.

  14. Neil Greatorex

    @ Boltar

    "Because - to use the same analogy - they want 21 inch low profile alloys, not steel rim 17 inchers."

    Nope, they want caterpillar tracks, but capable of 417.3 MPH :-)

  15. Anonymous Coward

    How much to develop from scratch?

    One of the other posters mentions commercial products - I don't know about the others but the bloke who says GFS scale is way beyond whats available is wrong. The PERCS project paid for by the good ole US government is scaling GPFS to...

    # 1 trillion files in a single file system

    # 32,000 file creates per second

    # 10,000 metadata operations per second

    # 6TB/s throughput

    # 100's of Pbs of data

    # add more scary stats...

    All in a single filesystem by 2010 (Not so far away).... Why not work with IBM? I'm sure they'd be so desperate to get some chocolate factory kudos it would be practically given away :-D

    Since its designed for HPC it is different to the demands of Map-Reduce but would support any amount of real-time processing they fancy in the future.

  16. Munchausen's proxy

    Get off my lawn

    "Still, trying to build an interactive database on top of a file system that was designed from the start to support more batch-oriented operations has certainly proved to be a pain point."

    DEC was apparently 30 years ahead of its time, providing Datatrieve as a pain point.

    Mine's the one with the wombat in the pocket.

  17. Anonymous Coward
    Anonymous Coward

    RE: How much to develop from scratch?

    Willy Waving...

  18. David 81

    "So Google was building for the short term."

    No, they made a decision to simplify the design in order to deliver faster... When you design a new system, you will not be able to put everything you want in the first iteration.

This topic is closed for new posts.

Other stories you might like