back to article Google search index splits with MapReduce

Google Caffeine — the remodeled search infrastructure rolled out across Google's worldwide data center network earlier this year — is not based on MapReduce, the distributed number-crunching platform that famously underpinned the company's previous indexing system. As the likes of Yahoo!, Facebook, and Microsoft work to …


  1. Craig 2
    Not bad though...

    8 Hours to index the whole net. That's one mother of a for-next loop!

  2. Anonymous Coward
    Nice to hear

    I liked this article. Google rose to prominence because they did search well. OK, they branched out into other areas sure but search is something they excel at. Always nice to read about how they're evolving. It's just a shame everyone else seems to be playing catch-up still.

  3. Ian Michael Gumby
    Putting this in perspective...

    While Google has their internal code... The rest of the world has Map/Reduce in Hadoop and BigTable in HBase.

    1. Trevor_Pott Gold badge

      @Ian Michael Gumby

      I wonder what the efficiency delta is?

  4. Nic 3

    Old School

    How crap do I feel on a Friday morning reading about how old school my db programming is :(

    1. CD001

      Could be worse...

      You could find out you're one of those new fangled web applications developers who don't actually know much about databases and don't see anything wrong with "SELECT * FROM `tbl_products` ORDER BY `product_times_bought` DESC LIMIT 0, 500".

  5. Anonymous Coward
    Anonymous Coward

    so what's next?

    Will google migrate their entire database to simlpy a bunch of pointers to actual websites so they don't have to store anything other than their index?

    1. john mullee


      I know a guy who was moaning about migrating petabytes in the chocolate factory.

      It didn't work smoothly first time .....

  6. Martin 47

    now if they could only find a way

    to make their search results useful and not try and save everything that I type in I may actually start using it again

  7. DZ-Jay

    "The whole web"?

    Really? The *entire* World Wide Web? Or only the portion indexed by Google? This portion is certainly significant, but not at all exhaustive.


  8. ChrisInAStrangeLand

    The Whole Web

    If it doesn't exist on Google and, then a document isn't on the web. It's on a private database that no one will find or care about.

    1. Steven Knox


      If you have never found a site via means other than Google or, then YOU'RE not on the web. You're on Google's extended network. But the site is still on the web.

    2. DZ-Jay


      If it does not exist on Google, well, it does not exist on Google. It still exists on the World Wide Web, and could still be publicaly accessible, albeit not from Google.

      There could be a link to some resource on some page that disallows Google from indexing certain documents, yet still be openly, freely, and publicly available.

      The "Web" is not Google, it existed before and outside of it, and so shall it continue.


  9. xperroni

    What about NoSQL?

    Can't wait for Google to publish more details on BigTable, specially for that "database programming" bit. Could it be they're using actual relational databases as building blocks? Google has long been the poster child of NoSQL due to its use of MapReduce, what if Google moves over to a relational database infrastructure out of - gasp! - performance concerns?

    1. Jerome 0


      BigTable itself certainly isn't relational, far from it in fact (as I know from writing for Google AppEngine). Whether the extra layers on top of BigTable that Google are talking about here provide something like relational functionality is a different matter - but, to be honest, I doubt that too.

  10. Guido Esperanto

    ack you can keep your GFS and penii-enlarging names

    I'm telling you, I'm gonna bust the world with my Access 97 db and select queries, just you wait and see...

    /raises finger to mouth.


  11. Identity


    Does anybody remember the 1970 movie "Colossus: The Forbin Project"? <> A giant AI supercomputer (2 actually) attempt to take over the world...

    1. MinionZero
      @Colossus: The Forbin Project

      I did think of that same film when hearing them call it Colossus. (It would scare Google senseless if their server started to talk on its own to Microsoft's server in their own beyond human made up language ;)

      By the way, there is talk of remaking that film!. :)

    2. Alan Esworthy

      Just yesterday...

      ...I was reminiscing with a cow-orker about that movie. Dreadful acting but very entertaining. I'll never again look at an elevator/lift floor indicator without a small smile and mental shudder.

      (note to RegEds: where's the ROTM icon?)

    3. xperroni

      The voice of World Control

      Sure as Hell I do. Colossus' closing speech replayed in my head all week long:

      "This is the voice of World Control. I bring you peace. It may be the peace of plenty and content or the peace of unburied dead. The choice is yours. Obey me and live, or disobey and die."

      Just wonder whether GFS2's kill switch is conveniently placed on Google's CEO office's desk, or buried inside an armored mountain?

  12. mhenriday
    Big Brother

    Google seems determined to maintain its lead in search -

    and is obviously matching this determination with the resources needed to keep its level of technical competence in the field well above that of its competitors. I only hope that these commendable efforts will not be fatally marred by the hubris that at times seems to strikes Google policy makers - to take an example from the search field, users to whom Google Instant has been rolled out will find themselves unable to disable both Instant and Autocomplete. Complaints to the help fora have been met with an attitude all too characteristic for Google - we know best ! Personally, I like Instant and am looking forward to it being rolled out to us here in the Frozen North, but Google's refusal to provide users with an option to disable both it and Autocomplete tells a tale of arrogance which may come to be the company's downfall....


    1. CD001


      As is stands currently - not only do you have to go through quite a lot of effort to enable it (presumably because it's in "beta" heh) but once you do there's a little dropdown link next to the search bar that says "instant search on" - just click it to get the dropdown to toggle it to "off" ... of course, they may have some IP -> Location mapping that changes how it works depending on where you are.

  13. ZenCoder

    Its nice to see they are still competeing based on techonology.

    Its nice to see that Google keeps moving forward trying to out innovate the competition.

    The privacy implications of their services creep me out a bit sometimes but I still use Google Search, GMail, Google Maps, Google Voice, Picasa, Google Products anyway.

    1. asdf

      well of course

      The worlds biggest ad agency business model depends on you trusting them with data so of course they need to give us some of the best software and services in the industry for free. Chrome though is so damn good its hard to deny the borg. Oh well guess its sexier to be the worlds most high tech ad agency (think Madmen) than to be the worlds most high tech ink seller (HP).

  14. amanfromMars 1 Silver badge

    Of Baby Steps .... and Giant Strides. The Normal Gait in, and for, Revolutionary Times

    "Colossus is specifically designed for BigTable, and for this reason it's not as suited to "general use" as GFS was. In other words, it was built specifically for use with the new Caffeine search-indexing system, and though it may be used in some form with other Google services, it isn't the sort of thing that's designed to span the entire Google infrastructure."

    If Google are Playing the Enigmatic Great Game ..... the One that Fields Beings and Non-State Actors as Unilateral Universal Drivers leading Networks InterNetworking with Absolute IntelAIgently Designed Dynamic Control of the Future Thought Space, are they in Good Company Head Quarters.

    Care for a DoughNut with your Caffeine Fix/Hit? IT is a Seventh Heavenly Mix and Sticky Sweet Delight to Savour, and much Favoured by more than just a Rich Tea Few for the Simple Rich Complexity of its Fundamentally Elementary Base.

    And regarding those who would have/have had a Forbin Project Deja Vu Moment/Psychotic Episode, one trusts in Global Operating Devices that you really enjoyed it and are enjoying IT. Would you be assured and reassured to know that the Present pales into Insignificance, compared to what ITs Future Machines bring.

    What says Herr Schmidt, Google's Talking Head, to that Colossal ProgramMING and Titanic AIdDevelopment? And specifically Titanic, because it sinks everything and anything that would Rudely Assault its Progress, without Due Attention to Future Harmless Direction and Secure Processing of Driver Information and Input for Operating System Intelligence and Output ....... for with such as is then a Virtual AIMachine Operating System, is the Human Race easily Programmed to Lead in a Path which Follows an Alien Course with ESPecial Forces.

    You may like to consider though the Enigma that the Future of Search is not in Search at All, IT is in Novel Product and Service Placement from Servers and Drivers of SMARTer Virtual Machinery which Long ago found All the Answers on Global Operating Devices. .......... and that can be Delivered by any Business in the Field of Providing Businesses in the Field with their IntelAIgents and Future Information...... their Drivers and Direction Maps.

    And Henri makes very valid points in ... "Google seems determined to maintain its lead in search -" .... mhenriday Posted Friday 10th September 2010 13:55 GMT ..... for a machine and its servers which store a range of alternative available options for you, and thinks to provide them before you have time to deliver your own choices, is a SMART Honey Trap which can easily be programmed to curtail and eliminate novel imaginative choice and replace it with a catalogue of default corporate/federated fare and Sub-Prime Stock Market items and DODgy MODifying services for AIRemote Control of Free Choice ..... which can easily be Abused and Used for Enslavement to Third Party Decisions, although that will always lead very quickly to the arrogant company's downfall, or boardroom coup and company takeover for SMARTer Market Place Makeover.

    1. amanfromMars 1 Silver badge

      Welcome to the Mad House, with Insane Rules when One Fears to Tread a Clear and Transparent Path?

      And who says the machines are not taking over ......

      Read IT and Weep, for then is the Future in your Personalised Program of Valuable Enslaving Addictions, which Enable Paper Tiger Currency Heroes and Penning Wealth Pushers, the Pleasure of ProVisioning your every Bad/Glad/Mad/Rad/Sad Dream, with Cloud and Communication and Computer Control of Virtualised Memes, Sequencing Codes which Attack and InterReact with MetaDataBase and the Human Genome, the Bits and Bytes of Coded Information which define and expose your Humanity to Revolutionary Manipulation for Evolutionary Change?

      1. CD001


        A couple of PROPER amanfrommars posts - it's been a while since there's been one of such substance (or nonsense depending on how many of your allotted meds you've decided to take today).

  15. Anonymous Coward
    so true, I wished I could also see how it is to work INSIDE google, on sites such as or (although it's not clear if they are real employees)

  16. Anonymous Coward
    Anonymous Coward

    Nice database, shame about the search function

    Google continues to be a poorer and poorer search engine day by day; It regularly returns rubbish results and inserts its interpretation of what you *meant* to search for ever more frequently, requiring more and more usage of quotation marks in search terms (not to mention "" in every single term, which at least Opera allows me to automate).

    PageRank was never a good idea, and it continues to suck donkey wab to this very day. A better search engine is a trivial task. A better (or even just as good) index is non-trivial to say the least.

    The problem is: who has the funding to actually produce such an alternative? Microsoft, obviously, but they don't have the talent. Who does that leave?

