NEWS: Google’s infrastructure

[via Web Rank Info – in French]

Web Rank Info, one of the most interesting and active French website / forum specialized in Search Engines (Google, Yahoo!, etc), published a summary of a presentation made by Jeff Dean, a Google engineer who gave some inputs about the Google’s infrastructure during a colloquium at the University of Washington. You can also have a look at the entire video presentation (about 1 hour).

  • Among the 4 billion indexed pages by Google, the average size of each page is 10KB
  • On the other side, there are perhaps more than 4 billion indexed pages. Have a look at the Google’s request which seems to give back the highest results value (search of the word “the”) => today more than 6 billion!
  • Based on the 4 billion pages forecast and the 10KB per page, Google has to manage and index an incredible volume of raw data, about 40 TB, again fully indexed
  • About the costs: Google chose to use low-costs servers, i.e. a lot of low-end servers instead of some costly high-end servers. A forecast example how the business case should work:
    • Serveur IBM eServer xSeries 440
    • 8 processors Xeon 2 GHz
    • 65 GB RAM
    • 8 TB HD
    • Estimated price: 758’000$

  • Rack of 88 smaller servers
  • 176 processors Xeon 2 GHz (88 x 2)
  • 176 GB RAM (88 x 2)
  • 7 TB HD
  • Estimated price: 278’000$

You got it? You have a factor 2.7 between both estimations, with a huge difference in the delivered power, if the software architecture is able to use this distributed infrastructure correctly.

  • Response time is very important for Google, it should not exceed 0.5s. For this reason, Google has to deploy some servers everywhere around the world, in order to be “nearer” to the clients.
  • On average, each search request will use about 1’000 servers, with an average response time of 0.25s.
  • Google manages on average 250 million search requests per day.
  • Really impressive! And a good confirmation of the competitive advantage Google built with its infrastructure and platform (see my article about the Google’s platform)

    Leave a Reply