[via Web Rank Info – in French]
Web Rank Info, one of the most interesting and active French website / forum specialized in Search Engines (Google, Yahoo!, etc), published a summary of a presentation made by Jeff Dean, a Google engineer who gave some inputs about the Google’s infrastructure during a colloquium at the University of Washington. You can also have a look at the entire video presentation (about 1 hour).
- Among the 4 billion indexed pages by Google, the average size of each page is 10KB
- On the other side, there are perhaps more than 4 billion indexed pages. Have a look at the Google’s request which seems to give back the highest results value (search of the word “the”) => today more than 6 billion!
- Based on the 4 billion pages forecast and the 10KB per page, Google has to manage and index an incredible volume of raw data, about 40 TB, again fully indexed
- About the costs: Google chose to use low-costs servers, i.e. a lot of low-end servers instead of some costly high-end servers. A forecast example how the business case should work:
- Serveur IBM eServer xSeries 440
- 8 processors Xeon 2 GHz
- 65 GB RAM
- 8 TB HD
- Estimated price: 758’000$
- Rack of 88 smaller servers
- 176 processors Xeon 2 GHz (88 x 2)
- 176 GB RAM (88 x 2)
- 7 TB HD
- Estimated price: 278’000$
You got it? You have a factor 2.7 between both estimations, with a huge difference in the delivered power, if the software architecture is able to use this distributed infrastructure correctly.
Really impressive! And a good confirmation of the competitive advantage Google built with its infrastructure and platform (see my article about the Google’s platform)