Presentation is loading. Please wait.

Presentation is loading. Please wait.

2/25/2004 The Google Cluster Architecture February 25, 2004.

Similar presentations


Presentation on theme: "2/25/2004 The Google Cluster Architecture February 25, 2004."— Presentation transcript:

1 2/25/2004 The Google Cluster Architecture February 25, 2004

2 2/25/2004 Assignments Work on Registrar Assignment Study for your quiz!

3 2/25/2004 Web Crawling Start with seed URL Follow all URLs in page, etc Store documents Create index –mapping between word in document and document

4 2/25/2004 Web Search

5 2/25/2004 Properties of Web Search Embarrassingly parallel –Stateless –Read-only Requires lots of storage Requires lots of computation Requires small response time

6 2/25/2004 Google Design Goals Energy efficiency Price performance ratio

7 2/25/2004 Software Architecture Reliability in software –Fault tolerance, not prevention –Cheap PCs High degree of replication

8 2/25/2004 Load Distribution/Balancing Geographically distributed clusters –Increased fault tolerance DNS-based load balancing –Select closest cluster to minimize RTT Hardware-based local load balancing

9 2/25/2004 Query Execution

10 2/25/2004 Query Execution 1.Index each query term 2.Compute relevance score across results

11 2/25/2004 Index Shards A pool of machines serves a particular shard Request goes to one machine in the pool If a machine goes down, capacity marginally reduced

12 2/25/2004 Query Execution 1.Index each query term 2.Compute relevance score across results 3.Retrieve document Highlight keywords 4.Generate/return HTML

13 2/25/2004 Replication No consistency issues Nearly linear speedup

14 2/25/2004 Discussion For which other applications would this architecture be useful/not useful?


Download ppt "2/25/2004 The Google Cluster Architecture February 25, 2004."

Similar presentations


Ads by Google