Download presentation
Presentation is loading. Please wait.
1
2/25/2004 The Google Cluster Architecture February 25, 2004
2
2/25/2004 Assignments Work on Registrar Assignment Study for your quiz!
3
2/25/2004 Web Crawling Start with seed URL Follow all URLs in page, etc Store documents Create index –mapping between word in document and document
4
2/25/2004 Web Search
5
2/25/2004 Properties of Web Search Embarrassingly parallel –Stateless –Read-only Requires lots of storage Requires lots of computation Requires small response time
6
2/25/2004 Google Design Goals Energy efficiency Price performance ratio
7
2/25/2004 Software Architecture Reliability in software –Fault tolerance, not prevention –Cheap PCs High degree of replication
8
2/25/2004 Load Distribution/Balancing Geographically distributed clusters –Increased fault tolerance DNS-based load balancing –Select closest cluster to minimize RTT Hardware-based local load balancing
9
2/25/2004 Query Execution
10
2/25/2004 Query Execution 1.Index each query term 2.Compute relevance score across results
11
2/25/2004 Index Shards A pool of machines serves a particular shard Request goes to one machine in the pool If a machine goes down, capacity marginally reduced
12
2/25/2004 Query Execution 1.Index each query term 2.Compute relevance score across results 3.Retrieve document Highlight keywords 4.Generate/return HTML
13
2/25/2004 Replication No consistency issues Nearly linear speedup
14
2/25/2004 Discussion For which other applications would this architecture be useful/not useful?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.