Cloud Computing Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington August 2012
6/24/2016 Office applications Databases and storage Math and science Web browser Personal computing
6/24/2016 Office applications Math and science Web browser Databases and storage Cloud accessed through the browser
With the cloud provider ’ s domain name …
Or with your own …
6/24/2016 Math and science Office applications Web browser Databases and storage Why not office applications too?
6/24/2016 © 2009 Gribble, Lazowska, Levy, Zahorjan Web browser Math and science Office applications Databases and storage Why not everything else?
Consider … z Sharing is easy z Someone else does backup z Someone else handles software updates z There ’ s 7x24x365 operations support, auxiliary power, redundant network connections, geographical diversity z Scalability – both up and down – is instantaneous z Many fewer demands on the local operating system and computer z Incredibly disruptive to existing business models!
Amazon Elastic Compute Cloud (EC2) z $0.68 per hour for y 8 cores of 4 GHz 64-bit Intel Xeon or AMD Opteron y 7 GB memory y 1.69 TB scratch storage z Need it 24x7 for a year? y $3900 z $0.085 per hour for y 1 core of 1.2 GHz 32-bit Intel or AMD (1/20 th the above) y 1.7 GB memory y 160 GB scratch storage z Need it 24x7 for a year? y $490
z This includes y Purchase + replacement y Housing y Power y Operation y Reliability y Security y Instantaneous expansion and contraction z 1000 processors for 1 day costs the same as 1 processor for 1000 days!
Velcro
z A datacenter has containers z A container has 1, ,000 servers z A server has two processors, 2 disks, tons of memory, battery backup z Processors are chosen for power efficiency, not performance
Slide courtesy of Werner Vogels
z Many hundreds of machines are involved in a single Google search request (remember, the web is 400+TB) y There are multiple clusters (of thousands of computers each) all over the world y DNS routes your search to a nearby cluster Isn’t this just timesharing?
y A cluster consists of Google Web Servers, Index Servers, Doc Servers, and various other servers (ads, spell checking, etc.) y These are cheap standalone computers, rack-mounted, connected by commodity networking gear
y Within the cluster, load-balancing routes your search to a lightly-loaded Google Web Server (GWS), which will coordinate the search and response y The index is partitioned into “ shards. ” Each shard indexes a subset of the docs (web pages). Each shard is replicated, and can be searched by multiple computers – “ index servers ” y The GWS routes your search to one index server associated with each shard, through another load-balancer y When the dust has settled, the result is an ID for every doc satisfying your search, rank-ordered by relevance
y The docs, too, are partitioned into “ shards ” – the partitioning is a hash on the doc ID. Each shard contains the full text of a subset of the docs. Each shard can be searched by multiple computers – “ doc servers ” y The GWS sends appropriate doc IDs to one doc server associated with each relevant shard y When the dust has settled, the result is a URL, a title, and a summary for every relevant doc
y Meanwhile, the ad server has done its thing, the spell checker has done its thing, etc. y The GWS builds an HTTP response to your search and ships it off z Many hundreds of computers have enabled you to search 400+TB of web in ~100 ms.
z Enormous volumes of data z Extreme parallelism z The cheapest imaginable components y Failures occur all the time y You couldn ’ t afford to prevent this in hardware z Software makes it y Fault-Tolerant y Highly Available y Recoverable y Consistent y Scalable y Predictable y Secure