Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 18 Page 1 CS 188,Winter 2015 Cloud Computing CS 188 Distributed Systems March 12, 2015.

Similar presentations


Presentation on theme: "Lecture 18 Page 1 CS 188,Winter 2015 Cloud Computing CS 188 Distributed Systems March 12, 2015."— Presentation transcript:

1 Lecture 18 Page 1 CS 188,Winter 2015 Cloud Computing CS 188 Distributed Systems March 12, 2015

2 Lecture 18 Page 2 CS 188,Winter 2015 Introduction What is cloud computing? Cloud computing and distributed systems Important cloud computing tools –Map Reduce

3 Lecture 18 Page 3 CS 188,Winter 2015 What Is Cloud Computing? Essentially, moving your computing into a vague “network cloud” Not just putting your network transmissions there But putting storage, computation, and other services into “the cloud” Offloading complicated management issues to someone else

4 Lecture 18 Page 4 CS 188,Winter 2015 What Is It Really? Somebody buys and runs a vast number of machines They offer to rent use of their machines to essentially anyone One or more of the machines host each client’s computations Possibly also providing stable storage

5 Lecture 18 Page 5 CS 188,Winter 2015 A Cloud Computing Facility A huge farm of machines With a high speed interconnect And special software to help manage the machines Clients’ jobs are placed on some subset of the machines

6 Lecture 18 Page 6 CS 188,Winter 2015 The Cloud Computing Concept I’ve got a big data job to get done I need to run a small web server I need nodes for a scientific computation

7 Lecture 18 Page 7 CS 188,Winter 2015 Cloud Storage Systems Systems that specialize in storing data for large numbers of users Often don’t provide compute services –Just storage services They take care of backup, ensuring accessibility, etc. Sometimes consumer oriented Sometimes big data oriented

8 Lecture 18 Page 8 CS 188,Winter 2015 Implications of Cloud Computing – For Clients Computing becomes a commodity –Spend X dollars, get Y amount of computing –If you want 2Y computing, spend X more dollars No worries about complex issues of managing machines No need to sink money into machines you only need some of the time But a lot of details are out of your hands

9 Lecture 18 Page 9 CS 188,Winter 2015 Implications of Cloud Computing – For Providers Need to make the most efficient use of your resources as possible –Which requires flexibly moving jobs around over time –Nodes will be heavily reused Must isolate each client from all others Must handle all the grubby distributed system details

10 Lecture 18 Page 10 CS 188,Winter 2015 Who Does Cloud Computing? Lots of people Companies wanting to run web services Parties with large quantities of data to analyze or store Those who don’t want to pay for system administrators

11 Lecture 18 Page 11 CS 188,Winter 2015 Who Provides Cloud Computing? Mostly large companies Big costs involved in setting up and running a cloud environment Huge hardware costs, electric bills, repair and maintenance costs, system and network admin salaries, etc. Generally pays off best at high scale

12 Lecture 18 Page 12 CS 188,Winter 2015 Some Sample Cloud Services Amazon Elastic Compute Cloud Google Cloud Computing Microsoft Cloud Apple iCloud –Primarily cloud storage Dropbox –Also primarily cloud storage

13 Lecture 18 Page 13 CS 188,Winter 2015 Cloud Computing and Distributed Systems Cloud computing relates to distributed systems in two ways 1.Cloud users often run distributed systems –Since it’s most beneficial when your job needs many resources 2.Cloud computing facilities are inherently distributed systems –Of a specialized type requiring special kinds of control software

14 Lecture 18 Page 14 CS 188,Winter 2015 Running Distributed Systems In the Cloud What if your job is large and won’t fit on one computer? Well, design it as a distributed system Contract with the cloud provider to rent the right number of nodes Configure those nodes as the distributed system you want

15 Lecture 18 Page 15 CS 188,Winter 2015 Advantages of This Approach Beyond basic advantages of the cloud A relatively friendly distributed environment Short, predictable delays Generally homogeneous hardware Nice recovery from failures Easy expansion

16 Lecture 18 Page 16 CS 188,Winter 2015 An Example of Advantages I need to run a small web server I also need to run a backend database server Business is good, so now I need a second web server. Business is very good, so now I need more web servers. And a load balancer and a special firewall machine and...

17 Lecture 18 Page 17 CS 188,Winter 2015 What Does This Look Like to the Client? Web server Back end database server Second web server Firewall Load balancer Web servers Back end database server

18 Lecture 18 Page 18 CS 188,Winter 2015 How Does The Client Connect His Cloud Distributed System? Cloud offers tools for specifying connectivity Client indicates what connects to what Cloud uses various virtual networking software to arrange those connections Flexible and easy to change

19 Lecture 18 Page 19 CS 188,Winter 2015 The Cloud As A Distributed System The cloud environment is a collection of computers Connected by a local area network Which must be flexibly set up in many different ways Requires treating the environment as a distributed system

20 Lecture 18 Page 20 CS 188,Winter 2015 Challenges for the Cloud Distributed System Sharing resources –How to make sure a shared network is properly used by all Enforcing topologies Flexible remapping of services to different nodes Security issues

21 Lecture 18 Page 21 CS 188,Winter 2015 Cloud Computing and Virtual Machines Sometimes a client doesn’t need many resources –Perhaps fewer than on one machine Wasteful to give him a whole machine Why not give him a virtual machine hosted on a real machine? Possibly shared with others

22 Lecture 18 Page 22 CS 188,Winter 2015 For Example Instead of this, Do this, Note sharing of physical machines

23 Lecture 18 Page 23 CS 188,Winter 2015 Cloud Computing and Virtual Machines Achieving this effect requires supporting virtual machines Generally a good thing for cloud computing Also provides security advantages Often clouds treat all client machines as virtual –Even if they have all of a physical machine’s resources

24 Lecture 18 Page 24 CS 188,Winter 2015 Failures in Cloud Computing When you have a lot of nodes, you’ll have a lot of failures A failed node will often belong to a client Under many circumstances, you can simply give him another node –Assuming you can recover state –Sometimes done by saving VM state –Sometimes handled by a cloud computing tool

25 Lecture 18 Page 25 CS 188,Winter 2015 Security In Cloud Computing Everyone shares the same network links Sometimes multiple virtual machines share one physical machine Different clients live on the same physical machine as time passes Must provide each client a totally clean and safe environment –One client shouldn’t be able to affect any other client

26 Lecture 18 Page 26 CS 188,Winter 2015 Cloud Computing Tools Clients can do anything they want on cloud machines, usually But there are classes of things many clients need Cloud environments try to provide libraries or other tools to do them

27 Lecture 18 Page 27 CS 188,Winter 2015 MapReduce Perhaps the most common cloud computing software tool/technique A method of dividing large problems into compartmentalized pieces Each of which can be performed on a separate node With an eventual combined set of results

28 Lecture 18 Page 28 CS 188,Winter 2015 The Origin of MapReduce Built by Google In response to their internal needs –They did lots of parallel-ish processing on lots of data Observed common characteristics of many of their tasks Built a framework to handle all of them

29 Lecture 18 Page 29 CS 188,Winter 2015 The Idea Behind MapReduce There is a single function you want to perform on a lot of data –Such as searching it for a string Divide the data into disjoint pieces Perform the function on each piece on a separate node ( map ) Combine the results to obtain output ( reduce )

30 Lecture 18 Page 30 CS 188,Winter 2015 An Example We have 64 megabytes of text data Count how many times each word occurs in the text Divide it into 4 chunks of 16 mbytes Assign each chunk to one processor Perform the map function of “count words” on each

31 Lecture 18 Page 31 CS 188,Winter 2015 The Example Continued 1234 Foo 1 Bar 4 Baz 3 Zoo 6 Yes 12 Too 5 Foo 7 Bar 3 Baz 9 Zoo 1 Yes 17 Too 8 Foo 2 Bar 6 Baz 2 Zoo 2 Yes 10 Too 4 Foo 4 Bar 7 Baz 5 Zoo 9 Yes 3 Too 7 That’s the map stage

32 Lecture 18 Page 32 CS 188,Winter 2015 On To Reduce We might have two more nodes assigned to doing the reduce operation They will each receive a share of data from a map node The reduce node performs a reduce operation to “combine” the shares Outputting its own result

33 Lecture 18 Page 33 CS 188,Winter 2015 Continuing the Example Foo 1 Bar 4 Baz 3 Zoo 6 Yes 12 Too 5 Foo 7 Bar 3 Baz 9 Zoo 1 Yes 17 Too 8 Foo 2 Bar 6 Baz 2 Zoo 2 Yes 10 Too 4 Foo 4 Bar 7 Baz 5 Zoo 9 Yes 3 Too 7

34 Lecture 18 Page 34 CS 188,Winter 2015 The Reduce Nodes Do Their Job Foo 14 Bar 20 Baz 19 Zoo 16 Yes 42 Too 24 And MapReduce is done! Write out the results to files

35 Lecture 18 Page 35 CS 188,Winter 2015 But I Wanted A Combined List No problem Run another (slightly different) MapReduce on the outputs Have one reduce node that combines everything

36 Lecture 18 Page 36 CS 188,Winter 2015 Synchronization in MapReduce Each map node produces an output file for each reduce node It is produced atomically The reduce node can’t work on this data until the whole file is written Forcing a synchronization point between the map and reduce phases Why can’t the reduce nodes start working on data as it’s produced?

37 Lecture 18 Page 37 CS 188,Winter 2015 Controlling the Synchronization One node in the computation is the master It assigns input pieces to the map nodes And indicates which outputs go to which reduce nodes Also keeps track of the health of participant nodes

38 Lecture 18 Page 38 CS 188,Winter 2015 Handling Failures in MapReduce Relatively simple If a map node fails, redo its work on another node Reduce nodes will need to wait for the new node’s results But result is correct

39 Lecture 18 Page 39 CS 188,Winter 2015 What If a Reduce Node Fails? Pretty much the same answer Choose a node to replace it Send that node the reduce files from the failed node Reduce results are also written atomically –So not necessary if it failed after finishing everything

40 Lecture 18 Page 40 CS 188,Winter 2015 MapReduce and Determinism Most MapReduce applications are deterministic So restarting a computation a second time produces the same results Not an absolute requirement But possible results don’t follow such clean semantices

41 Lecture 18 Page 41 CS 188,Winter 2015 MapReduce and Load Balancing Depending on map function and data, some data may take longer to process Leading to possibility of poor assignments of work to nodes In turn, leading to longer run times Handled by dividing inputs into lots of pieces (~100 per worker machine)

42 Lecture 18 Page 42 CS 188,Winter 2015 How Does That Help? First, better load balancing Second, if map node fails after completing a piece, no need to restart it Just assign the incomplete pieces to another node Also, if load balance is poor anyway, idle nodes can take on extra pieces

43 Lecture 18 Page 43 CS 188,Winter 2015 Using MapReduce Designed as a library –In C++ User defines the map and reduce functions Links to the library Provides the input and number of nodes The library handles the details

44 Lecture 18 Page 44 CS 188,Winter 2015 MapReduce and Hadoop The original MapReduce library was built by Google Apache has built an open source version –In Java –As part of its Hadoop package –Which also includes stuff like a distributed file system There are other open source versions of MapReduce

45 Lecture 18 Page 45 CS 188,Winter 2015 Use of MapReduce Extremely widespread If you can define your task using Map/Reduce, other things become easy High quality open source libraries are available MapReduce itself handles most tricky issues Of course, not everything is Map/Reduce

46 Lecture 18 Page 46 CS 188,Winter 2015 A Related Issue What about those files? They’re stored on different machines How do we get them from one machine (say a map node) to another (a reduce node)? Probably we need a distributed file system...

47 Lecture 18 Page 47 CS 188,Winter 2015 Distributed File Systems for MapReduce Google has their own distributed file system –Not bundled with MapReduce –But works well with it The Hadoop package also has one –Designed to work well there Both particularly intended for cluster or cloud environments

48 Lecture 18 Page 48 CS 188,Winter 2015 Conclusion Cloud computing is an inherently distributed system It avoids or hides many messy issues Doesn’t solve everyone’s problems, but of very wide utility Good cloud service requires handling many tricky distributed systems issues


Download ppt "Lecture 18 Page 1 CS 188,Winter 2015 Cloud Computing CS 188 Distributed Systems March 12, 2015."

Similar presentations


Ads by Google