11 Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 22, 2013 CS 525 Advanced Distributed Systems Spring 2013 All Slides © IG.

11 Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 22, 2013 CS 525 Advanced Distributed Systems Spring 2013 All Slides © IG

2 Timesharing Industry (1975): Market Share: Honeywell 34%, IBM 15%, Xerox 10%, CDC 10%, DEC 10%, UNIVAC 10% Honeywell 6000 & 635, IBM 370/168, Xerox 940 & Sigma 9, DEC PDP-10, UNIVAC 1108 Grids (1980s-2000s): GriPhyN (1970s-80s) Open Science Grid and Lambda Rail (2000s) Globus & other standards (1990s-2000s) First large datacenters: ENIAC, ORDVAC, ILLIAC Many used vacuum tubes and mechanical relays P2P Systems (90s-00s) Many Millions of users Many GB per day Data Processing Industry - 1968: $70 M. 1978: $3.15 Billion. Berkeley NOW Project Supercomputers Server Farms (e.g., Oceano) “A Cloudy History of Time” © IG 2010 Clouds

What(’s new) in Today’s Clouds? Four major features: I.Massive scale. II.On-demand access: Pay-as-you-go, no upfront commitment. –Anyone can access it III.Data-intensive Nature: What was MBs has now become TBs, PBs and XBs. –Daily logs, forensics, Web data, etc. –Do you know the size of Wikipedia dump? IV.New Cloud Programming Paradigms: MapReduce/Hadoop, NoSQL/Cassandra/MongoDB and many others. –High in accessibility and ease of programmability –Lots of open-source Combination of one or more of these gives rise to novel and unsolved distributed computing problems in cloud computing. 3

IV. New Cloud Programming Paradigms Easy to write and run highly parallel programs in new cloud programming paradigms: Google: MapReduce and Sawzall Amazon: Elastic MapReduce service (pay-as-you-go) Google (MapReduce) –Indexing: a chain of 24 MapReduce jobs –~200K jobs processing 50PB/month (in 2006) Yahoo! (Hadoop + Pig) –WebMap: a chain of 100 MapReduce jobs –280 TB of data, 2500 nodes, 73 hours Facebook (Hadoop + Hive) –~300TB total, adding 2TB/day (in 2008) –3K jobs processing 55TB/day Similar numbers from other companies, e.g., Yieldex, eharmony.com, etc. NoSQL: MySQL is an industry standard, but Cassandra is 2400 times faster! 4

What is MapReduce? Terms are borrowed from Functional Language (e.g., Lisp) Sum of squares: (map square ‘(1 2 3 4)) –Output: (1 4 9 16) [processes each record sequentially and independently] (reduce + ‘(1 4 9 16)) –(+ 16 (+ 9 (+ 4 1) ) ) –Output: 30 [processes set of all records in batches] Let’s consider a sample application: Wordcount –You are given a huge dataset (e.g., Wikipedia dump) and asked to list the count for each of the words in each of the documents therein 5 (Acknowledgments: Md Yusuf Sarwar and Md Ahsan Arefin)

Map Process individual records to generate intermediate key/value pairs. Welcome Everyone Hello Everyone Welcome1 Everyone1 Hello1 Everyone1 Input KeyValue 6

Map Parallelly Process individual records to generate intermediate key/value pairs. Welcome Everyone Hello Everyone Welcome1 Everyone1 Hello1 Everyone1 Input MAP TASK 1 MAP TASK 2 7

Map Parallelly Process a large number of individual records to generate intermediate key/value pairs. Welcome Everyone Hello Everyone Why are you here I am also here They are also here Yes, it’s THEM! The same people we were thinking of ……. Welcome1 Everyone1 Hello1 Everyone1 Why 1 Are1 You1 Here1 ……. Input MAP TASKS 8

Reduce Processes and merges all intermediate values associated per key Welcome1 Everyone1 Hello1 Everyone1 Everyone2 Hello1 Welcome1 KeyValue 9

Reduce Parallelly Processes and merges all intermediate values by partitioning keys Welcome1 Everyone1 Hello1 Everyone1 Everyone2 Hello1 Welcome1 REDUCE TASK 1 REDUCE TASK 2 10

Hadoop Code - Map public static class MapClass extends MapReduceBase implements Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map( LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); } } // Source: http://developer.yahoo.com/hadoop/tutorial/module4.html#wordcount http://developer.yahoo.com/hadoop/tutorial/module4.html#wordcount

Hadoop Code - Reduce public static class ReduceClass extends MapReduceBase implements Reducer { public void reduce( Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); }

Hadoop Code - Driver // Tells Hadoop how to run your Map-Reduce job public void run (String inputPath, String outputPath) throws Exception { // The job. WordCount contains MapClass and Reduce. JobConf conf = new JobConf(WordCount.class); conf.setJobName(”mywordcount"); // The keys are words (strings) conf.setOutputKeyClass(Text.class); // The values are counts (ints) conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(MapClass.class); conf.setReducerClass(ReduceClass.class); FileInputFormat.addInputPath( conf, newPath(inputPath)); FileOutputFormat.setOutputPath( conf, new Path(outputPath)); JobClient.runJob(conf); }

Some Other Applications of MapReduce Distributed Grep: –Input: large set of files –Output: lines that match pattern –Map – Emits a line if it matches the supplied pattern –Reduce - Copies the intermediate data to output

Some Other Applications of MapReduce (2) Reverse Web-Link Graph –Input: Web graph: tuples (a, b) where (page a  page b) –Output: For each page, list of pages that link to it –Map – process web log and for each input, it outputs –Reduce - emits

Some Other Applications of MapReduce (3) Count of URL access frequency –Input: Log of accessed URLs, e.g., from proxy server –Output: For each URL, % of total accesses for that URL –Map – Process web log and outputs –Multiple Reducers - Emits (So far, like Wordcount. But still need %) –Chain another MapReduce job after above one –Map – Processes and outputs )> –1 Reducer – Sums up URL_count’s to calculate overall_count. Emits multiple

Some Other Applications of MapReduce (4) Map task’s output is sorted (e.g., quicksort) Reduce task’s input is sorted (e.g., mergesort) Sort –Input: Series of (key, value) pairs –Output: Sorted s –Map – -> (identity) –Reducer – -> (identity) –Partitioning function – partition keys across reducers based on ranges Take data distribution into account to balance reducer tasks

18 Programming MapReduce Externally: For user 1.Write a Map program (short), write a Reduce program (short) 2.Submit job; wait for result 3.Need to know nothing about parallel/distributed programming! Internally: For the cloud (and for us distributed systems researchers) 1.Parallelize Map 2.Transfer data from Map to Reduce 3.Parallelize Reduce 4.Implement Storage for Map input, Map output, Reduce input, and Reduce output

19 Inside MapReduce For the cloud (and for us distributed systems researchers) 1.Parallelize Map: easy! each map job is independent of the other! All Map output records with same key assigned to same Reduce 2.Transfer data from Map to Reduce: All Map output records with same key assigned to same Reduce task use partitioning function, e.g., hash(key) 3.Parallelize Reduce: easy! each reduce job is independent of the other! 4.Implement Storage for Map input, Map output, Reduce input, and Reduce output Map input: from distributed file system Map output: to local disk (at Map node); uses local file system Reduce input: from (multiple) remote disks; uses local file systems Reduce output: to distributed file system local file system = Linux FS, etc. distributed file system = GFS (Google File System), HDFS (Hadoop Distributed File System)

20 Internal Workings of MapReduce

21 Fault Tolerance Worker Failure –Master keeps 3 states for each worker task (idle, in-progress, completed) –Master sends periodic pings to each worker to keep track of it (central failure detector) If fail while in-progress, mark the task as idle Reduce task does not start until all Map tasks done, and all its (Reduce’s) data has been fetched –Barrier between Map phase and Reduce phase Master Failure –Use old checkpoints and bring up secondary master

22 Locality and Backup tasks Locality –Since cloud has hierarchical topology (e.g., racks) –GFS stores 3 replicas of each of 64MB chunks Maybe on different racks, e.g., 2 on a rack, 1 on a different rack –MR attempts to schedule a map task on a machine that contains a replica of corresponding input data: why? Stragglers (slow nodes) –The slowest machine slows the entire job down (why?) –Due to Bad Disk, Network Bandwidth, CPU, or Memory –Keep track of “progress” of each task (% done) –Perform backup (replicated) execution of straggler task: task considered done when first replica complete. Called Speculative Execution.

23 Grep Locality optimization helps: 1800 machines read 1 TB at peak ~31 GB/s W/out this, rack switches would limit to 10 GB/s Startup overhead is significant for short jobs Workload: 1010 100-byte records to extract records matching a rare pattern (92K matching records) Testbed: 1800 servers each with 4GB RAM, dual 2GHz Xeon, dual 169 GB IDE disk, 100 Gbps, Gigabit ethernet per machine

24 Hadoop always either outputs complete results, or none –Partial results? –Can you characterize partial results of a partial MapReduce run? Storage: Is the local write-remote read model good for Map output/Reduce input? –What happens on node failure? –Can you treat intermediate data separately, as a first- class citizen? Entire Reduce phase needs to wait for all Map tasks to finish: in other words, a barrier –Why? What is the advantage? What is the disadvantage? –Can you get around this? Discussion Points

25 10 Challenges [Above the Clouds] (Index: Performance Data-related Scalability Logisitical) Availability of Service: Use Multiple Cloud Providers; Use Elasticity; Prevent DDOS Data Lock-In: Standardize APIs; Enable Surge Computing Data Confidentiality and Auditability: Deploy Encryption, VLANs, Firewalls, Geographical Data Storage Data Transfer Bottlenecks: Data Backup/Archival; Higher BW Switches; New Cloud Topologies; FedExing Disks Performance Unpredictability: QoS; Improved VM Support; Flash Memory; Schedule VMs Scalable Storage: Invent Scalable Store Bugs in Large Distributed Systems: Invent Debuggers; Real-time debugging; predictable pre-run-time debugging Scaling Quickly: Invent Good Auto-Scalers; Snapshots for Conservation Reputation Fate Sharing Software Licensing: Pay-for-use licenses; Bulk use sales 25

26 A more Bottom-Up View of Open Research Directions Myriad interesting problems that acknowledge the characteristics that make today’s cloud computing unique: massive scale + on-demand + data- intensive + new programmability + and infrastructure- and application- specific details.  Monitoring: of systems&applications; single site and multi-site  Storage: massive scale; global storage; for specific apps or classes  Failures: what is their effect, what is their frequency, how do we achieve fault-tolerance?  Scheduling: Moving tasks to data, dealing with federation  Communication bottleneck: within applications, within a site  Locality: within clouds, or across them  Cloud Topologies: non-hierarchical, other hierarchical  Security: of data, of users, of applications, confidentiality, integrity  Availability of Data  Seamless Scalability: of applications, of clouds, of data, of everything  Inter-cloud/multi-cloud computations  Second Generation of Other Programming Models? Beyond MapReduce!  Pricing Models, SLAs, Fairness  Green cloud computing  Explore the limits today’s of cloud computing 26

27 Hmm, CCT and OpenCirrus are new. But are there no older testbeds?

A community resource open to researchers in academia and industry http://www.planet-lab.org/ (About )1077 nodes at 494 sites across the world Founded at Princeton University (led by Prof. Larry Peterson), but owned in a federated manner by 494 sites Node: Dedicated server that runs components of PlanetLab services. Site: A location, e.g., UIUC, that hosts a number of nodes. Sliver: Virtual division of each node. Currently, uses VMs, but it could also other technology. Needed for timesharing across users. Slice: A spatial cut-up of the PL nodes. Per user. A slice is a way of giving each user (Unix-shell like) access to a subset of PL machines, selected by the user. A slice consists of multiple slivers, one at each component node. Thus, PlanetLab allows you to run real world-wide experiments. Many services have been deployed atop it, used by millions (not just researchers): Application-level DNS services, Monitoring services, CDN services. If you need a PlanetLab account and slice for your CS525 experiment, let me know asap! There are a limited number of these available for CS525. 28 All images © PlanetLab

Emulab A community resource open to researchers in academia and industry https://www.emulab.net/ A cluster, with currently 475 nodes Founded and owned by University of Utah (led by Prof. Jay Lepreau) As a user, you can: –Grab a set of machines for your experiment –You get root-level (sudo) access to these machines –You can specify a network topology for your cluster (ns file format) Thus, you are not limited to only single-cluster experiments; you can emulate any topology If you need an Emulab account for your CS525 experiment, let me know asap! There are a limited number of these available for CS525. 29 All images © Emulab

30 And then there were… Grids! What is it?

31 Example: Rapid Atmospheric Modeling System, ColoState U Hurricane Georges, 17 days in Sept 1998 –“RAMS modeled the mesoscale convective complex that dropped so much rain, in good agreement with recorded data” –Used 5 km spacing instead of the usual 10 km –Ran on 256+ processors Computation-intenstive computing (or HPC = high performance computing) Can one run such a program without access to a supercomputer?

32 Wisconsin MIT NCSA Distributed Computing Resources

33 An Application Coded by a Physicist Job 0 Job 2 Job 1 Job 3 Output files of Job 0 Input to Job 2 Output files of Job 2 Input to Job 3 Jobs 1 and 2 can be concurrent

34 An Application Coded by a Physicist Job 2 Output files of Job 0 Input to Job 2 Output files of Job 2 Input to Job 3 May take several hours/days 4 stages of a job Init Stage in Execute Stage out Publish Computation Intensive, so Massively Parallel Several GBs

35 Wisconsin MIT NCSA Job 0 Job 2Job 1 Job 3

36 Job 0 Job 2Job 1 Job 3 Wisconsin MIT Condor Protocol NCSA Globus Protocol

37 Job 0 Job 2 Job 1 Job 3 Wisconsin MIT NCSA Globus Protocol Internal structure of different sites invisible to Globus External Allocation & Scheduling Stage in & Stage out of Files

38 Job 0 Job 3 Wisconsin Condor Protocol Internal Allocation & Scheduling Monitoring Distribution and Publishing of Files

39 Tiered Architecture (OSI 7 layer- like) Resource discovery, replication, brokering High energy Physics apps Globus, Condor Workstations, LANs

40 The Grid Recently Some are 40Gbps links! (The TeraGrid links) “A parallel Internet”

41 Globus Alliance Alliance involves U. Illinois Chicago, Argonne National Laboratory, USC-ISI, U. Edinburgh, Swedish Center for Parallel Computers Activities : research, testbeds, software tools, applications Globus Toolkit (latest ver - GT3) “The Globus Toolkit includes software services and libraries for resource monitoring, discovery, and management, plus security and file management. Its latest version, GT3, is the first full-scale implementation of new Open Grid Services Architecture (OGSA).”

42 Some Things Grid Researchers Consider Important Single sign-on: collective job set should require once-only user authentication Mapping to local security mechanisms: some sites use Kerberos, others using Unix Delegation: credentials to access resources inherited by subcomputations, e.g., job 0 to job 1 Community authorization: e.g., third-party authentication For clouds, you need to additionally worry about failures, scale, on-demand nature, and so on.

43 Cloud computing vs. Grid computing: what are the differences? National Lambda Rail: hot in 2000s, funding pulled in 2009 What has happened to the Grid Computing Community? –See Open Cloud Consortium –See CCA conference Discussion Points

(Entrepreneurial) Project: Where to get your ideas from Sometimes the best ideas come from something that you would rather use (or someone close to you would) –Adobe’s John developed Illustrator because his wife was a graphic designer and LaserWriter was not enough –Bluemercury.com was founded by a woman interested in high-end cosmetics herself –Militaryadvantage.com was started by a young veteran who had had trouble finding out about benefits and connecting with other servicemen and women when he was in service –Facebook was developed by Zuckerberg who wanted to target exclusive clubs in Harvard for his own reasons. –Flickr developed the photo IM’ing and sharing service internally for themselves –Craigslist originally started as an email list for events (and then the cc feature broke, forcing Newmark to put it all on a website) –Yahoo! was started by Yang and Filo, when they were grad students, as a collection of links to research papers in the EE field. Slowly they came to realize that this idea could be generalize to the entire web. Sometimes the best ideas come bottom-up from what you are really passionate about –Gates and Microsoft, Wozniak and Apple, Facebook and Zuckerberg The above two categories are not mutually exclusive! (Geschke, Adobe) “If you aren't passionate about what you are going to do, don't do it. Work smart and not long, because you need to preserve all of your life, not just your work life.” 44

Announcements Please sign up for a presentation slot by this Thursday office hours Next up: Peer to peer systems 45

11 Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 22, 2013 CS 525 Advanced Distributed Systems Spring 2013 All Slides © IG.

Similar presentations

Presentation on theme: "11 Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 22, 2013 CS 525 Advanced Distributed Systems Spring 2013 All Slides © IG."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

11 Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 22, 2013 CS 525 Advanced Distributed Systems Spring 2013 All Slides © IG.

Similar presentations

Presentation on theme: "11 Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 22, 2013 CS 525 Advanced Distributed Systems Spring 2013 All Slides © IG."— Presentation transcript:

Similar presentations

About project

Feedback