Lecture 2-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) August 26, 2010 Lecture 2  2010, I. Gupta.

Slides:



Advertisements
Similar presentations
University of Notre Dame
Advertisements

EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
IT INFRASTRUCTURE AND EMERGING TECHNOLOGIES
Amazon. Cloud computing also known as on-demand computing or utility computing. Similar to other utility providers like electric, water, and natural gas,
Cloud Computing Imranul Hoque. Today’s Cloud Computing.
By Adam Balla & Wachiu Siu
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 6 2/13/2015.
Cloud Computing PRESENTED BY- Rajat Dixit (rd2392)
Emerging Platform#6: Cloud Computing B. Ramamurthy 6/20/20141 cse651, B. Ramamurthy.
Ken Birman. Massive data centers We’ve discussed the emergence of massive data centers associated with web applications and cloud computing Generally.
Public cloud definition Public cloud is a cloud in which Cloud infrastructure is available to the general public. Public cloud define cloud computing.
CS 525 Advanced Distributed Systems Spring 2015 Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 22, 2015 All slides © IG.
11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 20, 2011 CS 525 Advanced Distributed Systems Spring.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 2: Introduction to Cloud Computing All slides © IG.
Cloud Computing (101).
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 4.
1 Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 23, 2014 CS 525 Advanced Distributed Systems Spring 2014 All Slides © IG 1.
INTRODUCTION TO CLOUD COMPUTING Cs 595 Lecture 5 2/11/2015.
Data-intensive Computing on the Cloud: Concepts, Technologies and Applications B. Ramamurthy This talks is partially supported by National.
Chapter-7 Introduction to Cloud Computing Cloud Computing.
Lecture 2-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Indranil Gupta (Indy) August 29, 2013 Lecture 2 What is Cloud Computing?
CLOUD COMPUTING. A general term for anything that involves delivering hosted services over the Internet. And Cloud is referred to the hardware and software.
Cloud Computing – The Cloud Dr. Jie Liu. Definition  Cloud computing is Web-based processing, whereby shared resources, software, and information are.
On Availability of Intermediate Data in Cloud Computations Steven Y. Ko, Imranul Hoque, Brian Cho, and Indranil Gupta Distributed Protocols Research Group.
Clouds on IT horizon Faculty of Maritime Studies University of Rijeka Sanja Mohorovičić INFuture 2009, Zagreb, 5 November 2009.
Cloud Computing Cloud Computing Class-1. Introduction to Cloud Computing In cloud computing, the word cloud (also phrased as "the cloud") is used as a.
For more notes and topics visit:
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Lecture 3-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) August 31, 2010 Lecture 3  2010, I. Gupta.
PhD course - Milan, March /09/ Some additional words about cloud computing Lionel Brunie National Institute of Applied Science (INSA) LIRIS.
Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over the Internet. Cloud is the metaphor for.
Cloud Computing Definitions Cloud The set of hardware, networks, storage, services and interfaces that combine to deliver computing as a service Cloud.
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
Geographic Information Systems Cloud GIS. ► The use of computing resources (hardware and software) that are delivered as a service over the Internet ►
Software Architecture
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Cloud Computing. Cloud Computing defined Dynamically scalable, device-independent and task-centric computing resources are provided online, with all charges.
Cloud Computing Dave Elliman 11/10/2015G53ELC 1. Source: NY Times (6/14/2006) The datacenter is the computer!
Cloud Architecture Chapter 2. SPI Model Cloud Computing Classification Model – SPI - SaaS: (Software as a Service) - PaaS (Platform as a Service) - IaaS.
Presented by: Mostafa Magdi. Contents Introduction. Cloud Computing Definition. Cloud Computing Characteristics. Cloud Computing Key features. Cost Virtualization.
1 © 2009 Cisco Systems, Inc. All rights reserved.Cisco Confidential Cloud Computing – The Value Proposition Wayne Clark Architect, Intelligent Network.
Distributed Systems Introduction 2 CS 403 Distributed Systems D.S. Theory Peer to peer systems Cloud Computing Sensor Networks.
Enterprise Cloud Computing
CLOUD COMPUTING
11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG.
CLOUD COMPUTING RICH SANGPROM. What is cloud computing? “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
Web Technologies Lecture 13 Introduction to cloud computing.
Cloud Architecture. SPI Model Cloud Computing Classification Model – SPI Cloud Computing Classification Model – SPI - SaaS: (Software as a Service) -
1 TCS Confidential. 2 Objective : In this session we will be able to learn:  What is Cloud Computing?  Characteristics  Cloud Flavors  Cloud Deployment.
CS 525 Advanced Distributed Systems Spring 2016 Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2016 All slides © IG.
RANDY MODOWSKI COSC Cloud Computing. Road Map What is Cloud Computing? History of “The Cloud” Cloud Milestones How Cloud Computing is being used.
Data Centers and Cloud Computing 1. 2 Data Centers 3.
KAASHIV INFOTECH – A SOFTWARE CUM RESEARCH COMPANY IN ELECTRONICS, ELECTRICAL, CIVIL AND MECHANICAL AREAS
© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.
Distributed Systems Lecture 2 Cloud computing 1. Previous lecture Overview of distributed systems Differences between parallel and distributed computing.
Lecture 2-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2012 Indranil Gupta (Indy) August 30, 2012 Lecture 2 What is Cloud.
© 2007 IBM Corporation IBM Software Strategy Group IBM Google Announcement on Internet-Scale Computing (“Cloud Computing Model”) Oct 8, 2007 IBM Confidential.
What is Cloud Computing 1. Cloud computing is a service that helps you to perform the tasks over the Internet. The users can access resources as they.
Lecture 1 Book: Hadoop in Action by Chuck Lam Online course – “Cloud Computing Concepts” lecture notes by Indranil Gupta.
Lecture 2-3: Introduction to Cloud Computing
Avenues International Inc.
Amazon AWS fundamental Visite- Ravindra verma.
Trends: Technology Doubling Periods – storage: 12 mos, bandwidth: 9 mos, and (what law is this?) cpu compute capacity: 18 mos Then and Now Bandwidth 1985:
Introduction to Cloud Computing
Cloud Computing Cloud computing refers to “a model of computing that provides access to a shared pool of computing resources (computers, storage, applications,
Emerging technologies-
Presentation transcript:

Lecture 2-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) August 26, 2010 Lecture 2  2010, I. Gupta

Lecture 2-2 Clouds are Water Vapor Oracle has a Cloud Computing Center. And yet… Larry Ellison’s Rant on Cloud Computing

Lecture 2-3 The Hype! Gartner - Cloud computing revenue will soar faster than expected and will exceed $150 billion within five years. Forrester - Cloud-Based Is Often Cheaper Than On-Premise Vivek Kundra, CTO of Obama Government: “Growing adoption of cloud computing could improve data sharing and promote collaboration among federal, state and local governments.” E.g: fedbizopps.gov Merrill Lynch: “By 2011 the volume of cloud computing market opportunity would amount to $160bn, including $95bn in business and productivity apps ( , office, CRM, etc.) and $65bn in online advertising.” IDC: “Spending on IT cloud services will triple in the next 5 years, reaching $42 billion and capturing 25% of IT spending growth in 2012.” Sources: and

Lecture 2-4 $$$ Ingo Elfering, Vice President of Information Technology Strategy, GlaxoSmithKline: “With Online Services, we are able to reduce our IT operational costs by roughly 30% of what we’re spending now and introduce a variable cost subscription model for these technologies that allows us to more rapidly scale or divest our investment as necessary as we undergo a transformational change in the pharmaceutical industry” Jim Swartz, CIO, Sybase: “At Sybase, a private cloud of virtual servers inside its data centre has saved nearly $US2 million annually since 2006, Swartz says, because the company can share computing power and storage resources across servers.” Dave Power, Associate Information Consultant at Eli Lilly and Company: “With AWS, Powers said, a new server can be up and running in three minutes (it used to take Eli Lilly seven and a half weeks to deploy a server internally) and a 64- node Linux cluster can be online in five minutes (compared with three months internally). The deployment time is really what impressed us. It's just shy of instantaneous." Sources: and

Lecture 2-5 What is a Cloud? It’s a cluster! It’s a supercomputer! It’s a datastore! It’s superman! None of the above All of the above Cloud = Lots of storage + compute cycles nearby

Lecture 2-6 What is a Cloud? A single-site cloud (aka “Datacenter”) consists of –Compute nodes (split into racks) –Switches, connecting the racks –A network topology, e.g., hierarchical –Storage (backend) nodes connected to the network –Front-end for submitting jobs –Software Services: A geographically distributed cloud consists of –Multiple such sites –Each site perhaps with a different structure and services

Lecture 2-7 A Sample Cloud Topology Top of the Rack Switch Core Switch Servers Rack So then, what is a cluster?

Lecture 2-8 Scale of Industry Datacenters Microsoft [NYTimes, 2008] –150,000 machines –Growth rate of 10,000 per month –Largest datacenter: 48,000 machines –80,000 total running Bing Yahoo! [Hadoop Summit, 2009] –25,000 machines –Split into clusters of 4000 AWS EC2 (Oct 2009) –40,000 machines –8 cores/machine Google –(Rumored) several hundreds of thousands of machines

Lecture 2-9 What(’s new) in Today’s Clouds? Three major features: I.On-demand access: Pay-as-you-go, no upfront commitment. –Anyone can access it (e.g., Washington Post – Hillary Clinton example) II.Data-intensive Nature: What was MBs has now become TBs. –Daily logs, forensics, Web data, etc. –Do you know the size of Wikipedia dump? III.New Cloud Programming Paradigms: MapReduce/Hadoop, Pig Latin, DryadLinq, Swift, and many others. –High in accessibility and ease of programmability Combination of one or more of these gives rise to novel and unsolved distributed computing problems in cloud computing.

Lecture Timesharing Companies & Data Processing Industry 2010 Grids Peer to peer systems Clusters The first datacenters! PCs (not distributed!) Clouds and datacenters “A Cloudy History of Time” © IG 2010

Lecture 2-11 Timesharing Industry (1975): Market Share: Honeywell 34%, IBM 15%, Xerox 10%, CDC 10%, DEC 10%, UNIVAC 10% Honeywell 6000 & 635, IBM 370/168, Xerox 940 & Sigma 9, DEC PDP-10, UNIVAC 1108 Grids (1980s-2000s): GriPhyN (1970s-80s) Open Science Grid and Lambda Rail (2000s) Globus & other standards (1990s-2000s) First large datacenters: ENIAC, ORDVAC, ILLIAC Many used vacuum tubes and mechanical relays P2P Systems (90s-00s) Many Millions of users Many GB per day Data Processing Industry : $70 M. 1978: $3.15 Billion. Berkeley NOW Project Supercomputers Server Farms (e.g., Oceano) “A Cloudy History of Time” © IG 2010 Clouds

Lecture 2-12 Trends: Technology Doubling Periods – storage: 12 mos, bandwidth: 9 mos, and (what law is this?) cpu speed: 18 mos Then and Now Bandwidth –1985: mostly 56Kbps links nationwide –2004: 155 Mbps links widespread Disk capacity –Today’s PCs have 100GBs, same as a 1990 supercomputer

Lecture 2-13 Trends: Users Then and Now Biologists: –1990: were running small single-molecule simulations –2004: want to calculate structures of complex macromolecules, want to screen thousands of drug candidates, sequence very complex genomes Physicists –CERN’s Large Hadron Collider producing many PB/year

Lecture 2-14 Prophecies In 1965, MIT's Fernando Corbató and the other designers of the Multics operating system envisioned a computer facility operating “like a power company or water company”. Plug your thin client into the computing Utility and Play your favorite Intensive Compute & Communicate Application –[Have today’s clouds brought us closer to this reality?]

Lecture 2-15 I. On-demand access: *aaS Classification On-demand: renting a cab vs (previously) renting a car, or buying one. E.g.: –AWS Elastic Compute Cloud (EC2): $0.086-$1.16 per CPU hour –AWS Simple Storage Service (S3): $0.055-$0.15 per GB-month HaaS: Hardware as a Service –You get access to barebones hardware machines, do whatever you want with them, Ex: Your own cluster –Not always a good idea (why?) IaaS: Infrastructure as a Service –You get access to flexible computing and storage infrastructure. Virtualization is one way of achieving this. Often said to subsume HaaS. –Ex: Amazon Web Services (AWS: EC2 and S3), Eucalyptus, Rightscale.

Lecture 2-16 I. On-demand access: *aaS Classification PaaS: Platform as a Service –You get access to flexible computing and storage infrastructure, coupled with a software platform (often tightly) –Ex: Google’s AppEngine SaaS: Software as a Service –You get access to software services, when you need them. Often said to subsume SOA (Service Oriented Architectures). –Ex: Microsoft’s LiveMesh, MS Office on demand

Lecture 2-17 II. Data-intensive Computing Computation-Intensive Computing –Example areas: MPI-based, High-performance computing, Grids –Typically run on supercomputers (e.g., NCSA Blue Waters) Data-Intensive –Typically store data at datacenters –Use compute nodes nearby –Compute nodes run computation services In data-intensive computing, the focus shifts from computation to the data: CPU utilization no longer the most important resource metric

Lecture 2-18 III. New Cloud Programming Paradigms Easy to write and run highly parallel programs in new cloud programming paradigms: Google: MapReduce and Sawzall Yahoo: Hadoop and Pig Latin Microsoft: DryadLINQ Facebook: Hive Amazon: Elastic MapReduce service (pay-as-you-go) Google (MapReduce) –Indexing: a chain of 24 MapReduce jobs –~200K jobs processing 50PB/month (in 2006) Yahoo! (Hadoop + Pig) –WebMap: a chain of 100 MapReduce jobs –280 TB of data, 2500 nodes, 73 hours Facebook (Hadoop + Hive) –~300TB total, adding 2TB/day (in 2008) –3K jobs processing 55TB/day Similar numbers from other companies, e.g., Yieldex, eharmony.com, etc.

Lecture 2-19 Two Categories of Clouds Industrial Clouds –Can be either a (i) public cloud, or (ii) private cloud –Private clouds are accessible only to company employees –Public clouds provide service to any paying customer: »Amazon S3 (Simple Storage Service): store arbitrary datasets, pay per GB-month stored »Amazon EC2 (Elastic Compute Cloud): upload and run arbitrary images, pay per CPU hour used »Google AppEngine: develop applications within their appengine framework, upload data that will be imported into their format, and run Academic Clouds –Allow researchers to innovate, deploy, and experiment –Google-IBM Cloud (U. Washington): run apps programmed atop Hadoop –Cloud Computing Testbed UIUC): first cloud testbed to support systems research. Runs: (i) apps programmed atop Hadoop and Pig, (ii) systems-level research on this first generation of cloud computing models (~HaaS), and (iii) Eucalyptus services (~AWS EC2). »On the 4 th floor of Siebel Center (if you care to look) –OpenCirrus: first federated cloud testbed.

Lecture 2-20 Single site Cloud: to Outsource or Own? Medium-sized organization: wishes to run a service for M months –Service requires 128 servers (1024 cores) and 524 TB –Same as UIUC CCT cloud site Outsource (e.g., via AWS): monthly cost –S3 costs: $0.12 per GB month. EC2 costs: $0.10 per Cpu hour –Storage = $ 0.12 X 524 X 1000 ~ $62 K –Total = Storage + CPUs = $62 K + $0.10 X 1024 X 24 X 30 ~ $136 K Own: monthly cost –Storage ~ $349 K / M –Total ~ $ 1555 K / M K (includes 1 sysadmin / 100 nodes) »using 0.45:0.4:0.15 split for hardware:power:network and 3 year lifetime of hardware

Lecture 2-21 Breakeven analysis: more preferable to own if: - $349 K / M < $62 K (storage) - $ 1555 K / M K < $136 K (overall) Breakeven points - M > 5.55 months (storage) » Not surprising: Cloud providers benefit monetarily most from storage - M > 12 months (overall) Single site Cloud: to Outsource or Own?

Lecture Timesharing Companies & Data Processing Industry 2010 Grids Peer to peer systems Clusters The first datacenters! PCs (not distributed!) Clouds and datacenters “A Cloudy History of Time” © IG 2010

Lecture 2-23 But were there clouds before this? Yes!

Lecture 2-24 Emulab A community resource open to researchers in academia and industry A cluster, with currently 475 nodes Founded and owned by University of Utah (led by Prof. Jay Lepreau) As a user, you can: –Grab a set of machines for your experiment –You get root-level (sudo) access to these machines –You can specify a network topology for your cluster –You can emulate any topology All images © Emulab

Lecture 2-25 A community resource open to researchers in academia and industry Currently, about 1077 nodes at 494 sites across the world Founded at Princeton University (led by Prof. Larry Peterson), but owned in a federated manner by 494 sites Node: Dedicated server that runs components of PlanetLab services. Site: A location, e.g., UIUC, that hosts a number of nodes. Sliver: Virtual division of each node. Currently, uses VMs, but it could also other technology. Needed for timesharing across users. Slice: A spatial cut-up of the PL nodes. Per user. A slice is a way of giving each user (Unix-shell like) access to a subset of PL machines, selected by the user. A slice consists of multiple slivers, one at each component node. Thus, PlanetLab allows you to run real world-wide experiments. Many services have been deployed atop it, used by millions (not just researchers): Application-level DNS services, Monitoring services, etc. All images © PlanetLab

Lecture 2-26 Next Week Tuesday –More cloud computing: quick intro to MapReduce, and Grids Thursday –Failure detection –Readings: Section 12.1, parts of Section –Relevant to MP1!