Distributed Systems Lecture 2 Cloud computing 1
Previous lecture Overview of distributed systems Differences between parallel and distributed computing Challenges in distributed computing Distributed computing models 2
Motivation Companies’ IT-Infrastructures are hosted on premise – Maintenance and operational costs – Upgrade costs – Personnel costs – Limited space Clouds offer a new way to provision IT infrastructure – Outsource infrastructure Off premise On demand No maintenance, upgrade, personnel costs Virtually unlimited capacity 3
Clouds vs. on premise Better resource planning Lower costs – 5 year infrastructure rule – Electricity bills – Downtime 4
Cost - a major player Dave Power, Associate Information Consultant at Eli Lilly and Company: – “With AWS, Powers said, a new server can be up and running in three minutes (it used to take Eli Lilly seven and a half weeks to deploy a server internally) and a 64-node Linux cluster can be online in five minutes (compared with three months internally). … It's just shy of instantaneous.“ Ingo Elfering, Vice President of Information Technology Strategy, GlaxoSmithKline: – “With Online Services, we are able to reduce our IT operational costs by roughly 30% of what we’re spending.” Jim Swartz, CIO, Sybase: – “At Sybase, a private cloud of virtual servers inside its data centre has saved nearly $US2 million annually since 2006, Swartz says, because the company can share computing power and storage resources across servers.” 100s of startups in Silicon Valley can harness large computing resources without buying their own machines 5
Definition “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” (NIST) Key characteristics 1.On demand access Storage, computational, network, applications 2.Broad network access 3.Pay per use policy Per hour, per minute, per Gb, per request 4.Resource pooling Virtually unlimited resources 5.Rapid elasticity Add/remove VM cores and memory, add/remove VMs 6.New programming paradigms MapReduce, Hadoop, NoSQL (Cassandra, MongoDB), … 7.Data intensive nature MBs have become TBs, PBs, …. – Daily logs, web data, scientific data, … 6
Gartner hype 7
How does a cloud look like? A walk through Facebook’s Datacenter in Prineville, Oregon – Facebook OpenCompute – One of the world’s most energy efficient – $210M investment Source: Gigaom article from
But we also need energy & cooling Off site and on site power generation units Cooling units 9 Water sprayed into air Air sucked in from top. Purified water sprayed into air.
On demand pay per use On demand self-service – A consumer can provision computing services such as server resources, network, storage automatically as required Geographically distributed and broad network access – The services are geographically distributed and are accessible broadly through internet and hence can be used with think or thin clients (such as mobile phones) Multi-tenant resource pooling – The resources are shared among multiple clients using the cloud for better utilization of the underlying infrastructure. The allocation of resources is transperent to the consumer Pay per use – Hourly (Amazon, Azure) – Per minute (Google) 10
Resource elasticity Rapid elasticity – The consumer can rapidly request or release resources based on their requirements. This is useful in quickly scaling out to changing demands Measured service (utility computing) – Different metering mechanisms are in place to monitor compute, network, sotrage and any other specialiazed services. – The consumer only pays for the actual services used (pay-per-use model) 11
Scale out or scale up? Scale out – Add computing nodes – Pros Unlimited extensibility – Cons Hard to achieve Requires flexible structure of software system Replication of compute nodes and data has to be supported by system Scale up – Add resources to existing nodes – Pros Easy to achieve – Cons Limited by the node’s maximum hardware capacity 12
Service models Hardware Virtualization Software OS Application Platform (distributed) Applications Hardware Virtualization Software OS Application Platform (distributed) Applications Hardware Virtualization Software OS Application Platform (distributed) Applications IaaS PaaSSaaS Cloud Provider Developer/User Infrastructure as a Service (IaaS) - utility computing – datacenter as a service – client provisions processing, storage, networks where she/he can run arbitrary software s.t. OS or applications – full control of the infrastructure through virtualized resources – Example: Amazon EC2 13
Service models (2) Platform as a Service (PaaS) – developer point of view – deploy applications on the cloud using programming languages, libraries, and tools provided by the cloud provider – no management control of the infrastructure – Example: Google App Engine, Microsoft Azure Software as a Service (SaaS) – end user point of view – use existing applications deployed on the cloud – software experiences are delivered through the Internet – Example: Google Drive, Flickr, Gmail 14
Cloud stack 15
Deployment models Private clouds – Single organization owned, managed, and operated cloud infrastructure – Community clouds – Infrastructure provisioned for exclusive use by a specific community of consumers from organizations that have – Shared concerns – Owned, managed, and operated by one or more organizations in the community 16
Deployment models (2) Public clouds – cloud infrastructure provisioned for public use – owned, managed, and operated by a business, academic or government institution, or a combination of them – usually accessible following a pay-per-use billing model Hybrid clouds – composition of two or more private/community/public clouds that remain unique entities but are bound by standards or proprietary technology 17
Advantages of each model 18
Two types of clouds 19 Industrial Clouds – Can be either public or private – Private clouds are accessible only to company employees E.g., EWS, or Yahoo’s private clusters for its employees – Public clouds provide service to any paying customer: Amazon S3 (Simple Storage Service): store arbitrary datasets, pay per GB-month stored Amazon EC2 (Elastic Compute Cloud): upload and run arbitrary images, pay per CPU hour used Google AppEngine: develop applications within their appengine framework, upload data that will be imported into their format, and run Academic Clouds – Allow researchers to innovate, deploy, and experiment – Cloud Computing Testbed UIUC): first cloud testbed to support Hadoop and HaaS. – OpenCirrus: first federated cloud testbed.
Control vs. productivity Different service models (IaaS, PaaS) provide different levels of control and productivity in terms of management overhead and administration requirements IaaS provide greater control since everything from the OS to platform to application is under developer’s control PaaS give higher productivity since the details of the underlying platform are completely hidden and transparent to the user (e.g. handling scalability or VM lifecycle management) Public vs. Private vs. Hybrid cloud deployments also imply different level of control over the infrastructure as well as data and computation 20
Control vs. productivity 21
Cloud concepts: virtualization Cloud computing main aspect: elastic on demand – pay as you go – use as much as you want whenever you want These notions are practical only if we have – lot of flexibility – efficiency in the back-end These are readily available in Virtualized Environments and Machines 22
Virtualization The creation of a virtual (rather than actual) version of something, such as an operating system, a server, a storage device or network resources Allows sharing of physical resources among multiple users (tenants) Allows deployment of hardware agnostic software Allows easy configuration of virtual machine images and quick deployment of large number of services 23
Virtualized vs. traditional computing Traditional computing stackVirtualized computing stack 24
Hypervisor/VMM Software layer which: Allows multiple guest OSs (Virtual Machines) to run simultaneously on a single physical host Provides a hardware abstraction to the running guest OS and efficiently multiplexes underlying hardware resources 25
Multiprogramming vs. virtualization Multi Programming – Each Process thinks it has complete control on all of the resources – Virtual Memory – CPU Sharing Virtualization – OS assumes control of the entire underlying infrastructure through a hypervisors/VMM 26
Multi ProgrammingVirtualization 1.CPU shared among processes 2.Memory shared using Page Tables 3.Process knows it is being managed (system calls) 1.CPU shared among OSs 2.Memory shared using more indirections: Multiple Page Tables 3.OS may/ may not know it is being used Multiprogramming vs. virtualization (2) 27
Amazon Web Services (AWS) First Public Cloud (launched in 2006) Collection of on-demnad pay-per-use computing services Solutions in various service models: – IaaS: EC2, S3, ELB, Autoscaler – PaaS: Elastic Beanstalk, EMR – SaaS: Cloud Search, Elastic Transcoder Other services: – Networking: DNS, CDN – Databases: relational, noSQL, memcache – Scripted deployment 28
AWS interface 29
Next lecture Big Data and Hadoop/MapReduce 30