Lecture 2. Ecosystem, CAP, and Challenges COSC6376 Cloud Computing Lecture 2. Ecosystem, CAP, and Challenges Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston
Outline Ecosystem CAP Challenges
Reading Assignment Summary due Next Tuesday in class
NIST: Interactions between Actors in Cloud Computing Cloud Consumer Cloud Provider Cloud Broker Cloud Auditor Cloud Carrier The communication path between a cloud provider & a cloud consumer The communication paths for a cloud auditor to collect auditing information The communication paths for a cloud broker to provide service to a cloud consumer
Classes of Cloud Players
Six Layers of Cloud Services Salesforce.com, Webex, … App Engine, Microsoft Azure Amzon AWS, Racksapce, IBM Ensembles Savvis, Intermap, Digital Realty Trust AT & T VMWare, IBM, Xen
Spectrum of Clouds Instruction Set VM (Amazon EC2, 3Tera) Bytecode VM (Microsoft Azure) Framework VM Google AppEngine, Force.com Lower-level, Less management Higher-level, More management EC2 Azure AppEngine Force.com
Amazon EC2 Like physical hardware, users can control nearly the entirely software stack, from the kernel upwards. A few API calls to request and configure the virtualized hardware. No limit on the kinds of applications that can be hosted. Low level of virtualization-raw CPU cycles, IP connectivity-allow developers to code whatever they want. Hard to offer scalability and failover.
Google AppEngine and Force.com Does one thing well: running web apps App Engine handles HTTP(S) requests, nothing else Think RPC: request in, processing, response out Works well for the web and AJAX; also for other services Request-reply based. Not suitable for general- purpose. Severely rationed in how much CPU time they can use in a request. Automatic scaling and high-availability.
Microsoft’s Azure Written using the .NET libraries, and compiled to the language independent managed environment. General -purpose computing. Users get a choice of language, but can not control the operating system or runtime. Libraries provide automatic network configuration and failover/scalability but need users' cooperation also.
Spectrum Azure General-purpose Can not control OS A degree of scalability Google appengine/force.com Highly scalable Yet not general-purpose Amazon EC2 General-purpose Hard to offer scalability
Major Cloud Providers and Service Offerings
Public, Private, and Hybrid Clouds
Hybrid Clouds Using multiple clouds for different applications to match needs Moving an application to meet requirements at specific stages in its lifecycle, from early development through unit test, scale testing, pre-production and ultimately full production scenarios Moving workloads closer to end users across geographic locations, including user groups within the enterprise, partners and external customers Meeting peak demands efficiently in the cloud while the low steady-state is handled internally Maintaining confidential data on better protected clouds while allowing distributed computation on more computationally efficient ones.
Deployment Generic Scenario Perspective 2. Manage a Single Cloud 8. Operate across Clouds Clouds 7. Work with a Selected Cloud 6. Interface Clouds 5. Migrate Between Clouds 1. Deploy to a Cloud 3. Interface to a Cloud Enterprise Systems 4. Migrate to a Cloud
The Combined Conceptual Reference Diagram Cloud Carrier Cloud Consumer Cloud Auditor Broker Security Audit Privacy Impact Audit Performance Cloud Service Management Service Layer Business Support Service Arbitrage Aggregation Service Intermediation Provisioning/ Configuration Portability/ Interoperability Physical Resource Layer IaaS SaaS PaaS Resource Abstraction and Control Layer Hardware Facility
Cloud Interoperability Standards Open Cloud Computing Interface – Infrastructure EC2 API Simple Storage Service (S3) API Windows Azure Storage Service REST APIs Windows Azure Service Management REST APIs Deltacloud API Rackspace Cloud Servers API Rackspace Cloud Files API Cloud Data Management Interface vCloud API GlobusOnline REST API
CAP
The CAP Theorem Three properties of a system: consistency, availability and partitions Availability Consistency Partition tolerance 20
The CAP Theorem Once a writer has written, all readers will see that write Availability Consistency Partition tolerance
Consistency Model A consistency model determines rules for visibility and apparent order of updates. For example: Row X is replicated on nodes M and N Client A writes row X to node N Some period of time t elapses. Client B reads row X from node M Does client B see the write from client A? Consistency is a continuum with tradeoffs For NoSQL, the answer would be: maybe CAP Theorem states: Strict Consistency can't be achieved at the same time as availability and partition-tolerance.
Eventual Consistency When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service
The CAP Theorem Every request received by a non-failing node in the system must result in a response (must terminate) System is available during software and hardware upgrades and node failures Availability Consistency Partition tolerance
Availability Traditionally, thought of as the server/process available five 9’s (99.999 %). However, for large node system, at almost any point in time there’s a good chance that a node is either down or there is a network disruption among the nodes. Want a system that is resilient in the face of network disruption
The CAP Theorem A system can continue to operate in the presence of a network partitions. Availability Consistency Partition tolerance
The CAP Theorem You can have at most two of these three properties for any shared-data system To scale out, you have to partition. That leaves either consistency or availability to choose from In almost all cases, you would choose availability over consistency C A P Availability Partition-resilience Claim: every distributed system is on one side of the triangle.
Challenges
Adoption Challenges Challenge Opportunity Availability Multiple providers & DCs Data lock-in Standardization Data Confidentiality, Auditability, and privacy Encryption, VLANs, Firewalls; Geographical Data Storage; Privacy preserving data outsourcing
Challenges and Opportunities Availability of Service Service Duration Data S3 outage: authentication service overload leading to unavailability 2hours 2/15/08 S3 outage: Single bit error leading to gossip protocol blowup. 6-8hours 7/20/08 AppEngine partial outage: programming error 5 hours 6/17/08 Gmail. 1.5hours 08/11/08
Adoption Challenges Challenge Opportunity Availability Multiple providers & DCs Data lock-in Standardization Data Confidentiality, Auditability, and privacy Encryption, VLANs, Firewalls; Geographical Data Storage; Privacy preserving data outsourcing
Challenges of Datasets over Multiple Clouds Interesting datasets might be available in different clouds Different cloud providers Private or public clouds Services mashing up datasets Inevitably crossing clouds Federated cloud architectures
Adoption Challenges Challenge Opportunity Availability Multiple providers & DCs Data lock-in Standardization Data Confidentiality, Auditability, and privacy Encryption, VLANs, Firewalls; Geographical Data Storage; Privacy preserving data outsourcing
Growth Challenges Challenge Opportunity Data transfer bottlenecks FedEx-ing disks, Data Backup/Archival Performance unpredictability Improved VM support, flash memory, scheduling VMs Scalable storage Invent scalable store Bugs in large distributed systems Invent Debugger that relies on Distributed VMs Scaling quickly Invent Auto-Scaler that relies on ML; Snapshots
Challenges and Opportunities ∙Data Transfer bottlenecks Obstacles: large data transferring is expensive. e.g. Ship 10 TB from UC Berkeley to Amazon Bandwith: 20 M/s Time: 45 days Money: $1000 Opportunities: Ship disks. Make it attractive to keep data in cloud. Reduce the cost of WAN bandwidth.
Growth Challenges Challenge Opportunity Data transfer bottlenecks FedEx-ing disks, Data Backup/Archival Performance unpredictability Improved VM support, flash memory, scheduling VMs Scalable storage Invent scalable store Bugs in large distributed systems Invent Debugger that relies on Distributed VMs Scaling quickly Invent Auto-Scaler that relies on ML; Snapshots
Algorithms on Big data Working on “Big Data” Data mining Machine learning Visualization Traditionally assume data is in flat files or relational databases Distributed data organization puts new challenges Redesign algorithms Redesign frameworks
Policy and Business Challenges Opportunity Reputation Fate Sharing Offer reputation-guarding services like those for email Software Licensing Pay-for-use licenses; Bulk use sales
Come to the Dark Side Spam as a service Crimeware as a service Password cracking cloud DoS attack as a service How likely is the risk buy services using stolen credit card numbers create ec2 instances using stolen keys attack authentication (SOAP, XML. XML wrapping attacks)
Policy and Business Challenges Opportunity Reputation Fate Sharing Offer reputation-guarding services like those for email Software Licensing Pay-for-use licenses; Bulk use sales