Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Cloud Computing, CS596-015 Data in the Cloud: Data-as- a-Service for the Cloud.

Similar presentations


Presentation on theme: "1 Cloud Computing, CS596-015 Data in the Cloud: Data-as- a-Service for the Cloud."— Presentation transcript:

1 1 Cloud Computing, CS596-015 Data in the Cloud: Data-as- a-Service for the Cloud

2 Motivation & Challenges Data in the Cloud Transactions in the Cloud: RDBMS vs K/V Store Data Scalability, Elasticity, and Autonomy in the Cloud Multi-tenant Data Platforms Privacy Amazon Relational Database Service: RDS RDBMS on Amazon Simple Storage Service: S3 Microsoft SQL Azure Summary and Conclusions 2 Outline

3 Motivation:  Economics of Scale; hardware and licensing cost  Pay per use & lower administrative cost  Relational cloud is mainly for OLTP workloads & Direct-Attached- Storage (DAS) architectures with consistency guarantees Challenges:  Efficient Multi-tenancy (Provider)  Elastic scalability (Provider)  Privacy (User) 3 Motivation & Challenges

4 Separate system and application state  System metadata is critical but small  Application data has varying needs  Separation allows use of different protocols for each Limit interactions to a single node  Allows systems to scale horizontally  Graceful degradation during failures  Obviate need for distributed synchronization  Non-distributed transaction execution is efficient 4 Data in the Cloud: Design Principals

5 Decouple Access Control from Data Storage  Access Control refer to R/W access to the data  Partition ownership – effectively partition data  Decoupling allows light weight ownership transfer Limited distributed synchronization is practical  Maintenance of metadata  Provide strong guarantees only for data that needs it 5 Data in the Cloud: Design Principals

6 Low consistency considerably increases complexity Consistency logic duplicated in all applications! Often leads to performance inefficiencies There are two candidates for transactions support in the cloud:  Cloudify RDBMS (Data Fission – split atoms)  Enrich Key/Value stores (Data Fusion –combine atoms) 6 Transactions in the Cloud: RDBMS vs K/V Store

7 7 RDBMS Fusion of the architectures Key Value Stores MegaStore [CIDR ‘11] G-Store [SoCC ‘11] Vo et al. [VLDB ‘10] Rao et al. [VLDB ‘11] Deutoronomy [CIDR ‘09, ‘11] ElasTraS [HotCloud ’09, TR ‘10] DB on S3 [SIGMOD ‘08] RelationalCloud [CIDR ‘11] SQL Azure [ICDE ’11] Cloudify RDBMSs Enrich Key Value Stores

8 Add more resources, get more performance:  Handle more requests/sec  Store more data Scaling is achievable in two dimensions:  Scale-up  Scale-out is the main paradigm for the cloud 8 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

9 Finding the right design point? What is the right consistency / programming model?  Pure Key-value stores are too weak  Only having transactions on single record  Traditional RDBMS are too strong  Can’t just run MySQL at scale!  Instead, provide strong consistency within a portion of the data  Megastore  Vertica, Aster, Teradata, Greenplum, etc. 9 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

10 10 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability WeakStrong DynamoBigTable, PNUTS Megastore, G-Store Azure, ElasTraS, Rel Cloud MySQL

11 Data Fusion:  Start with key-value store  Partition records into groups  Provide multi-record updates within a group  Cross-group operations handled separately  Assumes that cross-group ops are rare Data Fission:  Start with relational database  Partition tables into shards  Provide ACID within each shard  Cross-shard ops are expensive  Assumes that cross-shard ops are rare 11 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

12 Data Fusion – Atomic Multi-key Access:  GStore: Efficient Transactional Multi-key access [ACM SOCC’2010]  Key Value Stores:  Atomicity guarantees on single keys  Suitable for majority of current web applications  Many other applications need multi-key accesses  Online multi-player games  Collaborative applications  Enrich functionality of the key-value store 12 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

13 Data Fusion – Key Group Abstraction:  Define a granule of on-demand transactional access  Applications select any set of keys to form a group  Data store provides transactional access to the group  Non-overlapping groups 13 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

14 14 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

15 Data Fusion – Key Grouping Protocol:  Conceptually similar to locking  Allows collocation of ownership at the leader  Leader is the gateway fro group accesses  Safe” ownership transfer: deal with dynamics of the underlying key-value store  Data dynamics of the Key-Value store  Various failure scenarios  Hides complexity from the applications while exposing a richer functionality 15 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

16 Data Fusion – Implementing GStore: 16 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

17 Data Fission – Elastic Transaction Management:  ElasTraS: Designed to make RDBMS cloud-friendly  Database viewed as a collection of partitions  Suitable for standard OLTP workloads:  Large single tenant database instance  Database partitioned at the schema level  Multi-tenant with large number of small databases  Each partition is a self contained database  Elastic to deal with workload changes  Dynamic Load balancing of partitions  Automatic recovery from node failures  Transactional access to database partitions 17 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

18 Data Fission – Effective Resource Sharing:  Multiple database partitions hosted within the same database process  Good consolidation  Independent transaction and data managers  Good performance isolation  Lightweight live database migration  Elastic scaling 18 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

19 What is the difference:  Is Fusion vs. Fission a worthwhile distinction?  Seems like they both arrive at the same place  Megastore “Fusion” vs. ElasTras “Fission”  Shard tables based on table’s primary key  Shard is co-located on the same machine  ACID transactions within a shard  Primary and secondary indexes  All Megastore is missing is a SQL interface! 19 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

20 The difference:  Different targeted users  Fusion is for people who own datacenters  Fission is for people who want SQL in the Cloud Different exposed API:  Fusion is more explicit about performance  Fission tries to hide partitioning from user Anything else? 20 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

21 Dynamically scaling up and down on-demand Important with pay-as-you-go cloud pricing Consolidate to reduce costs Expand to increase performance Need to move state and processing duties around within the system 21 Data Scalability, Elasticity, and Autonomy in the Cloud: Elasticity

22 Need management to be more automatic Elasticity and load balancing based on usage and Machine Learning (ML) predictions Performance modeling:  Migration costs (availability, performance, $$)  Resource isolation (consolidated services)  SLAs 22 Data Scalability, Elasticity, and Autonomy in the Cloud: Autonomy

23 Problem definition: Consolidate databases into smaller number of servers, balancing load and without affecting performance or security It is a paradigm in which a service provider hosts multiple clients (tenants) on a single shared stack of software and hardware Virtualization – Multitenancy in the hardware layer  Major enabling technology for cloud infrastructure Virtualization in the database tier 23 Multi-tenant Data Platforms

24 24 Multi-tenant Data Platforms: Capturing the “Long Tail” in Multitenant Apps

25 Multi Application Scenario:  Support very large number of database applications (with different schemas 25 Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario

26 Multi-tenancy Challenges: 26 Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario Isolation, Scalability, Performance, Customization, Resource Utilization, Metering …

27 Multi-tenancy Trade-offs: 27 Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario

28 Multi-tenancy Resource Sharing and Isolation: 28 Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario

29 Multi-tenancy Trade-offs: 29 Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario

30 Multi-tenancy Trade-offs: 30 Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario

31 Force.com Architecture:  Metadata driven architecture  Tenant specific customizations information stored as metadata  Engine uses metadata to generate virtual application components at runtime  Metadata is key – cache metadata  Application data stored in large shared table – referred to as the heap  Materialize some virtual tables  Pivot tables used for indexing, maintaining relationships, uniqueness constraints 31 Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario

32 Prevent DBA from snooping on data Ensure data security during application & DBMS server compromise 32 Privacy

33 Privacy: Problem: Confidential Data Leaks Application DB Server  curious DB administrators  hackers  curious cloud/employees  physical attacks SQL User 1 User 2 User 3 Both on private clouds and public clouds Regulatory laws 33

34 Goal: protect confidentiality of data 1. Process SQL queries on encrypted data 2. Capture and enforce cryptographically access control in SQL: chain keys from user passwords to data item Application DB Server SQL Threat 1: passive attacks on DB server Threat 2: active/passive attacks on all servers User 1 User 2 User 3 Proxy user passwordPrivacy:CryptDB 34

35 35 Privacy: CryptDB – Threat Model Consider attacks on any part of the servers We do not consider integrity attacks  Can affect data integrity, but not confidentiality

36 36 Privacy: CryptDB – Two Techniques SQL-aware encryption strategy  Observation: set of SQL operators are limited  Different encryption schemes provide different functionality Adjustable query-based encryption  Adapt encryption of data based on user queries

37 e.g., =, !=, GROUP BY, IN, COUNT, DISTINCT Highest SchemeOperation Details RNDNone AES in UFE HOM+, * AES in CTR DETequality e.g., Paillier SEARCH joinnew JOIN ILIKE Amanatidis et al.’07 OPEorder Boldyreva et al. ’09 e.g., >, <, ORDER BY, SORT, MAX, MIN first practical implementation Security 37 Privacy: CryptDB – (1) SQL-aware encryption

38 Any value JOIN SEARCH DET RND Any value OPE-JOIN OPE RND int value HOM  Each column has the same key in a given layer of an onion Onion 1Onion 2Onion 3 Significant confidentiality and space savings 38 Privacy: CryptDB – Onions of encryption

39 39 Privacy: CryptDB – (2) Adjustable query-based encryption Start out the database with the most secure encryption scheme Adjust encryption dynamically  Strip off levels of the onions: proxy gives key to server using a UDF

40 SELECT * FROM emp WHERE salary = 100 UPDATE table1 SET col3onion1 = DecryptRND(key, col3onion1) SELECT * FROM table1 WHERE col3onion1 = x5a8c34 Any value JOIN SEARCH DET RND DET emp: ranknamesalary 40 Privacy: CryptDB – Example

41 RDS is a web service that makes it easy to set up, operate, and scale an RDBMS in the cloud: http://aws.amazon.com/rds/ http://aws.amazon.com/rds/ RDS provides cost-efficient and resizable capacity while managing time-consuming DB administration tasks RDS supports both MySQL, Oracle and SQL Server RDBMS engines Current code, applications and tools with your existing RDBMS can be used with Amazon RDS RDS automatically patches the DBMS, backup your RDBMS, storing the backups for a user-defined retention period, as well as enables point-in-time recovery 41 Amazon Relational Database Service: RDS

42 Highly scalable data storage in-the-cloud Programmatic access via web services API Simple to get going and simple to use Highly available and durable Pay-as-you-go:  Storage: $0.15/GB/month  Data transfer: starts at $0.18/GB  Requests: nominal charges 42 RDBMS on Amazon Simple Storage Service: S3

43 43 RDBMS on Amazon Simple Storage Service: S3 S3 Name Space Amazon S3 mculver-images media.mydomain.com Beach.jpg img1.jpg img2.jpg 2005/party/hat.jp g public.blueorigin.com index.html img/pic1.jpg

44 44 RDBMS on Amazon Simple Storage Service: S3 $.15 per GB per month storage $.15 per GB per month storage Object-Based Storage 1 B – 5 GB / object Fast, Reliable, Scalable Redundant, Dispersed 99.99% Availability Goal Private or Public Per-object URLs & ACLs BitTorrent Support Object-Based Storage 1 B – 5 GB / object Fast, Reliable, Scalable Redundant, Dispersed 99.99% Availability Goal Private or Public Per-object URLs & ACLs BitTorrent Support $.10 - $.18 per GB data transfer $.01 for 1000 to 10000 requests

45 45 Microsoft SQL Azure  Azure Services Platform supports applications running in the Cloud or on local Systems

46 46 Microsoft SQL Azure  Windows Azure provides Windows-based compute and storage services for cloud applications

47 Summary and Conclusion Data Management for Cloud Computing poses a fundamental challenge to database researchers:  Scalability  Reliability  Data Consistency  Elasticity  Differential Pricing Radically different approaches and solutions are warranted to overcome this challenge  Need to understand the nature of new applications Novel Data Management Challenges coupled with Distributed and Parallel Computing issues 47

48 48 END


Download ppt "1 Cloud Computing, CS596-015 Data in the Cloud: Data-as- a-Service for the Cloud."

Similar presentations


Ads by Google