Download presentation
Presentation is loading. Please wait.
1
Data in the Cloud: Data-as-a-Service for the Cloud
Cloud Computing, CS Data in the Cloud: Data-as-a-Service for the Cloud
2
Outline Motivation & Challenges Data in the Cloud
Transactions in the Cloud: RDBMS vs K/V Store Data Scalability, Elasticity, and Autonomy in the Cloud Multi-tenant Data Platforms Privacy Amazon Relational Database Service: RDS RDBMS on Amazon Simple Storage Service: S3 Microsoft SQL Azure Summary and Conclusions I can pretty much read this one straight through and provide details in following slides.
3
Motivation & Challenges
Economics of Scale; hardware and licensing cost Pay per use & lower administrative cost Relational cloud is mainly for OLTP workloads & Direct-Attached-Storage (DAS) architectures with consistency guarantees Challenges: Efficient Multi-tenancy (Provider) Elastic scalability (Provider) Privacy (User) Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
4
Data in the Cloud: Design Principals
Separate system and application state System metadata is critical but small Application data has varying needs Separation allows use of different protocols for each Limit interactions to a single node Allows systems to scale horizontally Graceful degradation during failures Obviate need for distributed synchronization Non-distributed transaction execution is efficient Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
5
Data in the Cloud: Design Principals
Decouple Access Control from Data Storage Access Control refer to R/W access to the data Partition ownership – effectively partition data Decoupling allows light weight ownership transfer Limited distributed synchronization is practical Maintenance of metadata Provide strong guarantees only for data that needs it Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
6
Transactions in the Cloud: RDBMS vs K/V Store
High consistency considerably increases complexity Consistency logic duplicated in all applications! Often leads to performance inefficiencies There are two candidates for transactions support in the cloud: Cloudify RDBMS (Data Fission – split atoms) Enrich Key/Value stores (Data Fusion – combine atoms) Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
7
Transactions in the Cloud: RDBMS vs K/V Store
Fusion of the architectures Key Value Stores MegaStore [CIDR ‘11] G-Store [SoCC ‘11] Vo et al. [VLDB ‘10] Rao et al. [VLDB ‘11] Deutoronomy [CIDR ‘09, ‘11] ElasTraS [HotCloud ’09, TR ‘10] DB on S3 [SIGMOD ‘08] RelationalCloud [CIDR ‘11] SQL Azure [ICDE ’11] Cloudify RDBMSs Enrich Key Value Stores Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
8
Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability
Add more resources, get more performance: Handle more requests/sec Store more data Scaling is achievable in two dimensions: Scale-up Scale-out is the main paradigm for the cloud Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
9
Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability
Finding the right design point? What is the right consistency / programming model? Pure Key-value stores are too weak Only having transactions on single record Traditional RDBMS are too strong Can’t just run MySQL at scale! Instead, provide strong consistency within a portion of the data Megastore Vertica, Aster, Teradata, Greenplum, etc. Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
10
Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability
Weak Strong Dynamo BigTable, PNUTS Megastore, G-Store Azure, ElasTraS, Rel Cloud MySQL Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
11
Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability
Data Fusion: Start with key-value store Partition records into groups Provide multi-record updates within a group Cross-group operations handled separately Assumes that cross-group ops are rare Data Fission: Start with relational database Partition tables into shards Provide ACID within each shard Cross-shard ops are expensive Assumes that cross-shard ops are rare Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
12
Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability
Data Fusion – Atomic Multi-key Access: GStore: Efficient Transactional Multi-key access [ACM SOCC’2010] Key Value Stores: Atomicity guarantees on single keys Suitable for majority of current web applications Many other applications need multi-key accesses Online multi-player games Collaborative applications Enrich functionality of the key-value store Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
13
Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability
Data Fusion – Key Group Abstraction: Define a granule of on-demand transactional access Applications select any set of keys to form a group Data store provides transactional access to the group Non-overlapping groups Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
14
Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability
Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
15
Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability
Data Fusion – Key Grouping Protocol: Conceptually similar to locking Allows collocation of ownership at the leader Leader is the gateway for group accesses Safe” ownership transfer: deal with dynamics of the underlying key-value store Data dynamics of the Key-Value store Various failure scenarios Hides complexity from the applications while exposing a richer functionality Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
16
Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability
Data Fusion – Implementing GStore: Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
17
Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability
Data Fission – Elastic Transaction Management: ElasTraS: Designed to make RDBMS cloud-friendly Database viewed as a collection of partitions Suitable for standard OLTP workloads: Large single tenant database instance Database partitioned at the schema level Multi-tenant with large number of small databases Each partition is a self contained database Elastic to deal with workload changes Dynamic Load balancing of partitions Automatic recovery from node failures Transactional access to database partitions Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
18
Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability
Data Fission – Effective Resource Sharing: Multiple database partitions hosted within the same database process Good consolidation Independent transaction and data managers Good performance isolation Lightweight live database migration Elastic scaling Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
19
Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability
What is the difference: Is Fusion vs. Fission a worthwhile distinction? Seems like they both eventually will arrive at the same place Megastore “Fusion” vs. ElasTras “Fission” Shard tables based on table’s primary key Shard is co-located on the same machine ACID transactions within a shard Primary and secondary indexes All Megastore is missing is a SQL interface! Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
20
Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability
The difference: Different targeted users Fusion is for people who own datacenters Fission is for people who want SQL in the Cloud Different exposed API: Fusion is more explicit about performance Fission tries to hide partitioning from user Anything else? Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
21
Data Scalability, Elasticity, and Autonomy in the Cloud: Elasticity
Dynamically scaling up and down on-demand Important with pay-as-you-go cloud pricing Consolidate to reduce costs Expand to increase performance Need to move state and processing duties around within the system Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
22
Data Scalability, Elasticity, and Autonomy in the Cloud: Autonomy
Need management to be more automatic Elasticity and load balancing based on usage and Machine Learning (ML) predictions Performance modeling: Migration costs (availability, performance, $$) Resource isolation (consolidated services) SLAs Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
23
Multi-tenant Data Platforms
Problem definition: Consolidate databases into smaller number of servers, balancing load and without affecting performance or security It is a paradigm in which a service provider hosts multiple clients (tenants) on a single shared stack of software and hardware Virtualization – Multi-tenancy in the hardware layer Major enabling technology for cloud infrastructure Virtualization in the database tier Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
24
Multi-tenant Data Platforms: Capturing the “Long Tail” in Multitenant Apps
Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
25
Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario
Multi Application Scenario: Support very large number of database applications (with different schemas Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
26
Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario
Multi-tenancy Challenges: Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance? Isolation, Scalability, Performance, Customization, Resource Utilization, Metering …
27
Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario
Multi-tenancy Trade-offs: Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
28
Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario
Multi-tenancy Resource Sharing and Isolation: Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
29
Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario
Multi-tenancy Trade-offs: Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
30
Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario
Multi-tenancy Trade-offs: Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
31
Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario
Force.com Architecture: Metadata driven architecture Tenant specific customizations information stored as metadata Engine uses metadata to generate virtual application components at runtime Metadata is key – cache metadata Application data stored in large shared table – referred to as the heap Materialize some virtual tables Pivot tables used for indexing, maintaining relationships, uniqueness constraints Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
32
Privacy Prevent DBA from snooping on data
Ensure data security during application & DBMS server compromise Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
33
Privacy: Problem: Confidential Data Leaks
Application DB Server curious DB administrators hackers curious cloud/employees physical attacks SQL User 1 User 2 User 3 Both on private clouds and public clouds Regulatory laws
34
Privacy: CryptDB Goal: protect confidentiality of data
Application DB Server SQL Threat 1: passive attacks on DB server Threat 2: active/passive attacks on all servers User 1 User 2 User 3 Proxy user password Process SQL queries on encrypted data Capture and enforce cryptographically access control in SQL: chain keys from user passwords to data item
35
Privacy: CryptDB – Threat Model
Consider attacks on any part of the servers We do not consider integrity attacks Can affect data integrity, but not confidentiality
36
Privacy: CryptDB – Two Techniques SQL-aware encryption strategy
Observation: set of SQL operators are limited Different encryption schemes provide different functionality Adjustable query-based encryption Adapt encryption of data based on user queries
37
Privacy: CryptDB – (1) SQL-aware encryption Scheme Operation RND None
e.g., =, !=, GROUP BY, IN, COUNT, DISTINCT Highest Scheme Operation Details RND None AES in UFE HOM +, * AES in CTR DET equality e.g., Paillier SEARCH join new JOIN ILIKE Amanatidis et al.’07 OPE order Boldyreva et al. ’09 e.g., >, <, ORDER BY, SORT, MAX, MIN first practical implementation Security
38
Privacy: CryptDB – Onions of encryption
Significant confidentiality and space savings RND DET RND SEARCH OPE JOIN OPE-JOIN HOM Any value Any value int value Onion 1 Onion 2 Onion 3 Each column has the same key in a given layer of an onion OPE: Order-Preserving symmetric Encryption
39
Privacy: CryptDB – (2) Adjustable query-based encryption
Start out the database with the most secure encryption scheme Adjust encryption dynamically Strip off levels of the onions: proxy gives key to server using a UDF
40
Privacy: CryptDB – Example SELECT * FROM emp WHERE salary = 100
Any value JOIN SEARCH DET RND emp: rank name salary SELECT * FROM emp WHERE salary = 100 UPDATE table1 SET col3onion1 = DecryptRND(key, col3onion1) SELECT * FROM table1 WHERE col3onion1 = x5a8c34
41
Amazon Relational Database Service: RDS
RDS is a web service that makes it easy to set up, operate, and scale an RDBMS in the cloud: RDS provides cost-efficient and resizable capacity while managing time-consuming DB administration tasks RDS supports both MySQL, Oracle and SQL Server RDBMS engines Current code, applications and tools with your existing RDBMS can be used with Amazon RDS RDS automatically patches the DBMS, backup your RDBMS, storing the backups for a user-defined retention period, as well as enables point-in-time recovery Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
42
RDBMS on Amazon Simple Storage Service: S3
Highly scalable data storage in-the-cloud Programmatic access via web services API Simple to get going and simple to use Highly available and durable Pay-as-you-go: Storage: $0.15/GB/month Data transfer: starts at $0.18/GB Requests: nominal charges Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
43
RDBMS on Amazon Simple Storage Service: S3 S3 Name Space
mculver-images media.mydomain.com Beach.jpg img1.jpg img2.jpg 2005/party/hat.jpg public.blueorigin.com index.html img/pic1.jpg Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
44
RDBMS on Amazon Simple Storage Service: S3
$.15 per GB per month storage Object-Based Storage 1 B – 5 GB / object Fast, Reliable, Scalable Redundant, Dispersed 99.99% Availability Goal Private or Public Per-object URLs & ACLs BitTorrent Support $.10 - $.18 per GB data transfer $.01 for 1000 to requests Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?
45
Microsoft SQL Azure Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance? Azure Services Platform supports applications running in the Cloud or on local Systems
46
Microsoft SQL Azure Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance? Windows Azure provides Windows-based compute and storage services for cloud applications
47
Summary and Conclusion
Data Management for Cloud Computing poses a fundamental challenge to database researchers: Scalability (Horizontal) Reliability Data Consistency Elasticity Differential Pricing Radically different approaches and solutions are warranted to overcome this challenge Need to understand the nature of new applications Novel Data Management Challenges coupled with Distributed and Parallel Computing issues
48
END
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.