Data in the Cloud: Data-as-a-Service for the Cloud

Slides:



Advertisements
Similar presentations
Creating HIPAA-Compliant Medical Data Applications with Amazon Web Services Presented by, Tulika Srivastava Purdue University.
Advertisements

Database Architectures and the Web
System Center 2012 R2 Overview
Database Scalability, Elasticity, and Autonomy in the Cloud Agrawal et al. Oct 24, 2011.
C-Store: Data Management in the Cloud Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 5, 2009.
Amazon RDS (MySQL and Oracle) and SQL Azure Emil Tabakov Telerik Software Academy academy.telerik.com.
CLOUD COMPUTING AN OVERVIEW & QUALITY OF SERVICE Hamzeh Khazaei University of Manitoba Department of Computer Science Jan 28, 2010.
CryptDB: A Practical Encrypted Relational DBMS Raluca Ada Popa, Nickolai Zeldovich, and Hari Balakrishnan MIT CSAIL New England Database Summit 2011.
 Relational Cloud: A Database-as-a-Service for the Cloud Carlo Curino, Evan Jones, Raluca Ada Popa, Nirmesh Malaviya, Eugene Wu, Sam Madden, Hari Balakrishnan,
Locking Key Ranges with Unbundled Transaction Services 1 David Lomet Microsoft Research Mohamed Mokbel University of Minnesota.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
PhD course - Milan, March /09/ Some additional words about cloud computing Lionel Brunie National Institute of Applied Science (INSA) LIRIS.
Training Workshop Windows Azure Platform. Presentation Outline (hidden slide): Technical Level: 200 Intended Audience: Developers Objectives (what do.
Introduction to Cloud Computing
Cloud Computing & Amazon Web Services – EC2 Arpita Patel Software Engineer.
Webscale Computing Mike Culver Amazon Web Services.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
1 Database Management Systems (DBMS). 2 Database Management Systems (DBMS) n Overview of: ä Database Management Components ä Database Systems Architecture.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
Cloud Computing is a Nebulous Subject Or how I learned to love VDF on Amazon.
Text Microsoft to Or Tweet #uktechdays Questions?
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
Web Technologies Lecture 13 Introduction to cloud computing.
Technology Drill Down: Windows Azure Platform Eric Nelson | ISV Application Architect | Microsoft UK |
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
1 Cloud Computing, CS Data in the Cloud: Data-as- a-Service for the Cloud.
Aaron Stanley King. What is SQL Azure? “SQL Azure is a scalable and cost-effective on- demand data storage and query processing service. SQL Azure is.
Prof. Jong-Moon Chung’s Lecture Notes at Yonsei University
Platform as a Service (PaaS)
Unit 3 Virtualization.
Course: Cluster, grid and cloud computing systems Course author: Prof
Introduction to DBMS Purpose of Database Systems View of Data
Application Security Lecture 27 Aditya Akella.
CS 540 Database Management Systems
Chapter 6: Securing the Cloud
Business System Development
Understanding The Cloud
Avenues International Inc.
Platform as a Service (PaaS)
Scalable Web Apps Target this solution to brand leaders responsible for customer engagement and roll-out of global marketing campaigns. Implement scenarios.
Windows Azure SQL Federation
Connected Maintenance Solution
IOT Critical Impact on DC Design
Chapter 2 Database System Concepts and Architecture
Hybrid Cloud Architecture for Software-as-a-Service Provider to Achieve Higher Privacy and Decrease Securiity Concerns about Cloud Computing P. Reinhold.
Dynamo: Amazon’s Highly Available Key-value Store
Connected Maintenance Solution
Maximum Availability Architecture Enterprise Technology Centre.
Scalable Web Apps Target this solution to brand leaders responsible for customer engagement and roll-out of global marketing campaigns. Implement scenarios.
Cloud Computing By P.Mahesh
Introduction to Cloud Computing
Using cryptography in databases and web applications
Chapter 2 Database Environment Pearson Education © 2009.
Database Management System (DBMS)
Microsoft Azure Fundamentals: Data Understanding Microsoft Azure SQL
Modernizing your enterprise with hybrid it
Outline Virtualization Cloud Computing Microsoft Azure Platform
Building a Database on S3
Introduction to Databases Transparencies
Database Environment Transparencies
بررسی معماری های امن پایگاه داده از جنبه رمزنگاری
Cloud computing mechanisms
AWS Cloud Computing Masaki.
Introduction to DBMS Purpose of Database Systems View of Data
Emerging technologies-
Specialized Cloud Architectures
Database Management Systems
Terms: Data: Database: Database Management System: INTRODUCTION
The Database World of Azure
Presentation transcript:

Data in the Cloud: Data-as-a-Service for the Cloud Cloud Computing, CS596-015 Data in the Cloud: Data-as-a-Service for the Cloud

Outline Motivation & Challenges Data in the Cloud Transactions in the Cloud: RDBMS vs K/V Store Data Scalability, Elasticity, and Autonomy in the Cloud Multi-tenant Data Platforms Privacy Amazon Relational Database Service: RDS RDBMS on Amazon Simple Storage Service: S3 Microsoft SQL Azure Summary and Conclusions I can pretty much read this one straight through and provide details in following slides.

Motivation & Challenges Economics of Scale; hardware and licensing cost Pay per use & lower administrative cost Relational cloud is mainly for OLTP workloads & Direct-Attached-Storage (DAS) architectures with consistency guarantees Challenges: Efficient Multi-tenancy (Provider) Elastic scalability (Provider) Privacy (User) Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Data in the Cloud: Design Principals Separate system and application state System metadata is critical but small Application data has varying needs Separation allows use of different protocols for each Limit interactions to a single node Allows systems to scale horizontally Graceful degradation during failures Obviate need for distributed synchronization Non-distributed transaction execution is efficient Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Data in the Cloud: Design Principals Decouple Access Control from Data Storage Access Control refer to R/W access to the data Partition ownership – effectively partition data Decoupling allows light weight ownership transfer Limited distributed synchronization is practical Maintenance of metadata Provide strong guarantees only for data that needs it Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Transactions in the Cloud: RDBMS vs K/V Store High consistency considerably increases complexity Consistency logic duplicated in all applications! Often leads to performance inefficiencies There are two candidates for transactions support in the cloud: Cloudify RDBMS (Data Fission – split atoms) Enrich Key/Value stores (Data Fusion – combine atoms) Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Transactions in the Cloud: RDBMS vs K/V Store Fusion of the architectures Key Value Stores MegaStore [CIDR ‘11] G-Store [SoCC ‘11] Vo et al. [VLDB ‘10] Rao et al. [VLDB ‘11] Deutoronomy [CIDR ‘09, ‘11] ElasTraS [HotCloud ’09, TR ‘10] DB on S3 [SIGMOD ‘08] RelationalCloud [CIDR ‘11] SQL Azure [ICDE ’11] Cloudify RDBMSs Enrich Key Value Stores Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability Add more resources, get more performance: Handle more requests/sec Store more data Scaling is achievable in two dimensions: Scale-up Scale-out is the main paradigm for the cloud Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability Finding the right design point? What is the right consistency / programming model? Pure Key-value stores are too weak Only having transactions on single record Traditional RDBMS are too strong Can’t just run MySQL at scale! Instead, provide strong consistency within a portion of the data Megastore Vertica, Aster, Teradata, Greenplum, etc. Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability Weak Strong Dynamo BigTable, PNUTS Megastore, G-Store Azure, ElasTraS, Rel Cloud MySQL Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability Data Fusion: Start with key-value store Partition records into groups Provide multi-record updates within a group Cross-group operations handled separately Assumes that cross-group ops are rare Data Fission: Start with relational database Partition tables into shards Provide ACID within each shard Cross-shard ops are expensive Assumes that cross-shard ops are rare Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability Data Fusion – Atomic Multi-key Access: GStore: Efficient Transactional Multi-key access [ACM SOCC’2010] Key Value Stores: Atomicity guarantees on single keys Suitable for majority of current web applications Many other applications need multi-key accesses Online multi-player games Collaborative applications Enrich functionality of the key-value store Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability Data Fusion – Key Group Abstraction: Define a granule of on-demand transactional access Applications select any set of keys to form a group Data store provides transactional access to the group Non-overlapping groups Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability Data Fusion – Key Grouping Protocol: Conceptually similar to locking Allows collocation of ownership at the leader Leader is the gateway for group accesses Safe” ownership transfer: deal with dynamics of the underlying key-value store Data dynamics of the Key-Value store Various failure scenarios Hides complexity from the applications while exposing a richer functionality Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability Data Fusion – Implementing GStore: Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability Data Fission – Elastic Transaction Management: ElasTraS: Designed to make RDBMS cloud-friendly Database viewed as a collection of partitions Suitable for standard OLTP workloads: Large single tenant database instance Database partitioned at the schema level Multi-tenant with large number of small databases Each partition is a self contained database Elastic to deal with workload changes Dynamic Load balancing of partitions Automatic recovery from node failures Transactional access to database partitions Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability Data Fission – Effective Resource Sharing: Multiple database partitions hosted within the same database process Good consolidation Independent transaction and data managers Good performance isolation Lightweight live database migration Elastic scaling Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability What is the difference: Is Fusion vs. Fission a worthwhile distinction? Seems like they both eventually will arrive at the same place Megastore “Fusion” vs. ElasTras “Fission” Shard tables based on table’s primary key Shard is co-located on the same machine ACID transactions within a shard Primary and secondary indexes All Megastore is missing is a SQL interface! Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability The difference: Different targeted users Fusion is for people who own datacenters Fission is for people who want SQL in the Cloud Different exposed API: Fusion is more explicit about performance Fission tries to hide partitioning from user Anything else? Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Data Scalability, Elasticity, and Autonomy in the Cloud: Elasticity Dynamically scaling up and down on-demand Important with pay-as-you-go cloud pricing Consolidate to reduce costs Expand to increase performance Need to move state and processing duties around within the system Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Data Scalability, Elasticity, and Autonomy in the Cloud: Autonomy Need management to be more automatic Elasticity and load balancing based on usage and Machine Learning (ML) predictions Performance modeling: Migration costs (availability, performance, $$) Resource isolation (consolidated services) SLAs Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Multi-tenant Data Platforms Problem definition: Consolidate databases into smaller number of servers, balancing load and without affecting performance or security It is a paradigm in which a service provider hosts multiple clients (tenants) on a single shared stack of software and hardware Virtualization – Multi-tenancy in the hardware layer Major enabling technology for cloud infrastructure Virtualization in the database tier Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Multi-tenant Data Platforms: Capturing the “Long Tail” in Multitenant Apps Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario Multi Application Scenario: Support very large number of database applications (with different schemas Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario Multi-tenancy Challenges: Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance? Isolation, Scalability, Performance, Customization, Resource Utilization, Metering …

Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario Multi-tenancy Trade-offs: Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario Multi-tenancy Resource Sharing and Isolation: Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario Multi-tenancy Trade-offs: Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario Multi-tenancy Trade-offs: Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario Force.com Architecture: Metadata driven architecture Tenant specific customizations information stored as metadata Engine uses metadata to generate virtual application components at runtime Metadata is key – cache metadata Application data stored in large shared table – referred to as the heap Materialize some virtual tables Pivot tables used for indexing, maintaining relationships, uniqueness constraints Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Privacy Prevent DBA from snooping on data Ensure data security during application & DBMS server compromise Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Privacy: Problem: Confidential Data Leaks Application DB Server curious DB administrators hackers curious cloud/employees physical attacks SQL User 1 User 2 User 3 Both on private clouds and public clouds Regulatory laws

Privacy: CryptDB Goal: protect confidentiality of data Application DB Server SQL Threat 1: passive attacks on DB server Threat 2: active/passive attacks on all servers User 1 User 2 User 3 Proxy user password Process SQL queries on encrypted data Capture and enforce cryptographically access control in SQL: chain keys from user passwords to data item

Privacy: CryptDB – Threat Model Consider attacks on any part of the servers We do not consider integrity attacks Can affect data integrity, but not confidentiality

Privacy: CryptDB – Two Techniques SQL-aware encryption strategy Observation: set of SQL operators are limited Different encryption schemes provide different functionality Adjustable query-based encryption Adapt encryption of data based on user queries

Privacy: CryptDB – (1) SQL-aware encryption Scheme Operation RND None e.g., =, !=, GROUP BY, IN, COUNT, DISTINCT Highest Scheme Operation Details RND None AES in UFE HOM +, * AES in CTR DET equality e.g., Paillier SEARCH join new JOIN ILIKE Amanatidis et al.’07 OPE order Boldyreva et al. ’09 e.g., >, <, ORDER BY, SORT, MAX, MIN first practical implementation Security

Privacy: CryptDB – Onions of encryption Significant confidentiality and space savings RND DET RND SEARCH OPE JOIN OPE-JOIN HOM Any value Any value int value Onion 1 Onion 2 Onion 3 Each column has the same key in a given layer of an onion OPE: Order-Preserving symmetric Encryption

Privacy: CryptDB – (2) Adjustable query-based encryption Start out the database with the most secure encryption scheme Adjust encryption dynamically Strip off levels of the onions: proxy gives key to server using a UDF

Privacy: CryptDB – Example SELECT * FROM emp WHERE salary = 100 Any value JOIN SEARCH DET RND emp: rank name salary SELECT * FROM emp WHERE salary = 100 UPDATE table1 SET col3onion1 = DecryptRND(key, col3onion1) SELECT * FROM table1 WHERE col3onion1 = x5a8c34

Amazon Relational Database Service: RDS RDS is a web service that makes it easy to set up, operate, and scale an RDBMS in the cloud: http://aws.amazon.com/rds/ RDS provides cost-efficient and resizable capacity while managing time-consuming DB administration tasks RDS supports both MySQL, Oracle and SQL Server RDBMS engines Current code, applications and tools with your existing RDBMS can be used with Amazon RDS RDS automatically patches the DBMS, backup your RDBMS, storing the backups for a user-defined retention period, as well as enables point-in-time recovery Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

RDBMS on Amazon Simple Storage Service: S3 Highly scalable data storage in-the-cloud Programmatic access via web services API Simple to get going and simple to use Highly available and durable Pay-as-you-go: Storage: $0.15/GB/month Data transfer: starts at $0.18/GB Requests: nominal charges Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

RDBMS on Amazon Simple Storage Service: S3 S3 Name Space mculver-images media.mydomain.com Beach.jpg img1.jpg img2.jpg 2005/party/hat.jpg public.blueorigin.com index.html img/pic1.jpg Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

RDBMS on Amazon Simple Storage Service: S3 $.15 per GB per month storage Object-Based Storage 1 B – 5 GB / object Fast, Reliable, Scalable Redundant, Dispersed 99.99% Availability Goal Private or Public Per-object URLs & ACLs BitTorrent Support $.10 - $.18 per GB data transfer $.01 for 1000 to 10000 requests Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance?

Microsoft SQL Azure Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance? Azure Services Platform supports applications running in the Cloud or on local Systems

Microsoft SQL Azure Since economies of scale are in play, larger DC is in favor. So the HW blocks can be enumerated. How do you design for best performance? Windows Azure provides Windows-based compute and storage services for cloud applications

Summary and Conclusion Data Management for Cloud Computing poses a fundamental challenge to database researchers: Scalability (Horizontal) Reliability Data Consistency Elasticity Differential Pricing Radically different approaches and solutions are warranted to overcome this challenge Need to understand the nature of new applications Novel Data Management Challenges coupled with Distributed and Parallel Computing issues

END