1 Cloud Computing, CS596-015 Data in the Cloud: Data-as- a-Service for the Cloud.

Slides:



Advertisements
Similar presentations
Creating HIPAA-Compliant Medical Data Applications with Amazon Web Services Presented by, Tulika Srivastava Purdue University.
Advertisements

Database Architectures and the Web
Data Modeling and Database Design Chapter 1: Database Systems: Architecture and Components.
Database Scalability, Elasticity, and Autonomy in the Cloud Agrawal et al. Oct 24, 2011.
C-Store: Data Management in the Cloud Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 5, 2009.
1 Vladimir Knežević Microsoft Software d.o.o.. 80% Održavanje 80% Održavanje 20% New Cost Reduction Keep Business Up & Running End User Productivity End.
Amazon RDS (MySQL and Oracle) and SQL Azure Emil Tabakov Telerik Software Academy academy.telerik.com.
CryptDB: Protecting Confidentiality with Encrypted Query Processing
CryptDB: Confidentiality for Database Applications with Encrypted Query Processing Raluca Ada Popa, Catherine Redfield, Nickolai Zeldovich, and Hari Balakrishnan.
CryptDB: A Practical Encrypted Relational DBMS Raluca Ada Popa, Nickolai Zeldovich, and Hari Balakrishnan MIT CSAIL New England Database Summit 2011.
Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.
--What is a Database--1 What is a database What is a Database.
 Relational Cloud: A Database-as-a-Service for the Cloud Carlo Curino, Evan Jones, Raluca Ada Popa, Nirmesh Malaviya, Eugene Wu, Sam Madden, Hari Balakrishnan,
Chapter 2 Database Environment Pearson Education © 2014.
Google AppEngine. Google App Engine enables you to build and host web apps on the same systems that power Google applications. App Engine offers fast.
Running Your Database in the Cloud Eran Levin VP R&D - Xeround.
Plan Introduction What is Cloud Computing?
Introduction to Databases Transparencies 1. ©Pearson Education 2009 Objectives Common uses of database systems. Meaning of the term database. Meaning.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
Jiazhang Liu;Yiren Ding Team 8 [10/22/13]. Traditional Database Servers Database Admin DBMS 1.
Locking Key Ranges with Unbundled Transaction Services 1 David Lomet Microsoft Research Mohamed Mokbel University of Minnesota.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Database Design Table design Index design Query design Transaction design Capacity Size limits Partitioning (shard) Latency Redundancy Replica overhead.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Cloud Computing Saneel Bidaye uni-slb2181. What is Cloud Computing? Cloud Computing refers to both the applications delivered as services over the Internet.
PhD course - Milan, March /09/ Some additional words about cloud computing Lionel Brunie National Institute of Applied Science (INSA) LIRIS.
Training Workshop Windows Azure Platform. Presentation Outline (hidden slide): Technical Level: 200 Intended Audience: Developers Objectives (what do.
 Introduction Introduction  Purpose of Database SystemsPurpose of Database Systems  Levels of Abstraction Levels of Abstraction  Instances and Schemas.
What is Architecture  Architecture is a subjective thing, a shared understanding of a system’s design by the expert developers on a project  In the.
STORING ORGANIZATIONAL INFORMATION— DATABASES CIS 429—Chapter 7.
Mohammad Ahmadian COP-6087 University of Central Florida.
1 Introduction to Database Systems. 2 Database and Database System / A database is a shared collection of logically related data designed to meet the.
Introduction to Cloud Computing
Amazon Web Services BY, RAJESH KANDEPU. Introduction  Amazon Web Services is a collection of remote computing services that together make up a cloud.
Cloud Computing & Amazon Web Services – EC2 Arpita Patel Software Engineer.
Webscale Computing Mike Culver Amazon Web Services.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Lecture # 3 & 4 Chapter # 2 Database System Concepts and Architecture Muhammad Emran Database Systems 1.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
1 Database Management Systems (DBMS). 2 Database Management Systems (DBMS) n Overview of: ä Database Management Components ä Database Systems Architecture.
Bayu Adhi Tama, M.T.I 1 © Pearson Education Limited 1995, 2005.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
Cloud Computing is a Nebulous Subject Or how I learned to love VDF on Amazon.
Text Microsoft to Or Tweet #uktechdays Questions?
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
Web Technologies Lecture 13 Introduction to cloud computing.
Chapter 2 Database Environment.
Technology Drill Down: Windows Azure Platform Eric Nelson | ISV Application Architect | Microsoft UK |
SQL Server 2012 Session: 1 Session: 4 SQL Azure Data Management Using Microsoft SQL Server.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
uses of DB systems DB environment DB structure Codd’s rules current common RDBMs implementations.
#SummitNow Alfresco Deployments on AWS Cost-Effective, Scalable & Secure Michael Waldrop Director, Solutions Engineering .
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
Data in the Cloud: Data-as-a-Service for the Cloud
Application Security Lecture 27 Aditya Akella.
Chapter 6: Securing the Cloud
Maximum Availability Architecture Enterprise Technology Centre.
Using cryptography in databases and web applications
Chapter 2 Database Environment Pearson Education © 2009.
Database Management System (DBMS)
Building a Database on S3
Introduction to Databases Transparencies
Database Environment Transparencies
Database Management Systems
The Database World of Azure
Presentation transcript:

1 Cloud Computing, CS Data in the Cloud: Data-as- a-Service for the Cloud

Motivation & Challenges Data in the Cloud Transactions in the Cloud: RDBMS vs K/V Store Data Scalability, Elasticity, and Autonomy in the Cloud Multi-tenant Data Platforms Privacy Amazon Relational Database Service: RDS RDBMS on Amazon Simple Storage Service: S3 Microsoft SQL Azure Summary and Conclusions 2 Outline

Motivation:  Economics of Scale; hardware and licensing cost  Pay per use & lower administrative cost  Relational cloud is mainly for OLTP workloads & Direct-Attached- Storage (DAS) architectures with consistency guarantees Challenges:  Efficient Multi-tenancy (Provider)  Elastic scalability (Provider)  Privacy (User) 3 Motivation & Challenges

Separate system and application state  System metadata is critical but small  Application data has varying needs  Separation allows use of different protocols for each Limit interactions to a single node  Allows systems to scale horizontally  Graceful degradation during failures  Obviate need for distributed synchronization  Non-distributed transaction execution is efficient 4 Data in the Cloud: Design Principals

Decouple Access Control from Data Storage  Access Control refer to R/W access to the data  Partition ownership – effectively partition data  Decoupling allows light weight ownership transfer Limited distributed synchronization is practical  Maintenance of metadata  Provide strong guarantees only for data that needs it 5 Data in the Cloud: Design Principals

Low consistency considerably increases complexity Consistency logic duplicated in all applications! Often leads to performance inefficiencies There are two candidates for transactions support in the cloud:  Cloudify RDBMS (Data Fission – split atoms)  Enrich Key/Value stores (Data Fusion –combine atoms) 6 Transactions in the Cloud: RDBMS vs K/V Store

7 RDBMS Fusion of the architectures Key Value Stores MegaStore [CIDR ‘11] G-Store [SoCC ‘11] Vo et al. [VLDB ‘10] Rao et al. [VLDB ‘11] Deutoronomy [CIDR ‘09, ‘11] ElasTraS [HotCloud ’09, TR ‘10] DB on S3 [SIGMOD ‘08] RelationalCloud [CIDR ‘11] SQL Azure [ICDE ’11] Cloudify RDBMSs Enrich Key Value Stores

Add more resources, get more performance:  Handle more requests/sec  Store more data Scaling is achievable in two dimensions:  Scale-up  Scale-out is the main paradigm for the cloud 8 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

Finding the right design point? What is the right consistency / programming model?  Pure Key-value stores are too weak  Only having transactions on single record  Traditional RDBMS are too strong  Can’t just run MySQL at scale!  Instead, provide strong consistency within a portion of the data  Megastore  Vertica, Aster, Teradata, Greenplum, etc. 9 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

10 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability WeakStrong DynamoBigTable, PNUTS Megastore, G-Store Azure, ElasTraS, Rel Cloud MySQL

Data Fusion:  Start with key-value store  Partition records into groups  Provide multi-record updates within a group  Cross-group operations handled separately  Assumes that cross-group ops are rare Data Fission:  Start with relational database  Partition tables into shards  Provide ACID within each shard  Cross-shard ops are expensive  Assumes that cross-shard ops are rare 11 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

Data Fusion – Atomic Multi-key Access:  GStore: Efficient Transactional Multi-key access [ACM SOCC’2010]  Key Value Stores:  Atomicity guarantees on single keys  Suitable for majority of current web applications  Many other applications need multi-key accesses  Online multi-player games  Collaborative applications  Enrich functionality of the key-value store 12 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

Data Fusion – Key Group Abstraction:  Define a granule of on-demand transactional access  Applications select any set of keys to form a group  Data store provides transactional access to the group  Non-overlapping groups 13 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

14 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

Data Fusion – Key Grouping Protocol:  Conceptually similar to locking  Allows collocation of ownership at the leader  Leader is the gateway fro group accesses  Safe” ownership transfer: deal with dynamics of the underlying key-value store  Data dynamics of the Key-Value store  Various failure scenarios  Hides complexity from the applications while exposing a richer functionality 15 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

Data Fusion – Implementing GStore: 16 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

Data Fission – Elastic Transaction Management:  ElasTraS: Designed to make RDBMS cloud-friendly  Database viewed as a collection of partitions  Suitable for standard OLTP workloads:  Large single tenant database instance  Database partitioned at the schema level  Multi-tenant with large number of small databases  Each partition is a self contained database  Elastic to deal with workload changes  Dynamic Load balancing of partitions  Automatic recovery from node failures  Transactional access to database partitions 17 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

Data Fission – Effective Resource Sharing:  Multiple database partitions hosted within the same database process  Good consolidation  Independent transaction and data managers  Good performance isolation  Lightweight live database migration  Elastic scaling 18 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

What is the difference:  Is Fusion vs. Fission a worthwhile distinction?  Seems like they both arrive at the same place  Megastore “Fusion” vs. ElasTras “Fission”  Shard tables based on table’s primary key  Shard is co-located on the same machine  ACID transactions within a shard  Primary and secondary indexes  All Megastore is missing is a SQL interface! 19 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

The difference:  Different targeted users  Fusion is for people who own datacenters  Fission is for people who want SQL in the Cloud Different exposed API:  Fusion is more explicit about performance  Fission tries to hide partitioning from user Anything else? 20 Data Scalability, Elasticity, and Autonomy in the Cloud: Scalability

Dynamically scaling up and down on-demand Important with pay-as-you-go cloud pricing Consolidate to reduce costs Expand to increase performance Need to move state and processing duties around within the system 21 Data Scalability, Elasticity, and Autonomy in the Cloud: Elasticity

Need management to be more automatic Elasticity and load balancing based on usage and Machine Learning (ML) predictions Performance modeling:  Migration costs (availability, performance, $$)  Resource isolation (consolidated services)  SLAs 22 Data Scalability, Elasticity, and Autonomy in the Cloud: Autonomy

Problem definition: Consolidate databases into smaller number of servers, balancing load and without affecting performance or security It is a paradigm in which a service provider hosts multiple clients (tenants) on a single shared stack of software and hardware Virtualization – Multitenancy in the hardware layer  Major enabling technology for cloud infrastructure Virtualization in the database tier 23 Multi-tenant Data Platforms

24 Multi-tenant Data Platforms: Capturing the “Long Tail” in Multitenant Apps

Multi Application Scenario:  Support very large number of database applications (with different schemas 25 Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario

Multi-tenancy Challenges: 26 Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario Isolation, Scalability, Performance, Customization, Resource Utilization, Metering …

Multi-tenancy Trade-offs: 27 Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario

Multi-tenancy Resource Sharing and Isolation: 28 Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario

Multi-tenancy Trade-offs: 29 Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario

Multi-tenancy Trade-offs: 30 Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario

Force.com Architecture:  Metadata driven architecture  Tenant specific customizations information stored as metadata  Engine uses metadata to generate virtual application components at runtime  Metadata is key – cache metadata  Application data stored in large shared table – referred to as the heap  Materialize some virtual tables  Pivot tables used for indexing, maintaining relationships, uniqueness constraints 31 Multi-tenant Data Platforms: Multi Apps vs. Multi-tenant Apps Scenario

Prevent DBA from snooping on data Ensure data security during application & DBMS server compromise 32 Privacy

Privacy: Problem: Confidential Data Leaks Application DB Server  curious DB administrators  hackers  curious cloud/employees  physical attacks SQL User 1 User 2 User 3 Both on private clouds and public clouds Regulatory laws 33

Goal: protect confidentiality of data 1. Process SQL queries on encrypted data 2. Capture and enforce cryptographically access control in SQL: chain keys from user passwords to data item Application DB Server SQL Threat 1: passive attacks on DB server Threat 2: active/passive attacks on all servers User 1 User 2 User 3 Proxy user passwordPrivacy:CryptDB 34

35 Privacy: CryptDB – Threat Model Consider attacks on any part of the servers We do not consider integrity attacks  Can affect data integrity, but not confidentiality

36 Privacy: CryptDB – Two Techniques SQL-aware encryption strategy  Observation: set of SQL operators are limited  Different encryption schemes provide different functionality Adjustable query-based encryption  Adapt encryption of data based on user queries

e.g., =, !=, GROUP BY, IN, COUNT, DISTINCT Highest SchemeOperation Details RNDNone AES in UFE HOM+, * AES in CTR DETequality e.g., Paillier SEARCH joinnew JOIN ILIKE Amanatidis et al.’07 OPEorder Boldyreva et al. ’09 e.g., >, <, ORDER BY, SORT, MAX, MIN first practical implementation Security 37 Privacy: CryptDB – (1) SQL-aware encryption

Any value JOIN SEARCH DET RND Any value OPE-JOIN OPE RND int value HOM  Each column has the same key in a given layer of an onion Onion 1Onion 2Onion 3 Significant confidentiality and space savings 38 Privacy: CryptDB – Onions of encryption

39 Privacy: CryptDB – (2) Adjustable query-based encryption Start out the database with the most secure encryption scheme Adjust encryption dynamically  Strip off levels of the onions: proxy gives key to server using a UDF

SELECT * FROM emp WHERE salary = 100 UPDATE table1 SET col3onion1 = DecryptRND(key, col3onion1) SELECT * FROM table1 WHERE col3onion1 = x5a8c34 Any value JOIN SEARCH DET RND DET emp: ranknamesalary 40 Privacy: CryptDB – Example

RDS is a web service that makes it easy to set up, operate, and scale an RDBMS in the cloud: RDS provides cost-efficient and resizable capacity while managing time-consuming DB administration tasks RDS supports both MySQL, Oracle and SQL Server RDBMS engines Current code, applications and tools with your existing RDBMS can be used with Amazon RDS RDS automatically patches the DBMS, backup your RDBMS, storing the backups for a user-defined retention period, as well as enables point-in-time recovery 41 Amazon Relational Database Service: RDS

Highly scalable data storage in-the-cloud Programmatic access via web services API Simple to get going and simple to use Highly available and durable Pay-as-you-go:  Storage: $0.15/GB/month  Data transfer: starts at $0.18/GB  Requests: nominal charges 42 RDBMS on Amazon Simple Storage Service: S3

43 RDBMS on Amazon Simple Storage Service: S3 S3 Name Space Amazon S3 mculver-images media.mydomain.com Beach.jpg img1.jpg img2.jpg 2005/party/hat.jp g public.blueorigin.com index.html img/pic1.jpg

44 RDBMS on Amazon Simple Storage Service: S3 $.15 per GB per month storage $.15 per GB per month storage Object-Based Storage 1 B – 5 GB / object Fast, Reliable, Scalable Redundant, Dispersed 99.99% Availability Goal Private or Public Per-object URLs & ACLs BitTorrent Support Object-Based Storage 1 B – 5 GB / object Fast, Reliable, Scalable Redundant, Dispersed 99.99% Availability Goal Private or Public Per-object URLs & ACLs BitTorrent Support $.10 - $.18 per GB data transfer $.01 for 1000 to requests

45 Microsoft SQL Azure  Azure Services Platform supports applications running in the Cloud or on local Systems

46 Microsoft SQL Azure  Windows Azure provides Windows-based compute and storage services for cloud applications

Summary and Conclusion Data Management for Cloud Computing poses a fundamental challenge to database researchers:  Scalability  Reliability  Data Consistency  Elasticity  Differential Pricing Radically different approaches and solutions are warranted to overcome this challenge  Need to understand the nature of new applications Novel Data Management Challenges coupled with Distributed and Parallel Computing issues 47

48 END