Research Case in Cloud Computing IST 501 Fall 2014 Dongwon Lee, Ph.D.

Slides:



Advertisements
Similar presentations
A Privacy Preserving Index for Range Queries
Advertisements

Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
A N I MPROVED I NDEXING S CHEME FOR R ANGE Q UERIES Yvonne Yao Adviser: Professor Huiping Guo.
Relational Algebra Ch. 7.4 – 7.6 John Ortiz. Lecture 4Relational Algebra2 Relational Query Languages  Query languages: allow manipulation and retrieval.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A Modified by Donghui Zhang.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
CryptDB: A Practical Encrypted Relational DBMS Raluca Ada Popa, Nickolai Zeldovich, and Hari Balakrishnan MIT CSAIL New England Database Summit 2011.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
Physical Database Monitoring and Tuning the Operational System.
Privacy and Integrity Preserving in Distributed Systems Presented for Ph.D. Qualifying Examination Fei Chen Michigan State University August 25 th, 2009.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
1 Privacy, Authenticity and Integrity in Outsourced Databases Gene Tsudik Computer Science Department School of Information & Computer Science University.
Distributed Databases
CSCD343- Introduction to databases- A. Vaisman1 Relational Algebra.
Protecting data privacy and integrity in clouds By Jyh-haw Yeh Computer Science Boise state University.
Privacy Preserving Query Processing in Cloud Computing Wen Jie
Database Design – Lecture 16
Secure Cloud Database using Multiparty Computation.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Database Management 9. course. Execution of queries.
Relational Model Concepts. The relational model represents the database as a collection of relations. Each relation resembles a table of values. A table.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 2: Intro to Relational.
Wai Kit Wong 1, Ben Kao 2, David W. Cheung 2, Rongbin Li 2, Siu Ming Yiu 2 1 Hang Seng Management College, Hong Kong 2 University of Hong Kong.
Wai Kit Wong, Ben Kao, David W. Cheung, Rongbin Li, Siu Ming Yiu.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Managing and querying encrypted data Trần Mỹ Giao Huỳnh Mai Thúy.
Executing SQL over Encrypted Data in Database-Service-Provider Model Hakan Hacigumus University of California, Irvine Bala Iyer IBM Silicon Valley Lab.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
ICS 321 Fall 2011 The Relational Model of Data (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 8/29/20111Lipyeow.
Histograms for Selectivity Estimation
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
Chapter 9 Database Systems © 2007 Pearson Addison-Wesley. All rights reserved.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 4 Relational Algebra.
CSCD34-Data Management Systems - A. Vaisman1 Relational Algebra.
CMPT 258 Database Systems SQL Queries (Chapter 5).
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Query Processing – Implementing Set Operations and Joins Chap. 19.
Secure Data Outsourcing
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
October 15-18, 2013 Charlotte, NC Accelerating Database Performance Using Compression Joseph D’Antoni, Solutions Architect Anexinet.
1 VLDB, Background What is important for the user.
IST VLabs Tutorial Fall 2010 Dongwon Lee, Ph.D..
Table General Guidelines for Better System Performance
Indexes By Adrienne Watt.
Parallel Databases.
Database Performance Tuning and Query Optimization
Relational Algebra Chapter 4, Part A
RELATIONAL ALGEBRA (Chapter 2)
Examples of Physical Query Plan Alternatives
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#13: Query Evaluation
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Table General Guidelines for Better System Performance
(A Research Proposal for Optimizing DBMS on CMP)
Advance Database Systems
Chapter 2: Intro to Relational Model
Overview of Query Evaluation
Chapter 11 Database Performance Tuning and Query Optimization
CS639: Data Management for Data Science
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
Database Administration
PSoup: A System for streaming queries over streaming data
A Privacy – Preserving Index
Presentation transcript:

Research Case in Cloud Computing IST 501 Fall 2014 Dongwon Lee, Ph.D.

2 Learning Objectives Understand an important research case in cloud computing “Database as a Service” Identify research methods used in the research case cloudtimes.org

Research Case: The DAS Project Database as a Service UC Irvine Pioneering work of SaaS type research in data management Recipient of SIGMOD 10-year best paper Two representative works: “Providing Database as a Service,” ICDE 2002 “Executing SQL over Encrypted Data in Database Service Provider Model,” SIGMOD HERE

4 Objective Want to store the data on “a server” User Encrypted User Database Server User Data But the problem is we do not trust “ the server ” for sensitive information! encrypt the data and store it but still be able to run queries over the encrypted data do most of the work at the server If the server is trusted, ICDE 2002, else SIGMOD 2002 Distrusted

5 Database as a Service SaaS model for Database DB management transferred to service provider for o backup, administration, restoration, space management, upgrades etc. use the database “as a service” provided by Cloud computing o use SW, HW, human resources of CC, instead of your own User Encrypted User Database (Distrusted) Application Service Provider User Data Distrusted Server

6 Service Provider Architecture Encrypted User Database Query Translator Server Site Temporary Results Query Executer Metadata Original Query Server Side Query Encrypted Results Actual Results Service Provider User Client Site Client Side Query ? ? ?

7 Relational Encryption NAMESALARYPID John Marry James Lisa etupleN_IDS_IDP_ID fErf!$Q!!vddf>></|50110 F%3w&%gfErf!$65210 &%gfsdf$%343v<l50220 %33w&%gfs##!65220 Server Site Store an encrypted string – etuple – for each tuple in the original table This is called “ row level encryption ” Any kind of encryption technique can be used Create an index for each (or selected) attribute(s) in the original table

8 Building the Index Partition function divides domain values into partitions (= buckets) Partition (R.A) = { [0,200], (200,400], (400,600], (600,800], (800,1000] } partitioning function has an impact on performance as well as privacy Domain Values Partition (Bucket) ids Identification function assigns a partition id to each partition of attribute A e.g.ident R.A ( (200,400] ) = 7 Any function can be use as identification function, e.g., hash functions

9 Mapping Functions Mapping function maps a value v in the domain of attribute A to the id of the partition which value v belongs to e.g.Map R.A ( 250 ) = 7, Map R.A ( 620 ) = Domain Values Partition (Bucket) ids

10 Storing Encrypted Data R=  R S = etuple = encrypt ( A | B | C ) A_id = Map R.A ( A ), B_id = Map R.B ( B ), C_id = Map R.C ( C ) NAMESALARYPID John Marry James Lisa EtupleN_IDS_IDP_ID fErf!$Q!!vddf>></|50110 F%3w&%gfErf!$65210 &%gfsdf$%343v<l50220 %33w&%gfs##!65220 Table: EMPLOYEE Table: EMPLOYEE S

11 Mapping Conditions Q: SELECT name, pname FROM emp, proj WHERE emp.pid=proj.pid AND salary > 100k Server stores attribute indices determined by mapping functions Client stores metadata and utilizes that to translate the query Conditions: Condition  Attribute op Value Condition  Attribute op Attribute Condition  (Condition or Condition) | (Condition and Condition) | (not Condition)

12 Mapping Conditions (2) Example: Attribute = Value Map cond ( A = v )  A S = Map A ( v ) Map cond ( A = 250 )  A S = Domain Values Partition Ids

13 Mapping Conditions (3) Attribute1 = Attribute2 Map cond ( A = B )  N (A S = ident A ( p k ) B S = ident B ( p l )) where N is p k  partition (A), p l  partition (B), p k  p l   PartitionsA_id [0,100]2 (100,200]4 (200,300]3 PartitionsB_id [0,200]9 (200,400]8 C : A = B  C’ : (A S = 2 and B S = 9) or (A S = 4 and B S = 9) or (A S = 3 and B S = 8)

14 Relational Operators over Encrypted Relations Partition the computation of the operators across client and server Compute (possibly) superset of answers at the server Filter the answers at the client Objective : minimize the work at the client and process the answers as soon as they arrive without requiring storage at the client Operators studied: Selection Join Grouping and Aggregation Sorting Duplicate Elimination Set Difference Union Projection

15 Selection Operator  A=250 TABLE Example:  A=250 D E_TABLE  A_id = 7 Client Query Server Query  c ( R ) =  c ( D (  S Mapcond(c) ( R S ) )

16 Join Operator C EMPPROJ C : A = B  C’ : (A_id = 2 and B_id = 9) Or (A_id = 4 and B_id = 9) Or (A_id = 3 and B_id = 8) PartitionsA_id [0,100]2 (100,200]4 (200,300]3 PartitionsB_id [0,200]9 (200,400]8 R c T =  c ( D ( R S S Mapcond(c) T S ) Example: C’C’ E_EMPE_PROJ  A=B D Client Query Server Query

17 Query Decomposition Client Query Q: SELECT name, pname FROM emp, proj WHERE emp.pid=proj.pid AND salary > 100k Server Query Encrypted (EMP) Encrypted (PROJ)  salary >100k  name,pname D D e.pid = p.pid EMP PROJ  salary >100k  name,pname e.pid = p.pid

18 Query Decomposition (2) E_EMP E_PROJ  salary >100k D D E_EMP E_PROJ  salary >100k D D  s_id = 1 v s_id = 2 e.pid = p.pid  name,pname Client Query Server Query Client Query Server Query

19 Query Decomposition (3) e.p_id = p.p_id E_EMP E_PROJ  salary >100k and e.pid = p.pid D  s_id = 1 v s_id = 2 e.pid = p.pid E_EMP E_PROJ  salary >100k D D  s_id = 1 v s_id = 2  name,pname Client Query Server Query

20 Query Decomposition (4) Q: SELECT name, pname FROM emp, proj WHERE emp.pid=proj.pid AND salary > 100k Q S : SELECT e_emp.etuple, e_proj.etuple FROM e_emp, e_proj WHERE e.p_id=p.p_id AND s_id = 1 OR s_id = 2 Q C : SELECT name, pname FROM temp WHERE emp.pid=proj.pid AND salary > 100k e.p_id = p.p_id E_EMP E_PROJ  salary >100k and e.pid = p.pid D  s_id = 1 v s_id = 2  name,pname Client Query Server Query

21 Experimental Evaluation Data TPC-H database Queries TPC-H Queries, versions of Q#6 and Q#3 Partitioning Strategy Equi-depth histograms for the first set of experiments Equi-width histograms for the second set of experiments

22 Effect of # of Buckets in Non-Join Query Client and communications costs decreases with increasing number of buckets due to better filtering at the server

23 Effect of # of Buckets in Non-Join Query Single Server: Server is trusted and performs all operations including decryption on site Shows that proposed query execution protocol doesn’t introduce significant overhead

24 Effect of # of Buckets in Join Query Single Server: Server is trusted and performs all operations including decryption on site Consistent with the previous results showing proposed communication protocol doesn’t introduce significant overhead

25 Findings Database-as-a-Service model is a promising solution Proposed algebraic solution to provide data privacy when cloud computing host is not trusted encrypts data, creates “coarse indexes” and stores the data at the server allows only data owner to decrypt the data With query decomposition most of query execution performed at the server client only performs filtering

CI Research Methods Used 26 MethodSIGMOD 02 Lit-Survey Formal Solution Experimental Build Process Model

Reference “Executing SQL over Encrypted Data in Database Service Provider Model,” SIGMOD