A N I MPROVED I NDEXING S CHEME FOR R ANGE Q UERIES Yvonne Yao Adviser: Professor Huiping Guo.

Slides:



Advertisements
Similar presentations
A Privacy Preserving Index for Range Queries
Advertisements

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
GI Systems and Science January 30, Points to Cover  Recap of what we covered so far  A concept of database Database Management System (DBMS) 
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
D ATABASE S YSTEMS I A DMIN S TUFF. 2 Mid-term exam Tuesday, Oct 2:30pm Room 3005 (usual room) Closed book No cheating, blah blah No class on Oct.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 1-1 David M. Kroenke’s Chapter One: Introduction Part One Database Processing:
Palantir A window-sharing system for Windows NT Max Feingold, Vladimir Livshits, and.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Chapter 15.7 Buffer Management ID: 219 Name: Qun Yu Class: CS Spring 2009 Instructor: Dr. T.Y.Lin.
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
The Client/Server Database Environment
Russ Houberg Senior Technical Architect, MCM KnowledgeLake, Inc.
Russ Houberg Senior Technical Architect, MCM KnowledgeLake, Inc.
Locking Key Ranges with Unbundled Transaction Services 1 David Lomet Microsoft Research Mohamed Mokbel University of Minnesota.
Functions Lesson 10. Skills Matrix Function A function is a piece of code or routine that accepts parameters and stored as an object in SQL Server. The.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
CSC271 Database Systems Lecture # 30.
© Paradigm Publishing Inc. 9-1 Chapter 9 Database and Information Management.
Views, Indexes and JDBC/JSP tutorial Professor: Dr. Shu-Ching Chen TA: Haiman Tian 1.
Database Design – Lecture 16
Database Technical Session By: Prof. Adarsh Patel.
ADO.NET A2 Teacher Up skilling LECTURE 3. What’s to come today? ADO.NET What is ADO.NET? ADO.NET Objects SqlConnection SqlCommand SqlDataReader DataSet.
OracleAS Reports Services. Problem Statement To simplify the process of managing, creating and execution of Oracle Reports.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Dexterity | CONFIDENTIAL 2009 MRO | Analytics | Insights 1 Stored Procedures.
CHRIS NELSON METADATA TECHNOLOGY WORK SESSION ON STATISTICAL METADATA GENEVA 6-8 MAY 2013 Designing a Metadata Repository Metadata Technology Ltd.
SEC835 Practical aspects of security implementation Part 1.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
CYBORG Domain Independent Distributed Database Retrieval System Alok Khemka Kapil Assudani Kedar Fondekar Rahul Nabar.
PHP and MySQL CS How Web Site Architectures Work  User’s browser sends HTTP request.  The request may be a form where the action is to call PHP.
SQL Server User Group Meeting Reporting Services Tips & Tricks Presented by Jason Buck of Custom Business Solutions.
Research Case in Cloud Computing IST 501 Fall 2014 Dongwon Lee, Ph.D.
Executing SQL over Encrypted Data in Database-Service-Provider Model Hakan Hacigumus University of California, Irvine Bala Iyer IBM Silicon Valley Lab.
Announcements. Data Management Chapter 12 Traditional File Approach  Structure Field  Record  File  Fixed All records have common fields, and a field.
The Client/Server Database Environment Ployphan Sornsuwit KPRU Ref.
Views Lesson 7.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
Advanced SQL: Triggers & Assertions
Lesson 19-E-Commerce Security Needs. Overview Understand e-commerce services. Understand the importance of availability. Implement client-side security.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
7 Strategies for Extracting, Transforming, and Loading.
Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.
Visual Basic for Application - Microsoft Access 2003 Finishing the application.
Stored Procedures / Session 4/ 1 of 41 Session 4 Module 7: Introducing stored procedures Module 8: More about stored procedures.
Secure Data Outsourcing
CS4432: Database Systems II
Database Systems, 8 th Edition SQL Performance Tuning Evaluated from client perspective –Most current relational DBMSs perform automatic query optimization.
SQL: Interactive Queries (2) Prof. Weining Zhang Cs.utsa.edu.
Create Stored Procedures and Functions Database Management Fundamentals LESSON 2.4.
[FUNCTIONALITY AND SAFETY OF A MODERN TECHNOLOGY] [CLOUD COMPUTING FOR INDIVIDUAL CONSUMERS]
Zac Fenigshtien  Introduction: 3 Tier Architecture  SQL Injection ◦ Parameter Sandboxing ◦ Blacklisting, Whitelisting.
Customer pulse Why Stretch? How Stretch works? Core Stretch scenarios Demo QA.
Databases and the MVC Model
Databases.
Dynamic SQL: Writing Efficient Queries on the Fly
The Client/Server Database Environment
Views View Modifications  Introduction.
Chapter 15 QUERY EXECUTION.
A Privacy-Preserving Index for Range Queries
Dynamic SQL: Writing Efficient Queries on the Fly
Views View Modifications  Introduction.
DATABASE HISTOGRAMS E0 261 Jayant Haritsa
Databases and the MVC Model
Privacy preserving cloud computing
MIS2502: Data Analytics MySQL and MySQL Workbench
Presentation transcript:

A N I MPROVED I NDEXING S CHEME FOR R ANGE Q UERIES Yvonne Yao Adviser: Professor Huiping Guo

D ATABASE - AS - A -S ERVICE Business organizations handle a large amount of data (TB) Cost of managing and maintaining these data onsite is high DAS DBMSs outsourcing Clients rely on service providers for data management and maintenance Cost is a lot lowered. But…

D ATABASE - AS - A -S ERVICE Security of data is not guaranteed Service providers are untrusted Store only an encrypted form of data onto the remote server Only users with the correct key(s) can have access How then can we query the encrypted data? Retrieve and decrypt the entire table, and apply SQL statements on it. Too expensive! A more realistic approach was discovered

D ATABASE - AS - A -S ERVICE

B UCKETIZATION Various approaches to build meta-data: B+-tree based, hash-based, and bucket-based What is bucketization? Partition of attribute data into several buckets Each bucket is identified by an ID Bucket IDs are stored, along with encrypted data, on the remote server Client keeps partition information as meta-data General bucketization approach Equi-width Equi-depth

E XAMPLE 1

PartitionID [0.0 ~ 1.0]Bucket_1 [1.1 ~ 2.0]Bucket_2 [2.1 ~ 3.0]Bucket_3 [3.1 ~ 4.0]Bucket_4

E XAMPLE 1 User query: SELECT * FROM grades WHERE gpa < 3.0 Q server : SELECT * FROM egrades WHERE gpaID = ‘Bucket_1’ OR gpaID = ‘Bucket_2’ OR gpaID = ‘Bucket_3’ Size of superset is 29, of which 7 of them are false positives

Q UERY O PTIMAL B UCKETIZATION General idea: minimizing the bucket cost of each bucket Input: V = { v 1, v 2, v 3, …, v n } where v 1 < v 2 < v 3 < … < v n F = Frequency of each value M = Number of buckets to fill Output: a matrix indicating the boundary of each bucket

Q UERY O PTIMAL B UCKETIZATION QOB Finds optimum solutions to two smaller sub-problems one contains the leftmost M -1 buckets covering the ( n-i ) smallest points Another contains the rightmost single bucket covering the remaining i points V = { v 1, v 2, v 3, v 4, v 5, v 6, …, v n-3, v n-2, v n-1, v n } n-i points go to last i points go to M -1 buckets last bucket

E XAMPLE 2 PartitionID [0.7 ~ 1.2]Bucket_1 [1.5 ~ 2.5]Bucket_2 [2.8 ~ 3.0]Bucket_3 [3.5 ~ 4.0]Bucket_4

E XAMPLE 2 Q server : SELECT * FROM egrades WHERE gpaID = ‘Bucket_1’ OR gpaID = ‘Bucket_2’ OR gpaID = ‘Bucket_3’ Same as the general bucketization method In most cases, QOB can outperform the conventional bucketization strategy, but not always

D EVIATION B UCKETIZATION Built upon QOB, takes the same parameters Has two levels of buckets First level: same as those produced by QOB Second level: bucketization of deviation values, the difference between the value itself to the average of the bucket Each first-level-bucket has at most M second level buckets QOB has at most M buckets, while DB has at most M 2 buckets

D EVIATION B UCKETIZATION DB Run QOB ( D, M ) Construct First-Level-Buckets from boundary matrix For each First-Level-Bucket Initialize empty datasets v i ’ and f i ’ For each v i in the bucket v i ’ = v i ’ ∪ v i ’ – avg() f i ’ = f i ’ ∪ 1 Create a new dataset d i = ( v i ’, f i ’ ) Run QOB( d i, M )

E XAMPLE 3 PartitionIDAvg [0.7 ~ 1.2]Bucket_10.93 [1.5 ~ 2.5]Bucket_21.84 [2.8 ~ 3.0]Bucket_32.93 [3.5 ~ 4.0]Bucket_43.67 PartitionIDAvg ……… [2.8 ~ 2.8]Bucket_3_12.8 [2.9 ~ 2.9]Bucket_3_22.9 [3.0 ~ 3.0]Bucket_3_33.0 ………

E XAMPLE 3 Q server : SELECT * FROM egrades WHERE gpaID = ‘Bucket_1’ OR gpaID = ‘Bucket_2’ OR gpaID = ‘Bucket_3_1’ OR gpaID = ‘Bucket_3_2’ In this case, no false positives are returned Generally, false positives will still be returned, just the number of them will be greatly reduced

E XPERIMENTS Two datasets Synthetic dataset: 10 5 integers from [0, 999] Real dataset: 10 3 data points from the Aspect column of the Forest CoverType database in UCI’s KDD Archive Two sets of queries Q syn Q real

E XPERIMENT 1

E XPERIMENT 2

Thank You