A Privacy – Preserving Index

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

On allocations that maximize fairness Uriel Feige Microsoft Research and Weizmann Institute.
A Privacy Preserving Index for Range Queries
Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
CS4432: Database Systems II
Fast Algorithms For Hierarchical Range Histogram Constructions
3/13/2012Data Streams: Lecture 161 CS 410/510 Data Streams Lecture 16: Data-Stream Sampling: Basic Techniques and Results Kristin Tufte, David Maier.
Introduction to Histograms Presented By: Laukik Chitnis
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Tutorial 8 CSI 2132 Database I. Exercise 1 Both disks and main memory support direct access to any desired location (page). On average, main memory accesses.
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
1 CS 361 Lecture 5 Approximate Quantiles and Histograms 9 Oct 2002 Gurmeet Singh Manku
Exact or stable image\signal reconstruction from incomplete information Project guide: Dr. Pradeep Sen UNM (Abq) Submitted by: Nitesh Agarwal IIT Roorkee.
Top-k Query Evaluation with Probabilistic Guarantees By Martin Theobald, Gerald Weikum, Ralf Schenkel.
Implementation (Group 6) Monika (A H) Damien Forest (A A) Rattanak Chhung (A N) Yann-Loup Phan Van Song (A B) 1 CS Groupe.
A Robust, Optimization-Based Approach for Approximate Answering of Aggregate Queries By : Surajid Chaudhuri Gautam Das Vivek Narasayya Presented by :Sayed.
Mutual Information Mathematical Biology Seminar
Introduction to Evolutionary Computation  Genetic algorithms are inspired by the biological processes of reproduction and natural selection. Natural selection.
Privacy and Integrity Preserving in Distributed Systems Presented for Ph.D. Qualifying Examination Fei Chen Michigan State University August 25 th, 2009.
Privacy Preserving Query Processing in Cloud Computing Wen Jie
OLAP : Blitzkreig Introduction 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema :
Discussion of Conditional Functional Dependencies Erik Wang.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Research Case in Cloud Computing IST 501 Fall 2014 Dongwon Lee, Ph.D.
Executing SQL over Encrypted Data in Database-Service-Provider Model Hakan Hacigumus University of California, Irvine Bala Iyer IBM Silicon Valley Lab.
Additive Data Perturbation: the Basic Problem and Techniques.
Data Stream Algorithms Ke Yi Hong Kong University of Science and Technology.
Protection of outsourced data MARIA ANGEL MARQUEZ ANDRADE.
Preserving Location Privacy in Wireless LANs Jiang, Wang and Hu MobiSys 2007 Presenter: Bibudh Lahiri.
OLAP Recap 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema : Hierarchical Dimensions.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Joseph M. Hellerstein Peter J. Haas Helen J. Wang Presented by: Calvin R Noronha ( ) Deepak Anand ( ) By:
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
CS573 Data Privacy and Security Secure data outsourcing – Combining encryption and fragmentation.
CY1B2 Statistics1 (ii) Poisson distribution The Poisson distribution resembles the binomial distribution if the probability of an accident is very small.
Privacy Protection in Social Networks Instructor: Assoc. Prof. Dr. DANG Tran Khanh Present : Bui Tien Duc Lam Van Dai Nguyen Viet Dang.
Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject to some constraints. (There may.
Histograms for Selectivity Estimation, Part II Speaker: Ho Wai Shing Global Optimization of Histograms.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
Secure Data Outsourcing
Presenters: Amool Gupta Amit Sharma. MOTIVATION Basic problem that it addresses?(Why) Other techniques to solve same problem and how this one is step.
Presented by: Omar Alqahtani Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Jian-Lin Kuo Author : Aristidis Likas Nikos Vlassis Jakob J.Verbeek 國立雲林科技大學 National Yunlin.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
Advanced Sorting 7 2  9 4   2   4   7
The simple linear regression model and parameter estimation
University of Texas at El Paso
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
Approximation Algorithms
Database Management Systems (CS 564)
16th International World Wide Web Conference Speeding up Adaptation of Web Service Compositions Using Expiration Times John Harney, Prashant Doshi LSDIS.
Rutgers Intelligent Transportation Systems (RITS) Laboratory
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
A Privacy-Preserving Index for Range Queries
Load Shedding Techniques for Data Stream Systems
Introduction to Database Systems
Introduction to Instrumentation Engineering
Multidimensional Indexes
Differential Privacy (2)
Distributed Database Management Systems
Feifei Li, Ching Chang, George Kollios, Azer Bestavros
DATABASE HISTOGRAMS E0 261 Jayant Haritsa
Data Transformations targeted at minimizing experimental variance
X y y = x2 - 3x Solutions of y = x2 - 3x y x –1 5 –2 –3 6 y = x2-3x.
Wavelet-based histograms for selectivity estimation
Chrysostomos Koutsimanis and G´abor Fodor
Presentation transcript:

A Privacy – Preserving Index for Range queries Paper By: Bijit Hore, Sharad Mehrotra, Gene Tsudik Presented By: Akshay Phadke

What this paper is about Database as a Service (DAS) Improving the existing Bucketization Technique Identification of privacy measures in DAS. Development of a novel privacy-preserving re- bucketization technique.

DAS and its implications Database-as-a-service in which organizations outsource data management to a service provider. Privacy because the data is stored at service provider. One possible solution: Q = Qsec + Qunsec

Previous Solutions Bucketization for ranged queries Attribute domain is partitioned into a set indentified by a set. Deterministic encryption for join queries. Drawbacks: Lacks in-depth privacy scenarios. Privacy is subjective: no clear specification.

Before we proceed Etuple: tuple stored in encrypted form. crypto-indices: indices created on sensitive attributes. Bucket_id: Set created is assigned a unique random tag.

Example Allocating a large number of buckets to crypto-indices increases query precision but reduces privacy. On the other hand, a small number of buckets increases privacy but adversely aects performance.

Uniform Query Distribution Total False Positives: Average Query Precision: Goal: Minimize the total number of false positives.

Algorithm Basics Number of false positives depends on the the width of the bucket (i.e. minimum and the maximum values) and the sum of the frequencies. To solve the problem use Optimal Substructure property: Splitting the problems into two smaller sub problems.

Algorithm

Variance, ASEE and Entropy Maximize Var(x)

Controlled Diffusion(CDf) QoS is the maximum allowed performance degradation factor (K). CDf algorithm increases privacy of buckets. Diffusion carried out in a controlled manner. Elements diffused into composite buckets. d = K..|Bi| / fCB Composite buckets overlap whereas in case of optimal buckets, they don’t.

Experiments Data Set - Synthetic Data Set - Real Data Set - Benchmark Query Set Measurements - Decrease in Precision - Privacy Measure - Performance-Privacy Trade Off - Time taken

Results Observed decrease in query precision was less than 3 For privacy measure: standard deviation increases by a large factor. Entropy grows more slowly.

Critique Although starts promising, the paper becomes a mathematics paper and seems to loose focus of actual intent. Examples mentioned just have the first step and the final solution, no intermediate steps. The paper doesn’t explain the results.

Thank you