Download presentation
Presentation is loading. Please wait.
Published byMeredith Curtis Modified over 9 years ago
1
Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden Research Center Auditing Compliance with a Hippocratic Database Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden Research Center
2
Outline Introduction and motivation Introduction and motivation Problem statement Problem statement Foundations Foundations System organization and algorithms System organization and algorithms Performance Performance Summary Summary
3
Motivation Hippocratic databases advocate policy directed data management for privacy sensitive data Hippocratic databases advocate policy directed data management for privacy sensitive data –Need reinforced by legislations and regulations: Health Insurance Portability & Accountability Act Health Insurance Portability & Accountability Act Gramm-Leach Bliley Act – Consumer Privacy Rule Gramm-Leach Bliley Act – Consumer Privacy Rule Goal Goal –Build a system to assist with auditing compliance with the stated policy Event driven - privacy complaint Event driven - privacy complaint Periodic - monitor exposure to privacy violation Periodic - monitor exposure to privacy violation
4
Audit Scenario Jane complains to the department of Health and Human Services saying that she had opted out of the doctor sharing her medical information with pharmaceutical companies for marketing purposes The doctor must now review disclosures of Jane’s information in order to understand the circumstances of the disclosure, and take appropriate action Sometime later, Jane receives promotional literature from a pharmaceutical company, proposing over the counter diabetes tests Jane has not been feeling well and decides to consult her doctor The doctor uncovers that Jane’s blood sugar level is high and suspects diabetes
5
Audit Expression auditT.disease fromCustomer C, Treatment T whereC.cid=T.pcid and C.name = ‘Jane’ Who has accessed Jane’s disease information?
6
Outline Introduction and motivation Introduction and motivation Problem statement Problem statement Foundations Foundations System organization and algorithms System organization and algorithms Performance Performance Summary Summary
7
Problem Statement Given Given –A log of queries executed over a database –An audit expression specifying sensitive data Precisely identify Precisely identify –Those queries that accessed the data specified by the audit expression
8
“Suspicious” Queries cidnameaddresszip… 1Jane 1234 … 95120… … A query Q i has accessed information contained in the Customer table The audit expression A specifies the data to the audited If query Q i accesses all the cells specified by the audit expression A for any row, Q i is suspicious Customer table
9
Issues Convenient language Convenient language –Audit expression (essentially SPJ query) Fast and precise on audits Fast and precise on audits Non disruptive Non disruptive –Minimal performance impact on normal database operation Fine grained Fine grained
10
Assumptions Disclosures stemming from multiple query executions is not considered Disclosures stemming from multiple query executions is not considered No use of outside knowledge to deduce information without detection No use of outside knowledge to deduce information without detection Queries considered include Queries considered include –Joins and aggregation, but not nested subqueries Note that existential subqueries can be converted into joins [SIGMOD92] Note that existential subqueries can be converted into joins [SIGMOD92]
11
Outline Introduction and motivation Introduction and motivation Problem statement Problem statement Foundations Foundations System organization and algorithms System organization and algorithms Performance Performance Summary Summary
12
Informal Definitions “Candidate” query “Candidate” query –Logged query that accesses all columns specified by the audit expression “Indispensable” tuple (for a query) “Indispensable” tuple (for a query) –A tuple whose omission makes a difference to the result of a query “Suspicious” query “Suspicious” query –A candidate query that shares an indispensable tuple with the audit expression
13
Indispensable Tuple The SPJ query Q and the audit expression A are of the form: Definition 1 - A virtual tuple v c T is indispensable for an SPJ query Q if the result of Q changes when we delete v: Predicates in Q Columns appearing anywhere in Q Duplicate preserving projection operator Tables common to Q and A Output columns in Q
14
“Candidate” Query Definition 6 - Q is a candidate query with respect to A if: Only candidate queries can be suspicous queries
15
“Suspicious” Query Definition 7 - Q is suspicious with respect to A if they share an indispensable MVT v For example, Query Q:Addresses of people with diabetes Audit A:Jane’s diagnosis Jane’s tuple is indispensable for both; hence query Q is “suspicious” with respect to A A tuple v is a MVT for queries Q 1 and Q 2 if it belongs to the cross product of common tables in their from clauses Definition 5 - Maximal virtual tuple (MVT):
16
Outline Introduction and motivation Introduction and motivation Problem statement Problem statement Foundations Foundations System organization and algorithms System organization and algorithms Performance Performance Summary Summary
17
System Overview Data Tables IDTimestampQueryUserPurposeRecipient 1 2004-02 … Select … JamesCurrentOurs 2 2004-02 … Select … JohnTelemarketingpublic Query Log Database Layer Query with purpose, recipient Updates, inserts, delete Backlog Database triggers track updates to base tables Audit Database Layer Audit expression IDs of log queries having accessed data specified by the audit query Audit query Static analysis Generate audit query
18
Static Analysis IDTimestampQueryUserPurposeRecipient 1 2004-02 … Select … JamesCurrentOurs 2 2004-02 … Select … JohnTelemarketingpublic Query Log Audit expression Filter Queries Candidate queries Eliminates queries that could not possibly have violated the audit expression Insures that Accomplished by examining only the queries themselves (i.e., without running the queries)
19
Audit Query Generation Goal Goal –Build a query which, when run, returns the id’s of suspicious queries with respect to an audit expression A
20
Generating the Audit Query Candidate Query 1 Candidate Query 2 Audit Expression Union Combine individual candidate queries and the audit expression into a single query graph Combine the audit expression with individual candidate queries to identify suspicious queries Replace each table with it’s backlog to restore the version of the table to the time of each query T1T1 T2T2 QGM is a graphical representation of a query Boxes represent operators, such as select Lines represent input/output relationships between operators Boxes with no inputs are tables
21
Suspicious SPJ Query Theorem 2 - A candidate SPJ query Q is suspicious with respect to an audit expression A if and only if: The candidate SPJ query Q and the audit expression A are of the form: QGM rewrites, shown in previous slide, transform Q and A into: Proof of correctness is based upon Definition 7 (suspicious query) and given in the paper
22
Suspicious Aggregate Query (Including Having) Solution in the paper Solution in the paper
23
Example Jane’s audit
24
Audit Expression auditT.disease fromCustomer C, Treatment T whereC.cid=T.pcid and C.name = ‘Jane’ Who has accessed Jane’s disease information?
25
Query Log IDQueryTSUserPurposeRecipient 1 select name, address, zip from Customer, Treatment where disease = ‘diabetes’ and cid=pcid T3jamesmarketingothers 2 select name, address from Customer where zip=‘95112’ T3johncontactothers Query 1 was executed at time T3
26
Backlog Table (Time Stamp) NameAddress…OPRTS Jane1234……IT2 Jane1234……UT4 Alice……IT1 Attributes also in the source tableAttributes only in the backlog table Jane’s record was inserted at time T2 and updated at time T4. The backlog table records both versions of her information Operation on a tuple among Insert, Update and Delete Timestamp of the operation C. S. Jensen, L. Mark, and N. Roussopoulos [TKDE 1991]
27
Merge Logged Queries and Audit Expression Customer c, n, …, t audit expression := T.p=C.c and C.n= ‘Jane’ T.s Select := T.s=‘diabetes’ and T.p=C.c C.n, C.a, C.z C C Merge logged queries and audit expression into a single query graph Treatment p, r, …, t T T
28
Transform Query Graph into an Audit Query Customer c, n, …, t audit expression := X.n= ‘Jane’ ‘Q1’ Select := T.s=‘diabetes’ and C.c=T.p C.n C X View of Customer (Treatment) is a temporal view at the time of the query was executed The audit expression now ranges over the logged query. If the logged query is suspicious, the audit query will output the id of the logged query T Treatment p, r,..., t
29
Scenario Outcome The audit uncovers that Query 1 in the query log accessed Jane’s information The audit uncovers that Query 1 in the query log accessed Jane’s information
30
Outline Introduction and motivation Introduction and motivation Problem statement Problem statement Foundations Foundations System organization and algorithms System organization and algorithms Performance Performance Summary Summary
31
Empirical Evaluation: Goals Cost of maintaining backlog tables Cost of maintaining backlog tables –Understand the impact of maintaining backlog tables on ongoing database operations Cost of running audits Cost of running audits –Understand whether audits can run in reasonable time
32
Experimental Setup IBM M Pro 6868 Intellistation IBM M Pro 6868 Intellistation –800 MHz Pentium III processor –512 MB of memory –16.9 GB disk drive Windows 2000 Version 5, SP 4 Windows 2000 Version 5, SP 4 DB2 v7 with default settings DB2 v7 with default settings TPC-H database TPC-H database –Supplier table 100,000 tuples 100,000 tuples
33
System Structures Indexing Indexing –Eager indexing Maintain an index over the backlog table Maintain an index over the backlog table Maintained during ongoing database operations Maintained during ongoing database operations –Lazy indexing No index over the backlog table No index over the backlog table Create indices at the time of audit Create indices at the time of audit Choice of index Choice of index –Simple index Primary key of source table Primary key of source table –Composite index Primary key of source table Primary key of source table Time stamp Time stamp
34
Impact on Ongoing Operations Queries Queries –Additionally log the query string Already performed in many application environments Already performed in many application environments Updates Updates –For each updated tuple, Insert a tuple to the backlog table Insert a tuple to the backlog table –Inserts and deletes are handled similarly In a majority of environments, queries are much more frequent than updates In a majority of environments, queries are much more frequent than updates
35
Update Performance 100,000 tuples in Supplier table 100,000 tuples in Supplier table Update statement updates all tuples Update statement updates all tuples Each update statement fires triggers which inserts an additional 100,000 tuples in backlog Each update statement fires triggers which inserts an additional 100,000 tuples in backlog Evaluate impact of multiple versions on performance Evaluate impact of multiple versions on performance
36
Overhead on Updates Simple wins over Composite 7x if all tuples are updates 3x if a single tuple is updated Eager indexing doesn’t add much cost Number of version of each tuple in the Supplier backlog table
37
Audit Query Performance Audit query: select ‘Q’ from Supplier where skey = k Experiment: Evaluate the impact of the number of versions of tuples in the backlog table on performance
38
Audit Query Execution Time Composite wins over simple if initial version is selected Simple wins over composite if the current version is selected
39
Takeaways The composite index The composite index –Enhances the performance of audits, but –Additionally burdens updates when using eager indexing The system supports The system supports –Efficient auditing –Without substantially burdening normal query processing
40
Related Work Oracle Privacy Security Auditing – –Facility for logging queries with timestamp – –Flash-back queries Restores the version of the data at the time of the query – –No support for automated auditing User manually selects queries from the log and runs them The user to decide if the query is suspicious G. Miklau D. Suciu [SIGMOD 2004] – –Formal analysis of information disclosure in data exchange Is information about a secret query S revealed by views V 1,…,V n Considers all possible instances of a database schema Assumes tuple independence – –We’re interested in given instances (temporal versions) – –Nonetheless, it will be interesting to explore the connection between the two works Active enforcement of policies by limiting disclosure [VLDB’04] Active enforcement of policies by limiting disclosure [VLDB’04] Literature on multi-query optimization
41
Summary In light of new privacy legislation In light of new privacy legislation –The problem of auditing usage of information represents an important opportunity for database research Formalized the problem through the fundamental concepts of indispensable tuple and suspicious queries Formalized the problem through the fundamental concepts of indispensable tuple and suspicious queries Achieved our design goals: Achieved our design goals:
42
Design Goals Convenient language Convenient language Fast and precise on audits Fast and precise on audits Non disruptive Non disruptive –Minimal performance impact on normal database operation Fine grained Fine grained
43
Backup
44
Multiple Candidate Queries audit expression := C.n= ‘Jane’ ‘Q1’ audit expression := C.n= ‘Jane’ ‘Q2’ Union
45
Aggregate Queries with Having group:= c 1, …, c i c 1, …, c i, agg 1, …, agg n select:= … c 1, …, c i QsQs QgQg QhQh audit expression := … c 1, …, c k audit expression := … c 1, …, c k select:= q 1.c 1 =q 2.c 1 and … and q 1.c i =q 2.c i ‘Q1’ q1q1 q1q1 The join on aggregate columns ensures that the group being tracked by the audit has not been eliminated by the having clause
46
Dynamic Temporal Views Customer_backlog c, n, a, h, z, o, t, ts, op Select := ts ‘delete’ and not(C5) c, n, a, h, z, o, t Exists := C4.ts C3.ts * C3 C1 C4 C5 View of Customer table at time t c = id n = name a = address h = phone z = zip o = contact t = marketing ts = ts op = opr Time stamp of the logged query
47
Cost of Building Indices over Backlog Tables
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.