Auditing Compliance with a Hippocratic Database Javier Salinas Martín
Outline Introduction System architecture: – Logs – Audits – Audit queries Performance
Introduction Responsibly managing privacy sensitive data is mandatory Approaches: – Physically logging the results of each query – New system to audit whether the database executed a query in the past that accessed private data
System properties Non-disruptive Fast and precise Fine-grained Convenient
System architecture
Logs Query log: timestamp, user ID Temporal extensions: for each table T, a backlog table T b is created – Time stamped – Interval stamped
Time stamped organization A tuple in T b has two additional columns: – TS: time of storage – OP: operation {‘insert’, ‘delete’, ‘update’} Triggers are used to capture updates Recover state of T at time τ: take a snapshot
Interval stamped organization Period of time for wich each tuple was alive: – TS: time of storage – TE: end time Insert trigger adds t to T b, setting TE to null Update trigger searches for tuple b such that b.P=t.P and b.TE=null and sets b.TE to the current time and inserts new tuple t Delete trigger searches for tuple b such that b.P=t.P and b.TE=null and sets b.TE to the current time
Audit expressions Identical to that of a select query No disctinct in the select list “Audit” replaces “Select” U: cross product of all the base tables in the database Cells that satisfy the expression are marked in U
Schema used for examples
Example of audit expression Audit if the disease information of anybody living in the ZIP code was diclosed Cells corresponding to the disease column of those tuples in the Customer x Treatment table that have c.cid=t.pcid and c.zip = are marked
Some definitions Tuple t, Query Q, Audit A Indispensable tuple: omitting t makes a difference on Q Candidate query: Q accesses all columns A specifies in its audit list Suspicious query: Q and A share an indispensable tuple
Example 1 Q is a candidate query with respect to A Q is suspicious with respect to A if there is a customer who lived in the ZIP code and was treated for diabetes
Example 2 Q is not suspicious with respect to A Anyone who looks at the output of the query will not learn that Alice has cancer
System architecture
Audit query generation Full audit expression Two steps: – Static analysis: select candidate queries from the query log – Audit query generation: augment every candidate query with information from the audit expression and combine them into an audit query that unions their output
Static analysis Select candidate queries Four steps: – Check whether Q is a candidate query – Check whether timestamp of Q isout of range – Check whether the purpose-recipient pair of Q matches any of the purpose-recipient specified in the otherthan clause of A – Check for contradictions between predicates Set of candidate queries Q= {Q 1,…,Q n }
Audit Query Generation Augment every Q i with A Result is another query AQ i, defined against the backlog database at time τ i τ i is the timestamp of Q i as recorded in the query log All AQ i are combined into one AQ audit query whose output is the union of the output of the individual AQ i AQ is executed against the backlog database
Audit Query Generation example Example:
Audit Query Generation example
Performance Cost of maintaining backlog tables
Performance Execution time of an audit query
Questions?