DYNAMIC ENFORCEMENT OF KNOWLEDGE-BASED SECURITY POLICIES Piotr (Peter) Mardziel, Stephen Magill, Michael Hicks, and Mudhakar Srivatsa.

Slides:

Advertisements

Similar presentations

Pretty-Good Tomography Scott Aaronson MIT. Theres a problem… To do tomography on an entangled state of n qubits, we need exp(n) measurements Does this.

Advertisements

An Abstract Interpretation Framework for Refactoring P. Cousot, NYU, ENS, CNRS, INRIA R. Cousot, ENS, CNRS, INRIA F. Logozzo, M. Barnett, Microsoft Research.

Fast Algorithms For Hierarchical Range Histogram Constructions

Chapter 10: Estimating with Confidence

PROBABILISTIC COMPUTATION FOR INFORMATION SECURITY Piotr (Peter) Mardziel (UMD) Kasturi Raghavan (UCLA)

Theoretical Program Checking Greg Bronevetsky. Background The field of Program Checking is about 13 years old. Pioneered by Manuel Blum, Hal Wasserman,

CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.

1 Basic abstract interpretation theory. 2 The general idea §a semantics l any definition style, from a denotational definition to a detailed interpreter.

1 8. Safe Query Languages Safe program – its semantics can be at least partially computed on any valid database input. Safety is tied to program verification,

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

Analysis of Algorithms1 Estimate the running time Estimate the memory space required. Time and space depend on the input size.

Security in Databases. 2 Srini & Nandita (CSE2500)DB Security Outline review of databases reliability & integrity protection of sensitive data protection.

Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)

Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.

Lecture 10 Comparison and Evaluation of Alternative System Designs.

Foundations of Privacy Lecture 11 Lecturer: Moni Naor.

Security in Databases. 2 Outline review of databases reliability & integrity protection of sensitive data protection against inference multi-level security.

Chapter 10: Estimating with Confidence

D ATABASE S ECURITY Proposed by Abdulrahman Aldekhelallah University of Scranton – CS521 Spring2015.

Modeling (Chap. 2) Modern Information Retrieval Spring 2000.

Gaussian process modelling

1 Privacy-Preserving Distributed Information Sharing Nan Zhang and Wei Zhao Texas A&M University, USA.

DYNAMIC ENFORCEMENT OF KNOWLEDGE-BASED SECURITY POLICIES Michael Hicks University of Maryland, College Park Joint work with Piotr Mardziel, Stephen Magill,

+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.

+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.

Analysis of Algorithms

Introduction Algorithms and Conventions The design and analysis of algorithms is the core subject matter of Computer Science. Given a problem, we want.

Secure sharing in distributed information management applications: problems and directions Piotr Mardziel, Adam Bender, Michael Hicks, Dave Levin, Mudhakar.

Annual Conference of ITA ACITA 2010 Secure Sharing in Distributed Information Management Applications: Problems and Directions Piotr Mardziel, Adam Bender,

Algorithms and Algorithm Analysis The “fun” stuff.

Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.

Computational Complexity Jang, HaYoung BioIntelligence Lab.

International Technology Alliance in Network & Information Sciences Knowledge Inference for Securing and Optimizing Secure Computation Piotr (Peter) Mardziel,

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.

Algorithms & Flowchart

Privacy Framework for RDF Data Mining Master’s Thesis Project Proposal By: Yotam Aron.

PROBABILISTIC PROGRAMMING FOR SECURITY Michael Hicks Piotr (Peter) Mardziel University of Maryland, College Park Stephen Magill Galois Michael Hicks UMD.

PRISM: Private Retrieval of the Internet’s Sensitive Metadata Ang ChenAndreas Haeberlen University of Pennsylvania.

Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006.

Uncertainty Management in Rule-based Expert Systems

+ DO NOW. + Chapter 8 Estimating with Confidence 8.1Confidence Intervals: The Basics 8.2Estimating a Population Proportion 8.3Estimating a Population.

Trust Me, I’m Partially Right: Incremental Visualization Lets Analysts Explore Large Datasets Faster Shengliang Dai.

Quantification of Integrity Michael Clarkson and Fred B. Schneider Cornell University IEEE Computer Security Foundations Symposium July 17, 2010.

Belief in Information Flow Michael Clarkson, Andrew Myers, Fred B. Schneider Cornell University 18 th IEEE Computer Security Foundations Workshop June.

Lecture. Today Problem set 9 out (due next Thursday) Topics: –Complexity Theory –Optimization versus Decision Problems –P and NP –Efficient Verification.

KNOWLEDGE-ORIENTED MULTIPARTY COMPUTATION Piotr (Peter) Mardziel, Michael Hicks, Jonathan Katz, Mudhakar Srivatsa (IBM TJ Watson)

+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.

Output Perturbation with Query Relaxation By: XIAO Xiaokui and TAO Yufei Presenter: CUI Yingjie.

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Private Data Management with Verification

Sampling and Sampling Distributions

Differential Privacy in Practice

Chapter 8: Estimating with Confidence

Probabilistic Databases

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Analysis of Algorithms

Chapter 8: Estimating with Confidence

CS 416 Artificial Intelligence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Propagation of Error Berlin Chen

Presentation transcript:

DYNAMIC ENFORCEMENT OF KNOWLEDGE-BASED SECURITY POLICIES Piotr (Peter) Mardziel, Stephen Magill, Michael Hicks, and Mudhakar Srivatsa

Your information 2

Bad + Good 3

Want Protect personal information. Preserve beneficial uses. 4

Take back control Photography 5 querier you

Useful and non-revealing Photography out = 24 ≤ Age ≤ 30 Æ Female? Æ Engaged? * true 6 * real query used by a facebook advertiser querier you

out = (age, zip-code, birth-date) * reject 7 * - zip-code: postal code in USA - age, zip-code, birth-date can be used to uniquely identify 63% of Americans

Benefits Can provide exact answer to query. Need not know ahead of time how your information can be useful. Need not preemptively hide information. doesn’t restricts uses 8

The problem: how to decide to run a query or not? out := (age, zip-code, birth-date) out := age out := 24 ≤ age ≤ 30 Æ female? Æ engaged? Q1 Q2 Q3 ? 9

Setting Past answers, unknown future queries. Cannot view queries in isolation. Cannot use various static QIF measures. Rejection. Cannot let rejection defeat our protection. time Q41 Q43 … … Q42 X reject 10

Goal: restrict querier knowledge Define knowledge. Define learning. 11

Knowledge model Given belief, a query, determine revised belief given the output of the query (Bayesian revision) evaluate revised belief as acceptable or not Can repeat the process on future queries start with the revised belief from the previous one = belief (probability distribution) over secret data 12

Meet Bob = 0 · bday · · byear · 1992 Bob (born September 24, 1980) bday = 267 byear = 1980 Secret bday byear Assumption: this is accurate each equally likely

bday-query1 today := 260; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 | = (out = 0) P

bday-query1 today := 260; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 | = (out = 0) = 15 Problem Policy: Is this acceptable?

Probabilistic interpretation We can use (general purpose) probabilistic languages to model attacker knowledge and how it changes due to query results. prob-scheme, IBAL, etc. Problem: (standard) enumeration-based probabilistic interpretation and revision is too slow. 16

Outline Our results Problem 1: Suitable policy definition. Problem 2: Approximate (but sound in terms of policy) probabilistic computation. Implementation. Experimental results. compare to prob-scheme, enumeration-based probabilistic interpretation 17

Problem 1: Policy Let us consider a policy Pr[my secret] < t Choice of threshold t might depend on risks involved in revelation of secret. 18

Bob’s policy = 0 · bday · · byear · 1992 Bob (born September 24, 1980) bday = 267 byear = 1980 Secret Policy Pr[bday] < 0.2 Pr[bday,byear] < 0.05 bday byear Currently Pr[bday] = 1/365 Pr[bday,byear] = 1/(365*37) Pr[bday = 267] … 19 each equally likely P

Bob’s policies Policy Pr[bday] < 0.2 bday specifically should never be certainly known 20

Bob’s policies 21 Policy Pr[bday,byear] < 0.05 the pair of bday,byear should never be certainly known doesn’t commit to protecting bday nor byear alone allows queries that need high precision in one, or low precision in both

bday-query1 today := 260; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 | = (out = 0)

bday-query1 today := 260; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 | = (out = 0) Potentially Pr[bday] = 1/358 < 0.2 Pr[bday,byear] = 1/(358*37) < 0.05 = 23

bday-query2 today := 261; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 Next day … | = (out = 1) So reject? Pr[bday] = 1 P

if bday != 267if bday = Querier’s perspective will get answer will get reject Assume querier knows policy 25

Rejection problem | =reject Policy: Pr[bday = 267 | out = o] < t Rejection, intended to protect secret, reveals secret. 26

Rejection revised Policy: Pr[bday = 267 | out = o] < t Solution? Decide policy independently of secret Revised policy for every possible output o, for every possible bday b, Pr[bday = b | out = o] < t So the real bday in particular 27

bday-query1 today := 260; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 accept 28 initial belief

bday-query2 today := 261; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 reject (regardless of what bday actually is) 29 (after bday-query1)

bday-query3 today := 266; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 30 (after bday-query1) accept

Problem 2: Slowness We can use (general purpose) probabilistic languages to model attacker knowledge and how it changes due to query results. prob-scheme, IBAL, etc. (standard) enumeration-based probabilistic interpretation and revision is too slow. Let us look at why. 31

bday-query1 today := 260; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 32 Enumeration Æ (out = 0) * * ± | (out = 0) = normalize(± Æ (out = 0)) input state output state

33 Enumeration A lot of states = slow Can (over estimate) probabilities after partial enumeration. Assume unseen mass is in worst possible spot

34 Problem 2: Slowness Let us do away with enumeration. 15:00

Representation P 1 : 0 · bday · 259, 1956 · byear · 1992, out = 0 p = P 2 : 267 · bday · 364, 1956 · byear · 1992, out = 1 p = … P1P1 P 1 : 0 · bday · 364, 1956 · byear · 1992 p = P1P1 P2P2 P3P3 P

36 Region Enumeration of states ! Manipulation of regions of states Problems too many regions non-uniform regions

nasty-query1 … disjuncts … … more disjuncts … … disjuncts EVERYWHERE … Too many regions

Approximation P 1 : 0 · bday · 259, 1956 · byear · 1992 p = s · 8580 (  size of P 1 ) P 2 : 267 · bday · 364, 1956 · byear · 1992 p = s · 3234 (  size of P 2 ) P1P1 P2P2 P 1 : 0 · bday · 259, 1992 · byear · 1992 p = P 2 : 0 · bday · 259, 1982 · byear · 1990 p = … P 10 p,s refer to possible (non-zero probability) points in region abstract P1P1 P2P2

Abstraction imprecision abstract P1P1 P2P

nasty-query-2 … disjuncts … … more disjuncts … … probabilistic choice … P1P1 P2P2 Non-uniform regions

Approximation approximate P1P1 P2P2 P 1 : 0 · bday · 259, 1956 · byear · 1992 p · s · 9620 P 2 : 267 · bday · 364, 1956 · byear · 1992 p · s · P1P1 P2P2 P 1 : 0 · bday · 259, 1992 · byear · 1992 p = P 2 : 0 · bday · 259, 1991 · byear · 1991 p = … P 18 for policy check 8 … 8 … Pr[bday = b | out = o] < t 41 P

Abstraction abstract P1P1 P2P2 For each P i, store region (polyhedron) upper bound on probability of each possible point upper bound on the number of (possible) points Also store lower bounds on the above Pr[A | B] = Pr[A Æ B] / Pr[B] Probabilistic Polyhedron 42

Abstract Conditioning

Abstract Conditioning approximate P1P1 P2P

Approximation benefit = 0 · bday · · byear · 1992 Bob (born September 24, 1980) bday = 267 byear = 1980 Secret bday byear Assumption: this is accurate each equally likely 20:00

Initial distribution bday byear P1P1 1/(37*365) - ² · probability · 1/(37*365) + ² 37*365 · # of points · 37*365 Need attacker’s actual belief to be represented by this P 1 Much easier task than knowing the exact querier belief.

Abstraction Summary Approximate representation of a set of probability distributions. Abstract operations for Clarkson’s probabilistic semantics. Sound in terms of policy, even after belief revision. Restricted arithmetic expressions (linear only) No state enumeration and a way to restrict number of regions (more) computationally feasible Implementation Uses Parma PPL for polyhedra operations Uses Latte to determine size (# of integer points) inside polyhedra 47

Results Statespace size resistance See technical report for more queries precision/performance tradeoff issues 48

Statespace size resistance = 0 · bday · · byear · 1992 bday byear each equally likely

bday-query1 today := 260; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 1.prob-poly-set (our implementation) 2.prob-scheme (sampling/enumeration) provides sound estimation after partial enumeration measure time and probability bound of most probable state (pair bday,byear) 50 = 0 · bday · · byear · 1992 bday byear each equally likely

Statespace size resistance = 0 · bday · · byear · 1992 = 0 · bday · · byear · pp > 1 pp 51 each equally likely 24:00

Conclusions Knowledge modeling Probabilistic computation and revision of knowledge due to query outputs. Our results Suitable policy definition. Approximate (but sound in terms of policy) probabilistic interpretation. Experimental results. State size resistance. Drawbacks Restricted language, integer linear expressions Still too slow for practical use The future performance / accuracy simpler domains (octagons) can replace polyhedra (and Latte counting tool) use of Latte accounts for over 95% of most tests we ran various other technicalities less restricted language and/or state space 52

Thank you! Go back. 53

Creating dependency Dependencies can be created P1P P1P1 strange-query if bday – 1956 ¸ byear - 5 Æ bday – 1956 · byear + 5 then out := 1 else out := 0 * not to scale and/or shape

Differential privacy Encourage participation in a database by protecting inference of a record. Add noise to query result. Requires trusted curator. Suitable for aggregate queries where exact results are not required. Not suitable to a single user protecting him(her)self. Using differential privacy on a query with 2 output values results in the incorrect query output almost half the time (for reasonable ²) 55

Composability / Collusion If collusion is suspected among queries, one can make them share belief query outputs are assumed to have been learned by all of them assumes deterministic queries only Confusion problem: non-deterministic queries can result in a revised belief which is less sure of the true (or some) secret value than the initial belief If collusion is suspected but not present, this can result in incorrect belief Abstract solution: abstract union of the revised belief (colluded) and the prebelief (not colluded) P 1 [ P 2 if ± i 2 °(P i ) for i = 1,2 then ± 1, ± 2 2 °(P 1 [ P 2 ) 56

Setting, Known Secret Measures of badness that weigh their values by the distribution of the secret (or the distribution of the outputs of a query) can be problematic. An unlikely user could theorize a query is safe, but know it is not safe for them Consider conditional min-entropy (see Information-theoretic Bounds for Differentially Private Mechanisms) H 1 (X | Y) = -log 2  y P Y (y) max x P X|Y (x,y) > - log 2 t probability of certain query output Our policy: -log 2 max y max x P X|Y (x,y) > - log 2 t so that -log 2 max x P X|Y (x,yreal) > - log 2 t 57