DYNAMIC ENFORCEMENT OF KNOWLEDGE-BASED SECURITY POLICIES Piotr (Peter) Mardziel, Stephen Magill, Michael Hicks, and Mudhakar Srivatsa
Your information 2
Bad + Good 3
Want Protect personal information. Preserve beneficial uses. 4
Take back control Photography 5 querier you
Useful and non-revealing Photography out = 24 ≤ Age ≤ 30 Æ Female? Æ Engaged? * true 6 * real query used by a facebook advertiser querier you
out = (age, zip-code, birth-date) * reject 7 * - zip-code: postal code in USA - age, zip-code, birth-date can be used to uniquely identify 63% of Americans
Benefits Can provide exact answer to query. Need not know ahead of time how your information can be useful. Need not preemptively hide information. doesn’t restricts uses 8
The problem: how to decide to run a query or not? out := (age, zip-code, birth-date) out := age out := 24 ≤ age ≤ 30 Æ female? Æ engaged? Q1 Q2 Q3 ? 9
Setting Past answers, unknown future queries. Cannot view queries in isolation. Cannot use various static QIF measures. Rejection. Cannot let rejection defeat our protection. time Q41 Q43 … … Q42 X reject 10
Goal: restrict querier knowledge Define knowledge. Define learning. 11
Knowledge model Given belief, a query, determine revised belief given the output of the query (Bayesian revision) evaluate revised belief as acceptable or not Can repeat the process on future queries start with the revised belief from the previous one = belief (probability distribution) over secret data 12
Meet Bob = 0 · bday · · byear · 1992 Bob (born September 24, 1980) bday = 267 byear = 1980 Secret bday byear Assumption: this is accurate each equally likely
bday-query1 today := 260; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 | = (out = 0) P
bday-query1 today := 260; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 | = (out = 0) = 15 Problem Policy: Is this acceptable?
Probabilistic interpretation We can use (general purpose) probabilistic languages to model attacker knowledge and how it changes due to query results. prob-scheme, IBAL, etc. Problem: (standard) enumeration-based probabilistic interpretation and revision is too slow. 16
Outline Our results Problem 1: Suitable policy definition. Problem 2: Approximate (but sound in terms of policy) probabilistic computation. Implementation. Experimental results. compare to prob-scheme, enumeration-based probabilistic interpretation 17
Problem 1: Policy Let us consider a policy Pr[my secret] < t Choice of threshold t might depend on risks involved in revelation of secret. 18
Bob’s policy = 0 · bday · · byear · 1992 Bob (born September 24, 1980) bday = 267 byear = 1980 Secret Policy Pr[bday] < 0.2 Pr[bday,byear] < 0.05 bday byear Currently Pr[bday] = 1/365 Pr[bday,byear] = 1/(365*37) Pr[bday = 267] … 19 each equally likely P
Bob’s policies Policy Pr[bday] < 0.2 bday specifically should never be certainly known 20
Bob’s policies 21 Policy Pr[bday,byear] < 0.05 the pair of bday,byear should never be certainly known doesn’t commit to protecting bday nor byear alone allows queries that need high precision in one, or low precision in both
bday-query1 today := 260; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 | = (out = 0)
bday-query1 today := 260; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 | = (out = 0) Potentially Pr[bday] = 1/358 < 0.2 Pr[bday,byear] = 1/(358*37) < 0.05 = 23
bday-query2 today := 261; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 Next day … | = (out = 1) So reject? Pr[bday] = 1 P
if bday != 267if bday = Querier’s perspective will get answer will get reject Assume querier knows policy 25
Rejection problem | =reject Policy: Pr[bday = 267 | out = o] < t Rejection, intended to protect secret, reveals secret. 26
Rejection revised Policy: Pr[bday = 267 | out = o] < t Solution? Decide policy independently of secret Revised policy for every possible output o, for every possible bday b, Pr[bday = b | out = o] < t So the real bday in particular 27
bday-query1 today := 260; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 accept 28 initial belief
bday-query2 today := 261; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 reject (regardless of what bday actually is) 29 (after bday-query1)
bday-query3 today := 266; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 30 (after bday-query1) accept
Problem 2: Slowness We can use (general purpose) probabilistic languages to model attacker knowledge and how it changes due to query results. prob-scheme, IBAL, etc. (standard) enumeration-based probabilistic interpretation and revision is too slow. Let us look at why. 31
bday-query1 today := 260; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 32 Enumeration Æ (out = 0) * * ± | (out = 0) = normalize(± Æ (out = 0)) input state output state
33 Enumeration A lot of states = slow Can (over estimate) probabilities after partial enumeration. Assume unseen mass is in worst possible spot
34 Problem 2: Slowness Let us do away with enumeration. 15:00
Representation P 1 : 0 · bday · 259, 1956 · byear · 1992, out = 0 p = P 2 : 267 · bday · 364, 1956 · byear · 1992, out = 1 p = … P1P1 P 1 : 0 · bday · 364, 1956 · byear · 1992 p = P1P1 P2P2 P3P3 P
36 Region Enumeration of states ! Manipulation of regions of states Problems too many regions non-uniform regions
nasty-query1 … disjuncts … … more disjuncts … … disjuncts EVERYWHERE … Too many regions
Approximation P 1 : 0 · bday · 259, 1956 · byear · 1992 p = s · 8580 ( size of P 1 ) P 2 : 267 · bday · 364, 1956 · byear · 1992 p = s · 3234 ( size of P 2 ) P1P1 P2P2 P 1 : 0 · bday · 259, 1992 · byear · 1992 p = P 2 : 0 · bday · 259, 1982 · byear · 1990 p = … P 10 p,s refer to possible (non-zero probability) points in region abstract P1P1 P2P2
Abstraction imprecision abstract P1P1 P2P
nasty-query-2 … disjuncts … … more disjuncts … … probabilistic choice … P1P1 P2P2 Non-uniform regions
Approximation approximate P1P1 P2P2 P 1 : 0 · bday · 259, 1956 · byear · 1992 p · s · 9620 P 2 : 267 · bday · 364, 1956 · byear · 1992 p · s · P1P1 P2P2 P 1 : 0 · bday · 259, 1992 · byear · 1992 p = P 2 : 0 · bday · 259, 1991 · byear · 1991 p = … P 18 for policy check 8 … 8 … Pr[bday = b | out = o] < t 41 P
Abstraction abstract P1P1 P2P2 For each P i, store region (polyhedron) upper bound on probability of each possible point upper bound on the number of (possible) points Also store lower bounds on the above Pr[A | B] = Pr[A Æ B] / Pr[B] Probabilistic Polyhedron 42
Abstract Conditioning
Abstract Conditioning approximate P1P1 P2P
Approximation benefit = 0 · bday · · byear · 1992 Bob (born September 24, 1980) bday = 267 byear = 1980 Secret bday byear Assumption: this is accurate each equally likely 20:00
Initial distribution bday byear P1P1 1/(37*365) - ² · probability · 1/(37*365) + ² 37*365 · # of points · 37*365 Need attacker’s actual belief to be represented by this P 1 Much easier task than knowing the exact querier belief.
Abstraction Summary Approximate representation of a set of probability distributions. Abstract operations for Clarkson’s probabilistic semantics. Sound in terms of policy, even after belief revision. Restricted arithmetic expressions (linear only) No state enumeration and a way to restrict number of regions (more) computationally feasible Implementation Uses Parma PPL for polyhedra operations Uses Latte to determine size (# of integer points) inside polyhedra 47
Results Statespace size resistance See technical report for more queries precision/performance tradeoff issues 48
Statespace size resistance = 0 · bday · · byear · 1992 bday byear each equally likely
bday-query1 today := 260; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 1.prob-poly-set (our implementation) 2.prob-scheme (sampling/enumeration) provides sound estimation after partial enumeration measure time and probability bound of most probable state (pair bday,byear) 50 = 0 · bday · · byear · 1992 bday byear each equally likely
Statespace size resistance = 0 · bday · · byear · 1992 = 0 · bday · · byear · pp > 1 pp 51 each equally likely 24:00
Conclusions Knowledge modeling Probabilistic computation and revision of knowledge due to query outputs. Our results Suitable policy definition. Approximate (but sound in terms of policy) probabilistic interpretation. Experimental results. State size resistance. Drawbacks Restricted language, integer linear expressions Still too slow for practical use The future performance / accuracy simpler domains (octagons) can replace polyhedra (and Latte counting tool) use of Latte accounts for over 95% of most tests we ran various other technicalities less restricted language and/or state space 52
Thank you! Go back. 53
Creating dependency Dependencies can be created P1P P1P1 strange-query if bday – 1956 ¸ byear - 5 Æ bday – 1956 · byear + 5 then out := 1 else out := 0 * not to scale and/or shape
Differential privacy Encourage participation in a database by protecting inference of a record. Add noise to query result. Requires trusted curator. Suitable for aggregate queries where exact results are not required. Not suitable to a single user protecting him(her)self. Using differential privacy on a query with 2 output values results in the incorrect query output almost half the time (for reasonable ²) 55
Composability / Collusion If collusion is suspected among queries, one can make them share belief query outputs are assumed to have been learned by all of them assumes deterministic queries only Confusion problem: non-deterministic queries can result in a revised belief which is less sure of the true (or some) secret value than the initial belief If collusion is suspected but not present, this can result in incorrect belief Abstract solution: abstract union of the revised belief (colluded) and the prebelief (not colluded) P 1 [ P 2 if ± i 2 °(P i ) for i = 1,2 then ± 1, ± 2 2 °(P 1 [ P 2 ) 56
Setting, Known Secret Measures of badness that weigh their values by the distribution of the secret (or the distribution of the outputs of a query) can be problematic. An unlikely user could theorize a query is safe, but know it is not safe for them Consider conditional min-entropy (see Information-theoretic Bounds for Differentially Private Mechanisms) H 1 (X | Y) = -log 2 y P Y (y) max x P X|Y (x,y) > - log 2 t probability of certain query output Our policy: -log 2 max y max x P X|Y (x,y) > - log 2 t so that -log 2 max x P X|Y (x,yreal) > - log 2 t 57