DYNAMIC ENFORCEMENT OF KNOWLEDGE-BASED SECURITY POLICIES Michael Hicks University of Maryland, College Park Joint work with Piotr Mardziel, Stephen Magill,

DYNAMIC ENFORCEMENT OF KNOWLEDGE-BASED SECURITY POLICIES Michael Hicks University of Maryland, College Park Joint work with Piotr Mardziel, Stephen Magill, and Mudhakar Srivatsa (IBM)

My research Programming languages-based techniques for more reliable, available, and secure software 2

My research Programming languages-based techniques for more reliable, available, and secure software Recent projects Dynamic software updating (for C and Java) Blending PL and Crypto Adapton: Demand-driven Incremental Computation Druby: Static types for dynamic languages 3

Why am I here? 4 http://www.pl- enthusiast.net/2014/09/08/probabilistic- programming/ 21,858 pageviews, as of yesterday 21,858 pageviews, as of yesterday

Today’s talk Using probabilistic programming to enforce knowledge-based security policies 5 1.The setting, the problem, and our solution 2.A technique for probabilistic inference that is sound with respect to security Based on abstract interpretation

“Free” sites Good: Useful service, no direct charge Bad: Free use paid for by use of personal information How to balance the two? 6

Status quo: they have your data 7 your data you query/filter page Photography advertisers + ads

Take back control 8 Photography advertisers query/filter page + query/filter query responses your data you

Simpler model (for exposition) response 9 querier you query Photography

Useful and non-revealing Photography out = 24 ≤ Age ≤ 30 & Female? & Engaged? * true 10 * real query used by a facebook advertiser querier you

Reveals too much! out = (age, zip-code, birth-date) * reject 11 * - age, zip-code, birth-date can be used to uniquely identify 63% of Americans

The problem: how to decide to answer based on the information revealed? out := (age, zip-code, birth-date) out := age out := 24 ≤ age ≤ 30 & female? & engaged? Q1 Q2 Q3 ? 12

Approach Maintain a representation of the querier’s belief about secret’s possible values Each query result revises the belief; reject if actual secret becomes too likely Cannot let rejection defeat our protection. time Q1 Q3 … … Q2 Reject 13 Belief ≜ probability distribution Bayesian reasoning to revise belief OK (answer)

Contributions 1. Knowledge-based security policies Dynamic enforcement (CSF’11, JCS’13) Threshold to decide when an answer reveals too much Query rejection does not reveal information Risk prediction (IEEE S&P’14, FCS’14) Model series of interactions between adaptive adversary and a defender able to change the secret What is risk of certain change strategies? 14 All work collected in Piotr Mardizel’s dissertation

Contributions 2. Implementing checking and belief revision efficiently Treat queries as probabilistic programs, where inputs are distributions that represent beliefs Perform probabilistic inference via abstract interpretation, using a novel probabilistic polyhedra Domain associates polyhedron with overlapping descriptions of probability mass, for improved precision We track upper and lower bounds for sound conditioning and normalization 15 All work collected in Piotr Mardizel’s dissertation

Meet Bob = 0 ≤ bday ≤ 364 1956 ≤ byear ≤ 1992 Bob (born September 24, 1980) bday = 267 byear = 1980 Secret bday byear 1956 1992 0364 16 Assumption: this is accurate each equally likely

bday-query1 today := 260; if bday ≤ today && bday < (today + 7) then out := 1 else out := 0 | = (out = 0) 1956 1992 0364 259 267 17 1956 1992 0364

bday-query1 today := 260; if bday ≤ today && bday < (today + 7) then out := 1 else out := 0 | = (out = 0) 1956 1992 0364 259 267 = 18 Problem Policy: Is this acceptable?

Idea: policy as knowledge threshold Answer a query if, for querier’s revised belief, Pr[my secret] < t Call t the knowledge threshold Choice of t depends on the risk of revelation 19

Bob’s policy = 0 ≤ bday ≤ 364 1956 ≤ byear ≤ 1992 Bob (born September 24, 1980) bday = 267 byear = 1980 Secret Policy Pr[bday] < 0.2 Pr[bday,byear] < 0.05 bday byear 1956 1992 0364 Currently Pr[bday] = 1/365 Pr[bday,byear] = 1/(365*37) Pr[bday = 267] … 20 each equally likely P

Bob’s policies Policy Pr[bday] < 0.2 bday specifically should never be certainly known 21

Bob’s policies 22 Policy Pr[bday,byear] < 0.05 the pair of bday,byear should never be certainly known doesn’t commit to protecting bday or byear alone allows queries that need high precision in one, or low precision in both

bday-query1 today := 260; if bday ≤ today && bday < (today + 7) then out := 1 else out := 0 | = (out = 0) 1956 1992 0364 259 267 23 1956 1992 0364

bday-query1 today := 260; if bday ≤ today && bday < (today + 7) then out := 1 else out := 0 | = (out = 0) 1956 1992 0364 259 267 Potentially Pr[bday] = 1/358 < 0.2 Pr[bday,byear] = 1/(358*37) < 0.05 = 24

bday-query2 today := 261; if bday ≤ today && bday < (today + 7) then out := 1 else out := 0 Next day … 1956 1992 0364 259 267 | = (out = 1) 25 1956 1992 0 267 So reject? Pr[bday] = 1 P

1956 1992 0 267 if bday ≠ 267if bday = 267 1956 1992 0364 259 268 Querier’s perspective will get answer will get reject Assume querier knows policy 26

Rejection problem | =reject Policy: Pr[bday = 267 | out = o] < t Rejection, intended to protect secret, reveals secret! 27

Rejection revised Policy: Pr[bday = 267 | out = o] < t Solution? Decide policy independently of secret Revised procedure for every possible output o, for every possible bday b, Pr[bday = b | out = o] < t So the real bday in particular 28

bday-query1 today := 260; if bday ≤ today && bday < (today + 7) then out := 1 else out := 0 accept 29 initial belief

bday-query2 today := 261; if bday ≤ today && bday < (today + 7) then out := 1 else out := 0 reject (regardless of what bday actually is) 30 (after bday-query1)

bday-query3 today := 266; if bday ≤ today && bday < (today + 7) then out := 1 else out := 0 31 (after bday-query1) accept

Implementing belief revision Idea: Use probabilistic programming! Query is a probabilistic program Inputs are distributions representing adversary beliefs Observations are actual/possible outputs Posterior is conditioned on these 32

BUT: Enumeration/sampling-based probabilistic interpretation is problematic In the limit, time taken is proportional to the running time of the query and the size of the state space Approximations used may not be sound (may underestimate risk) Can address the latter problem using abstract interpretation Sound approximation of enumeration-based semantics 33 Implementing belief revision

Three slides on abstract interpretation Given a programming language e ::= x | n | e * e The concrete interpretation is its actual semantics Concrete domain is natural numbers Concrete interpretation Eval(e) is multiplication 1 * (2 * 3) = 6, etc. An abstract interpretation approximates the semantics Abstract domain may be intervals of natural numbers [n 1,n 2 ] Abstract interpretation Aeval(e): [n 1,n 2 ] * [n 3,n 4 ] = [n 1 *n 3, n 2 *n 4 ] [1,4] * ([2,2] * [3,6]) = [1,4] * [6,12] = [6,48] 34

Relating abstract and concrete Concretization function γ relates the abstract domain A to the concrete one C γ : A C γ ([n 1,n 2 ]) = { n | n 1 ≤ n ≤ n 2 } – here, C is the power-set of naturals Abstraction function α relates the concrete domain C to the abstract one A α : C A α(S) = [n 1,n 2 ] where n ∈ S implies n 1 ≤ n and n ≤ n 2 Galois insertion: S ⊆ γ(α(S)) and α(γ(A)) = A 35

Soundness Aeval(e) (where x appears in e) is sound iff Aeval(e) [x=A] = A’ forall n ∈ γ(A), Eval(e) [x=n] = n’ implies n’ ∈ γ(A’) (With obvious generalization to multiple variables) Suppose e is x * 5 Abstract interpretation, assuming x=[1,3] Aeval(e) [x=[1,3]] = [5,15] Concrete interpretation, assuming x=2 ∈ γ([1,3]) Eval(e) [x=2] = 10 Notice that 10 ∈ γ([5,15]) 36

Concrete domain: probability distributions Distributions δ : States Real numbers A state σ is a map from variables to integers Semantics of a statement S under distribution δ is written Sδ. Informally, concrete interpretation is: For each state σ in the support of δ, let p be its probability Execute S in that state, producing σ’ with probability mass p Add to it the probability mass of all other input states that produce σ’ Note that distributions’ probabilities may not sum to 1 Frequency distributions are more efficient, simplify development 37 Back to our setting of probabilistic inference:

1956 1992 0364 bday-query1 today := 260; if bday ≤ today && bday < (today + 7) then out := 1 else out := 0 38 Implementation by enumeration 1956 1992 0364 259 267 | (out = 0) * * δ | (out = 0) = normalize(δ | (out = 0)) input state output state =

Probabilistic Interpretation Semantics (cf. Kozen) skipδ = δ S 1 ; S 2 δ = S 2 (S 1 δ) if B then S 1 else S 2 δ = S 1 (δ | B) + S 2 (δ | ¬B) pif p then S 1 else S 2 δ = S 1 (p*δ) + S 2 ((1-p)*δ) x := Eδ = δ[x E] while B do S = lfp (λF. λδ. F(S(δ | B)) + (δ | ¬B)) Operations p*δ – scale probabilities by p δ | B – remove mass inconsistent with B δ 1 + δ 2 – combine mass from both δ[x E] – transform mass

Abstract domain: probabilistic polyhedra P 1 : 0 ≤ bday ≤ 259, 1956 ≤ byear ≤ 1992, out = 0 p = 0.000074 P 2 : 267 ≤ bday ≤ 364, 1956 ≤ byear ≤ 1992, out = 1 p = 0.000074 … 1956 1992 0364 P1P1 P 1 : 0 ≤ bday ≤ 364, 1956 ≤ byear ≤ 1992 p = 0.000074 41 1956 1992 0364 259 267 P1P1 P2P2 P3P3 Key elements of Prob. Poly. P:  Constraints describing polygons C i  Probability p per point within the polygon

42 Performance / precision tradeoff (Sets of) polyhedra are a standard abstract domain, so we can reuse (parts of) existing tools to operate on them directly Can end up with many small regions May be expensive to manipulate Approach Collapse regions: specify the “boundary” of a region and bounds on the number and probability of the points within it Nonuniform regions hurt precision but fewer regions improve performance

nasty-query1 … many disjuncts … age := 2011 – byear; if age = 20 || … || age = 60 then out := true else out := false; 1991 1956 0364 259 267 1961 1971 1981 1992 43 Too many regions Let = belief after bday-query1 | (out = false)

Approximation P 1 : 0 ≤ bday ≤ 259, 1956 ≤ byear ≤ 1992 p = 0.000067 s = 8580 (  true size = 9620) P 2 : 267 ≤ bday ≤ 364, 1956 ≤ byear ≤ 1992 p = 0.000067 s = 3234 (  true size = 3626) 1956 0364 259 267 1961 1971 1981 1992 P1P1 P2P2 P 1 : 0 ≤ bday ≤ 259, 1992 ≤ byear ≤ 1992 p = 0.000067 P 2 : 0 ≤ bday ≤ 259, 1982 ≤ byear ≤ 1990 p = 0.000067 … (ten total) P 10 p,s refer to possible (non-zero probability) points in region 44 1956 0364 259 267 1961 1971 1981 1992 1991 approximate P1P1 P2P2 exact

Abstraction imprecision 1956 0364 259 267 1961 1971 1981 1992 1991 abstract P1P1 P2P2 1956 0364 259 267 1961 1971 1981 1992 45 1956 0364 259 267 1961 1971 1981 1992 1956 0364 259 267 1961 1971 1981 1992 exact

nasty-query-2 … many disjuncts … age := 2011 – byear; if age = 20 || … || age = 60 then out := true else out := false; 46 1956 0364 259 267 1961 1971 1981 1992 Non-uniform regions Takes the true branch one time out of ten Thus, out = true implies age is a decade or the coin flipped in our favor; former more likely | out = true pif 0.1 then out := true … probabilistic choice:

Approximation 1956 0364 259 267 1961 1971 1981 1992 1991 approximate P1P1 P2P2 P 1 : 0 ≤ bday ≤ 259, 1956 ≤ byear ≤ 1992 p ≤ 0.000074 (less-equal) s = 9620 P 2 : 267 ≤ bday ≤ 364, 1956 ≤ byear ≤ 1992 p ≤ 0.000074 s = 3626 1956 0364 259 267 1961 1971 1981 1992 P1P1 P2P2 P 1 : 0 ≤ bday ≤ 259, 1992 ≤ byear ≤ 1992 p = 0.0000074 P 2 : 0 ≤ bday ≤ 259, 1991 ≤ byear ≤ 1991 p = 0.000074 … (18 total) P 18 for policy check Pr[bday = b | out = o] < t 47

Final abstraction 1956 0364 259 267 1961 1971 1981 1992 1991 P1P1 P2P2 For each P i, store region (polyhedron) p: upper bound on probability of each possible point s: upper bound on the number of (possible) points m: upper bound on the total probability mass (useful) Also store lower bounds on the above Pr[A | B] = Pr[A ∩ B] / Pr[B] Probabilistic Polyhedron 48

Relating concrete and abstract CiCi δ ∈ γ(P) iff support(δ) ⊆ γ(C) p min ≤ δ(σ) ≤ p max for every σ ∈ support(δ) s min ≤ |support(δ)| ≤ s max m min ≤ mass(δ) ≤ m max support(δ) = {σ | δ(σ) > 0} p i min, p i max s i min, s i max m i min, m i max ∈ γ( ) … δ1δ1 δ2δ2 δ3δ3 δ ∈ γ({P 1,P 2 }) iff δ = δ 1 + δ 2 with δ 1 ∈ γ(P 1 ) and δ 2 ∈ γ(P 2 ) similarly for more than two

Abstract Semantics Define SP Soundness: if δ ∈ γ(P) then Sδ ∈ γ (SP) Need Abstract operations P 1 + P 2 if δ i ∈ γ(P i ) for i = 1,2 then δ 1 + δ 2 ∈ γ(P 1 + P 2 ) P | B if δ ∈ γ(P) then δ | B ∈ γ(P | B) p*P if δ ∈ γ(P) then p * δ ∈ γ(p*P) … (and likewise for sets of P)

Abstract operation example δ 1 + δ 2 – combine mass from both δ 1 + δ 2 = λσ. δ 1 (σ) + δ 2 (σ) δ1δ1 δ2δ2 δ1+δ2δ1+δ2 s 1 max = 100 s 2 max = 30 P1P1 C1C1 P2P2 C2C2 P 3 = P 1 + P 2 s 3 max ≤ s 1 max + s 2 max = 130 | C 1 ⋂ C 2 | = 20 20 80 | C 1 | = 100 | C 2 | = 40 s 3 max = 120 What is the maximum number of possible points in the sum? determine minimum overlap (20)

Operations P 3 = P 1 + P 2 C 3 – convex hull of C 1, C 2 s 3 max – what is the smallest overlap? s 3 min – what is the largest overlap? p 3 max – is overlap possible? p 3 min – is overlap impossible? m 3 max – simple sum m 1 max + m 2 max m 3 min – simple sum m 1 min + m 2 min Other operations, similar, complicated formulas abound Need to count number of integer points in a convex polyhedra Latte maximize a linear function over integer points in a polyhedron Latte convex hull, intersection, affine transform Parma ±P PPL

Implementation and experiments Interpreter written in Objective Caml for a simple imperative language Calling out to LattE and Parma, as mentioned In addition to full-precision polyhedra, implemented probabilistic versions of two other domains Intervals – constraints have form c 1 ≤ x ≤ c 2 Can implement counting without LattE Octagons – constraints have form ax + by ≤ c with a,b ∈ {-1,0,1} Experiments on several queries Results for a Mac Pro with two 2.26 GHz quad-core Xeon processors using 16 GB of RAM 53

Queries Birthday query 1 Birthday query 1+2 Birthday query 1+2+special Pizza User in particular age range, in college, lives within 2 mile square Photo User in particular age range, female, engaged Travel Lives in particular country, speaks English, over 21, completed high school 54

Scales better than enumeration = 0 ≤ bday ≤ 364 1956 ≤ byear ≤ 1992 = 0 ≤ bday ≤ 364 1910 ≤ byear ≤ 2010 1 pp > 1 pp 55 each equally likely bday1 small bday 1 large

Performance/precision tradeoff 56 Birthday query 1+2+special

Intervals very fast generally QueryIntervalsOctagonsPolyhedra Bday1 (small)0.011.872.81 Bday1+2 (small)0.012.95.25 Bday1+2+spec0.4717.823.0 Bday1 (large)0.012.12.48 Bday1+2 (large)0.023.024.52 Bday1+2+spec0.5833.646.5 Pizza0.2692.7125.5 Photo0.025.477.98 Travel0.48126.9154.5 57 Times in seconds All achieve maximum precision when given unlimited polyhedra

LattE is the bottleneck 58

Limitations: Prototype impoverished No real or rational numbers No probabilistic choice where weight is unknown No widening/narrowing operator for better handling loops Little support for structured data These things should all be possible, starting from standard domains for abstract interpretation 59

Ongoing work, other applications Knowledge-based security policies in Symmetric setting A and B are private inputs to Alice and Bob, and Charlie computes f(A,B) and returns the result to both What can Alice and/or Bob learn about the other’s input? See our paper in PLAS’12 Location privacy Apply knowledge policies to locations as secrets Must consider time-varying nature of current location Ongoing work with Neil Toronto and David Van Horn More … ! 60

Summary of contributions 61 Knowledge-based security policies Implementation of belief tracking using probabilistic abstract interpretation Performance results using intervals are promising

BACKUP 62

Creating dependency Dependencies can be created. 63 1956 1992 0364 P1P1 1956 1992 0364 P1P1 strange-query if bday – 1956 ¸ byear - 5 Æ bday – 1956 · byear + 5 then out := 1 else out := 0 * not to scale and/or shape

Differential privacy Encourage participation in a database by protecting inference of a record. Add noise to query result. Requires trusted curator. Suitable for aggregate queries where exact results are not required. Not suitable to a single user protecting him(her)self. Using differential privacy on a query with 2 output values results in the incorrect query output almost half the time (for reasonable ²) 64

Composability / Collusion If collusion is suspected among queries, one can make them share belief query outputs are assumed to have been learned by all of them assumes deterministic queries only Confusion problem: non-deterministic queries can result in a revised belief which is less sure of the true (or some) secret value than the initial belief If collusion is suspected but not present, this can result in incorrect belief Abstract solution: abstract union of the revised belief (colluded) and the prebelief (not colluded) P 1 [ P 2 if ± i 2 °(P i ) for i = 1,2 then ± 1, ± 2 2 °(P 1 [ P 2 ) 65

Setting, Known Secret Measures of badness that weigh their values by the distribution of the secret (or the distribution of the outputs of a query) can be problematic. An unlikely user could theorize a query is safe, but know it is not safe for them Consider conditional min-entropy (see Information-theoretic Bounds for Differentially Private Mechanisms) H 1 (X | Y) = -log 2  y P Y (y) max x P X|Y (x,y) > - log 2 t probability of certain query output Our policy: -log 2 max y max x P X|Y (x,y) > - log 2 t so that -log 2 max x P X|Y (x,yreal) > - log 2 t 66

Abstract Conditioning 67 1956 0364 259 267 1961 1971 1981 1992 1956 0364 259 267 1961 1971 1981 1992

Abstract Conditioning 1956 0364 259 267 1961 1971 1981 1992 1991 approximate P1P1 P2P2 68 1956 0364 259 267 1961 1971 1981 1992 1956 0364 259 267 1961 1971 1981 1992 1956 0364 259 267 1961 1971 1981 1992

Initial distribution bday byear 1956 1992 0364 69 P1P1 1/(37*365) - ² · probability · 1/(37*365) + ² 37*365 · # of points · 37*365 Need attacker’s actual belief to be represented by this P 1 Much easier task than knowing the exact querier belief.

DYNAMIC ENFORCEMENT OF KNOWLEDGE-BASED SECURITY POLICIES Michael Hicks University of Maryland, College Park Joint work with Piotr Mardziel, Stephen Magill,

Similar presentations

Presentation on theme: "DYNAMIC ENFORCEMENT OF KNOWLEDGE-BASED SECURITY POLICIES Michael Hicks University of Maryland, College Park Joint work with Piotr Mardziel, Stephen Magill,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

DYNAMIC ENFORCEMENT OF KNOWLEDGE-BASED SECURITY POLICIES Michael Hicks University of Maryland, College Park Joint work with Piotr Mardziel, Stephen Magill,

Similar presentations

Presentation on theme: "DYNAMIC ENFORCEMENT OF KNOWLEDGE-BASED SECURITY POLICIES Michael Hicks University of Maryland, College Park Joint work with Piotr Mardziel, Stephen Magill,"— Presentation transcript:

Similar presentations

About project

Feedback