PROBABILISTIC PROGRAMMING FOR SECURITY Michael Hicks Piotr (Peter) Mardziel University of Maryland, College Park Stephen Magill Galois Michael Hicks UMD.

Slides:



Advertisements
Similar presentations
Scaling Up Graphical Model Inference
Advertisements

CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.
Abstract Interpretation Part II
Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe.
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
Estimating the distribution of the incubation period of HIV/AIDS Marloes H. Maathuis Joint work with: Piet Groeneboom and Jon A. Wellner.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
Computer vision: models, learning and inference Chapter 18 Models for style and identity.
Closest Point Transform: The Characteristics/Scan Conversion Algorithm Sean Mauch Caltech April, 2003.
PROBABILISTIC COMPUTATION FOR INFORMATION SECURITY Piotr (Peter) Mardziel (UMD) Kasturi Raghavan (UCLA)
DYNAMIC ENFORCEMENT OF KNOWLEDGE-BASED SECURITY POLICIES Piotr (Peter) Mardziel, Stephen Magill, Michael Hicks, and Mudhakar Srivatsa.
Graduate School of Information Sciences, Tohoku University
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA A.
Inferring Synchronization under Limited Observability Martin Vechev Eran Yahav Greta Yorsh IBM T.J. Watson Research Center.
1 Basic abstract interpretation theory. 2 The general idea §a semantics l any definition style, from a denotational definition to a detailed interpreter.
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Math443/543 Mathematical Modeling and Optimization
Vapnik-Chervonenkis Dimension
Analysis of Algorithms1 Estimate the running time Estimate the memory space required. Time and space depend on the input size.
Program analysis Mooly Sagiv html://
1 Systematic Domain Design Some Remarks. 2 Best (Conservative) interpretation abstract representation Set of states concretization Abstract semantics.
1 Program Analysis Systematic Domain Design Mooly Sagiv Tel Aviv University Textbook: Principles.
Prof. Aiken CS 294 Lecture 21 Abstract Interpretation Part 2.
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
1 Tentative Schedule u Today: Theory of abstract interpretation u May 5 Procedures u May 15, Orna Grumberg u May 12 Yom Hatzamaut u May.
Abstraction Interpretation Abstract Interpretation is a general theory for approximating the semantics of dynamic systems (Cousot & Cousot 1977) Abstract.
DYNAMIC ENFORCEMENT OF KNOWLEDGE-BASED SECURITY POLICIES Michael Hicks University of Maryland, College Park Joint work with Piotr Mardziel, Stephen Magill,
Role of Statistics in Geography
Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical.
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 14: Numerical Abstractions Roman Manevich Ben-Gurion University.
Secure sharing in distributed information management applications: problems and directions Piotr Mardziel, Adam Bender, Michael Hicks, Dave Levin, Mudhakar.
Program Analysis and Verification Spring 2014 Program Analysis and Verification Lecture 14: Numerical Abstractions Roman Manevich Ben-Gurion University.
Annual Conference of ITA ACITA 2010 Secure Sharing in Distributed Information Management Applications: Problems and Directions Piotr Mardziel, Adam Bender,
Geo597 Geostatistics Ch9 Random Function Models.
DAQ: A New Paradigm for Approximate Query Processing Navneet Potti Jignesh Patel VLDB 2015.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
International Technology Alliance in Network & Information Sciences Knowledge Inference for Securing and Optimizing Secure Computation Piotr (Peter) Mardziel,
Inferring Synchronization under Limited Observability Martin Vechev, Eran Yahav, Greta Yorsh IBM T.J. Watson Research Center (work in progress)
Sample Variability Consider the small population of integers {0, 2, 4, 6, 8} It is clear that the mean, μ = 4. Suppose we did not know the population mean.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 AdaBoost.. Binary Classification. Read 9.5 Duda,
Confidence Interval & Unbiased Estimator Review and Foreword.
Space-Efficient Online Computation of Quantile Summaries SIGMOD 01 Michael Greenwald & Sanjeev Khanna Presented by ellery.
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
Constructive Solid Geometry Ray Tracing CSG Models
Regression Analysis1. 2 INTRODUCTION TO EMPIRICAL MODELS LEAST SQUARES ESTIMATION OF THE PARAMETERS PROPERTIES OF THE LEAST SQUARES ESTIMATORS AND ESTIMATION.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 
Belief in Information Flow Michael Clarkson, Andrew Myers, Fred B. Schneider Cornell University 18 th IEEE Computer Security Foundations Workshop June.
1 Numeric Abstract Domains Mooly Sagiv Tel Aviv University Adapted from Antoine Mine.
KNOWLEDGE-ORIENTED MULTIPARTY COMPUTATION Piotr (Peter) Mardziel, Michael Hicks, Jonathan Katz, Mudhakar Srivatsa (IBM TJ Watson)
Daphne Koller Overview Maximum a posteriori (MAP) Probabilistic Graphical Models Inference.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,
Computing with R & Bayesian Statistical Inference P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/11/2016: Lecture 02-1.
MLPR - Questions. Can you go through integration, differentiation etc. Why do we need priors? Difference between prior and posterior. What does Bayesian.
Spring 2016 Program Analysis and Verification
Probabilistic Data Management
Combining Abstract Interpreters
CIS 700 Advanced Machine Learning for NLP Inference Applications
Iterative Program Analysis Abstract Interpretation
Markov Networks.
Hidden Markov Model LR Rabiner
Bayesian Statistics and Belief Networks
Expectation-Maximization & Belief Propagation
@ NeurIPS 2016 Tim Dunn, May
Presentation transcript:

PROBABILISTIC PROGRAMMING FOR SECURITY Michael Hicks Piotr (Peter) Mardziel University of Maryland, College Park Stephen Magill Galois Michael Hicks UMD Mudhakar Srivatsa IBM TJ Watson Jonathan Katz UMD Mário Alvim UFMG Michael Clarkson Cornell Arman Khouzani Royal Holloway Carlos Cid Royal Holloway 

Part 1 Machine learning ≈ Adversary learning Part 2 Probabilistic Abstract Interpretation Part 3 ~1 minute summary of our other work 2

Part 1 Machine learning ≈ Adversary learning Part 2 Probabilistic Abstract Interpretation Part 3 ~1 minute summary of our other work 3

“Machine Learning” 4 Today = not-raining weather 0.55 : Outlook = sunny 0.45 : Outlook = overcast “Forward” Model

“Machine Learning” : Today = not-raining 0.5 : Today = raining weather “Forward” Model Prior

“Machine Learning” : Today = not-raining 0.5 : Today = raining weather 0.82 : Today = not-raining 0.18 : Today = raining Outlook = sunny inference Posterior “Forward” Model “Backward” Inference Prior Observation

“Machine Learning” : Today = not-raining 0.5 : Today = raining weather Samples: Today = not-raining Today = raining … Outlook = sunny inference* Posterior Samples “Forward” Model “Backward” Inference Prior Observation

“Machine Learning” : Today = not-raining 0.5 : Today = raining weather 0.82 : Today = not-raining 0.18 : Today = raining Outlook = sunny inference* Posterior “Forward” Model “Backward” Inference Prior Observation

“Machine Learning” : Today = not-raining 0.5 : Today = raining weather 0.82 : Today = not-raining 0.18 : Today = raining Outlook = sunny inference* Posterior “Forward” Model “Backward” Inference Prior Observation Classification Today=not-raining

“Machine Learning” : Today = not-raining 0.5 : Today = raining weather 0.82 : Today = not-raining 0.18 : Today = raining Outlook = sunny inference* Posterior “Forward” Model “Backward” Inference Prior Observation Classification Today=not-raining Reality Accuracy/Error

Adversary learning : Pass = “password” : Pass = “12345” : Pass = … Auth(“password”) : Pass = “12345” Login=failed inference Posterior “Forward” Model “Backward” Inference Prior Observation $$ Exploitation Pass=“12345” Reality Vulnerability

Different but Same 12 PPL for machine learningPPL for security Model/program of prior Model/program of observation Inference + can be approximate + can be a sampler Inference - cannot be approximate + can be sound - cannot be a sampler ClassificationExploitation Accuracy/Error + compare inference algorithms Vulnerability measures + compare observation functions (with/without obfuscation, …) Deploy classifierDeploy protection mechanism

Different but Same 13 PPL for machine learningPPL for security Model/program of prior Model/program of observation Inference + can be approximate + can be a sampler Inference - cannot be approximate + can be sound - cannot be a sampler ClassificationExploitation Accuracy/Error + compare inference algorithms Vulnerability measures + compare observation functions (with/without obfuscation, …) Deploy classifierDeploy protection mechanism

Distributions δ : S  [0,1] 14 all distributions over S Inference visualized δ δ' δ’’ δ’’’ prior inference Accuracy

Distributions δ : S  [0,1] 15 all distributions over S Inference visualized δ δ' δ’’ δ’’’ prior inference Vulnerability

16 Vulnerability scale δ δ' δ’’δ’’’ prior inference Vulnerability

17 Information flow δ δ' δ’’δ’’’ prior inference Vulnerability information “flow”

18 Issue: Approximate inference δ δ' δ’’δ’’’ prior inference Approximate inference Vulnerability exact inference

19 Sound inference δ δ' δ’’δ’’’ prior inference Approximate, but sound inference Vulnerability exact inference

20 Issue: Complexity δ prior inference Vulnerability δ' δ’’ δ’’’

21 Issue: Prior δ prior Vulnerability

22 Worst-case prior δ wc worst-case prior Vulnerability δ δ' actual prior inference information “flow” δ’ wc w.c. information “flow”

23 Issue: Prior δ prior Vulnerability

24 Differential Privacy δ prior Vulnerability

25 Issue: Prior δ prior Vulnerability

Part 1 Machine learning ≈ Adversary learning Part 2 Probabilistic Abstract Interpretation Part 3 ~1 minute summary of our other work 26

27 all distributions over S Probabilistic Abstract Interpretation δ δ' δ’’ δ’’’ prior inference Vulnerability Abstract prior abstract inference

Part 2: Probabilistic Abstract Interpretation Standard PL lingo Concrete Semantics Abstract Semantics Concrete Probabilistic Semantics Abstract Probabilistic Semantics 28

(Program) States σ : Variables  Integers Concrete semantics: [[ Stmt ]] : States  States 29 All states over {x,y} Concrete Interpretation {x  1,y  1} {x  1,y  2} [[ y := x + y ]] [[ if y >= 2 then x := x + 1 ]] {x  2,y  2} x y

Abstract Program States AbsStates Concretization: γ(P) := { σ s.t. P(σ) } Abstract Semantics: > : AbsStates  AbsStates Example: intervals Predicate P is a closed interval on each variable γ(1≤x≤2, 1≤y≤1) = all states that assign x between 1 and 2, and y = 1 30 All states over {x,y} Abstract Interpretation (1≤x≤2,1≤y≤1) (1≤x≤2,3≤y≤4)(1≤x≤3,3≤y≤4) > = 4 then x := x + 1 >> x y

Abstract Program States AbsStates Concretization: γ(P) := { σ s.t. P(σ) } Abstract Semantics: > : AbsStates  AbsStates Example: intervals Predicate P is a closed interval on each variable γ(1≤x≤2, 1≤y≤1) = all states that assign x between 1 and 2, and y = 1 31 All states over {x,y} Abstract Interpretation (1≤x≤2,1≤y≤1) (1≤x≤2,3≤y≤4)(1≤x≤3,3≤y≤4) > = 4 then x := x + 1 >> x y σ σ' [[ y := x + 2*y ]]

Probabilistic Interpretation Concrete Abstraction Abstract semantics 32

Concrete Probabilistic Semantics (sub)distributions δ : States  [0,1] Semantics skipδ = δ S 1 ; S 2 δ = S 2 (S 1 δ) if B then S 1 else S 2 δ = S 1 (δ ∧ B) + S 2 (δ ∧ ¬B) pif p then S 1 else S 2 δ = S 1 (p*δ) + S 2 ((1-p)*δ) x := Eδ = δ[x E] while B do S = lfp (λF. λδ. F(S(δ | B)) + (δ | ¬B)) p*δ – scale probabilities by p p*δ := λσ. p*δ(σ) δ ∧ B – remove mass inconsistent with B δ ∧ B := λσ. if Bσ = true then δ(σ) else 0 δ 1 + δ 2 – combine mass from both δ 1 + δ 2 := λσ. δ 1 (σ) + δ 2 (σ) δ[x E] – transform mass

+ y := y – 3(δ ∧ x > 5) Subdistribution operations δ ∧ B – remove mass inconsistent with B δ ∧ B = λσ. if Bσ = true then δ(σ) else 0 δB = x ≥ y δ ∧ B δ 1 + δ 2 – combine mass from both δ 1 + δ 2 = λσ. δ 1 (σ) + δ 2 (σ) δ1δ1 δ2δ2 δ 1 + δ 2 if x ≤ 5 then y := y + 3 else y := y - 3δ δ δ ∧ x ≤ 5 δ ∧ x > 5 y := y + 3(δ ∧ x ≤ 5) y := y – 3(δ ∧ x > 5) SδSδ = y := y + 3(δ ∧ x ≤ 5)

Subdistribution Abstraction 35

Subdistribution Abstraction: Probabilistic Polyhedra P Region of program states (polyhedron) + upper bound on probability of each possible state in region + upper bound on the number of (possible) states + upper bound on the total probability mass (useful) + also lower bounds on the above Pr[A | B] = Pr[A ∩ B] / Pr[B] 36 V(δ) = max σ δ(σ)

Abstraction imprecision abstract P1P1 P2P2 37 exact

38 all distributions over S Probabilistic Abstract Interpretation δ δ' δ’’ δ’’’ prior inference Abstract prior P abstract inference Define > P Soundness: if δ ∈ γ(P) then Sδ ∈ γ ( >P) Abstract versions of subdistribution operations P 1 + P 2 P ∧ B p*P

Example abstract operation 39 δ 1 (σ) σ(x) δ1δ1 p 1 max p 1 min δ 2 (σ) σ(x) δ2δ2 p 2 max p 2 min + δ 3 (σ) σ(x) δ 3 := δ 1 + δ 2 {P 3, P 4, P 5 } = {P 1 } + {P 2 }

Conditioning Concrete Abstract: Lower bound on total mass

Simplify representation Limit number of probabilistic polyhedra P 1 ± P 2 - merge two probabilistic polyhedra into one Convex hull of regions, various counting arguments

Add and simplify 42 δ 1 (σ) σ(x) δ1δ1 p 1 max p 1 min δ 2 (σ) σ(x) δ2δ2 p 2 max p 2 min ± δ 3 (σ) σ(x) δ 3 := δ 1 + δ 2 {P 3 } = {P 1 } ± {P 2 }

Primitives for operations Need to Linear Model Counting: count number of integer points in a convex polyhedra Integer Linear Programming: maximize a linear function over integer points in a polyhedron

44 all distributions over S Probabilistic Abstract Interpretation δ δ' δ’’ δ’’’ prior inference Vulnerability Abstract prior abstract inference P P’ P’’ P’’’ Conservative (sound) vulnerability bounds

Part 3 45 [CSF11,JCS13] Limit vulnerability and computational aspects of probabilistic semantics [PLAS12] Limit vulnerability for symmetric cases [S&P14,FCS14] Measure vulnerability when secrets change over time [CSF15] onwards Active defense  game theory See

Abstract Conditioning

Abstract Conditioning approximate P1P1 P2P