Aspects of Bayesian Inference and Statistical Disclosure Control in Python Duncan Smith Confidentiality and Privacy Group CCSR University of Manchester.

Slides:

Advertisements

Similar presentations

CS188: Computational Models of Human Behavior

Advertisements

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.

Lauritzen-Spiegelhalter Algorithm

Exact Inference in Bayes Nets

Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.

Dynamic Bayesian Networks (DBNs)

Chapter 4 Probability and Probability Distributions

An Introduction to Variational Methods for Graphical Models.

Introduction of Probabilistic Reasoning and Bayesian Networks

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.

Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.

Hidden Markov Models M. Vijay Venkatesh. Outline Introduction Graphical Model Parameterization Inference Summary.

From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.

Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

Chapter 4 Probability.

Learning with Bayesian Networks David Heckerman Presented by Colin Rickert.

Belief Propagation, Junction Trees, and Factor Graphs

Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.

Review of Matrix Algebra

. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.

1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.

Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.

Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.

Visibility Graph. Voronoi Diagram Control is easy: stay equidistant away from closest obstacles.

Chapter 4 Probability ©. Sample Space sample space.S The possible outcomes of a random experiment are called the basic outcomes, and the set of all basic.

Physics Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 12th Bayesian network and belief propagation in statistical inference Kazuyuki Tanaka.

Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3

Slides for “Data Mining” by I. H. Witten and E. Frank.

An Introduction to Variational Methods for Graphical Models

Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:

Lecture 2: Statistical learning primer for biologists

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

Christopher M. Bishop, Pattern Recognition and Machine Learning 1.

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

Computer vision: models, learning and inference Chapter 2 Introduction to probability.

Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.

Today Graphical Models Representing conditional dependence graphically

Belief propagation with junction trees Presented by Mark Silberstein and Yaniv Hamo.

1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.

Bayesian Belief Propagation for Image Understanding David Rosenberg.

Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.

Probabilistic Reasoning Inference and Relational Bayesian Networks.

Graduate School of Information Sciences, Tohoku University

Qian Liu CSE spring University of Pennsylvania

Inference in Bayesian Networks

Chapter 4 Probability.

Read R&N Ch Next lecture: Read R&N

Learning Bayesian Network Models from Data

Read R&N Ch Next lecture: Read R&N

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Propagation Algorithm in Bayesian Networks

CS 188: Artificial Intelligence Fall 2007

Markov Random Fields Presented by: Vladan Radosavljevic.

Graduate School of Information Sciences, Tohoku University

Graduate School of Information Sciences, Tohoku University

Clique Tree Algorithm: Computation

Graduate School of Information Sciences, Tohoku University

Read R&N Ch Next lecture: Read R&N

Graduate School of Information Sciences, Tohoku University

Chapter 14 February 26, 2004.

Presentation transcript:

Aspects of Bayesian Inference and Statistical Disclosure Control in Python Duncan Smith Confidentiality and Privacy Group CCSR University of Manchester

Introduction Bayesian Belief Networks (BBNs) Bayesian Belief Networks (BBNs) probabilistic inference Statistical Disclosure Control (SDC) Statistical Disclosure Control (SDC) deterministic inference (attribution)

Bayesian Belief Networks Decision-making in complex domains Decision-making in complex domains Hard and soft evidence Hard and soft evidence Correlated variables Correlated variables Many variables Many variables

Bayes’ Rule A prior belief and evidence combined to give a posterior belief

Venn Diagram Both A & B A only B Neither A nor B Event A Event B

a  a a a a b3/72/3  b b b b4/71/3 11a  a a a a Prior probability table P(A) 2. Conditional probability table P(B|A) Inference

a  a a a a b  b b b b a  a a a ab a  a a a ab Produce joint probability table by multiplication 4. Condition on evidence 5. Normalise table probabilities to sum to 1

def Bayes(prior, conditional, obs_level): """Simple Bayes for two categorical variables. 'prior' is a Python list. 'conditional' is a list of lists (‘column’ variable conditional on ‘row’ variable). 'obs_level' is the index of the observed level of the row variable""" levels = len(prior) # condition on observed level result = conditional[obs_level] # multiply values by prior probabilities result = [result[i] * prior[i] for i in range(levels)] # get marginal probability of observed level marg_prob = sum(result) # normalise the current values to sum to 1 posterior = [value / marg_prob for value in result] return posterior Note: conditioning can be carried out before calculating the joint probabilities, reducing the cost of inference

>>> A = [0.7, 0.3] >>> B_given_A = [[3.0/7, 2.0/3], [4.0/7, 1.0/3]] >>> Bayes(A, B_given_A, 0) [ , ] >>> The posterior distribution can be used as a new prior and combined with evidence from further observed variables The posterior distribution can be used as a new prior and combined with evidence from further observed variables Although computationally efficient, this ‘naïve’ approach implies assumptions that can lead to problems Although computationally efficient, this ‘naïve’ approach implies assumptions that can lead to problems

Naive Bayes

A ‘correct’ factorisation

Conditional independence The Naive Bayes example assumes: The Naive Bayes example assumes: But if valid, the calculation is easier and fewer probabilities need to be specified But if valid, the calculation is easier and fewer probabilities need to be specified

The conditional independence implies that if A is observed, then evidence on B is irrelevant in calculating the posterior of C

A Bayesian Belief Network R and S are independent until H is observed R and S are independent until H is observed

A Markov Graph The conditional independence structure is found by marrying parents with common children The conditional independence structure is found by marrying parents with common children

Factoring The following factorisation is implied The following factorisation is implied So P(S) can be calculated as follows (although there is little point, yet) So P(S) can be calculated as follows (although there is little point, yet)

If H and W are observed to be in states h and w, then the posterior of S can be expressed as follows (where epsilon denotes ‘the evidence’) If H and W are observed to be in states h and w, then the posterior of S can be expressed as follows (where epsilon denotes ‘the evidence’)

Graph Triangulation

Belief Propagation Message passing in a Clique Tree Message passing in a Clique Tree

Message passing in a Directed Junction Tree Message passing in a Directed Junction Tree

A Typical BBN

Belief Network Summary Inference requires a decomposable graph Inference requires a decomposable graph Efficient inference requires a good decomposition Efficient inference requires a good decomposition Inference involves evidence instantiation, table combination and variable marginalisation Inference involves evidence instantiation, table combination and variable marginalisation

Statistical Disclosure Control Releases of small area population (census) data Releases of small area population (census) data Attribution occurs when a data intruder can make inferences (with probability 1) about a member of the population Attribution occurs when a data intruder can make inferences (with probability 1) about a member of the population

Negative Attribution - An individual who is an accountant does not work for Department C Negative Attribution - An individual who is an accountant does not work for Department C Positive Attribution - An individual who works in Department C is a lawyer Positive Attribution - An individual who works in Department C is a lawyer

Release of the full table is not safe from an attribute disclosure perspective (it contains a zero) Release of the full table is not safe from an attribute disclosure perspective (it contains a zero) Each of the two marginal tables is safe (neither contains a zero) Each of the two marginal tables is safe (neither contains a zero) Is the release of the two marginal tables ‘jointly’ safe? Is the release of the two marginal tables ‘jointly’ safe?

The Bounds Problem Given a set of released tables (relating to the same population), what inferences about the counts in the ‘full’ table can be made? Given a set of released tables (relating to the same population), what inferences about the counts in the ‘full’ table can be made? Can a data intruder derive an upper bound of zero for any cell count? Can a data intruder derive an upper bound of zero for any cell count?

A non-graphical case All 2 × 2 marginals of a 2×2×2 table All 2 × 2 marginals of a 2×2×2 table A maximal complete subgraph (clique) without an individual corresponding table A maximal complete subgraph (clique) without an individual corresponding table

Var1 Var2AB C39 D22 Var1Var3AB E110 F41

Var2 Var3CD E83 F41 Var1 and Var2 Var3 A, C A, D B, C B, D E0182 F3110 Original cell counts can be recovered from the marginal tables Original cell counts can be recovered from the marginal tables

Each cell’s upper bound is the minimum of it’s relevant margins (Dobra and Fienberg) Each cell’s upper bound is the minimum of it’s relevant margins (Dobra and Fienberg)

SDC Summary A set of released tables relating to a given population A set of released tables relating to a given population If the resulting graph is both graphical and decomposable, then the upper bounds can be derived efficiently If the resulting graph is both graphical and decomposable, then the upper bounds can be derived efficiently

Common aspects Graphical representations Graphical representations Graphs / cliques / nodes / trees Combination of tables Combination of tables Pointwise operations

BBNs BBNs pointwise multiplication SDC SDC pointwise minimum andpointwise addition pointwise subtraction For calculating exact lower bounds }

Coercing Numeric built-ins A table is a numeric array with an associated list of variables A table is a numeric array with an associated list of variables Marginalisation is trivial, using the built-in Numeric.add.reduce() function and removing the relevant variable from the list Marginalisation is trivial, using the built-in Numeric.add.reduce() function and removing the relevant variable from the list

Conditioning is easily achieved using a Numeric.take() slice, appropriately reshaping the array with Numeric.reshape() and removing the variable from the list Conditioning is easily achieved using a Numeric.take() slice, appropriately reshaping the array with Numeric.reshape() and removing the variable from the list

Pointwise multiplication Pointwise multiplication Numeric.multiply() generates the appropriate table IF the two tables have identical ranks and variable lists Numeric.multiply() generates the appropriate table IF the two tables have identical ranks and variable lists This is ensured by adding new axes (Numeric.NewAxis) for the ‘missing’ axes and transposing one of the tables (Numeric.transpose()) so that the variable lists match This is ensured by adding new axes (Numeric.NewAxis) for the ‘missing’ axes and transposing one of the tables (Numeric.transpose()) so that the variable lists match

array([24, 5]) ['profession'] (2,) array([24, 5]) ['profession'] (2,) array([20, 7, 2]) ['department'] (3,) array([20, 7, 2]) ['department'] (3,) array([[24], array([[24], [ 5]]) (2, 1) [ 5]]) (2, 1) ['profession', 'department'] array([[20, 7, 2]]) (1, 3) array([[20, 7, 2]]) (1, 3) ['profession', 'department'] ['profession', 'department']

>>> prof * dept array ([[480, 168, 48], [100, 35, 10]]) [100, 35, 10]]) ['profession', 'department'] >>> (prof * dept).normalise(29) array([[ , 5.793, 1.655], [ 3.448, 1.206, 0.344]]) [ 3.448, 1.206, 0.344]]) ['profession', 'department']

Pointwise minimum / addition / subtraction Pointwise minimum / addition / subtraction Numeric.minimum(), Numeric.add() and Numeric.subtract() generate the appropriate tables IF the two tables have identical ranks and variable lists AND the two tables also have identical shape Numeric.minimum(), Numeric.add() and Numeric.subtract() generate the appropriate tables IF the two tables have identical ranks and variable lists AND the two tables also have identical shape This is ensured by a secondary preprocessing stage where the tables from the first preprocessing stage are multiplied by a ‘correctly’ shaped table of ones (this is actually quicker than using Numeric.concatenate()) This is ensured by a secondary preprocessing stage where the tables from the first preprocessing stage are multiplied by a ‘correctly’ shaped table of ones (this is actually quicker than using Numeric.concatenate())

array([[24], array([[24], [ 5]]) (2, 1) [ 5]]) (2, 1) ['profession', 'department'] ['profession', 'department'] array([ [20, 7, 2]]) (1, 3) array([ [20, 7, 2]]) (1, 3) ['profession', 'department'] ['profession', 'department'] array([[20, 7, 2] array([[20, 7, 2] [20, 7, 2]]) (2,3) [20, 7, 2]]) (2,3) (2 nd stage preprocessing)

>>> prof.minimum(dept) array([[20, 7, 2], [ 5, 5, 2]]) [ 5, 5, 2]]) ['profession', 'department']

Summary The Bayesian Belief Network software was originally implemented in Python for two reasons The Bayesian Belief Network software was originally implemented in Python for two reasons 1. The author was, at the time, a relatively inexperienced programmer 2. Self-learning (albeit with some help) was the only option

The SDC software was implemented in Python because, The SDC software was implemented in Python because, 1. Python + Numeric turned out to be a wholly appropriate solution for BBNs (Python is powerful, Numeric is fast) 2. Existing code could be reused