Learning Bayesian Network Models from Data

Slides:

Advertisements

Similar presentations

BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.

Advertisements

Bayesian Networks. Introduction A problem domain is modeled by a list of variables X 1, …, X n Knowledge about the problem domain is represented by a.

Bayesian Networks VISA Hyoungjune Yi. BN – Intro. Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert.

1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.

Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.

Bayesian network inference

Review: Bayesian learning and inference

Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11

Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.

Bayesian networks Chapter 14 Section 1 – 2.

Bayesian Belief Networks

Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.

Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?

. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.

Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)

Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.

Probabilistic Reasoning

Knowledge and Uncertainty CS 271: Fall 2007 Instructor: Padhraic Smyth.

Quiz 4: Mean: 7.0/8.0 (= 88%) Median: 7.5/8.0 (= 94%)

Bayesian Networks CS 271: Fall 2007 Instructor: Padhraic Smyth.

Read R&N Ch Next lecture: Read R&N

Bayesian networks Chapter 14. Outline Syntax Semantics.

Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.

Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.

Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.

Introduction to Bayesian Networks

Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.

Bayesian Networks Read R&N Ch Next lecture: Read R&N

Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.

Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11 CS479/679 Pattern Recognition Dr. George Bebis.

Integrative Genomics I BME 230. Probabilistic Networks Incorporate uncertainty explicitly Capture sparseness of wiring Incorporate multiple kinds of data.

Another look at Bayesian inference

Reasoning Under Uncertainty: Belief Networks

CS 2750: Machine Learning Directed Graphical Models

Bayesian Networks Chapter 14 Section 1, 2, 4.

Bayesian networks Chapter 14 Section 1 – 2.

Presented By S.Yamuna AP/CSE

Qian Liu CSE spring University of Pennsylvania

Read R&N Ch Next lecture: Read R&N

Approximate Inference

Introduction to Artificial Intelligence

Bayesian Networks Probability In AI.

Introduction to Artificial Intelligence

Read R&N Ch Next lecture: Read R&N

Bayesian Networks Read R&N Ch. 13.6,

Read R&N Ch Next lecture: Read R&N

Knowledge and Uncertainty

Uncertainty in AI.

Read R&N Ch Next lecture: Read R&N

Introduction to Artificial Intelligence

Structure and Semantics of BN

CS 271: Fall 2007 Instructor: Padhraic Smyth

Read R&N Ch Next lecture: Read R&N

Bayesian Statistics and Belief Networks

CS 188: Artificial Intelligence Fall 2007

Class #19 – Tuesday, November 3

Class #16 – Tuesday, October 26

Structure and Semantics of BN

Inference III: Approximate Inference

Hankz Hankui Zhuo Bayesian Networks Hankz Hankui Zhuo

Read R&N Ch Next lecture: Read R&N

Bayesian networks Chapter 14 Section 1 – 2.

Probabilistic Reasoning

Read R&N Ch Next lecture: Read R&N

Bayesian Networks: Structure and Semantics

Intro. to Data Mining Chapter 6. Bayesian.

Presentation transcript:

Learning Bayesian Network Models from Data Emad Alsuwat

Bayesian Networks A Bayesian network specifies a joint distribution in a structured form Represent dependence/independence via a directed graph Nodes = random variables Edges = direct dependence Structure of the graph  Conditional independence relations Requires that graph is acyclic (no directed cycles) 2 components to a Bayesian network The graph structure (conditional independence assumptions) The numerical probabilities (for each variable given its parents) In general, p(X1, X2,....XN) =  p(Xi | parents(Xi ) ) The full joint distribution The graph-structured approximation

Example of a simple Bayesian network C p(A,B,C) = p(C|A,B)p(A)p(B) Probability model has simple factored form Directed edges => direct dependence Absence of an edge => conditional independence Also known as belief networks, graphical models, causal networks Other formulations, e.g., undirected graphical models

Examples of 3-way Bayesian Networks C B Marginal Independence: p(A,B,C) = p(A) p(B) p(C)

Examples of 3-way Bayesian Networks Conditionally independent effects: p(A,B,C) = p(B|A)p(C|A)p(A) B and C are conditionally independent Given A e.g., A is a disease, and we model B and C as conditionally independent symptoms given A A C B

Examples of 3-way Bayesian Networks C Independent Causes: p(A,B,C) = p(C|A,B)p(A)p(B) “Explaining away” effect: Given C, observing A makes B less likely e.g., earthquake/burglary/alarm example A and B are (marginally) independent but become dependent once C is known

Examples of 3-way Bayesian Networks C B Markov dependence: p(A,B,C) = p(C|B) p(B|A)p(A)

Example Consider the following 5 binary variables: B = a burglary occurs at your house E = an earthquake occurs at your house A = the alarm goes off J = John calls to report the alarm M = Mary calls to report the alarm What is P(B | M, J) ? (for example) We can use the full joint distribution to answer this question Requires 25 = 32 probabilities Can we use prior domain knowledge to come up with a Bayesian network that requires fewer probabilities?

The Resulting Bayesian Network

Constructing this Bayesian Network P(J, M, A, E, B) = P(J | A) P(M | A) P(A | E, B) P(E) P(B) There are 3 conditional probability tables (CPDs) to be determined: P(J | A), P(M | A), P(A | E, B) Requiring 2 + 2 + 4 = 8 probabilities And 2 marginal probabilities P(E), P(B) -> 2 more probabilities Where do these probabilities come from? Expert knowledge From data (relative frequency estimates) Or a combination of both

The Bayesian network

Inference (Reasoning) in Bayesian Networks Consider answering a query in a Bayesian Network Q = set of query variables e = evidence (set of instantiated variable-value pairs) Inference = computation of conditional distribution P(Q | e) Examples P(burglary | alarm) P(earthquake | JCalls, MCalls) P(JCalls, MCalls | burglary, earthquake) Can we use the structure of the Bayesian Network to answer such queries efficiently? Answer = yes Generally speaking, complexity is inversely proportional to sparsity of graph

Learning Bayesian Networks from Data

Why learning? Knowledge acquisition bottleneck Knowledge acquisition is an expensive process Often we don’t have an expert Data is cheap Amount of available information growing rapidly Learning allows us to construct models from raw data

Learning Bayesian networks Data + Prior Information E R B A C Learner .9 .1 e b .7 .3 .99 .01 .8 .2 B E P(A | E,B)

Known Structure, Complete Data E, B, A <Y,N,N> <Y,N,Y> <N,N,Y> <N,Y,Y> . E B A Learner .9 .1 e b .7 .3 .99 .01 .8 .2 B E P(A | E,B) ? e b B E P(A | E,B) E B A Network structure is specified Inducer needs to estimate parameters Data does not contain missing values

Unknown Structure, Complete Data E, B, A <Y,N,N> <Y,N,Y> <N,N,Y> <N,Y,Y> . E B A Learner .9 .1 e b .7 .3 .99 .01 .8 .2 B E P(A | E,B) ? e b B E P(A | E,B) E B A Network structure is not specified Inducer needs to select arcs & estimate parameters Data does not contain missing values

Known Structure, Incomplete Data E, B, A <Y,N,N> <Y,?,Y> <N,N,Y> <N,Y,?> . <?,Y,Y> E B A Learner .9 .1 e b .7 .3 .99 .01 .8 .2 B E P(A | E,B) ? e b B E P(A | E,B) E B A Network structure is specified Data contains missing values Need to consider assignments to missing values

Unknown Structure, Incomplete Data E, B, A <Y,N,N> <Y,?,Y> <N,N,Y> <N,Y,?> . <?,Y,Y> E B A Learner .9 .1 e b .7 .3 .99 .01 .8 .2 B E P(A | E,B) ? e b B E P(A | E,B) E B A Network structure is not specified Data contains missing values Need to consider assignments to missing values

What Software do we use for BN structure learning?

Idea We use Known Structure, Incomplete Data First, we use Hugin to generate 10,000 cases Then we use this data to learning the original BN model

Example: Chest Clinic Model

Generate 10,000 cases

Now we use Learning Wizard in Hugin and try to learn the model

Using BN learning algorithms for detecting offense (data corrupting) and defense (data quality) information operations

Offensive Information Operation (Corrupting Data) In offensive information operation, the dataset is corrupted and then is used to learn the BN model by using the PC algorithm. What is the minimal data that needs to be modified to change the model? The difficulty on this process can range from easier (such as masking that smoking causes cancer) to hard (such as changing to a desired model).

Defensive Information Operation (Data Quality) In defensive information operation, the new data in dataset is introduced and then the PC algorithm uses this dataset to learn the BN model. How much incorrect data can be introduced in the database without changing the model? How can we use Bayesian networks to be able to detect unauthorized data manipulation or incorrect data entries? Can we use Bayesian networks to ensure data quality assessment for integrity aspect?

Questions?

References Bayesian Networks for Padhraic Smyth Learning Bayesian Networks from Data for Nir Friedman (ebrew U.) and Daphne Koller(Stanford)