Download presentation
Presentation is loading. Please wait.
1
Bayesian Networks I: Static Models & Multinomial Distributions By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls Open Textbook version 1.0 Creative commons
2
Existing plant measurements Physics, chemistry, and chemical engineering knowledge & intuition Bayesian network models to establish connections Patterns of likely causes & influences Efficient experimental design to test combinations of causes ANOVA & probabilistic models to eliminate irrelevant or uninteresting relationships Process optimization (e.g. controllers, architecture, unit optimization, sequencing, and utilization) Dynamical process modeling
3
More scenarios where Bayesian Networks can help Inferential sensing: how do you sense the state of something you don’t see? Sensor redundancy: if multiple sensors disagree, what can you say about the state of the system? Nosy systems: if your system is highly variable, how can you model it?
4
Stages of knowing a model: 1.Topology and parameters are known. 2.Topology is known and we have data to learn parameters 3.Only data are known, must learn topology and parameters 4.Only partial data are known, must learn topology and parameters 5.Model is unknown and nonstationary More realistic e.g. Solve a given ODE e.g. Fit parameters to an ODE using optimization ?? More research.. Bayesian Networks
5
Probability Tables AP(B=on|A)P(B=off|A) high 11 =0.3 12 =0.7 medium 21 =0.99 22 =0.01 low 31 =0.46 32 =0.54 P(A=high)P(A=medium)P(A=low) 01 =0.21 02 =0.45 03 =0.34 Note: Rows sum to 1, but columns don’t A B
6
P(C-)P(C+) 0.5 Graphical form of Bayes’ Rule Conditional independence Decomposition of joint probability P(C+, S-, R+, W+) = P(C+)P(S-|C+)P(R+|C+)P(W+|S-,R+) Causal networks Inference on a network vs inference of a network Bayesian Networks CP(R-)P(R+) -0.80.2 + 0.8 CP(S-)P(S+) -0.5 +0.90.1 S RP(W-)P(W+) - 1.00.0 + -0.10.9 - +0.10.9 + 0.010.99
7
Inference on a network A B AP(B=on|A)P(B=off|A) high 11 =0.3 12 =0.7 medium 21 =0.99 22 =0.01 low 31 =0.46 32 =0.54 P(A=high)P(A=medium)P(A=low) 01 =0.21 02 =0.45 03 =0.34 Exact vs. Approximate calculation: In some cases you can exactly calculate probabilities on a BN given some data. This can be done directly or using quite complex algorithms for faster execution time. For large networks, exact is impractical.
8
Inference on a network A B AP(B=on|A)P(B=off|A) high 11 =0.3 12 =0.7 medium 21 =0.99 22 =0.01 low 31 =0.46 32 =0.54 P(A=high)P(A=medium)P(A=low) 01 =0.21 02 =0.45 03 =0.34 Given a value of A, say A=high, what is B? P(B=on)=0.3 P(B=off)=0.7 The answer is a probability!
9
Inference on a network A B AP(B=on|A)P(B=off|A) high 11 =0.3 12 =0.7 medium 21 =0.99 22 =0.01 low 31 =0.46 32 =0.54 P(A=high)P(A=medium)P(A=low) 01 =0.21 02 =0.45 03 =0.34 Given a value of B, say B=on, what is A? This is what Genie is doing on the wiki examples
10
Inference on a network A B AP(B=on|A)P(B=off|A) high 11 =0.3 12 =0.7 medium 21 =0.99 22 =0.01 low 31 =0.46 32 =0.54 P(A=high)P(A=medium)P(A=low) 01 =0.21 02 =0.45 03 =0.34 Approximate inference via Markov Chain Monte Carlo Sampling Given partial data, use your conditional probabilities to sample a value around the observed values and head nodes Repeat sampling out until you fill the network. Start over and gather averages.
11
Inference on a network Approximate inference via Markov Chain Monte Carlo Sampling Given partial data, use your conditional probabilities to sample a value around the observed values and head nodes Repeat sampling out until you fill the network. Start over and gather averages. * e1 *=observed data e1, e2=sample estimates in round 1 and 2
12
Inference on a network Approximate inference via Markov Chain Monte Carlo Sampling Given partial data, use your conditional probabilities to sample a value around the observed values and head nodes Repeat sampling out until you fill the network. Start over and gather averages. Method always works in the limit of infinite samples… * e1 e2 *=observed data e1, e2=sample estimates in round 1 and 2
13
This can be interpreted as a Bayesian network! Example scenario The network is the same as saying:
15
Note that these are equivalence classes and are a fundamental property of observed data. Causality can only be determined from observational data to some extent! The network A->B<-C is fundamentally different (prove it to yourself with Bayes rule), and can be distinguished with observational data. recall
16
FUNDAMENTAL PROPERTY! Equivalent models if we just observe A, B, and C. If we intervene and change A, B, or C we can distinguish between them. OR we can use our knowledge to choose the direction No arrangement of this last model will produce the upper 3 models.
17
Example scenario
18
Here we can use the multinomial distribution and the probabilities in the table above: (1) Given these data, what is the probability of observing a set of 9 temperature readings of which 4 are high, 2 are medium, and 3 are low? Note that these are independent readings and we don ’ t care about the ordering of the readings, just the probability of observing a set of 9 readings with this property. Compare to the binomial distribution we discussed previously (k=2)
19
For this problem we find: (1) Given these data, what is the probability of observing a set of 9 temperature readings of which 4 are high, 2 are medium, and 3 are low? Note that these are independent readings and we don ’ t care about the ordering of the readings, just the probability of observing a set of 9 readings with this property. Here we can use the multinomial distribution and the probabilities in the table above:
20
The next most likely temperature reading is medium, because this has the highest probability of 0.4. The previous sequence of temperature readings do not matter assuming these are independent readings, as is mentioned above. (2) After gathering these 9 temperature readings, what is the most likely next temperature reading you will see? Why?
21
(3) What is the probability of sampling a set of 9 observations with 7 of them catalyst A and 2 of them catalyst B? Here again, order does not matter. Here we can use the two state case of the multinomial distribution, (the binomial distribution):
22
(4) What is the probability of observing the following yield values? Note here we have the temperature and catalyst values, so we can use the conditional probability values. As before, order of observations does not matter, but the association between temperature and catalyst to yield does matter. For this part, just write down the expression you would use — you don ’ t need to do the full calculation. Number of times observed TemperatureCatalystYield 4xHAH 2xMBL 3xLAH
23
The number of orderings of identical items is the factorial term in the multinomial: Thus the total probability is 0.00071048 Calculation method 1: First we will calculate the probability of this set for a particular ordering: Number of times observed TemperatureCatalystYield 4xHAH 2xMBL 3xLAH
24
Calculation method 2: The combination term is the same, 1260. Note that this matches the result in calculation method 1 exactly. We can repeat this for the second case to find p(0H,0M,2L|T=med, Cat=B)=0.03 2 which is again the same as above. Taking the product of the combinations and probabilities we find the same total probability of 0.00071048. The probabilities can be interpreted here as another multinomial term. For example, for the first observation, we could say what is the probability of observing a 4 high, 0 med, and 0 low yields for a system with a high temperature and catalyst A? Using the multinomial distribution we would find: Number of times observed TemperatureCatalystYield 4xHAH 2xMBL 3xLAH
25
This term is the probability of the data given a model and parameters: P(data|model, parameters) The absolute value of this probability is not very informative by itself, but it could be if it were compared to something else. Note that the joint probability model here is p(temperature, catalyst, yield)= p(temperature)*p(catalyst)*p(yield | temperature, catalyst)= 0.047*00.0212*0.00071=7.07e-7 (Note: p(temp) and p(cat) were calculated earlier in the lecture)
26
What is the conditional probability model? P(temperature, cat, yield)=p(temp)p(cat)p(yield | temp) (call this model 2) As an example, lets say that you try another model where yield only depends on temperature. This model is shown graphically below:
27
P(temperature, cat, yield)=p(temp)p(cat)p(yield|temp) (call this model 2) How do we change this table to get p(yield|temp)?
28
Now what?
29
So which model is better?
30
A Bayes factor (BF) is like a p-value in probability or Bayesian terms. BF near 1=? BF far from 1=? Both models are nearly equal Models are different
31
Limitations: Analysis based on only 9 data points. This is useful for identifying unusual behavior. For example, in this case, we might conclude that catalyst A and B still have distinct properties, even though, say, they have been recycled many times. We don’t always have parameters like the truth table to start with.
33
L Constraints: There are a total of 100 samples drawn, thus 100=H+M+L For the maximum likelihood case, H=51, so the relationship between M and L is 100=51+M+L → M=49-L At some lower value of H we get the expression M=(100-H)-L Integrate by summing! 51H, 8M, and 41L M
34
Take Home Messages Using a Bayesian network you can describe complex relationships between variables Multinomial distributions allow you to handle variables with more than 2 states Using the rules of probability (Baye’s rule, marginalization, and independence), you can infer states on a Bayesian network
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.