Brad Legault Soft Computing CONDITIONAL DEPENDENCE & INDEPENDENCE.

Slides:

Advertisements

Similar presentations

Heuristic Search techniques

Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Counting Outcomes and Tree Diagrams

BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.

Copyright © Cengage Learning. All rights reserved.

CIRCUITS AND ELECTRICAL FLOW. The amazing image which follows shows a picture of earth taken at night from outer space. The street and house lights illuminate.

1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.

Sampling Distributions

Identifying Conditional Independencies in Bayes Nets Lecture 4.

For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.

INFINITE SEQUENCES AND SERIES

Introduction of Probabilistic Reasoning and Bayesian Networks

Algebra Problems… Solutions Algebra Problems… Solutions © 2007 Herbert I. Gross Set 22 By Herbert I. Gross and Richard A. Medeiros next.

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 5 The Mathematics of Getting Around 5.1Euler Circuit Problems 5.2What.

INFINITE SEQUENCES AND SERIES

Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.

Learning with Bayesian Networks David Heckerman Presented by Colin Rickert.

15 PARTIAL DERIVATIVES.

INTEGRALS 5. INTEGRALS We saw in Section 5.1 that a limit of the form arises when we compute an area.  We also saw that it arises when we try to find.

PROBABILITY MODELS. 1.1 Probability Models and Engineering Probability models are applied in all aspects of Engineering Traffic engineering, reliability,

1 Psych 5500/6500 The t Test for a Single Group Mean (Part 5): Outliers Fall, 2008.

Sampling Distributions

Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Probability Rules!

Dynamic Presentation of Key Concepts Module 2 – Part 3 Meters Filename: DPKC_Mod02_Part03.ppt.

Chapter 15: Probability Rules

Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)

Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.

5.5 Counting Techniques. More Challenging Stuff  The classical method, when all outcomes are equally likely, involves counting the number of ways something.

Key Stone Problem… Key Stone Problem… next Set 22 © 2007 Herbert I. Gross.

ME 2304: 3D Geometry & Vector Calculus Dr. Faraz Junejo Double Integrals.

The Normal Distribution The “Bell Curve” The “Normal Curve”

Results from kinetic theory, 1 1. Pressure is associated with collisions of gas particles with the walls. Dividing the total average force from all the.

Extending the Definition of Exponents © Math As A Second Language All Rights Reserved next #10 Taking the Fear out of Math 2 -8.

Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.

For Wednesday Read Chapter 11, sections 1-2 Program 2 due.

INTEGRALS 5. INTEGRALS In Chapter 2, we used the tangent and velocity problems to introduce the derivative—the central idea in differential calculus.

Integrals  In Chapter 2, we used the tangent and velocity problems to introduce the derivative—the central idea in differential calculus.  In much the.

1 Psych 5500/6500 Standard Deviations, Standard Scores, and Areas Under the Normal Curve Fall, 2008.

1 Monte Carlo Artificial Intelligence: Bayesian Networks.

Probability Rules!! Chapter 15.

Copyright © 2010 Pearson Education, Inc. Chapter 15 Probability Rules!

Counting Techniques and Some Other Math Team Strategies Susan Schwartz Wildstrom Walt Whitman High School Bethesda, MD NCTM National Meeting April 6, 2001.

Chapter 7 Sampling Distributions Statistics for Business (Env) 1.

In section 11.9, we were able to find power series representations for a certain restricted class of functions. Here, we investigate more general problems.

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Copyright © 2010 Pearson Education, Inc. Chapter 6 Probability.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS.

Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.

INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.

Slides for “Data Mining” by I. H. Witten and E. Frank.

Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Teaching to the Big Ideas K - 3. Getting to 20 You are on a number line. You can jump however you want as long as you always take the same size jump.

Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.

Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.

Adding and Subtracting Decimals © Math As A Second Language All Rights Reserved next #8 Taking the Fear out of Math 8.25 – 3.5.

Multiplication of Common Fractions © Math As A Second Language All Rights Reserved next #6 Taking the Fear out of Math 1 3 ×1 3 Applying.

In Chapters 6 and 8, we will see how to use the integral to solve problems concerning:  Volumes  Lengths of curves  Population predictions  Cardiac.

INTEGRALS We saw in Section 5.1 that a limit of the form arises when we compute an area. We also saw that it arises when we try to find the distance traveled.

Copyright © Cengage Learning. All rights reserved. 2 Probability.

Bayesian Networks COSC 4426 Brad Legault. Bayes’ Theorem “Bayes’ theorem is to probability theory what Pythagora’s theorem is to geometry.”  Sir Harold.

Chapter 14 Probability Rules!. Do Now: According to the 2010 US Census, 6.7% of the population is aged 10 to 14 years, and 7.1% of the population is aged.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 15 Probability Rules!

Theory of Computational Complexity Yusuke FURUKAWA Iwama Ito lab M1.

Chapter 14 Probability Rules!.

Personalize Practice with Accelerated Math

Propagation Algorithm in Bayesian Networks

Chapter 15 Probability Rules! Copyright © 2010 Pearson Education, Inc.

Chapter 15 Probability Rules!.

Probability Notes Math 309.

Presentation transcript:

Brad Legault Soft Computing CONDITIONAL DEPENDENCE & INDEPENDENCE

 Dependence, in the probabilistic sense, means that, given two events, the outcome of one event affects the outcome of the other event. P(A|B) ≠ P(A|¬B)  Independence, therefore, is when the outcome of one event has no effect on another event. P(A|B) = P(A|¬B)  Very simple and straightforward! Does A affect B? If yes, dependent. If no, independent. Right? DEPENDENCE VS. INDEPENDENCE

 Your math teachers have been lying to you all these years. Dependencies and independencies can be conditional on related facts. For example, let’s consider 2 events: Event A: It is cold outside. Event B: A billing mistake has occurred at Hydro At a glance, these events are apparently independent. The temperature should not change the likelihood of billing mistakes, nor should billing mistakes affect a change in temperature. WRONG! IT’S NOT THAT SIMPLE!

 Now imagine that we introduce another event, C, which is dependent on both events A and B we listed above. Event C: Your household reported energy consumption has drastically increased.  Knowing the outcome of event C can have an effect on our probabilities of event A or B.  This isn’t that intuitive, since we think of C being derived from A and B, not the other way around NOW ADD A RELATED EVENT…

 As was demonstrated in my previous presentation, we can create a simple Bayesian Network to represent the relationship and probabilities of the scenario BAYESIAN REPRESENTATION  As previously explained, the Conditional Probability Tables (CPT’s) are condensed, but all information is represented

Now imagine that we know that event C, the reported energy consumption increase, is found to be true (ie: more electricity was used). Suddenly, the possible outcomes of the scenario have changed drastically. This can be best illustrated with a tree diagram SPECIFIED OUTCOME OF DEPENDENT EVENT

This illustrates all the probabilities given no restrictions on events A, B, and C. Notice how regardless of the outcome of A, the probability of B remains constant, but C is dependent on the outcome of A and B. Also notice the sum of each column adds to 1. TREE DIAGRAM

Given that event C is true, all the paths where event C is false are no longer possible. Now we have a problem: The last column not longer totals to a probability of 1. We need to use Bayes Rule to compute the new weights for each of our four possible outcomes. TREE DIAGRAM

 Bayes theorem is really just a ratio. A given “path in the tree” is equal to the path’s weight divide by the sum of all possible weights  The sum of the viable path weights are: =  If we divide each viable path by , we will have our adjusted probability for each path BAYES THEOREM (AKA BAYES RULE)

If we remove the paths where C is false, we use the final weights with Bayes Theorem to get… TREE DIAGRAM

When examining the probabilities of event A and B given event C, we notice immediately that the two sub trees are NOT identical. In other words, the probability of A now affects the probability of B, on condition of event C. TREE DIAGRAM } Sub tree 1 } Sub tree 2

 This sort of calculation is done very easily using the initial Bayesian Network given, it’s just a matter of multiplying entries in the conditional probability tables. The trees were just shown to illustrate the paths that could be removed.  The example shown illustrated that event B is dependent on event A given event C. We could just demonstrated that A is dependent on B in the same fashion just by swapping the position of the A and B events in the tree  This sort of relationship in Bayesian Networks is sometimes informally called a “head-to-head” relationship A COUPLE NOTES

 What we’re seeing is that when two independent events both cause the same derived event, the two independent events are actually conditionally dependent on that event.  Conditional dependence is relatively straightforward, and is applied to everything that fits that shape in Bayesian Networks (assuming there’s no interconnectivity above that relates them. SO WHAT DOES THIS MEAN?

 This is the opposite of the example we saw before (conditional dependence)  This concerns events which are considered initially to be dependent on each other, but that become independent given a third related event  There are 2 simple structures that occur commonly in Bayesian Networks that illustrate conditional independence. CONDITIONAL INDEPENDENCE

 This type of conditional independence looks like the following graph:  Effectively, a single event which has two other events dependent on it.  Scenario: Imagine an old public school building uses a water boiler to provide heat via radiators throughout the classrooms. When the boiler turns on, it occasionally makes a noise which can be heard throughout the building. It can also, albeit less commonly, set off the fire alarm in the building. “TAIL-TO-TAIL”

 The Directed Acyclic Graph of the Bayesian Network for the preceding scenario could be drawn like as below  Consider that we do not know if the boiler is on or off. We hear the sound that we tend to relate to the boiler. We also know that the boiler can set the alarm off, so we perhaps brace ourselves for the alarm (just in case) all based on the noise (ie: they are dependent) BAYESIAN NETWORK REPRESENTATION

 Now imagine that we know for a fact the boiler is turning on (perhaps we fiddled with the thermostat to force it to start). This time, we hear the noise, but it doesn’t affect whether we brace ourselves for the alarm or not, as we already know the boiler has turned on, and the noise is unrelated to the alarm.  Thus, if we know the boiler has just started, the noise and the alarm become independent.  Therefore, we can say that the alarm and the noise are conditionally independent on the boiler starting. INTRODUCE KNOWN EVENT

THE MATH BEHIND THIS

 Another type of conditional independence is informally called Head-to-Tail  Consider a modification to the previous scenario: The same building with the old boiler has problems with old pipes. When it turns on, sometimes the pipes give out and a particular room will fill with steam. The steam has a good chance of setting off the fire alarm if it isn’t dealt with right away.  This time, what we have are a chain of three dependent items “HEAD-TO-TAIL”

 The Directed Acyclic Graph for the Bayesian Network could look like this: HEAD-TO-TAIL DAG Without knowing anything about the events in the diagram, it would seem as though they are all dependent on each other, directly or indirectly. Imagine, however, that we know that the middle event, (the room full of steam) has occurred. We would brace ourselves for the possibility of the alarm, and finding out about the boiler would be redundant. The events at the top and bottom of the chain have become conditionally independent given the pipe leak.

 One of the reasons we use Bayesian Networks is to be able to break them down into components and identify structures such as those outlined earlier in this presentation (and thus identifying conditional dependence or independence).  In order to break up Bayesian Networks, we can use a technique called D-Separation in order to do so.  The technique involves searching for paths (direction unimportant) and analysing the directionality of the edges to see if a particular path qualifies D-SEPARATION

 When looking at the DAG, we choose to condition upon one event, which we will denote C, at a time.  Once that event is chosen, we analyse the DAG under that pretense, and look for patterns satisfying the rules below a)We find a head-to-tail or tail-to-tail connection where C is the middle event b)We find a head-to-head connection where C is not in the middle, nor is it one of the descendants in the DAG D-SEPARATION BLOCK RULES

 Any spot in the DAG that satisfies these rules is “blocked” at that connection  If all routes between two vertices are blocked, then they are conditionally independent on whatever event was chosen  To see the conditional independence, it is easiest to denote where the “blocks” are right on the graph itself, then resetting when a new conditioning event is chosen D-SEPARATION CONTINUED

 Consider the following DAG: We will use node 3 as our initial condition event Now we identify all locations where it is blocked by finding where patterns where the 2 rules stated earlier are satisfied. EXAMPLE

 We can see here that the path {4,3,8} has the condition event in the middle and is a tail-to-tail connection, so by rule a, those edges are blocked BLOCKING EDGES 1

 Next, we notice that {4,6,5} forms a head-to-head relationship and the conditional event is not in (or a descendant of) the middle node. Therefore, those edges are blocked BLOCKING EDGES 2

 There are other edges we can block, here (eg: {1,3,8}), but given the layout of this graph, it won’t make a difference at this point  Now we can identify nodes which are blocked under condition of event 3. IDENTIFY BLOCKED PATHS

A FEW PATHS v1v2D-separated? 12no 48yes 45 47no 27yes

A FEW PATHS  All pairs which are D-Separated are conditionally independent given event C (in this case, event 3)  So given the table on the previous page, a few things we could write would be: 1 ⊥ 2 | 3 4 ⊥ 8 | 3 4 ⊥ 5 | 3 4 ⊥ 7 | 3 2 ⊥ 7 | 3 Repeating the process with different condition variables will give completely different results.

Thanks for listening to my presentation. IT’S OVER!