Introduction to Bayesian Networks Instructor: Dan Geiger

Slides:



Advertisements
Similar presentations
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
. Introduction to Bayesian Networks Instructor: Dan Geiger Web page:
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
BAYESIAN NETWORKS CHAPTER#4 Book: Modeling and Reasoning with Bayesian Networks Author : Adnan Darwiche Publisher: CambridgeUniversity Press 2009.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
. Bayesian Networks Lecture 9 Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.
1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Bayesian Networks Alan Ritter.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
. Introduction to Bayesian Networks Instructor: Dan Geiger Web page:
A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graphs.
1 Inference Algorithm for Similarity Networks Dan Geiger & David Heckerman Presentation by Jingsong Wang USC CSE BN Reading Club Contact:
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
Introduction to Proofs
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.
1 Bayesian Networks (Directed Acyclic Graphical Models) The situation of a bell that rings whenever the outcome of two coins are equal can not be well.
Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1,
1 First order theories (Chapter 1, Sections 1.4 – 1.5) From the slides for the book “Decision procedures” by D.Kroening and O.Strichman.
1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.
Daniel Kroening and Ofer Strichman 1 Decision Procedures An Algorithmic Point of View Basic Concepts and Background.
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.
. Bayesian Networks Some slides have been edited from Nir Friedman’s lectures which is available at Changes made by Dan Geiger.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
. Introduction to Bayesian Networks Instructor: Dan Geiger Web page:
1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:
Review of Probability.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Chapter 10: Using Uncertain Knowledge
Read R&N Ch Next lecture: Read R&N
Where are we in CS 440? Now leaving: sequential, deterministic reasoning Entering: probabilistic reasoning and machine learning.
Bayesian Networks Background Readings: An Introduction to Bayesian Networks, Finn Jensen, UCL Press, Some slides have been edited from Nir Friedman’s.
Bell & Coins Example Coin1 Bell Coin2
Review for the Midterm Exam
Data Mining Lecture 11.
The set  of all independence statements defined by (3
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Read R&N Ch Next lecture: Read R&N
The Foundations: Logic and Proofs
Dependency Models – abstraction of Probability distributions
Bayesian Networks Based on
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
CS 188: Artificial Intelligence
Introduction to Bayesian Networks Instructor: Dan Geiger
Bayesian Networks Independencies Representation Probabilistic
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Fall 2008
MA/CSSE 474 More Math Review Theory of Computation
Markov Random Fields Presented by: Vladan Radosavljevic.
Graduate School of Information Sciences, Tohoku University
Virtual University of Pakistan
CS 188: Artificial Intelligence Spring 2007
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Approximate Inference by Sampling
BN Semantics 3 – Now it’s personal! Parameter Learning 1
Junction Trees 3 Undirected Graphical Models
Probabilistic Reasoning
CS 188: Artificial Intelligence Spring 2006
CS 188: Artificial Intelligence Fall 2008
Lecture 23 NP-Hard Problems
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

Introduction to Bayesian Networks Instructor: Dan Geiger על מה המהומה ? נישואים מאושרים בין תורת ההסתברות ותורת הגרפים. הילדים המוצלחים: אלגוריתמים לגילוי תקלות, קודים לגילוי שגיאות, מודלים למערכות מורכבות. שימושים במגוון רחב של תחומים. קורס ממבט אישי למדי על נושא מחקריי לאורך שנים רבות. Web page: www.cs.technion.ac.il/~dang/courseBN http://webcourse.cs.technion.ac.il/236372 Email: dang@cs.technion.ac.il Phone: 829 4339 Office: Taub 616. .

What is it all about ? How to use graphs to represent probability distributions over thousands of random variables ? How to encode conditional independence in directed and undirected graphs ? How to use such representations for efficient computations of the probability of events of interest ? How to learn such models from data ?

Course Information Meetings: Lecture: Mondays 10:30 –12:30 Tutorial: Mondays 13:30 – 14:30 Grade: 40% in 4 question sets. These questions sets are obligatory. Each contains mostly theoretical problems. Submit in pairs before due time (three weeks). 50% Bochan on January 9, based on first 9 weeks including Chanuka, and on understanding Chapter 3 & 4 of Pearl’s book. Obligatory to PASS the test and get a grade, but grade is fully replaceable with a programming project using BNs software. Miluim has to send me an advanced notice two weeks before the Bochan. 10%. Obligatory for a **passing grade**. Attending all lectures and recitation classes except one. (In very special circumstances 2% per missing lecture or recitation). Prerequisites: Data structure 1 (cs234218) Algorithms 1 (cs234247) Probability (any course) Information and handouts: http://webcourse.cs.technion.ac.il/236372 http://www.cs.technion.ac.il/~dang/courseBN/ (Only lecture slides)

Relations to Some Other Courses אמור לי מי חבריך ואומר לך מי אתה. Introduction to Artificial Intelligence (cs236501) Introduction to Machine Learning (cs236756) Introduction to Neural Networks (cs236950) Algorithms in Computational Biology (cs236522) Error correcting codes Data mining

Mathematical Foundations Inference with Bayesian Networks Learning Bayesian Networks Applications

Homework HMW #1. Read Chapter 3.1 & 3.2.1. Answer Questions 3.1, 3.2, Prove Eq 3.5b, and fully expand the proof details of Theorem 2. Submit in pairs no later than noon of 14/11/12 (Two weeks). Pearl’s book contains all the notations that I happen not to define in these slides – consult it often – it is also a very unique and interesting classic text book.

The Traditional View of Probability in Text Books Probability theory provides the impression that we need to literally represent a joint distribution explicitly as P(x1,…,xn) on all propositions and their combinations. It is consistent and exhaustive. This representation stands in sharp contrast to human reasoning: It requires exponential computations to compute marginal probabilities like P(x1) or conditionals like P(x1|x2). Humans judge pairwise conditionals swiftly while conjunctions are judged hesitantly. Numerical calculations do not reflect simple reasoning tasks.

The Qualitative Notion of Dependence Marginal independence is defined numerically as P(x,y)=P(x) P(y). The truth of this equation is hard to judge by humans, while judging whether X and Y are dependent is often easy. “Burglary within a day” and “nuclear war within five years” Likewise, the three place relationship (X influences Y, given Z) is judjed easily. For example: X = time of last pickup from a bus station and Y= time for next bus are dependent, but are conditionally independent given Z= whereabouts of the next bus

The notions of relevance and dependence are far more basic than the numerical values. In a resonating system it should be asserted once and not be sensitive to numerical changes. Acquisition of new facts may destroy existing dependencies as well as creating new once. Learning child’s age Z destroys the dependency between height X and reading ability Y. Learning symptoms Z of a patient creates dependencies between the diseases X and Y that could account for it. Probability theory provides in principle such a device via P(X | Y, K) = P(X |K) But can we model the dynamics of dependency changes based on logic, without reference to numerical quantities ?

Definition of Marginal Independence Definition: IP(X,Y) iff for all xDX and yDy Pr(X=x, Y=y) = Pr(X=x) Pr(Y=y) Comments: Each Variable X has a domain DX with value (or state) x in DX. We often abbreviate via P(x, y) = P(x) P(y). When Y is the emptyset, we get Pr(X=x) = Pr(X=x). Alternative notations to IP(X,Y) such as: I(X,Y) or XY Next few slides on properties of marginal independence are based on “Axioms and algorithms for inferences involving probabilistic independence.”

Properties of Marginal Independence   Proof (Soundness). Trivial independence and Symmetry follow from the definition. Decomposition: Given P(x,y,w) = P(x) P(y,w), simply sum over w on both sides of the given equation. Mixing: Given: P(x,y) = P(x) P(y) and P(x,y,w) = P(x,y) P(w). Hence, P(x,y,w) = P(x) P(y) P(w) = P(x) P(y,w).

Properties of Marginal Independence Are there more such independent properties of independence ? No. There are none ! No more independent axioms of the form 1 & … & n   where each statement  stands for an independence statement. 2. We use the symbol  for a set of independence statements. In this notation:  is derivable from  via these properties if and only if  is entailed by  (i.e.,  holds in all probability distributions that satisfy ). 3. For every set  and a statement  not derivable from , there exists a probability distribution P that satisfies  and not .

Properties of Marginal Independence Can we use these properties to infer a new independence statements  from a set of given independence statements  in polynomial time ? YES. The “membership algorithm” and completeness proof in Recitation class today (Paper P2). Comment. The question “does  entail ” could in principle be undecidable, drops to being decidable via a complete set of axioms, and then drops to polynomial with the above claim.

Definitions of Conditional Independence Ip(X,Z,Y) if and only if whenever P(Y=y,Z=z) >0 (3.1) Pr(X=x | Y=y , Z=z) = Pr(X=x |Z=z)

Properties of Conditional Independence  

Important Properties of Conditional Independence Recall there are some more notations.

Graphical Interpretation

Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”, “connected ideas”, “far-fetched arguments”. Still, capturing the essence of dependence is not an easy task. When modeling causation, association, and relevance, it is hard to distinguish between direct and indirect neighbors. If we just connect “dependent variables” we will get cliques.

M = { IG(M1,{F1,F2},M2), IG(F1,{M1,M2},F2) + symmetry } Markov Network Example Other semantics. The color of each pixel depends on its neighbor. M = { IG(M1,{F1,F2},M2), IG(F1,{M1,M2},F2) + symmetry }

Markov Networks 1. Define for each (maximal) clique Ci a non-negative function g(Ci) called the compatibility function. 2. Take the product i g(Ci) over all cliques. 3. Define P(X1,…,Xn) = K· i g(Ci) where K is a normalizing factor (inverse sum of the product). Mp = { IP(M1,{F1,F2},M2), IP(F1,{M1,M2},F2) + symmetry } SOUNDNESS: IG(X, Z,Y)  IP(X, Z,Y) For all, sets of nodes X,Y,Z

The two males and females example P(M1,M2,F1,F2)= K g(M1,F1) g(M1,F2) g(M2,F1) g(M2,F2) Where K is a normalizing constant (to 1).

Bayesian Networks (Directed Acyclic Graphical Models) Coin1 Bell Coin2 A situation of a bell that rings whenever the outcome of two coins are equal can not be well represented by undirected graphical models. A clique will be formed because of induced dependency of the two coins given the bell.

Bayesian Networks (BNs) Examples of models for diseases & symptoms & risk factors One variable for all diseases (values are diseases) One variable per disease (values are True/False) Naïve Bayesian Networks versus Bipartite BNs

Natural Direction of Information Even simple tasks are not natural to phrase in terms of joint distributions. Consider the example of a hypothesis H=h indicating a rare disease and A set of symptoms such as fever, blood preasure, pain, etc, marked by E1=e1, E2=e2, E3=e3 or in short E=e. We need P(h | e). Can we naturally assume P(H,E) is given and compute P(h | e) ?