. Introduction to Bayesian Networks Instructor: Dan Geiger Web page:

Slides:



Advertisements
Similar presentations
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
Advertisements

. Introduction to Bayesian Networks Instructor: Dan Geiger Web page:
Dynamic Bayesian Networks (DBNs)
CSE 5522: Survey of Artificial Intelligence II: Advanced Techniques Instructor: Alan Ritter TA: Fan Yang.
BAYESIAN NETWORKS CHAPTER#4 Book: Modeling and Reasoning with Bayesian Networks Author : Adnan Darwiche Publisher: CambridgeUniversity Press 2009.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.
Artificial Intelligence Course review AIMA. Four main themes Problem solving by search Uninformed search Informed search Constraint satisfaction Adversarial.
Cooperating Intelligent Systems Course review AIMA.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
. Bayesian Networks Lecture 9 Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Chapter 11: Limitations of Algorithmic Power
Bayesian Networks Alan Ritter.
. Introduction to Bayesian Networks Instructor: Dan Geiger Web page:
Axioms and Algorithms for Inferences Involving Probabilistic Independence Dan Geiger, Azaria Paz, and Judea Pearl, Information and Computation 91(1), March.
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
CIS 410/510 Probabilistic Methods for Artificial Intelligence Instructor: Daniel Lowd.
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
CSE 515 Statistical Methods in Computer Science Instructor: Pedro Domingos.
C OURSE : D ISCRETE STRUCTURE CODE : ICS 252 Lecturer: Shamiel Hashim 1 lecturer:Shamiel Hashim second semester Prepared by: amani Omer.
Introduction to Proofs
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Artificial Intelligence CS 165A Thursday, November 29, 2007  Probabilistic Reasoning / Bayesian networks (Ch 14)
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.
CSE 024: Design & Analysis of Algorithms Chapter 9: NP Completeness Sedgewick Chp:40 David Luebke’s Course Notes / University of Virginia, Computer Science.
1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.
1 Introduction to Abstract Mathematics Chapter 2: The Logic of Quantified Statements. Predicate Calculus Instructor: Hayk Melikya 2.3.
1 Bayesian Networks (Directed Acyclic Graphical Models) The situation of a bell that rings whenever the outcome of two coins are equal can not be well.
Independence, Decomposability and functions which take values into an Abelian Group Adrian Silvescu Vasant Honavar Department of Computer Science Iowa.
CS 3343: Analysis of Algorithms Lecture 25: P and NP Some slides courtesy of Carola Wenk.
1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,
1 Reasoning with Infinite stable models Piero A. Bonatti presented by Axel Polleres (IJCAI 2001,
NPC.
Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.
Today Graphical Models Representing conditional dependence graphically
Dynamic Programming & Hidden Markov Models. Alan Yuille Dept. Statistics UCLA.
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.
. Bayesian Networks Some slides have been edited from Nir Friedman’s lectures which is available at Changes made by Dan Geiger.
Section 1.7. Section Summary Mathematical Proofs Forms of Theorems Direct Proofs Indirect Proofs Proof of the Contrapositive Proof by Contradiction.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Final Exam Information These slides and more detailed information will be posted on the webpage later…
1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
Chapter 7. Propositional and Predicate Logic
Review of Probability.
Introduction to Bayesian Networks Instructor: Dan Geiger
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Bayesian Networks Background Readings: An Introduction to Bayesian Networks, Finn Jensen, UCL Press, Some slides have been edited from Nir Friedman’s.
Bell & Coins Example Coin1 Bell Coin2
Data Mining Lecture 11.
Propositional Calculus: Boolean Algebra and Simplification
Dependency Models – abstraction of Probability distributions
Bayesian Networks Based on
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Introduction to Bayesian Networks Instructor: Dan Geiger
Chapter 34: NP-Completeness
Chapter 11 Limitations of Algorithm Power
MA/CSSE 474 More Math Review Theory of Computation
Graduate School of Information Sciences, Tohoku University
Chapter 7. Propositional and Predicate Logic
Approximate Inference by Sampling
BN Semantics 3 – Now it’s personal! Parameter Learning 1
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

. Introduction to Bayesian Networks Instructor: Dan Geiger Web page: Phone: Office: Taub 616. על מה המהומה ? נישואים מאושרים בין תורת ההסתברות ותורת הגרפים. הילדים המוצלחים: אלגוריתמים לגילוי תקלות, קודים לגילוי שגיאות, מודלים למערכות מורכבות. שימושים במגוון רחב של תחומים. קורס ממבט אישי למדי על נושא מחקריי לאורך שנים רבות.

2 What is it all above ? How to use graphs to represent probability distributions over thousands of random variables ? How to encode conditional independence in directed and undirected graphs ? How to use such representations for efficient computations of the probability of events of interest ? How to learn such models from data ?

3 Course Information Meetings:  Lecture: Wednesdays 10:30 –12:30  Tutorial: Wednesdays 16:30 – 17:30 Grade: u 30% in 5 question sets. These questions sets are obligatory. Each contains mostly theoretical problems. Submit in pairs before due time (three weeks). u 40% one or two hours lecture (Priority to graduate students). u 10% Attending at least 12 lectures and recitation classes for a **passing grade**. (In very special circumstances 2% per missing item). u 20% Checking HMWs and presentation of HMW solutions u Prerequisites: u Data structure 1 (cs234218) u Algorithms 1 (cs234247) u Probability (any course) Information and handouts: u u (Only lecture slides)

4 Relations to Some Other Courses u Introduction to Artificial Intelligence (cs236501) u Introduction to Machine Learning (cs236756) u Introduction to Neural Networks (cs236950) u Algorithms in Computational Biology (cs236522) u Error correcting codes u Data mining אמור לי מי חבריך ואומר לך מי אתה.

5 = Student lectures (8) = TENTATIVE Student lectures (7)

6 u Mathematical Foundations (4 weeks including students’ lectures, based on Pearl’s Chapter 3 + papers). 1.Properties of Conditional Independence (Soundness and completeness of marginal independence, graphoid axioms and their interpretation as “irrelevance”, incompleteness of conditional independence, no disjunctive axioms possible.) 2.Properties of graph separation (Paz and Pearl 85, Theorem 3), soundness and completeness of saturated independence statements. Undirected Graphs as I-maps of probability distributions. Markov-Blankets, Pairwise independence basis. Representation theorems (Pearl and Paz, from each basis to I-maps). Markov networks, HC representation theorem, Completeness theorem. Markov chains 3.Bayesian Networks, d-separation, Soundness, Completeness. 4.Chordal Graphs as the intersection of BN and Markov networks. Equivalence of their 4 definitions. u Combinatorial Optimiziation of Exact Inference in Graphical models (3 weeks including students lectures). 1.HMMs 2.Exact inference and their combinatorial optimization. 3.Clique tree algorithm. Conditioning. 4.Tree-width. Feedback Vertex Set. u Learning (5 weeks including students lectures). 1.Introduction to Bayesian statistics 2.Learning Bayesian networks 3.Chow and Liu’s algorithm; the TAN model. 4.Structural EM 5.Searching for Bayesian networks u Applications (2 weeks including student lectures).

7 Homeworks HMW #1. Read Chapter 3.1 & Answer Questions 3.1, 3.2, Prove Eq 3.5b, and fully expand/fix the proof of Theorem 2. Submit in pairs no later than 28/10/09 (Two weeks from now). HMW #2. Read Chapter 2 in Pearl’s book and answer 6 questions of choice at the end. Submit in pairs no later than 11/11/09. HMW #3. Read fully Chapter 3. Answer additional 5 questions of choice at the end. Submit in pairs no later than 2/12/09. HMW #4. Submit in pairs 23/12/09 HMW #5. Submit in pairs 6/1/10 Pearl’s book contains all the notations that I happen not to define in these slides – consult it often – it is also a very unique and interesting classic text book.

8 The Traditional View of Probability in Text Books Probability theory provides the impression that we need to literally represent a joint distribution explicitly as P(x 1,…,x n ) on all propositions and their combinations. It is consistent and exhaustive. This representation stands in sharp contrast to human reasoning: It requires exponential computations to compute marginal probabilities like P(x 1 ) or conditionals like P(x 1 |x 2 ). Humans judge pairwise conditionals swiftly while conjunctions are judged hesitantly. Numerical calculations do not reflect simple reasoning tasks.

9 The Traditional View of Probability in Text Books Given ? Computed ? Given ? Estimated or Computed ? P(e | h) חישבו על hכמחלה ויראלית נדירה ועל e כחום גבוה.

10 The Qualitative Notion of Dependence Marginal independence is defined numerically as P(x,y)=P(x) P(y). The truth of this equation is hard to judge by humans, while judging whether X and Y are dependent is often easy. “Burglary within a day” and “nuclear war within five years” Likewise, people tend to judge x Y Z

11 The notions of relevance and dependence are far more basic than the numerical values. In a resonating system it should be asserted once and not be sensitive to numerical changes. Acquisition of new facts may destroy existing dependencies as well as creating new once. Learning child’s age Z destroys the dependency between height X and reading ability Y. Learning symptoms Z of a patient creates dependencies between the diseases X and Y that could account for it. Probability theory provides in principle such a device via P(X | Y, K) = P(X |K) But can we model the dynamics of dependency changes based on logic, without reference to numerical quantities ?

12 Definition of Marginal Independence Definition: I P (X,Y) iff for all x  D X and y  D y Pr(X=x, Y=y) = Pr(X=x) Pr(Y=y) Comments: u Each Variable X has a domain D X with value (or state) x in D X. u We often abbreviate via P(x, y) = P(x) P(y). u When Y is the emptyset, we get Pr(X=x) = Pr(X=x).  Alternative notations to I P (X,Y) such as: I(X,Y) or X  Y u Next few slides on properties of marginal independence are based on “Axioms and algorithms for inferences involving probabilistic independence.”

13 Properties of Marginal Independence Trivial Independence: I p (X,  ) Symmetry: I p (X,Y)  I p (Y,X) Decomposition: I p (X,YW)  I p (X,Y) Mixing: I p (X,Y) and I p (XY,W)  I p (X,YW) Proof (Soundness). Trivial independence and Symmetry follow from the definition. Decomposition: Given P(x,y,w) = P(x) P(y,w), simply sum over w on both sides of the given equation. Mixing: Given: P(x,y) = P(x) P(y) and P(x,y,w) = P(x,y) P(w). Hence, P(x,y,w) = P(x) P(y) P(w) = P(x) P(y,w).

14 Properties of Marginal Independence Are there more such independent properties of independence ? No. There are none ! Horn axioms are of the form  1 & … &  n   where each statement  stands for an independence statement. We use the symbol  for a set of independence statements. Namely:  is derivable from  via these properties if and only if  is entailed by  (i.e.,  holds in all probability distributions that satisfy  ). Put differently: For every set  and a statement  not derivable from , there exists a probability distribution P  that satisfies S and not s.

15 Properties of Marginal Independence Can we use these properties to infer a new independence statements  from a set of given independence statements  in polynomial time ? YES. The “membership algorithm” and completeness proof in Recitation class (Paper P2). Comment. The question “does  entail  ” could in principle be undecidable, drops to being decidable via a complete set of axioms, and then drops to polynomial with this claim.

16 Properties of Marginal Independence u Can we check consistency of a set  + independence plus a set  - of negated independence statements ? The membership algorithm in previous slide applies only for G that includes one negated statement – simply use the algorithm to check that it is not entailed from  +. But another property of independence called “Amstrong Relation” guarantees that consistency is indeed verified by checking separately (in isolation) that each statement in  - is not entailed from  +.

17 Properties of Marginal Independence We define a “product”  of two distributions P 1 and P 2 over a set of n variables X 1,…,X n as follows: P  (a 1  1,…, a n  n )  P 1 (a 1,…, a n )  P 2 (  1,…,  n ) Note that the domains of X i change in this definition ! This definition implies that for every independence statement  :  hold in P 1 and  holds in P 2 if and only if  holds in P*. Hence, consistency can be checked statement by statement.

18 Definitions of Conditional Independence I p (X,Z,Y) if and only if whenever P(Y=y,Z=z) >0 (3.1) Pr(X=x | Y=y, Z=z) = Pr(X=x |Z=z)

19 Properties of Conditional Independence Same Properties of conditional independence: Symmetry: I(X,Z,Y)  I(Y,Z,X) Decomposition: I(X,Z,YW)  I(X,Z,Y) Mixing: I(X,Z,Y) and I(XY,Z,W)  I(X,Z,YW) BAD NEWS. Are there more properties of independence ? Yes, infinitely many independent Horn axioms. No answer to the membership problem, nor to the consistency problem.

20 Important Properties of Conditional Independence Recall there are some more notations.

21 Dependency Models – abstraction of Probability distributions

22 Graphical Interpretation

23 The Intersection Property Revisited Intersection: I(X,ZW,Y) and I(X,ZY,W)  I(X,Z,YW) This property holds for positive distributions. Counter example: X=Y=W, Z is the emptyset. A set  of independence statements I(x,y,z) (namely, a dependency model) that satisfies Symmetry, Decomposition, Weak Union, Contraction, and Intersection is called a graphoid, and without intersection is called semi-graphoid.