Today’s Topics Bayesian Networks (BNs) used a lot in medical diagnosis M-estimates Searching for Good BNs Markov Blanket what is conditionally independent.

Slides:

Advertisements

Similar presentations

BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.

Advertisements

Belief networks Conditional independence Syntax and semantics Exact inference Approximate inference CS 460, Belief Networks1 Mundhenk and Itti Based.

1 Bayesian Networks Slides from multiple sources: Weng-Keen Wong, School of Electrical Engineering and Computer Science, Oregon State University.

Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3

Reasoning Under Uncertainty: Bayesian networks intro Jim Little Uncertainty 4 November 7, 2014 Textbook §6.3, 6.3.1, 6.5, 6.5.1,

Generative Models. Announcements Probability Review (Friday, 1:15 Gates B03) Late days… To be fair… Start the p-set early double late days.

Slide 1 Reasoning Under Uncertainty: More on BNets structure and construction Jim Little Nov (Textbook 6.3)

Probabilistic Reasoning (2)

CSCI 121 Special Topics: Bayesian Networks Lecture #3: Multiply-Connected Graphs and the Junction Tree Algorithm.

CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.

Reasoning under Uncertainty: Conditional Prob., Bayes and Independence Computer Science cpsc322, Lecture 25 (Textbook Chpt ) March, 17, 2010.

Bayesian network inference

Integer Types. Bits and bytes A bit is a single two-valued quantity: yes or no, true or false, on or off, high or low, good or bad One bit can distinguish.

Bayesian Belief Networks

CPSC 322, Lecture 28Slide 1 Reasoning Under Uncertainty: More on BNets structure and construction Computer Science cpsc322, Lecture 28 (Textbook Chpt 6.3)

CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.

Learning Bayesian Networks

Announcements Homework 8 is out Final Contest (Optional)

CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.

CPSC 322, Lecture 29Slide 1 Reasoning Under Uncertainty: Bnet Inference (Variable elimination) Computer Science cpsc322, Lecture 29 (Textbook Chpt 6.4)

Rutgers CS440, Fall 2003 Introduction to Statistical Learning Reading: Ch. 20, Sec. 1-4, AIMA 2 nd Ed.

© Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors.

1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.

CPSC 322, Lecture 24Slide 1 Reasoning under Uncertainty: Intro to Probability Computer Science cpsc322, Lecture 24 (Textbook Chpt 6.1, 6.1.1) March, 15,

1 Midterm Exam Mean: 72.7% Max: % Kernel Density Estimation.

Quiz 4: Mean: 7.0/8.0 (= 88%) Median: 7.5/8.0 (= 94%)

Read R&N Ch Next lecture: Read R&N

Bayesian networks Chapter 14. Outline Syntax Semantics.

Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:

Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3

Reasoning Under Uncertainty: Bayesian networks intro CPSC 322 – Uncertainty 4 Textbook §6.3 – March 23, 2011.

Reasoning Under Uncertainty: Independence and Inference Jim Little Uncertainty 5 Nov 10, 2014 Textbook §6.3.1, 6.5, 6.5.1,

Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.

Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?

Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9.

Today’s Topics Read –For exam: Chapter 13 of textbook –Not on exam: Sections & Genetic Algorithms (GAs) –Mutation –Crossover –Fitness-proportional.

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.

Slides for “Data Mining” by I. H. Witten and E. Frank.

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.

Today’s Topics Playing Deterministic (no Dice, etc) Games –Mini-max –  -  pruning –ML and games? 1997: Computer Chess Player (IBM’s Deep Blue) Beat Human.

CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.

Today’s Topics Graded HW1 in Moodle (Testbeds used for grading are linked to class home page) HW2 due (but can still use 5 late days) at 11:55pm tonight.

Inference Algorithms for Bayes Networks

Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

Today’s Topics Remember: no discussing exam until next Tues! ok to stop by Thurs 5:45-7:15pm for HW3 help More BN Practice (from Fall 2014 CS 540 Final)

Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 

CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.

Today’s Topics Some Exam-Review Notes –Midterm is Thurs, 5:30-7:30pm HERE –One 8.5x11 inch page of notes (both sides), simple calculator (log’s and arithmetic)

Weng-Keen Wong, Oregon State University © Bayesian Networks: A Tutorial Weng-Keen Wong School of Electrical Engineering and Computer Science Oregon.

QUIZ!!  T/F: You can always (theoretically) do BNs inference by enumeration. TRUE  T/F: In VE, always first marginalize, then join. FALSE  T/F: VE is.

CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.

Web-Mining Agents Data Mining Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)

CS 2750: Machine Learning Directed Graphical Models

CS Fall 2015 (Shavlik©), Midterm Topics

Qian Liu CSE spring University of Pennsylvania

Inference in Bayesian Networks

CS Fall 2016 (Shavlik©), Lecture 11, Week 6

Read R&N Ch Next lecture: Read R&N

cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Bayesian Networks: Motivation

Read R&N Ch Next lecture: Read R&N

CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

Class #19 – Tuesday, November 3

CS 188: Artificial Intelligence Fall 2008

Class #16 – Tuesday, October 26

Read R&N Ch Next lecture: Read R&N

Presentation transcript:

Today’s Topics Bayesian Networks (BNs) used a lot in medical diagnosis M-estimates Searching for Good BNs Markov Blanket what is conditionally independent in a BN 10/13/15CS Fall 2015 (Shavlik©), Lecture 15, Week 61 Thomas Bayes

Approximating the Full Joint Prob Table Bayesian Networks are one way to ‘compactly’ represent a full joint prob table A Bayes net can fill every cell in a full joint prob table, but is (usually) much smaller –The trick? All the cells are no longer independent An analogy: we could have (a) a big table that holds all products of two 32-bit int’s or (b) a more compact way to compute ‘cells’ in this table when needed 10/13/15CS Fall 2015 (Shavlik©), Lecture 15, Week 6 2

Bayesian Networks (BNs) 10/13/15CS Fall 2015 (Shavlik©), Lecture 15, Week 6 3

Example 10/13/15CS Fall 2015 (Shavlik©), Lecture 15, Week 6 4 A B C D E P(A) = 0.7 (implicit: P(¬A) = 0.3)) P(B) = 0.4 P(D) = 0.2 AB P(C | A = ? ˄ B= ?) FF0.9 FT0.3 TF0.6 TT0.1 CD P(E | C = ? ˄ D= ?) FF0.8 FT0.7 TF0.4 TT0.6 What are P(A ˄ B ˄ C ˄ D ˄ E) = P(A ˄ ¬B ˄ C ˄ D ˄ ¬E) = Have several SMALL tables rather than one BIG table - they are called CONDITIONAL PROBABILITY TABLES (CPTs)

Solutions for Prev Page P(A ˄ B ˄ C ˄ D ˄ E) = P(A)  P(B)  P(C | A ˄ B)  P(D)  P(E | C ˄ D) = 0.7  0.4  0.1  0.2  0.6 P(A ˄ ¬B ˄ C ˄ D ˄ ¬E) = = P(A)  P(¬B)  P(C | A ˄ ¬B)  P(D)  P(¬E | C ˄ D) = P(A)  (1 - P(B))  P(C | A ˄ ¬B)  P(D)  (1 - P(E | C ˄ D)) = 0.7  (1 – 0.4)  0.6  0.2  (1 – 0.6 ) 10/13/15CS Fall 2015 (Shavlik©), Lecture 15, Week 6 5

Using BNs to Answer ‘Partial’ Queries 10/13/15CS Fall 2015 (Shavlik©), Lecture 15, Week 6 6 Basic idea (repeated from previous lecture) 1)create probs that only involve AND and NOT 2)“AND in” the remaining vars in all possible (conjunctive) ways 3)look up fully specified ‘world states’ 4)do the arithmetic We still do the following from the prev lec to convert partial world-state queries to full world-state queries But instead of doing a LOOKUP in a table, we use the BN to calc the prob of a complete world state Complete world state Prob(complete world state) Black Box (full joint? BN?)

Filling the BN’s Tables - look at data and simply count! Over60SmokesP(Had_HeartAttack | Over60 = ? ˄ Smokes = ?) FF FT TF TT 10/13/15CS Fall 2015 (Shavlik©), Lecture 15, Week 6 7 Over60SmokesHad_HeartAttack FTF TTT TFT FTT FFF TTF TFT FTT FTF TFF FFT FFF Data Set 1/3 2/4 2/3 1/2

Filling the Full Joint Table from a BN SeasonSprinklerOn?Raining?GrassWet?Slippery?Prob SpringFFFF FFFT FFTF FFTT FTFF FTFT etc………… 10/13/15CS Fall 2015 (Shavlik©), Lecture 15, Week 6 8 Successively fill each row by applying the BN formula using that row’s values for all the random variables CPTs not shown, but need to be there

10/13/15CS Fall 2015 (Shavlik©), Lecture 15, Week 6 9 A B C D E P(A) = 0.7 (implicit: P(¬A) = 0.3)) P(B) = 0.4 P(D) = 0.2 AB P(C | A = ? ˄ B= ?) FF0.9 FT0.3 TF0.6 TT0.1 CD P(E | C = ? ˄ D= ?) FF0.8 FT0.7 TF0.4 TT0.6 We have FIVE Boolean-valued random vars, so full joint table would have = 31 (‘-1’ since sum of probs equals 1) independent numbers In this BN we have 11 independent numbers (difference more striking on large tasks) We can use the BN to fill the joint table with non-zero values, but likely are approximating the ‘true’ full joint table (approx’ing might lead to better generalization if we don’t have a lot of data)

M-Estimates What if NO Data for some Cell in a CPT? Assuming prob=0 has Major Impact –Zero very different from ‘very small prob’ Soln: assume we have m examples for each possible value of each random variable –Often we use m=1 (called `Laplace Smoothing’, en.wikipedia.org/wiki/Additive_smoothing) en.wikipedia.org/wiki/Additive_smoothing 10/13/15CS Fall 2015 (Shavlik©), Lecture 15, Week 6 10 Pierre-Simon Laplace

Example of M-estimates Assume size has values S, M, L Assume we have 3 Small, 2 Medium, and 0 Large ex’s Use m=1, so ‘imagine’ 1 S, 1 M, and 1 L additional, ‘pseudo’ examples P(size = S) = (3 + 1) / (5 + 3) = P(size = M) = (2 + 1) / (5 + 3) = P(size = L) = (0 + 1) / (5 + 3) = Programming trick: start all NUMERATOR counters at m rather than 0 (then sum numerators to set initial DENOMINATOR) 10/13/15CS Fall 2015 (Shavlik©), Lecture 15, Week 6 11 Count of ACTUAL examples Count of PSEUDO examples Aside: could also do this in the d- tree calculations!

If we have, say, 100 features, we are multiplying 100 numbers in [0,1] If many probabilities are small, we could “underflow” the minimum positive double in our computer Trick: Sum log’s of prob’s Often we need only compare the exponents of two calc’s NB Technical Detail: Underflow 10/13/1512CS Fall 2015 (Shavlik©), Lecture 15, Week 6

From Where Does the BN’s GRAPH Come? Knowledge Provided by Domain Expert Or via AI SEARCH! State: a BN using all the random vars Start: NO arcs Action: add directed arc if no cycle created Heuristic: accuracy on train set plus penalty proportional to network size (to reduce over fitting) 10/13/15CS Fall 2015 (Shavlik©), Lecture 15, Week 6 13

Searching for a Good BN Structure 10/13/15CS Fall 2015 (Shavlik©), Lecture 15, Week 614 Computationally expensive, so often specialized algorithms used, eg TAN (Tree-Augmented Naïve Bayes), which allows at most two parents and finds optimal such network in polynomial time Add Arc...

Terminology: Bayes Nets and ML Parameter (aka Weight) Learning –Fill the conditional probability tables (CPTs) stored at each node (look at training data and count things) Structure Learning (‘learn the graph’) –SKIP if expert provides structure –Can cast as an AI search task Add/subtract nodes (broader search than on earlier slide) Add/subtract arcs (graph must be acyclic) Change direction of arc –Need to score candidate graphs Usually need to penalize graph complexity/size (we’ll see analogs later in SVMs and neural networks) 10/13/1515CS Fall 2015 (Shavlik©), Lecture 15, Week 6

The Full Joint Table as a BN Usually a BN approximates the full joint table But the BN notation can also exactly rep it P(A  B  C  D) // For clarity, we’ll only use four random vars  P(A  B  C | D)  P(D)  P(A  B | C  D)  P(C | D) x P(D)  P(A | B  C  D)  P(B | C  D)  P(C | D) x P(D) 10/13/15CS Fall 2015 (Shavlik©), Lecture 15, Week 6 16 DABC This is a ‘fully connected’ acyclic graph

Markov Blanket If the random variables in a node’s Markov Blanket are set, then that node is conditionally independent of all other nodes in a BN Markov Blanket (pg 517) –Parent nodes (this is one is obvious) –Children nodes (less obvious) –Children’s other parent nodes (much less obvious) 10/13/15CS Fall 2015 (Shavlik©), Lecture 15, Week 6 17 Andrei Andreyevich Markov

Markov Blanket Illustrated 10/13/15CS Fall 2015 (Shavlik©), Lecture 15, Week 6 18 P1P1 PNPN X CMCM C1C1 CP M,1 CP M,k CP 1,1 CP 1,k … … … … We can compute P(X | P’s, C’s, and CP’s) regardless of settings for the rest of the BN

More Markov-Blanket Experience 10/13/15CS Fall 2015 (Shavlik©), Lecture 15, Week 6 19 Knowing GasInTank=0 can ‘explain away’ why GasGauge=empty “Leak” nodes often used to model “all other causes” If we know value of EngineCranks, whether or not the car Starts tells us nothing about BatteryPower