cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Slides:



Advertisements
Similar presentations
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Advertisements

Bayesian Belief Networks
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
Announcements Homework 8 is out Final Contest (Optional)
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Quiz 4: Mean: 7.0/8.0 (= 88%) Median: 7.5/8.0 (= 94%)
Read R&N Ch Next lecture: Read R&N
Bayesian networks Chapter 14. Outline Syntax Semantics.
Lab Assignment 1 Environments Search Bayes Nets. Problem 1: Peg Solitaire Is Peg Solitaire: Partially observable? Stochastic? Continuous? Adversarial?
Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
Slides for “Data Mining” by I. H. Witten and E. Frank.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.
Today’s Topics Playing Deterministic (no Dice, etc) Games –Mini-max –  -  pruning –ML and games? 1997: Computer Chess Player (IBM’s Deep Blue) Beat Human.
Today’s Topics Graded HW1 in Moodle (Testbeds used for grading are linked to class home page) HW2 due (but can still use 5 late days) at 11:55pm tonight.
Inference Algorithms for Bayes Networks
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
Today’s Topics Remember: no discussing exam until next Tues! ok to stop by Thurs 5:45-7:15pm for HW3 help More BN Practice (from Fall 2014 CS 540 Final)
Today’s Topics Bayesian Networks (BNs) used a lot in medical diagnosis M-estimates Searching for Good BNs Markov Blanket what is conditionally independent.
Today’s Topics Some Exam-Review Notes –Midterm is Thurs, 5:30-7:30pm HERE –One 8.5x11 inch page of notes (both sides), simple calculator (log’s and arithmetic)
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
CS 2750: Machine Learning Directed Graphical Models
CS Fall 2015 (Shavlik©), Midterm Topics
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Qian Liu CSE spring University of Pennsylvania
Inference in Bayesian Networks
CS Fall 2016 (Shavlik©), Lecture 11, Week 6
Read R&N Ch Next lecture: Read R&N
Bayes Net Learning: Bayesian Approaches
Bayesian Networks: A Tutorial
Artificial Intelligence
Learning Bayesian Network Models from Data
CS Fall 2016 (Shavlik©), Lecture 12, Week 6
Quizzz Rihanna’s car engine does not start (E).
CS 4/527: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
Still More Uncertainty
Bayesian Networks: Motivation
Read R&N Ch Next lecture: Read R&N
Read R&N Ch Next lecture: Read R&N
CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
CAP 5636 – Advanced Artificial Intelligence
CSCI 5822 Probabilistic Models of Human and Machine Learning
CS Fall 2016 (Shavlik©), Lecture 10, Week 6
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Fall 2008
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence
CS Fall 2016 (Shavlik©), Lecture 12, Week 6
Directed Graphical Probabilistic Models: the sequel
CS 188: Artificial Intelligence Fall 2008
Class #16 – Tuesday, October 26
CS 188: Artificial Intelligence Spring 2007
Read R&N Ch Next lecture: Read R&N
CS 188: Artificial Intelligence Spring 2006
Read R&N Ch Next lecture: Read R&N
CS 188: Artificial Intelligence Fall 2008
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9 11/20/2018 Today’s Topics Bayesian Networks (BNs) used a lot in medical diagnosis M-estimates Searching for Good BNs Markov Blanket what is conditionally independent in a BN Next: Artificial Neural Networks (ANNs) Read Section 18.7 and Section 18.9 of text, skim Sec 18.6 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9 Some Exam Solutions Gain(F1) = 0.57, Gain(F2) = 0.42 ID3({ex4}, {F2}, +) ID3({ex1, ex3}, {F2}, +) ID3({ex2, ex5}, {F2}, +) S1, A, D, F, FAIL S1, A, D, S2, C, E, G1 Regrade requests must be in writing – indicate request on FRONT PAGE (I reserve right to regrade entire exam) 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9 Some Exam Solutions P(!A,B,!C) / [ P(!A,!B,!C) + P(!A, B,!C) ] P(A,!B,!C) + P(A,B,!C) + P(!A,!B,C) + P(A,!B,C) - P(A,!B,!C,C) -9 node can be skipped by alpha-beta -(1/8) log2(1/8) - (1/4) log2(1/4) -(1/2) log2(1/2) - (1/8) log2(1/4) = (-1/8)(-3) - (1/4)(-2) - (1/2)(-1) - (1/8)(-3) = 1.75 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Approximating the Full Joint Prob Table Bayesian Networks are one way to ‘compactly’ represent a full joint prob table A Bayes net can fill every cell in a full joint prob table, but is (usually) much smaller The trick? All the cells are no longer independent An analogy: we could have (a) a big table that holds all products of two 32-bit int’s or (b) a more compact way to compute ‘cells’ in this table when needed 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Bayesian Networks (BNs) BNs are directed, acyclic graphs Each random variable is a node Arcs indicate direct dependence P(A1 ˄ A2 ˄ A1 ˄ … ˄ An) =  P(Ai | immediate parents of Ai) 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9 Example P(B) = 0.4 P(A) = 0.7 (implicit: P(¬A) = 0.3)) A B P(D) = 0.2 A B P(C | A = ? ˄ B= ?) F 0.9 T 0.3 0.6 0.1 D C E C D P(E | C = ? ˄ D= ?) F 0.8 T 0.7 0.4 0.6 What are P(A ˄ B ˄ C ˄ D ˄ E) = P(A ˄ ¬B ˄ C ˄ D ˄ ¬E) = Have several SMALL tables rather than one BIG table - they are called CONDITIONAL PROBABILITY TABLES (CPTs) 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Solutions for Prev Page P(A ˄ B ˄ C ˄ D ˄ E) = P(A)  P(B)  P(C | A ˄ B)  P(D)  P(E | C ˄ D) = 0.7  0.4  0.1  0.2  0.6 P(A ˄ ¬B ˄ C ˄ D ˄ ¬E) = = P(A)  P(¬B)  P(C | A ˄ ¬B)  P(D)  P(¬E | C ˄ D) = P(A)  (1 - P(B))  P(C | A ˄ ¬B)  P(D)  (1 - P(E | C ˄ D)) = 0.7  (1 – 0.4)  0.6  0.2  (1 – 0.6 ) 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Using BNs to Answer ‘Partial’ Queries Black Box (full joint? BN?) Complete world state Prob(complete world state) We still do the following from the prev lec to convert partial world-state queries to full world-state queries But instead of doing a LOOKUP in a table, we use the BN to calc the prob of a complete world state Basic idea (repeated from previous lecture) create probs that only involve AND (NEGATED single vars OK) “AND in” the remaining vars in all possible (conjunctive) ways look up fully specified ‘world states’ do the arithmetic 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Filling the BN’s Tables - look at data and simply count! Data Set Over60 Smokes Had_HeartAttack F T Over60 Smokes P(Had_HeartAttack | Over60 = ? ˄ Smokes = ?) F T 1/3 2/4 2/3 1/2 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9 P(B) = 0.4 P(A) = 0.7 (implicit: P(¬A) = 0.3)) A B P(D) = 0.2 A B P(C | A = ? ˄ B= ?) F 0.9 T 0.3 0.6 0.1 D C E C D P(E | C = ? ˄ D= ?) F 0.8 T 0.7 0.4 0.6 We have FIVE Boolean-valued random vars, so full joint table would have 25 -1 = 31 (‘-1’ since sum of probs equals 1) independent numbers In this BN we have 11 independent numbers (difference more striking on large tasks) We can use the BN to fill the joint table with non-zero values, but likely are approximating the ‘true’ full joint table (approx’ing might lead to better generalization if we don’t have a lot of data) 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Filling the Full Joint Table from a BN CPTs not shown, but need to be there Season SprinklerOn? Raining? GrassWet? Slippery? Prob Spring F T etc … Successively fill each row by applying the BN formula using that row’s values for all the random variables 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9 M-Estimates What if NO Data for some Cell in a CPT? Assuming prob=0 has Major Impact Zero very different from ‘very small prob’ Soln: assume we have m examples for each possible value of each random variable Often we use m=1 (called `Laplace Smoothing’, en.wikipedia.org/wiki/Additive_smoothing) Pierre-Simon Laplace 1749-1827 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Example of M-estimates Assume size has values S, M, L Assume we have 3 Small, 2 Medium, and 0 Large ex’s Use m=1, so ‘imagine’ 1 S, 1 M, and 1 L additional, ‘pseudo’ examples P(size = S) = (3 + 1) / (5 + 3) = 0.500 P(size = M) = (2 + 1) / (5 + 3) = 0.375 P(size = L) = (0 + 1) / (5 + 3) = 0.125 Programming trick: start all NUMERATOR counters at m rather than 0 (then sum numerators to set initial DENOMINATOR) Count of ACTUAL examples Count of PSEUDO examples Aside: could also do this in the d-tree calculations! 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

From Where Does the BN’s GRAPH Come? Knowledge Provided by Domain Expert Or via AI SEARCH! State: a BN using all the random vars Start: NO arcs Action: add directed arc if no cycle created Heuristic: accuracy on train set plus penalty proportional to network size (to reduce over fitting) 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Searching for a Good BN Structure Add Arc . . . Computationally expensive, so often specialized algorithms used, eg TAN (Tree-Augmented Naïve Bayes), which allows at most two parents and finds optimal such network in polynomial time 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Terminology: Bayes Nets and ML Parameter (aka Weight) Learning Fill the conditional probability tables (CPTs) stored at each node (look at training data and count things) Structure Learning (‘learn the graph’) SKIP if expert provides structure Can cast as an AI search task Add/subtract nodes (broader search than on earlier slide) Add/subtract arcs (graph must be acyclic) Change direction of arc Need to score candidate graphs Usually need to penalize graph complexity/size (we’ll see analogs later in SVMs and neural networks) 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

The Full Joint Table as a BN Usually a BN approximates the full joint table But the BN notation can also exactly rep it P(A=?  B=?  C=?  D=?) // For clarity, we’ll only use 4 random vars  P(A  B  C | D)  P(D) // Dropping =? for clarity  P(A  B | C  D)  P(C | D) x P(D)  P(A | B  C  D)  P(B | C  D)  P(C | D) x P(D) D A B C This is a ‘fully connected’ acyclic graph 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

A Major Weakness of BN’s If many ‘hidden’ random vars (N binary vars, say), then the marginalization formula leads to many calls to a BN (2N in our example; for N = 20, 2N = 1,048,576) Using uniform-random sampling to estimate the result is too inaccurate since most of the probability might be concentrated in only a few ‘complete world states’ Hence, much research (beyond cs540’s scope) on scaling up inference in BNs and other graphical models, eg via more sophisticated sampling (eg, MCMC) 10/17/16 AmFam- Fall 2016(Shavlik©), Lecture 11, Week 7

cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9 Markov Blanket Andrei Andreyevich Markov 1856-1922 If the random variables in a node’s Markov Blanket are set, then that node is conditionally independent of all other nodes in a BN Markov Blanket Parent nodes (this is one is obvious) Children nodes (less obvious) Children’s other parent nodes (much less obvious) 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Markov Blanket Illustrated P1 PN … X CPM,1 CP1,k … … CPM,k CP1,1 C1 CM … We can compute P(X | P’s, C’s, and CP’s) regardless of settings for the rest of the BN 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

More Markov-Blanket Experience Knowing GasInTank=0 can ‘explain away’ why GasGauge=empty If we know value of EngineCranks, whether or not the car Starts tells us nothing more about BatteryPower “Leak” nodes often used to model “all other causes” 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9