Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) KDD Group Research Seminar.

Slides:



Advertisements
Similar presentations
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.
Advertisements

Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
A Sampling Distribution
Bayesian Networks VISA Hyoungjune Yi. BN – Intro. Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert.
IMPORTANCE SAMPLING ALGORITHM FOR BAYESIAN NETWORKS
M.I. Jaime Alfonso Reyes ´Cortés.  The basic task for any probabilistic inference system is to compute the posterior probability distribution for a set.
HYPOTHESIS TESTING Four Steps Statistical Significance Outcomes Sampling Distributions.
Data Mining Techniques Outline
Bayesian Belief Networks
Evaluating Hypotheses
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
Announcements Homework 8 is out Final Contest (Optional)
Experimental Evaluation
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
Variance Fall 2003, Math 115B. Basic Idea Tables of values and graphs of the p.m.f.’s of the finite random variables, X and Y, are given in the sheet.
1 Psych 5500/6500 Statistics and Parameters Fall, 2008.
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
Bayesian networks Chapter 14. Outline Syntax Semantics.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 2 – Slide 1 of 25 Chapter 11 Section 2 Inference about Two Means: Independent.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Sample Size Determination CHAPTER Eleven.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 6 Sampling Distributions.
Many times in statistical analysis, we do not know the TRUE mean of a population of interest. This is why we use sampling to be able to generalize the.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 28 of 41 Friday, 22 October.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 1): Two-tail Tests & Confidence Intervals Fall, 2008.
Anthony J Greene1 Where We Left Off What is the probability of randomly selecting a sample of three individuals, all of whom have an I.Q. of 135 or more?
Made by: Maor Levy, Temple University  Inference in Bayes Nets ◦ What is the probability of getting a strong letter? ◦ We want to compute the.
Biostatistics Unit 5 – Samples. Sampling distributions Sampling distributions are important in the understanding of statistical inference. Probability.
CPSC 322, Lecture 28Slide 1 More on Construction and Compactness: Compact Conditional Distributions Once we have established the topology of a Bnet, we.
1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Section 10.1 Confidence Intervals
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
Inferential Statistics Part 1 Chapter 8 P
Formula-Free Geometry. Area and Volume Exact Geometry Easy! Pretty easy!
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
BNJ 2.03α Beginner Developer Tutorial Chris H. Meyer (revised by William H. Hsu) Kansas State University KDD Laboratory
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
K2 Algorithm Presentation KDD Lab, CIS Department, KSU
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman 1 Assumptions 1) Sample is large (n > 30) a) Central limit theorem applies b) Can.
CSE 473: Artificial Intelligence Autumn 2011 Bayesian Networks: Inference Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart.
Weng-Keen Wong, Oregon State University © Bayesian Networks: A Tutorial Weng-Keen Wong School of Electrical Engineering and Computer Science Oregon.
A fault tree – Based Bayesian network construction for the failure rate assessment of a complex system 46th ESReDA Seminar May 29-30, 2014, Politecnico.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Inference: Conclusion with Confidence
Presented By S.Yamuna AP/CSE
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Approximate Inference
Bayesian Networks: A Tutorial
Artificial Intelligence
CS 4/527: Artificial Intelligence
Prof. Carolina Ruiz Department of Computer Science
CAP 5636 – Advanced Artificial Intelligence
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Fall 2008
Chapter 11: Testing a Claim
Continuous Probability Distributions
CS 188: Artificial Intelligence Fall 2007
Probabilistic Reasoning
Prof. Carolina Ruiz Department of Computer Science
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) KDD Group Research Seminar Fall, 2001 – Presentation 2b of 11 Friday, 05 October 2001 Julie A. Stilson Reference Cheng, J. and Druzdzel, M (2000). “AIS-BN: An Adaptive Importance Sampling Algorithm for Evidential Reasoning in Large Bayesian Networks.” Journal of Artificial Intelligence Research, 13, Adaptive Importance Sampling on Bayesian Networks (AIS-BN)

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) Outline Basic Algorithm –Definitions –Updating importance function –Example using Sprinkler-Rain Why Adaptive Importance Sampling? –Heuristic initialization –Sampling with unlikely evidence Different Importance Sampling Algorithms –Forward Sampling (FS) –Logic Sampling (LS) –Self-Importance Sampling (SIS) –Differences between SIS, AIS-BN Gathering results –How RMSE values are collected –Sample results for FS, AIS-BN

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) Definitions Importance Conditional Probability Tables (ICPTs) – Probability tables that represent the learned importance function – Initially, equal to the CPTs – Updated after each updating interval (see below) Learning Rate – The rate at which the true importance function is being learned – Learning rate = a (b / a) ^ (k / kmax) – A = initial learning rate, b = learning rate in last step, k = number of updates that have been made, kmax = total number of updates that will be made Frequency Table – Stores the frequency with which each instantiation of each query node occurs – Used to update importance function Updating Interval – AIS-BN updates the importance function after this many samples – If 1000 total samples are to be taken, and the updating interval is 100, then 10 total updates will be made

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) k := number of updates so far, m := desired number of samples, l := updating interval for (int i = 1, i <= m, i++) { if (i mod l == 0) { k++; Update importance function Pr^k(X\E) based on total samples } generate a sample according to Pr^k(X\E), add to total samples totalweight += Pr(s,e) / Pr^k(s) } totalweight = 0; T = null; for (int i = 1; i <= m, i++) generate a sample according to Pr^kmax(X\E), add to total samples totalweight += Pr(s,e) / Pr^kmax(s) compute RMSE value of s using totalweight } Basic Algorithm

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) Updating Importance Function Theorem: Xi in X, Xi not in Anc(E) => Pr(Xi | Pa(Xi), E) = Pr(Xi | Pa(Xi)) –Proved using d-connectivity –Only ancestors of evidence nodes need to have their importance function learned –The ICPT tables of all other nodes do not change throughout sampling Algorithm for Updating Importance Function : Sample l points independently according to the current importance function, Pr^k(X\E) For every query node Xi that is an ancestor to evidence, estimate Pr’(xi | pa(Xi), e) based on the samples Update Pr^k(X\E) according to the following formula: Pr^(k+1)(xi | pa(Xi), e) = Pr^k(xi | pa(Xi), e) + LRate * (Pr’(xi | pa(Xi), e) – Pr^k(xi | pa(Xi), e)

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) Example Using Sprinkler-Rain C S R G Cloudy: Yes No Sprinkler: On, Off Rain: Yes, No Ground: Wet, Dry –Imagine Ground is evidence – instantiated to Wet –More probable that Sprinkler is on and that it is raining –ICPT tables update the probabilities of the ancestors to evidence nodes to reflect this CloudyClear.5 C OnOff Cloudy.1.9 Clear.5 C RainNo rain Cloudy.8.2 Clear.2.8 SR WetDry OnRain On No rain.9.1 OffRain.9.1 OffNo rain 01

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) Why Adaptive Importance Sampling? Heuristic Initialization: Parents to Evidence Nodes –Changes the probabilities of the parents to evidence to a uniform distribution when the probability of that evidence is sufficiently small –Parents of evidence nodes are most affected by the instantiation of evidence –Uniform distribution helps importance function be learned faster Heuristic Initialization: Extremely Small Probabilities –Extremely low probabilities would usually not be sampled much –Slow to learn true importance function –AIS-BN raises extremely low probabilities to a set threshold and lowers extremely high probabilities accordingly Sampling with Unlikely Evidence –Importance function very different from CPTs with unlikely evidence –Difficult to accurately sample without changing probability distributions –AIS-BN performs better than other sampling algorithms with unlikely evidence

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) Different Importance Sampling Algorithms Forward Sampling / Likelihood Weighting (FS) –Similar to AIS-BN, but importance function is not learned –Performs well under most circumstances –Doesn’t do well when evidence is unlikely Logic Sampling (LS) –Network is sampled randomly without regard to evidence –Samples that don’t match evidence are then discarded –Simplest importance sampling algorithm –Also performs poorly with unlikely evidence –Inefficient when many nodes are evidence Self-Importance Sampling (SIS) –Also updates an importance function –Does not obtain samples from learned importance function –Updates to importance function do not use sampling information –For large numbers of samples, performs worse than FS

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) Gathering Results Relative Root Mean Square Error : – P(  i ) is exact probability of sample – P ^ (  i ) is estimated probability of sample from frequency table – M:= arity, T:= number of samples RMSE Collection – Relative RMSE computed for each sample – Each RMSE value is stored in an output file: printings.txt Graphing Results – Open output file in Excel – Graph results using “Chart” Example Chart – ALARM network, samples – Compares FS, AIS-BN