Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 15, 2000.

Slides:

Advertisements

Similar presentations

A Tutorial on Learning with Bayesian Networks

Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Bayesian Network and Influence Diagram A Guide to Construction And Analysis.

1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.

IMPORTANCE SAMPLING ALGORITHM FOR BAYESIAN NETWORKS

Introduction of Probabilistic Reasoning and Bayesian Networks

Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

Data Mining Techniques Outline

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

Learning with Bayesian Networks David Heckerman Presented by Colin Rickert.

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

Part 2 of 3: Bayesian Network and Dynamic Bayesian Network.

Haimonti Dutta, Department Of Computer And Information Science1 David HeckerMann A Tutorial On Learning With Bayesian Networks.

. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.

5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, March 6, 2000 William.

Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.

Learning Bayesian Networks (From David Heckerman’s tutorial)

1 Learning with Bayesian Networks Author: David Heckerman Presented by Yan Zhang April

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Computing & Information Sciences Kansas State University Lecture 28 of 42 CIS 530 / 730 Artificial Intelligence Lecture 28 of 42 William H. Hsu Department.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 1, 2000 William.

Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 26 of 41 Friday, 22 October.

Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, March 27, 2000 William.

Naive Bayes Classifier

1 CS 391L: Machine Learning: Bayesian Learning: Beyond Naïve Bayes Raymond J. Mooney University of Texas at Austin.

Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Tuesday 08 October 2002 William.

Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery.

1 Bayesian Param. Learning Bayesian Structure Learning Graphical Models – Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings:

Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Thursday, October 7, 1999 William.

Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.

Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.

Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 22 of 42 Wednesday, 22 October.

Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 28 of 41 Friday, 22 October.

Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 29 October 2004 William.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Friday, March 10, 2000 William.

Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 of 41 Monday, 25 October.

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.

Exploiting Structure in Probability Distributions Irit Gat-Viks Based on presentation and lecture notes of Nir Friedman, Hebrew University.

1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:

Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.

Computing & Information Sciences Kansas State University Friday, 27 Oct 2006CIS 490 / 730: Artificial Intelligence Lecture 27 of 42 Friday, 27 October.

Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center

Quiz 3: Mean: 9.2 Median: 9.75 Go over problem 1.

1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,

Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Monday, 08 November 2004 William.

04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.

Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 24 of 41 Monday, 18 October.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 8, 2000 Jincheng.

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.

Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.

Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.

Qian Liu CSE spring University of Pennsylvania

Irina Rish IBM T.J.Watson Research Center

Data Mining Lecture 11.

CSCI 5822 Probabilistic Models of Human and Machine Learning

Bayesian Networks: Motivation

Bayesian Statistics and Belief Networks

Chapter 14 February 26, 2004.

Presentation transcript:

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 15, 2000 William H. Hsu Department of Computing and Information Sciences, KSU Readings: “The Lumière Project: Inferring the Goals and Needs of Software Users”, Horvitz et al (Reference) Chapter 15, Russell and Norvig Uncertain Reasoning Discussion (3 of 4): Bayesian Network Applications Lecture 25

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Lecture Outline Readings –Chapter 15, Mitchell –References: Pearl and Verma; tutorials (Heckerman, Friedman and Goldszmidt) More Bayesian Belief Networks (BBNs) –Inference: applying CPTs –Learning: CPTs from data, elicitation –In-class demo: Hugin (CPT elicitation, application) Learning BBN Structure –K2 algorithm –Other probabilistic scores and search algorithms Causal Discovery: Learning Causality from Observations Next Class: Last BBN Presentation (Yue Jiao: Causality) After Spring Break –KDD –Genetic Algorithms (GAs) / Programming (GP)

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Bayesian Networks: Quick Review X1X1 X2X2 X3X3 X4X4 Season: Spring Summer Fall Winter Sprinkler: On, Off Rain: None, Drizzle, Steady, Downpour Ground-Moisture: Wet, Dry X5X5 Ground-Slipperiness: Slippery, Not-Slippery P(Summer, Off, Drizzle, Wet, Not-Slippery) = P(S) · P(O | S) · P(D | S) · P(W | O, D) · P(N | W) Recall: Conditional Independence (CI) Assumptions Bayesian Network: Digraph Model –Vertices (nodes): denote events (each a random variable) –Edges (arcs, links): denote conditional dependencies Chain Rule for (Exact) Inference in BBNs –Arbitrary Bayesian networks: NP-complete –Polytrees: linear time Example (“Sprinkler” BBN) MAP, ML Estimation over BBNs

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Learning Distributions in BBNs: Quick Review Learning Distributions –Shortcomings of Naïve Bayes –Making judicious CI assumptions –Scaling up to BBNs: need to learn a CPT for all parent sets –Goal: generalization Given D (e.g., {1011, 1001, 0100}) Would like to know P(schema): e.g., P(11**)  P(x 1 = 1, x 2 = 1) Variants –Known or unknown structure –Training examples may have missing values Gradient Learning Algorithm –Weight update rule –Learns CPTs given data points D

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Learning Structure Problem Definition –Given: data D (tuples or vectors containing observed values of variables) –Return: directed graph (V, E) expressing target CPTs (or commitment to acquire) Benefits –Efficient learning: more accurate models with less data - P(A), P(B) vs. P(A, B) –Discover structural properties of the domain (causal relationships) Acccurate Structure Learning: Issues –Superfluous arcs: more parameters to fit; wrong assumptions about causality –Missing arcs: cannot compensate using CPT learning; ignorance about causality Solution Approaches –Constraint-based: enforce consistency of network with observations –Score-based: optimize degree of match between network and observations Overview: Tutorials –[Friedman and Goldszmidt, 1998] –[Heckerman, 1999]

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Constraint-Based –Perform tests of conditional independence –Search for network consistent with observed dependencies (or lack thereof) –Intuitive; closely follows definition of BBNs –Separates construction from form of CI tests –Sensitive to errors in individual tests Score-Based –Define scoring function (aka score) that evaluates how well (in)dependencies in a structure match observations –Search for structure that maximizes score –Statistically and information theoretically motivated –Can make compromises Common Properties –Soundness: with sufficient data and computation, both learn correct structure –Both learn structure from observations and can incorporate knowledge Learning Structure: Constraints Versus Scores

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Learning Structure: Maximum Weight Spanning Tree (Chow-Liu) Algorithm Learn-Tree-Structure-I (D) –Estimate P(x) and P(x, y) for all single RVs, pairs; I(X; Y) = D(P(X, Y) || P(X) · P(Y)) –Build complete undirected graph: variables as vertices, I(X; Y) as edge weights –T  Build-MWST (V  V, Weights)// Chow-Liu algorithm: weight function  I –Set directional flow on T and place the CPTs on its edges (gradient learning) –RETURN: tree-structured BBN with CPT values Algorithm Build-MWST-Kruskal (E  V  V, Weights: E  R + ) –H  Build-Heap (E, Weights)// aka priority queue  (|E|) –E’  Ø; Forest  {{v} | v  V}// E’: set; Forest: union-find  (|V|) –WHILE Forest.Size > 1 DO  (|E|) e  H.Delete-Max()// e  new edge from H  (lg |E|) IF ((T S  Forest.Find(e.Start))  (T E  Forest.Find(e.End))) THEN  (lg * |E|) E’.Union(e)// append edge e; E’.Size++  (1) Forest.Union (T S, T E )// Forest.Size--  (1) –RETURN E’  (1) Running Time:  (|E| lg |E|) =  (|V| 2 lg |V| 2 ) =  (|V| 2 lg |V|) =  (n 2 lg n)

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence General-Case BBN Structure Learning: Use Inference to Compute Scores Recall: Bayesian Inference aka Bayesian Reasoning –Assumption: h  H are mutually exclusive and exhaustive –Optimal strategy: combine predictions of hypotheses in proportion to likelihood Compute conditional probability of hypothesis h given observed data D i.e., compute expectation over unknown h for unseen cases Let h  structure, parameters   CPTs Scores for Learning Structure: The Role of Inference Posterior ScoreMarginal Likelihood Prior over StructuresLikelihood Prior over Parameters

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Likelihood L(  : D) –Definition: L(  : D)  P(D |  ) =  x  D P(x |  ) –General BBN (i.i.d data x): L(  : D)   x  D  i P(x i | Parents(x i ) ~  ) =  i L(  i : D) NB:  specifies CPTs for Parents(x i ) Likelihood decomposes according to the structure of the BBN Estimating Prior over Parameters: P(  | D)  P(D) · P(D |  )  P(D) · L(  : D) –Example: Sprinkler Scenarios D = {(Season(i), Sprinkler(i), Rain(i), Moisture(i), Slipperiness(i))} P(Su, Off, Dr, Wet, NS) = P(S) · P(O | S) · P(D | S) · P(W | O, D) · P(N | W) –MLE for multinomial distribution (e.g., {Spring, Summer, Fall, Winter}): –Likelihood for multinomials –Binomial case: N 1 = # heads, N 2 = # tails (“frequency is ML estimator”) Scores for Learning Structure: Prior over Parameters

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Learning Structure: K2 Algorithm and ALARM Algorithm Learn-BBN-Structure-K2 (D, Max-Parents) FOR i  1 to n DO// arbitrary ordering of variables {x 1, x 2, …, x n } WHILE (Parents[x i ].Size < Max-Parents) DO// find best candidate parent Best  argmax j>i (P(D | x j  Parents[x i ])// max Dirichlet score IF (Parents[x i ] + Best).Score > Parents[x i ].Score) THEN Parents[x i ] += Best RETURN ({Parents[x i ] | i  {1, 2, …, n}}) A Logical Alarm Reduction Mechanism [Beinlich et al, 1989] –BBN model for patient monitoring in surgical anesthesia –Vertices (37): findings (e.g., esophageal intubation), intermediates, observables –K2: found BBN different in only 1 edge from gold standard (elicited from expert)

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Learning Structure: State Space Search and Causal Discovery Learning Structure: Beyond Trees –Problem not as easy for more complex networks Example: allow two parents (even singly-connected case, aka polytree) Greedy algorithms no longer guaranteed to find optimal network In fact, no efficient algorithm exists –Theorem: finding network structure with maximal score, where H restricted to BBNs with at most k parents for each variable, is NP-hard for k > 1 Heuristic (Score-Based) Search of Hypothesis Space H –Define H: elements denote possible structures, adjacency relation denotes transformation (e.g., arc addition, deletion, reversal) –Traverse this space looking for high-scoring structures –Algorithms: greedy hill-climbing, best-first search, simulated annealing Causal Discovery: Inferring Existence, Direction of Causal Relationships –Want: “No unexplained correlations; no accidental independencies” (cause  CI) –Can discover causality from observational data alone? –What is causality anyway?

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence In-Class Exercise: Hugin Demo Hugin –Commercial product for BBN inference: –First developed at University of Aalborg, Denmark Applications –Popular research tool for inference and learning –Used for real-world decision support applications Safety and risk evaluation: Diagnosis and control in unmanned subs: Customer support automation: Capabilities –Lauritzen-Spiegelhalter algorithm for inference (clustering aka clique reduction) –Object Oriented Bayesian Networks (OOBNs): structured learning and inference –Influence diagrams for decision-theoretic inference (utility + probability) –See:

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence In-Class Exercise: Hugin and CPT Elicitation Hugin Tutorials –Introduction: causal reasoning for diagnosis in decision support (toy problem) Example domain: explaining low yield (drought versus disease) –Tutorial 1: constructing a simple BBN in Hugin Eliciting CPTs (or collecting from data) and entering them –Tutorial 2: constructing a simple influence diagram (decision network) in Hugin Eliciting utilities (or collecting from data) and entering them Other Important BBN Resources –Microsoft Bayesian Networks: –XML BN (Interchange Format): –BBN Repository (more data sets)

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence In-Class Exercise: Bayesian Knowledge Discoverer (BKD) Demo Bayesian Knowledge Discoverer (BKD) –Research product for BBN structure learning: –Bayesian Knowledge Discovery Project [Ramoni and Sebastiani, 1997] Knowledge Media Institute (KMI), Open University, United Kingdom Closed source, beta freely available for educational use –Handles missing data –Uses Branch and Collapse: Dirichlet score-based BOC approximation algorithm Sister Product: Robust Bayesian Classifier (RoC) –Research product for BBN-based classification with missing data –Uses Robust Bayesian Estimator, a deterministic approximation algorithm

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Learning Structure: Conclusions Key Issues –Finding a criterion for inclusion or exclusion of an edge in the BBN –Each edge “Slice” (axis) of a CPT or a commitment to acquire one Positive statement of conditional dependency Other Techniques –Focus today: constructive (score-based) view of BBN structure learning –Other score-based algorithms Heuristic search over space of addition, deletion, reversal operations Other criteria (information theoretic, coding theoretic) –Constraint-based algorithms: incorporating knowledge into causal discovery Augmented Techniques –Model averaging: optimal Bayesian inference (integrate over structures) –Hybrid BBN/DT models: use a decision tree to record P(x | Parents(x)) Other Structures: e.g., Belief Propagation with Cycles

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Advanced Topics (Suggested Projects) –Continuous variables and hybrid (discrete/continuous) BBNs –Induction of hidden variables –Local structure: localized constraints and assumptions, e.g., Noisy-OR BBNs –Online learning and incrementality (aka lifelong, situated, in vivo learning): ability to change network structure during inferential process –Hybrid quantitative and qualitative inference (“simulation”) Other Topics (Beyond Scope of CIS 830 / 864) –Structural EM –Polytree structure learning (tree decomposition): alternatives to Chow-Liu MWST –Complexity of learning, inference in restricted classes of BBNs –BBN structure learning tools: combining elicitation and learning from data Turn to A Partner Exercise –How might the Lumière methodology be incorporated into a web search agent? –Discuss briefly (3 minutes) Continuing Research and Discussion Issues

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Terminology Bayesian Networks: Quick Review on Learning, Inference –Structure learning: determining the best topology for a graphical model from data Constraint-based methods Score-based methods: statistical or information-theoretic degree of match Both can be global or local, exact or approximate –Elicitation of subjective probabilities Causal Modeling –Causality: “direction” from cause to effect among events (observable or not) –Causal discovery: learning causality from observations Incomplete Data: Learning and Inference –Missing values: to be filled in given partial observations –Expectation-Maximization (EM): iterative refinement clustering algorithm Estimation step: use current parameters  to estimate missing {N i } Maximization (re-estimation) step: update  to maximize P(N i, E j | D)

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Summary Points Bayesian Networks: Quick Review on Learning, Inference –Learning, eliciting, applying CPTs –In-class exercise: Hugin demo; CPT elicitation, application –Learning BBN structure: constraint-based versus score-based approaches –K2, other scores and search algorithms Causal Modeling and Discovery: Learning Causality from Observations Incomplete Data: Learning and Inference (Expectation-Maximization) Tutorials on Bayesian Networks –Breese and Koller (AAAI ‘97, BBN intro): –Friedman and Goldszmidt (AAAI ‘98, Learning BBNs from Data): –Heckerman (various UAI/IJCAI/ICML , Learning BBNs from Data): Next Class: BBNs and Causality Later: UAI Concluded; KDD, Web Mining; GAs, Optimization