A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variances Jing Xiang & Seyoung Kim Bayesian Network Structure Learning X 1...

Slides:

Advertisements

Similar presentations

Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.

Advertisements

Informed search strategies

Artificial Intelligence Presentation

An Introduction to Artificial Intelligence

Greedy best-first search Use the heuristic function to rank the nodes Search strategy –Expand node with lowest h-value Greedily trying to find the least-cost.

Informed Search Methods Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 4 Spring 2004.

1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.

Solving Problem by Searching

1 Heuristic Search Chapter 4. 2 Outline Heuristic function Greedy Best-first search Admissible heuristic and A* Properties of A* Algorithm IDA*

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

Dynamic Bayesian Networks (DBNs)

Experiments We measured the times(s) and number of expanded nodes to previous heuristic using BFBnB. Dynamic Programming Intuition. All DAGs must have.

Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.

Minimum Redundancy and Maximum Relevance Feature Selection

1/21 Finding Optimal Bayesian Network Structures with Constraints Learned from Data 1 City University of New York 2 University of Helsinki Xiannian Fan.

Graduate Center/City University of New York University of Helsinki FINDING OPTIMAL BAYESIAN NETWORK STRUCTURES WITH CONSTRAINTS LEARNED FROM DATA Xiannian.

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

Concurrent Markov Decision Processes Mausam, Daniel S. Weld University of Washington Seattle.

Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

Distributed Planning in Hierarchical Factored MDPs Carlos Guestrin Stanford University Geoffrey Gordon Carnegie Mellon University.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Shuchi Chawla, Carnegie Mellon University Static Optimality and Dynamic Search Optimality in Lists and Trees Avrim Blum Shuchi Chawla Adam Kalai 1/6/2002.

1 Efficient planning of informative paths for multiple robots Amarjeet Singh *, Andreas Krause +, Carlos Guestrin +, William J. Kaiser *, Maxim Batalin.

CSC344: AI for Games Lecture 4: Informed search

Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.

Informed Search Idea: be smart about what paths to try.

Using Abstraction to Speed Up Search Robert Holte University of Ottawa.

Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.

Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.

Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul 11 January 2010.

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Informed (Heuristic) Search

Informed search algorithms

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

On Learning Parsimonious Models for Extracting Consumer Opinions International Conference on System Sciences 2005 Xue Bai and Rema Padman The John Heinz.

Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.

Informed search strategies Idea: give the algorithm “hints” about the desirability of different states – Use an evaluation function to rank nodes and select.

Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.

COMP261 Lecture 7 A* Search. A* search Can we do better than Dijkstra's algorithm? Yes! –want to explore more promising paths, not just shortest so far.

Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.

Slides for “Data Mining” by I. H. Witten and E. Frank.

Lecture 2: Statistical learning primer for biologists

CpSc 881: Machine Learning

Heuristic Search Foundations of Artificial Intelligence.

1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:

Chapter 3.5 and 3.6 Heuristic Search Continued. Review:Learning Objectives Heuristic search strategies –Best-first search –A* algorithm Heuristic functions.

Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

Artificial Intelligence Lecture No. 8 Dr. Asad Ali Safi Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.

Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.

Chapter 3.5 Heuristic Search. Learning Objectives Heuristic search strategies –Best-first search –A* algorithm Heuristic functions.

Artificial Intelligence Problem solving by searching CSC 361

Discussion on Greedy Search and A*

Discussion on Greedy Search and A*

CS 4100 Artificial Intelligence

COMP 8620 Advanced Topics in AI

Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,

Informed search algorithms

Informed search algorithms

Artificial Intelligence

Informed Search Idea: be smart about what paths to try.

Major Design Strategies

Artificial Intelligence

Informed Search Idea: be smart about what paths to try.

Major Design Strategies

Presentation transcript:

A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variances Jing Xiang & Seyoung Kim Bayesian Network Structure Learning X 1... X 5 Sample 1 Sample 2 Sample n … We observe… A Bayesian network for continuous variables is defined over DAG G, which has V nodes, where V = {X 1, …, X |V| }. The probability model factorizes as below. Recovery of V-structures Recovery of Skeleton Prediction Error for Benchmark Networks Prediction Error for S&P Stock Price Data Dynamic Programming (DP) with Lasso Learning Bayes net + DAG constraint = learning optimal ordering. Given ordering, Pa(X j ) = variables that precede it in ordering. DP must visit 2 |V| states! ≠ ≠ Example of A* Search with an Admissible and Consistent Heuristic DP is not practical for >20 nodes. Need to prune search space, use A* search! S 3= {X 3 }S 2= {X 2 } S 0= {} S 1= {X 1 } S 7= {X 1,X 2,X 3 } S 6= {X 2,X 3 }S 5= {X 1,X 3 }S 4= {X 1,X 2 } h(S 1 ) = 4 h(S 2 ) = 5 h(S 3 ) = 10 h(S 4 ) = 9 h(S 5 ) = 5 h(S 6 ) = 6 Queue {S 0,S 1 }: f = 1+4= 5 {S 0,S 2 }: f = 2+5= 7 {S 0,S 3 }: f = 3+10= 13 Queue {S 0,S 2 }: f = 2+5= 7 {S 0,S 1,S 5 }: f = (1+4)+5= 10 {S 0,S 3 }: f = 3+10= 13 {S 0,S 1,S 4 }: f = (1+5)+9= 15 Queue {S 0,S 1,S 5 }: f = (1+4)+5= 10 {S 0,S 3 }: f = 3+10= 13 {S 0,S 2,S 6 }: f = (2+5)+6= 13 {S 0,S 1,S 4 }: f = (1+5)+9= 15 {S 0,S 2,S 4 }: f = (2+6)+9= 17 Queue {S 0,S 1,S 5,S 7 }: f = (1+4)+7= 12 {S 0,S 3 }: f = 3+10= 13 {S 0,S 2,S 6 }: f = (2+5)+6= 13 {S 0,S 1,S 4 }: f = (1+5)+9= 15 S7S7 S6S6 S5S5 S4S4 S1S1 S2S2 S3S3 S0S0 Expand S 0 Expand S 1 Expand S 2 Expand S 5 S7S7 S6S6 S5S5 S4S4 S1S1 S2S2 S3S3 S0S0 S7S7 S6S6 S5S5 S4S4 S1S1 S2S2 S3S3 S0S0 S7S7 S6S6 S5S5 S4S4 S1S1 S2S2 S3S3 S0S0 Goal Reached! A* Lasso for Pruning the Search Space Construct ordering by decomposing the problem with DP. Comparing Computation Time of Different Methods Consistency! Improving Scalability We do NOT naively limit the queue. This would reduce quality of solutions dramatically! Best intermediate results occupy shallow part of the search space, so we distribute results to be discarded across different depths. To discard k results, given depth |V|, we discard k/|V| intermediate results at each depth. Daily stock price data of 125 S&P companies over 1500 time points (1/3/07-12/17/12). Estimated Bayes net using the first 1000 time points, then computed prediction errors on 500 time points. 1.Huang et al. A sparse structure learning algorithm for Gaussian Bayesian network identification from high-dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(6), Schmidt et al. Learning graphical model structure using L1-regularization paths. In Proceedings of AAAI, volume 22, Singh and Moore. Finding optimal Bayesian networks by dynamic programming. Technical Report , School of Computer Science, Carnegie Mellon University, Yuan et al. Learning optimal Bayesian networks using A* search. In Proceedings of AAAI, References Conclusions X1X1 X2X2 X3X3 X4X4 X5X5 X1X1 X2X2 X3X3 X4X4 X5X5 X1X1 X2X2 X3X3 X4X4 X5X5 Stage 1: Parent Selection Stage 2: Search for DAG Single stage combined Parent Selection + DAG Search e.g. L1MB, DP + A* for discrete variables [2,3,4] e.g. SBN [1] Method1-StageOptimalAllows Sparse Parent Set Computational Time DP [3]NoYesNoExp. A* [4]NoYesNo≤ Exp. L1MB [2]No YesFast SBN [1]YesNoYesFast DP LassoYes Exp A* LassoYes ≤ Exp. A* Lasso + QlimitYesNoYesFast Linear Regression Model: Bayesian Network Model Optimization Problem for Learning We address the problem of learning a sparse Bayes net structure for continuous variables in high-D space. 1.Present single stage methods A* lasso and Dynamic Programming (DP) lasso. 2.A* lasso and DP lasso both guarantee optimality of the structure for continuous variables. 3.A* lasso has huge speed-up over DP lasso! It improves on the exponential time required by DP lasso, and previous optimal methods for discrete variables. Contributions Finding optimal ordering = finding shortest path from start state to goal state DP must consider ALL possible paths in search space. Find optimal score for nodes excluding X j Find optimal score for first node X j Find optimal score for first node X j Cost incurred so far. g(S k ) only = Greedy Fast but suboptimal LassoScore from start state to S k. Cost incurred so far. g(S k ) only = Greedy Fast but suboptimal LassoScore from start state to S k. Heuristic Admissible Consistent + + h(S k ) is always an underestimate of the true cost to the goal. h(S k ) always satisfies A* guaranteed to find the optimal solution. First path to a state is guaranteed to be the shortest path, thus we can prune other paths. Proposed A* lasso for Bayes net structure learning with continuous variables, this guarantees optimality + reduces computational time compared to the previous optimal algorithm DP. Also presented heuristic scheme that further improves speed but does not significantly sacrifice the quality of solution. Efficient + Optimal! = Estimate of future cost Heuristic estimate of cost to reach goal from S k Estimate of future LassoScore from S k to goal state (ignores DAG constraint).