Solving MAP Exactly by Searching on Compiled Arithmetic Circuits

Slides:

Advertisements

Similar presentations

A. Darwiche Knowledge Compilation Jinbo Huang NICTA and ANU Slides made by Adnan Darwiche and Jinbo Huang.

Advertisements

A. Darwiche Sensitivity Analysis in Bayesian Networks Adnan Darwiche Computer Science Department

Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.

A Graph-Partitioning-Based Approach for Multi-Layer Constrained Via Minimization Yih-Chih Chou and Youn-Long Lin Department of Computer Science, Tsing.

“Using Weighted MAX-SAT Engines to Solve MPE” -- by James D. Park Shuo (Olivia) Yang.

Bayesian Networks Bucket Elimination Algorithm 主講人：虞台文大同大學資工所智慧型多媒體研究室.

MPE, MAP AND APPROXIMATIONS Lecture 10: Statistical Methods in AI/ML Vibhav Gogate The University of Texas at Dallas Readings: AD Chapter 10.

Graduate Center/City University of New York University of Helsinki FINDING OPTIMAL BAYESIAN NETWORK STRUCTURES WITH CONSTRAINTS LEARNED FROM DATA Xiannian.

CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.

A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graph.

Dagstuhl 2010 University of Puerto Rico Computer Science Department The power of group algebras for constrained multilinear monomial detection Yiannis.

Genetic Algorithms for multiple resource constraints Production Scheduling with multiple levels of product structure By : Pupong Pongcharoen (Ph.D. Research.

Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?

. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.

A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graphs.

The Design and Analysis of Algorithms

1 Efficient Stochastic Local Search for MPE Solving Frank Hutter The University of British Columbia (UBC), Vancouver, Canada Joint work with Holger Hoos.

Fixed Parameter Complexity Algorithms and Networks.

Performing Bayesian Inference by Weighted Model Counting Tian Sang, Paul Beame, and Henry Kautz Department of Computer Science & Engineering University.

A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variances Jing Xiang & Seyoung Kim Bayesian Network Structure Learning X 1...

COSC 4426 Topics in Computer Science II Discrete Optimization Good results with problems that are too big for people or computers to solve completely

Department of Computer Science Undergraduate Events More

Introduction to Bayesian Networks

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

Mobile Agent Migration Problem Yingyue Xu. Energy efficiency requirement of sensor networks Mobile agent computing paradigm Data fusion, distributed processing.

Probabilistic Networks Chapter 14 of Dechter’s CP textbook Speaker: Daniel Geschwender April 1, 2013 April 1&3, 2013DanielG--Probabilistic Networks1.

Variable and Value Ordering for MPE Search Sajjad Siddiqi and Jinbo Huang.

Inference Algorithms for Bayes Networks

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.

© P. Pongcharoen CCSI/1 Scheduling Complex Products using Genetic Algorithms with Alternative Fitness Functions P. Pongcharoen, C. Hicks, P.M. Braiden.

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,

Introductory Lecture. What is Discrete Mathematics? Discrete mathematics is the part of mathematics devoted to the study of discrete (as opposed to continuous)

On the Relation Between Simulation-based and SAT-based Diagnosis CMPE 58Q Giray Kömürcü Boğaziçi University.

Optimization Problems

Compiling Bayesian Networks Using Variable Elimination

Reasoning Under Uncertainty: Belief Networks

Evolutionary Technique for Combinatorial Reverse Auctions

The Design and Analysis of Algorithms

Analysis of Algorithms

Boosted Augmented Naive Bayes. Efficient discriminative learning of

Inference in Bayesian Networks

School of Computer Science & Engineering

A Study of Group-Tree Matching in Large Scale Group Communications

A New Algorithm for Computing Upper Bounds for Functional EmajSAT

CIS 700 Advanced Machine Learning for NLP Inference Applications

Analysis and design of algorithm

Discrete Mathematics and Its Applications

Encoding CNFs to Enhance Component Analysis

Functional Treewidth:

CSCI 5822 Probabilistic Models of Human and Machine Learning

metaheuristic methods and their applications

HOW NATURE BEAT US TO THE SUPERCOMPUTER MARTIN HANZEL

Objective of This Course

GC 211:Data Structures Algorithm Analysis Tools

Optimization Problems

Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,

Metaheuristic methods and their applications. Optimization Problems Strategies for Solving NP-hard Optimization Problems What is a Metaheuristic Method?

Exact Inference ..

Programming and Data Structure

CLASSES P AND NP.

المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen

Introduction to Data Structures

Branch and Bound Searching Strategies

Variable Elimination Graphical Models – Carlos Guestrin

Fast Min-Register Retiming Through Binary Max-Flow

Discrete Mathematics and Its Applications

Reasoning Under Uncertainty: Variable elimination

Presentation transcript:

Solving MAP Exactly by Searching on Compiled Arithmetic Circuits Jinbo Huang Logic and Computation National ICT Australia Mark Chavira, Adnan Darwiche Computer Science Department University of California, Los Angeles

Outline Background and previous work New algorithm for exact MAP Experimental Results Conclusion and future work

Maximum A Posteriori Hypothesis (MAP) D1 D2 D3 … Dm F1 F2 … Fn

Maximum A Posteriori Hypothesis (MAP) D1 D2 D3 … Dm F1 F2 … Fn Evidence Variables E

Maximum A Posteriori Hypothesis (MAP) D1 D2 D3 … Dm F1 F2 … Fn Evidence Variables E MAP Variables M Sum Variables S

Special Case: Most Probable Explanation (MPE) D1 D2 D3 … Dm F1 F2 … Fn Evidence Variables E MAP Variables M Sum Variables S

Special Case: Probability of Evidence … Dm F1 F2 … Fn Evidence Variables E MAP Variables M Sum Variables S

Why is MAP Hard? p (e) = S f f(e) MPE (e) = maxM f f(e) MAP (M,e) = maxM S f f(e)

Why is MAP Hard? p (e) = S f f(e) MPE (e) = maxM f f(e) MAP (M,e) = maxM S f f(e)

Why is MAP Hard? p (e) = S f f(e) MPE (e) = maxM f f(e) MAP (M,e) = maxM S f f(e) Best orders may be ruled out!

Why is MAP Hard? Solving MPE and P(e) using VE is exponential in treewidth. Solving MAP using VE is exponential in constrained treewidth! Network TW Cons. TW blockmap-10-3 50 122 mastermind-3-8-3 21 33 students-3-12 53 55 grid-75-12-3 18 31 grid-75-14-3 84 grid-75-16-3 25 101 grid-90-14-3 22 92 grid-90-16-3 103 grid-90-22-3 36 115 grid-90-26-3 42 120 alarm 7 16 hailfinder 12 pigs

Approximate Algorithms Compute MPE and project answer onto MAP variables Genetic algorithm [de Campos et al. 1999] Hill climbing & taboo search [Park & Darwiche 2001] Simulated annealing [Yuan et al. 2004]

Exact MAP [Park & Darwiche 2003] (PD03) Search for solution by branch-and-bound Space exponential in treewidth only Can solve many problems where constrained TW is too large but treewidth is manageable!

New Algorithm for Exact MAP (Acemap) Continue to use B&B search New methods for: computing upper bound ordering variables seeding the search Key: compute upper bound using arithmetic circuit, which exploits local structure Resulting algorithm not necessarily limited by constrained treewidth or even treewidth!

MAP Search T t f C C t f t f B B B B t f t f t f t f tcb .009 tcb’ .079 tc’b .117 tc’b’ .053 t’cb .023 t’cb’ .051 t’c’b .031 t’c’b’ .055

MAP Search Best Score: 0 T .151 t f C C t f t f B B B B t f t f t f t tcb .009 tcb’ .079 tc’b .117 tc’b’ .053 t’cb .023 t’cb’ .051 t’c’b .031 t’c’b’ .055

MAP Search Best Score: 0 T .151 t f C .135 C t f t f B B B B t f t f t tcb .009 tcb’ .079 tc’b .117 tc’b’ .053 t’cb .023 t’cb’ .051 t’c’b .031 t’c’b’ .055

MAP Search Best Score: 0 T .151 t f C .135 C t f t f B .127 B B B t f tcb .009 tcb’ .079 tc’b .117 tc’b’ .053 t’cb .023 t’cb’ .051 t’c’b .031 t’c’b’ .055

MAP Search Best Score: .009 T .151 t f C .135 C t f t f B .127 B B B t tcb .009 tcb’ .079 tc’b .117 tc’b’ .053 t’cb .023 t’cb’ .051 t’c’b .031 t’c’b’ .055

MAP Search Best Score: .079 T .151 t f C .135 C t f t f B .127 B B B t tcb .009 tcb’ .079 tc’b .117 tc’b’ .053 t’cb .023 t’cb’ .051 t’c’b .031 t’c’b’ .055

MAP Search Best Score: .079 T .151 t f C .135 C t f t f B B .120 B B t tcb .009 tcb’ .079 tc’b .117 tc’b’ .053 t’cb .023 t’cb’ .051 t’c’b .031 t’c’b’ .055

MAP Search Best Score: .117 T .151 t f C .135 C t f t f B B .120 B B t tcb .009 tcb’ .079 tc’b .117 tc’b’ .053 t’cb .023 t’cb’ .051 t’c’b .031 t’c’b’ .055

MAP Search Best Score: .117 T .151 t f C .135 C t f t f B B .120 B B t tcb .009 tcb’ .079 tc’b .117 tc’b’ .053 t’cb .023 t’cb’ .051 t’c’b .031 t’c’b’ .055

MAP Search Best Score: .117 T .151 t f C C .101 t f t f B B B B t f t tcb .009 tcb’ .079 tc’b .117 tc’b’ .053 t’cb .023 t’cb’ .051 t’c’b .031 t’c’b’ .055

MAP Search Best Score: .117 T .151 t f C C .101 t f t f B B B B t f t tcb .009 tcb’ .079 tc’b .117 tc’b’ .053 t’cb .023 t’cb’ .051 t’c’b .031 t’c’b’ .055

Factoring a Multi-Linear Function P(e) = X Y Z λx1λy1λz1θx1θy1θz1|x1 y1 + λx1λy1λz2θx1θy1θz2|x1 y1 + λx1λy1λz3θx1θy1θz3|x1 y1 + λx1λy2λz1θx1θy2θz1|x1 y2 + λx1λy2λz2θx1θy2θz2|x1 y2 + λx1λy2λz3θx1θy2θz3|x1 y2 + λx2λy1λz1θx2θy1θz1|x2 y1 + λx2λy1λz2θx2θy1θz2|x2 y1 + λx2λy1λz3θx2θy1θz3|x2 y1 + λx2λy2λz1θx2θy2θz1|x2 y2 + λx2λy2λz2θx2θy2θz2|x2 y2 + λx2λy2λz3θx2θy2θz3|x2 y2

Factoring a Multi-Linear Function P(e) = X Y Z λx1λy1λz1θx1θy1θz1|x1 y1 + λx1λy1λz2θx1θy1θz2|x1 y1 + λx1λy1λz3θx1θy1θz3|x1 y1 + λx1λy2λz1θx1θy2θz1|x1 y2 + λx1λy2λz2θx1θy2θz2|x1 y2 + λx1λy2λz3θx1θy2θz3|x1 y2 + λx2λy1λz1θx2θy1θz1|x2 y1 + λx2λy1λz2θx2θy1θz2|x2 y1 + λx2λy1λz3θx2θy1θz3|x2 y1 + λx2λy2λz1θx2θy2θz1|x2 y2 + λx2λy2λz2θx2θy2θz2|x2 y2 + λx2λy2λz3θx2θy2θz3|x2 y2

Factoring a Multi-Linear Function P(e) = X Y Z λx1λy1λz1θx1θy1θz1|x1 y1 + λx1λy1λz2θx1θy1θz2|x1 y1 + λx1λy1λz3θx1θy1θz3|x1 y1 + λx1λy2λz1θx1θy2θz1|x1 y2 + λx1λy2λz2θx1θy2θz2|x1 y2 + λx1λy2λz3θx1θy2θz3|x1 y2 + λx2λy1λz1θx2θy1θz1|x2 y1 + λx2λy1λz2θx2θy1θz2|x2 y1 + λx2λy1λz3θx2θy1θz3|x2 y1 + λx2λy2λz1θx2θy2θz1|x2 y2 + λx2λy2λz2θx2θy2θz2|x2 y2 + λx2λy2λz3θx2θy2θz3|x2 y2 Compile with Ace

Circuit Evaluation for P (e) .3 + .3 0 * * + + 1 1 .3 .1 .9 .8 .2 0 * * * * * * .3 1 .1 1 .9 .8 1 .2 0 .7

Circuit Evaluation for MPE .27 m .27 0 * * .9 .8 m m .3 .1 .9 .8 .2 0 * * * * * * .3 1 .1 1 .9 .8 1 .2 0 .7

Circuit Evaluation for MAP + * * + + * * * * * *

Circuit Evaluation for MAP .7 m .3 .7 * * 1 1 + + .3 .1 .9 .8 .2 .7 * * * * * * .3 1 .1 1 .9 .8 1 .2 1 .7

Circuit Evaluation for MAP Correct Answer! .7 m .3 .7 * * 1 1 + + .3 .1 .9 .8 .2 .7 * * * * * * .3 1 .1 1 .9 .8 1 .2 1 .7

Circuit Evaluation for MAP .83 + .27 .56 * * .9 .8 m m .3 .1 .9 .8 .2 .7 * * * * * * .3 1 .1 1 .9 .8 1 .2 1 .7

Circuit Evaluation for MAP May not be correct, but is upper bound! .83 + .27 .56 * * .9 .8 m m .3 .1 .9 .8 .2 .7 * * * * * * .3 1 .1 1 .9 .8 1 .2 1 .7

Techniques Core Technique Other Techniques Compile network into AC once Evaluate AC at each node in search to compute bound Other Techniques Compute static variable ordering at beginning of search Seed the search with a good approximate solution

Networks for Experiments Relational Bayesian networks [Jaeger 1997] Many binary variables, small CPTs Large amounts of determinism, large treewidth Grid networks [Sang et al. 2005] Various degrees of determinism More standard benchmarks alarm, hailfinder, pathfinder, pigs, water Not necessarily much determinism

Results (Relational Nets) Network TW Const. TW PD03 Time (s) Acemap Time (s) Improv. blockmap-5-1 19 36 186.42 0.34 548.29 blockmap-5-2 37 306.25 0.60 510.42 blockmap-5-3 22 38 357.18 0.78 457.92 blockmap-10-1 51 122 - 14.56 blockmap-10-2 48 18.57 blockmap-10-3 50 24.56 mastermind-3-8-3 21 33 2088.19 46.15 45.25 students-3-2 23 29 212.79 0.37 575.11 students-3-6 40 45 14.47 students-3-12 53 55 496.26

Results (Relational Nets) Network TW Const. TW PD03 Time (s) Acemap Time (s) Improv. blockmap-5-1 19 36 186.42 0.34 548.29 blockmap-5-2 37 306.25 0.60 510.42 blockmap-5-3 22 38 357.18 0.78 457.92 blockmap-10-1 51 122 - 14.56 blockmap-10-2 48 18.57 blockmap-10-3 50 24.56 mastermind-3-8-3 21 33 2088.19 46.15 45.25 students-3-2 23 29 212.79 0.37 575.11 students-3-6 40 45 14.47 students-3-12 53 55 496.26

Results (Relational Nets) Network TW Const. TW PD03 Time (s) Acemap Time (s) Improv. blockmap-5-1 19 36 186.42 0.34 548.29 blockmap-5-2 37 306.25 0.60 510.42 blockmap-5-3 22 38 357.18 0.78 457.92 blockmap-10-1 51 122 - 14.56 blockmap-10-2 48 18.57 blockmap-10-3 50 24.56 mastermind-3-8-3 21 33 2088.19 46.15 45.25 students-3-2 23 29 212.79 0.37 575.11 students-3-6 40 45 14.47 students-3-12 53 55 496.26

Results (Grid Nets) Network TW Const. TW PD03 Time (s) Acemap Time (s) Improv. grid-50-12-3 18 27 1511.30 18500.95 0.08 grid-75-12-3 31 595.36 3.63 164.01 grid-75-14-3 21 84 - 33.61 grid-75-16-3 25 101 1536.21 grid-90-12-3 29 203.22 0.16 1270.13 grid-90-14-3 22 92 1422.71 0.33 grid-90-16-3 103 0.32 grid-90-18-3 28 106 9.56 grid-90-22-3 36 115 1.99 grid-90-26-3 42 120 369.94

Results (Grid Nets) Network TW Const. TW PD03 Time (s) Acemap Time (s) Improv. grid-50-12-3 18 27 1511.30 18500.95 0.08 grid-75-12-3 31 595.36 3.63 164.01 grid-75-14-3 21 84 - 33.61 grid-75-16-3 25 101 1536.21 grid-90-12-3 29 203.22 0.16 1270.13 grid-90-14-3 22 92 1422.71 0.33 grid-90-16-3 103 0.32 grid-90-18-3 28 106 9.56 grid-90-22-3 36 115 1.99 grid-90-26-3 42 120 369.94

Results (Other Nets) alarm 7 16 5.84 0.25 23.36 hailfinder 12 36 41.67 Network TW Const. TW PD03 Time (s) Acemap Time (s) Improv. alarm 7 16 5.84 0.25 23.36 hailfinder 12 36 41.67 681.93 0.06 pathfinder 15 22.97 8.83 2.60 pigs 33 12.90 7935.91 0.0016 tcc4f.obfuscated 10 7.67 1.10 6.97 water 21 324.13 8.41 38.54

Conclusion and Future Work New algorithm for exact MAP Significantly expands range of MAP problems that can be solved exactly Further improvements possible Efficient dynamic ordering Compiling with evidence Advances in compilation automatically carry over

A Few Applications Diagnosis Music Transcription Network Monitoring Given observations, what is the most likely state of the system? Music Transcription Given performance input (e.g., MIDI file), what is the most likely musical score? Network Monitoring Given the result of diagnostic trace routes, what is the most likely state of the network components? Machine Translation Given a sentence in one language, what is the most likely sentence in another?

Complexity Results MPE is effectively an optimization problem MPE is NP-complete (Shimony 94) Pr is effectively a counting problem Pr is PP-complete (Roth 96) MAP requires both optimization and counting MAP is NPPP-complete (Park 02)

Generating MAP Problems Ten problems per network Select MAP vars randomly from roots Select evidence vars randomly from leaves Set evidence randomly, ensure nonzero prob.