Approximation Techniques for Automated Reasoning Irina Rish IBM T.J.Watson Research Center Rina Dechter University of California, Irvine.

Slides:

Advertisements

Similar presentations

Verification/constraints workshop, 2006 From AND/OR Search to AND/OR BDDs Rina Dechter Information and Computer Science, UC-Irvine, and Radcliffe Institue.

Advertisements

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

ICS-271:Notes 5: 1 Lecture 5: Constraint Satisfaction Problems ICS 271 Fall 2008.

Join-graph based cost-shifting Alexander Ihler, Natalia Flerova, Rina Dechter and Lars Otten University of California Irvine Introduction Mini-Bucket Elimination.

MURI Progress Report, June 2001 Advances in Approximate and Hybrid Reasoning for Decision Making Under Uncertainty Rina Dechter UC- Irvine Collaborators:

Constraint Optimization Presentation by Nathan Stender Chapter 13 of Constraint Processing by Rina Dechter 3/25/20131Constraint Optimization.

Exact Inference in Bayes Nets

Dynamic Bayesian Networks (DBNs)

An Introduction to Variational Methods for Graphical Models.

Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.

Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.

1 Directional consistency Chapter 4 ICS-179 Spring 2010 ICS Graphical models.

Willis Lemasters Grant Conklin. Searching a tree recursively one branch at a time, abandoning any branch which does not satisfy the search constraints.

1 Exact Inference Algorithms Bucket-elimination and more COMPSCI 179, Spring 2010 Set 8: Rina Dechter (Reading: chapter 14, Russell and Norvig.

Anagh Lal Monday, April 14, Chapter 9 – Tree Decomposition Methods Anagh Lal CSCE Advanced Constraint Processing.

Recent Development on Elimination Ordering Group 1.

CPSC 322, Lecture 12Slide 1 CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12 (Textbook Chpt ) January, 29, 2010.

1 Directional consistency Chapter 4 ICS-275 Spring 2007.

Ryan Kinworthy 2/26/20031 Chapter 7- Local Search part 1 Ryan Kinworthy CSCE Advanced Constraint Processing.

1 Exact Inference Algorithms for Probabilistic Reasoning; COMPSCI 276 Fall 2007.

1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman.

. Bayesian Networks Lecture 9 Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.

1 Hybrid of search and inference: time- space tradeoffs chapter 10 ICS-275 Spring 2007.

Resolution versus Search: Two Strategies for SAT Brad Dunbar Shamik Roy Chowdhury.

SampleSearch: A scheme that searches for Consistent Samples Vibhav Gogate and Rina Dechter University of California, Irvine USA.

M. HardojoFriday, February 14, 2003 Directional Consistency Dechter, Chapter 4 1.Section 4.4: Width vs. Local Consistency Width-1 problems: DAC Width-2.

5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.

General search strategies: Look-ahead Chapter 5 Chapter 5.

1 Hybrid of search and inference: time- space tradeoffs chapter 10 ICS-275A Fall 2003.

Stochastic greedy local search Chapter 7 ICS-275 Spring 2007.

Knowledge Representation II (Inference in Propositional Logic) CSE 473 Continued…

Ryan Kinworthy 2/26/20031 Chapter 7- Local Search part 2 Ryan Kinworthy CSCE Advanced Constraint Processing.

AND/OR Search for Mixed Networks #CSP Robert Mateescu ICS280 Spring Current Topics in Graphical Models Professor Rina Dechter.

ICS-270A:Notes 6: 1 Notes 6: Constraint Satisfaction Problems ICS 270A Spring 2003.

Relaxation and Hybrid constraint processing Different relaxation techniques Some popular hybrid techniques.

1 MCMC Style Sampling / Counting for SAT Can we extend SAT/CSP techniques to solve harder counting/sampling problems? Such an extension would lead us to.

Constraint Networks Overview. Suggested reading Russell and Norvig. Artificial Intelligence: Modern Approach. Chapter 5.

Hande ÇAKIN IES 503 TERM PROJECT CONSTRAINT SATISFACTION PROBLEMS.

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

Probabilistic Networks Chapter 14 of Dechter’s CP textbook Speaker: Daniel Geschwender April 1, 2013 April 1&3, 2013DanielG--Probabilistic Networks1.

Two Approximate Algorithms for Belief Updating Mini-Clustering - MC Robert Mateescu, Rina Dechter, Kalev Kask. "Tree Approximation for Belief Updating",

1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.

1 Directional consistency Chapter 4 ICS-275 Spring 2009 ICS Constraint Networks.

Approximation Techniques bounded inference 275b. SP22 Mini-buckets: “local inference” The idea is similar to i-consistency: bound the size of recorded.

An Introduction to Variational Methods for Graphical Models

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

Wednesday, January 29, 2003CSCE Spring 2003 B.Y. Choueiry Directional Consistency Chapter 4.

1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:

Stochastic greedy local search Chapter 7 ICS-275 Spring 2009.

Optimization Problems

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

Arc Consistency CPSC 322 – CSP 3 Textbook § 4.5 February 2, 2011.

Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center

CS 175 Project in AI. 2 Lectures: ICS 180 Tuesday, Thursday Hours: am Discussion: DBH 1300 Wednesday Hours: pm Instructor: Natalia.

Join-graph based cost-shifting Alexander Ihler, Natalia Flerova, Rina Dechter and Lars Otten University of California Irvine Introduction Mini-Bucket Elimination.

Lecture 5: Constraint Satisfaction Problems

Inference in Propositional Logic (and Intro to SAT) CSE 473.

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

Proof Methods for Propositional Logic CIS 391 – Intro to Artificial Intelligence.

CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12

Lecture 7 Constraint Satisfaction Problems

Constraint Optimization And counting, and enumeration 275 class

Irina Rish IBM T.J.Watson Research Center

Exact Inference Continued

An Introduction to Variational Methods for Graphical Models

Notes 6: Constraint Satisfaction Problems

Class #19 – Tuesday, November 3

Chapter 5: General search strategies: Look-ahead

Exact Inference Continued

Presentation transcript:

Approximation Techniques for Automated Reasoning Irina Rish IBM T.J.Watson Research Center Rina Dechter University of California, Irvine

SP22 Outline Introduction Reasoning tasks Reasoning approaches: elimination and conditioning CSPs: exact inference and approximations Belief networks: exact inference and approximations MDPs: decision-theoretic planning Conclusions

SP23 Automated reasoning tasks Propositional satisfiability Constraint satisfaction Planning and scheduling Probabilistic inference Decision-theoretic planning Etc. Reasoning is NP-hard Approximations

SP24 Graphical Frameworks Our focus - graphical frameworks: constraint and belief networks Nodes variables Edges dependencies (constraints, probabilities, utilities) Reasoning graph transformations

SP25 Propositional Satisfiability If Alex goes, then Becky goes: If Chris goes, then Alex goes: Query: Is it possible that Chris goes to the party but Becky does not? Example: party problem

SP26 Constraint Satisfaction Example: map coloring Variables - countries (A,B,C,etc.) Values - colors (e.g., red, green, yellow) Constraints:

SP27 Constrained Optimization Example: power plant scheduling

SP28 Probabilistic Inference smoking A S T V XD B C tuberculosis X-ray visit to Asia lung cancer bronchitis dyspnoea (shortness of breath) abnormality in lungs Query: P(T = yes | S = no, D = yes) = ? Example: medical diagnosis

SP29 Decision-Theoretic Planning State = {X, Y, Battery_Level} Actions = {Go_North, Go_South, Go_West, Go_East} Probability of success = P Task: reach the goal location ASAP Example: robot navigation

SP210 Reasoning Methods Our focus - conditioning and elimination Conditioning (“guessing” assignments, reasoning by assumptions) Branch-and-bound (optimization) Backtracking search (CSPs) Cycle-cutset (CSPs, belief nets) Variable elimination (inference, “propagation” of constraints, probabilities, cost functions) Dynamic programming (optimization) Adaptive consistency (CSPs) Joint-tree propagation (CSPs, belief nets)

SP211 Conditioning : Backtracking Search 0

SP212 Bucket E: E  D, E  C Bucket D: D  A Bucket C: C  B Bucket B: B  A Bucket A: A  C contradiction = D = C B = A Bucket Elimination Adaptive Consistency (Dechter & Pear, 1987) = 

SP213 Bucket-elimination and conditioning: a uniform framework Unifying approach to different reasoning tasks Understanding: commonality and differences “Technology transfer” Ease of implementation Extensions to hybrids: conditioning+elimination Approximations

SP214 Exact CSP techniques: complexity

SP215 Approximations Exact approaches can be intractable Approximate conditioning Local search, gradient descent (optimization, CSPs, SAT) Stochastic simulations (belief nets) Approximate elimination Local consistency enforcing (CSPs), local probability propagation (belief nets) Bounded resolution (SAT) Mini-bucket approach (belief nets) Hybrids (conditioning+elimination) Other approximations (e.g., variational)

SP216 “Road map” CSPs: complete algorithms Variable Elimination Conditioning (Search) CSPs: approximations Belief nets: complete algorithms Belief nets: approximations MDPs

SP217 Constraint Satisfaction Planning and scheduling Configuration and design problems Circuit diagnosis Scene labeling Temporal reasoning Natural language processing Applications:

SP218 AB redgreen redyellow greenred greenyellow yellowgreen yellow red Constraint Satisfaction Example: map coloring Variables - countries (A,B,C,etc.) Values - colors (e.g., red, green, yellow) Constraints: C A B D E F G

SP219 Constraint Networks

SP220 The Idea of Elimination 3 value assignment D B C R DBC eliminating E

SP221 Variable Elimination Eliminate variables one by one: “constraint propagation” Solution generation after elimination is backtrack-free

SP222 Elimination Operation: join followed by projection Join operation over A finds all solutions satisfying constraints that involve A

SP223 Bucket Elimination Adaptive Consistency (Dechter and Pearl, 1987) A E D C B E A D C B || R D BE, || R E || R DB || R DCB || R ACB || R AB RARA R C BE

SP224 Induced Width Width along ordering d: max # of previous neighbors (“parents”) Induced width The width in the ordered induced graph, obtained by connecting “parents” of each recursively, from i=n to 1.

SP225 Induced width (continued) Finding minimum- ordering is NP-complete (Arnborg, 1985) Greedy ordering heuristics: min-width, min- degree, max-cardinality (Bertele and Briochi, 1972; Freuder 1982) Tractable classes: trees have of an ordering is computed in O(n) time, i.e. complexity of elimination is easy to predict

SP226 Example: crossword puzzle

SP227 Crossword Puzzle: Adaptive consistency

SP228 Adaptive Consistency as “bucket-elimination” Initialize: partition constraints into For i=n down to 1 // process buckets in the reverse order for all relations do // join all relations and “project-out” If is not empty, add it to where k is the largest variable index in Else problem is unsatisfiable Return the set of all relations (old and new) in the buckets

SP229 Solving Trees (Mackworth and Freuder, 1985) Adaptive consistency is linear for trees and equivalent to enforcing directional arc-consistency (recording only unary constraints)

SP230 Properties of bucket-elimination (adaptive consistency) Adaptive consistency generates a constraint network that is backtrack-free (can be solved without deadends). The time and space complexity of adaptive consistency along ordering d is. Therefore, problems having bounded induced width are tractable (solved in polynomial time). Examples of tractable problem classes: trees ( ), series-parallel networks ( ), and in general k-trees ( ).

SP231 “Road map” CSPs: complete algorithms Variable Elimination Conditioning (Search) CSPs: approximations Belief nets: complete algorithms Belief nets: approximations MDPs

SP232 The Idea of Conditioning

SP233 Backtracking Search+Heuristics Look-ahead schemes Forward checking (Haralick and Elliot, 1980) MAC (full arc-consistency at each node) (Gashnig 1977) Look back schemes Backjumping (Gashnig 1977, Dechter 1990, Prosser 1993) Backmarking (Gashnig 1977) BJ+DVO (Frost and Dechter, 1994) Constraint learning (Dechter 1990, Frost and Dechter 1994, Bayardo and Miranker 1996) “Vanilla” backtracking + variable/value ordering Heuristics + constraint propagation + learning +…

SP234 Search complexity distributions Complexity histograms (deadends, time) => continuous distributions (Frost, Rish, and Vila 1997; Selman and Gomez 1997, Hoos 1998) nodes explored in the search space Frequency (probability)

SP235 Constraint Programming Constraint solving embedded in programming languages Allows flexible modeling + with algorithms Logic programs + forward checking Eclipse, Ilog, OPL Using only look-ahead schemes.

SP236 Complete CSP algorithms: summary Bucket elimination: adaptive consistency (CSP), directional resolution (SAT) elimination operation: join-project (CSP), resolution (SAT) Time and space exponential in the induced width (given a variable ordering) Conditioning: Backtracking search+heuristics Time complexity: worst-case O(exp(n)), but average-case is often much better. Space complexity: linear.

SP237 “Road map” CSPs: complete algorithms CSPs: approximations Approximating elimination Approximating conditioning Belief nets: complete algorithms Belief nets: approximations MDPs

SP238 Approximating Elimination: Local Constraint Propagation Problem: bucket-elimination algorithms are intractable when induced width is large Approximation: bound the size of recorded dependencies, i.e. perform local constraint propagation (local inference) Advantages: efficiency; may discover inconsistencies by deducing new constraints Disadvantages: does not guarantee a solution exist

SP239 From Global to Local Consistency

SP240 Constraint Propagation Arc-consistency, unit resolution, i-consistency 32,1, 32,1,32,1, 1  X, Y, Z, T  3 X  Y Y = Z T  Z X  T XY TZ 32,1,  =  

SP241 Constraint Propagation Arc-consistency, unit resolution, i-consistency 1  X, Y, Z, T  3 X  Y Y = Z T  Z X  T XY TZ  =   Incorporated into backtracking search Constraint programming languages powerful approach for modeling and solving combinatorial optimization problems.

SP242 Arc-consistency Only domain constraints are recorded: Example:

SP243 Local consistency: i-consistency i-consistency: Any consistent assignment to any i-1 variables is consistent with at least one value of any i-th variable strong i-consistency: k-consistency for every directional i-consistency Given an ordering, each variable is i-consistent with any i-1 preceding variables strong directional i-consistency Given an ordering, each variable is strongly i-consistent with any i-1 preceding variables

SP244 Directional i-consistency A E CD B D C B E D C B E D C B E Adaptived-arcd-path

SP245 Enforcing Directional i-consistency Directional i-consistency bounds the size of recorded constraints by i. i=1 - arc-consistency i=2 - path-consistency For, directional i-consistency is equivalent to adaptive consistency

SP246 Example: SAT Elimination operation – resolution Directional Resolution – adaptive consistency (Davis and Putnam, 1960; Dechter and Rish, 1994) Bounded resolution – bounds the resolvent size BDR(i) – directional i-consistency (Dechter and Rish, 1994) k-closure – full k-consistency (van Gelder and Tsuji, 1996) In general: bounded induced-width resolution DCDR(b) – generalizes cycle-cutset idea: limits induced width by conditioning on cutset variables (Rish and Dechter 1996, Rish and Dechter 2000)

SP247 Directional Resolution  Adaptive Consistency

SP248 DR complexity

SP249 History 1960 – resolution-based Davis-Putnam algorithm 1962 – resolution step replaced by conditioning (Davis, Logemann and Loveland, 1962) to avoid memory explosion, resulting into a backtracking search algorithm known as Davis-Putnam (DP), or DPLL procedure. The dependency on induced width was not known in – Directional Resolution (DR), a rediscovery of the original Davis-Putnam, identification of tractable classes (Dechter and Rish, 1994).

SP250 DR versus DPLL: complementary properties Uniform random 3-CNFs (large induced width) (k,m)-tree 3-CNFs (bounded induced width )

SP251 Complementary properties => hybrids

SP252 BDR-DP(i): bounded resolution + backtracking Complete algorithm: run BDR(i) as preprocessing before the Davis-Putnam backtracking algorithm. Empirical results: random vs. structured (low-w*) problems:

SP253 DCDR(b) Conditioning+DR

SP254

SP255 DCDR(b): empirical results

SP256 Approximating Elimination: Summary Key idea: local propagation, restricting the number of variables involved in recorded constraints Examples: arc-, path-, and i-consistency (CSPs), bounded resolution, k-closure (SAT) For SAT: bucket-elimination=directional resolution (original resolution- based Davis-Putnam) Conditioning=DPLL (backtracking search) Hybrids: bounded resolution+search= complete algorithms (BDR-DP(i), DCDR(b) )

SP257 “Road map” CSPs: complete algorithms CSPs: approximations Approximating elimination Approximating conditioning Belief nets: complete algorithms Belief nets: approximations MDPs

SP258 Approximating Conditioning: Local Search Problem: complete (systematic, exhaustive) search can be intractable (O(exp(n) worst-case) Approximation idea: explore only parts of search space Advantages: anytime answer; may “run into” a solution quicker than systematic approaches Disadvantages: may not find an exact solution even if there is one; cannot detect that a problem is unsatisfiable

SP259 Simple “greedy” search 1. Generate a random assignment to all variables 2. Repeat until no improvement made or solution found: // hill-climbing step 3. flip a variable (change its value) that increases the number of satisfied constraints Easily gets stuck at local maxima

SP260 GSAT – local search for SAT (Selman, Levesque and Mitchell, 1992) 1. For i=1 to MaxTries 2. Select a random assignment A 3. For j=1 to MaxFlips 4. if A satisfies all constraint, return A 5. else flip a variable to maximize the score 6. (number of satisfied constraints; if no variable 7. assignment increases the score, flip at random) 8. end 9. end Greatly improves hill-climbing by adding restarts and sideway moves

SP261 WalkSAT (Selman, Kautz and Cohen, 1994) With probability p random walk – flip a variable in some unsatisfied constraint With probability 1-p perform a hill-climbing step Adds random walk to GSAT: Randomized hill-climbing often solves large and hard satisfiable problems

SP262 Other approaches Different flavors of GSAT with randomization (GenSAT by Gent and Walsh, 1993; Novelty by McAllester, Kautz and Selman, 1997) Simulated annealing Tabu search Genetic algorithms Hybrid approximations: elimination+conditioning

SP263 Approximating conditioning with elimination Energy minimization in neural networks (Pinkas and Dechter, 1995) For cycle-cutset nodes, use the greedy update function (relative to neighbors). For the rest of nodes, run the arc-consistency algorithm followed by value assignment. cutset

SP264 GSAT with Cycle-Cutset (Kask and Dechter, 1996) Input: a CSP, a partition of the variables into cycle-cutset and tree variables Output: an assignment to all the variables Within each try: Generate a random initial asignment, and then alternate between the two steps: 1. Run Tree algorithm (arc-consistency+assignment) on the problem with fixed values of cutset variables. 2. Run GSAT on the problem with fixed values of tree variables.

SP265 Results: GSAT with Cycle-Cutset (Kask and Dechter, 1996)

SP266 Results: GSAT with Cycle-Cutset (Kask and Dechter, 1996)

SP267 “Road map” CSPs: complete algorithms CSPs: approximations Bayesian belief nets: complete algorithms Bucket-elimination Relation to: join-tree, Pearl’s poly-tree algorithm, conditioning Belief nets: approximations MDPs

SP268 Belief Networks = P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B) lung Cancer Smoking X-ray Bronchitis Dyspnoea P(D|C,B) P(B|S) P(S) P(X|C,S) P(C|S) P(S, C, B, X, D) Conditional IndependenciesEfficient Representation CPD: C B D=0 D=

SP269 Example: Printer Troubleshooting

SP270 Example: Car Diagnosis

SP271 What are they good for? Diagnosis: P(cause|symptom)=? Medicine Bio- informatics Computer troubleshooting Stock market Text Classification Speech recognition Prediction: P(symptom|cause)=? Classification: P(class|data) Decision-making (given a cost function) cause symptom cause

SP272 Probabilistic Inference Tasks  Belief updating:  Finding most probable explanation (MPE)  Finding maximum a-posteriory hypothesis  Finding maximum-expected-utility (MEU) decision

SP273 Belief Updating lung Cancer Smoking X-ray Bronchitis Dyspnoea P (lung cancer=yes | smoking=no, dyspnoea=yes ) = ?

SP274 “Moral” Graph Conditional Probability Distribution (CPD) Clique in moral graph (“family”)

SP275 Belief updating: P(X|evidence)=? “Moral” graph A D E C B P(a|e=0) P(a,e=0)= P(a)P(b|a)P(c|a)P(d|b,a)P(e|b,c)= P(a) P(b|a)P(d|b,a)P(e|b,c) BC ED Variable Elimination P(c|a)

SP276 Bucket elimination Algorithm elim-bel (Dechter 1996) Elimination operator P(a|e=0) W*=4 ”induced width” (max clique size) bucket B: P(a) P(c|a) P(b|a) P(d|b,a) P(e|b,c) bucket C: bucket D: bucket E: bucket A: e=0 B C D E A

SP277 Elimination operator MPE W*=4 ”induced width” (max clique size) bucket B: P(a) P(c|a) P(b|a) P(d|b,a) P(e|b,c) bucket C: bucket D: bucket E: bucket A: e=0 B C D E A Finding Algorithm elim-mpe (Dechter 1996)

SP278 Generating the MPE-tuple C: E: P(b|a) P(d|b,a) P(e|b,c)B: D: A: P(a) P(c|a) e=0

SP279 Complexity of elimination The effect of the ordering: “Moral” graph A D E C B B C D E A E D C B A

SP280 Other tasks and algorithms MAP and MEU tasks: Similar bucket-elimination algorithms - elim-map, elim-meu (Dechter 1996) Elimination operation: either summation or maximization Restriction on variable ordering: summation must precede maximization (i.e. hypothesis or decision variables are eliminated last) Other inference algorithms: Join-tree clustering Pearl’s poly-tree propagation Conditioning, etc.

SP281 Relationship with join-tree clustering ABC BCE ADB A cluster is a set of buckets (a “super-bucket”)

SP282 Relationship with Pearl’s belief propagation in poly-trees Pearl’s belief propagation for single-root query elim-bel using topological ordering and super-buckets for families Elim-bel, elim-mpe, and elim-map are linear for poly-trees. “Diagnostic support” “Causal support”

SP283 Conditioning generates the probability tree Complexity of conditioning: exponential time, linear space

SP284 Conditioning+Elimination Idea: conditioning until of a (sub)problem gets small

SP285 Super-bucket elimination (Dechter and El Fattah, 1996) Eliminating several variables ‘at once’ Conditioning is done only in super-buckets

SP286 The idea of super-buckets Larger super-buckets (cliques) =>more time but less space Complexity: 1.Time: exponential in clique (super-bucket) size 2.Space: exponential in separator size

SP287 Application: circuit diagnosis Problem: Given a circuit and its unexpected output, identify faulty components. The problem can be modeled as a constraint optimization problem and solved by bucket elimination.

SP288 Time-Space Tradeoff

SP289 “Road map” CSPs: complete algorithms CSPs: approximations Belief nets: complete algorithms Belief nets: approximations Local inference: mini-buckets Stochastic simulations Variational techniques MDPs

SP290 Mini-buckets: “local inference” The idea is similar to i-consistency: bound the size of recorded dependencies Computation in a bucket is time and space exponential in the number of variables involved Therefore, partition functions in a bucket into “mini-buckets” on smaller number of variables

SP291 Mini-bucket approximation: MPE task Split a bucket into mini-buckets =>bound complexity

SP292 Approx-mpe(i) Input: i – max number of variables allowed in a mini-bucket Output: [lower bound (P of a sub-optimal solution), upper bound] Example: approx-mpe(3) versus elim-mpe

SP293 Properties of approx-mpe(i) Complexity: O(exp(2i)) time and O(exp(i)) time. Accuracy: determined by upper/lower (U/L) bound. As i increases, both accuracy and complexity increase. Possible use of mini-bucket approximations: As anytime algorithms (Dechter and Rish, 1997) As heuristics in best-first search (Kask and Dechter, 1999) Other tasks: similar mini-bucket approximations for: belief updating, MAP and MEU (Dechter and Rish, 1997)

SP294 Anytime Approximation

SP295 Empirical Evaluation (Dechter and Rish, 1997; Rish, 1999) Randomly generated networks Uniform random probabilities Random noisy-OR CPCS networks Probabilistic decoding Comparing approx-mpe and anytime-mpe versus elim-mpe

SP296 Random networks Uniform random: 60 nodes, 90 edges (200 instances) In 80% of cases, times speed-up while U/L<2 Noisy-OR – even better results Exact elim-mpe was infeasible; appprox-mpe took 0.1 to 80 sec.

SP297 CPCS networks – medical diagnosis (noisy-OR model) Test case: no evidence anytime-mpe( ), anytime-mpe( ), elim-mpe cpcs422cpcs360 Algorithm Time (sec)

SP298 The effect of evidence More likely evidence=>higher MPE => higher accuracy (why?) Likely evidence versus random (unlikely) evidence

SP299 Probabilistic decoding Error-correcting linear block code State-of-the-art: approximate algorithm – iterative belief propagation (IBP) (Pearl’s poly-tree algorithm applied to loopy networks)

SP2100 Iterative Belief Proapagation Belief propagation is exact for poly-trees IBP - applying BP iteratively to cyclic networks No guarantees for convergence Works well for many coding networks

SP2101 approx-mpe vs. IBP Bit error rate (BER) as a function of noise (sigma):

SP2102 Mini-buckets: summary Mini-buckets – local inference approximation Idea: bound size of recorded functions Approx-mpe(i) - mini-bucket algorithm for MPE Better results for noisy-OR than for random problems Accuracy increases with decreasing noise in Accuracy increases for likely evidence Sparser graphs -> higher accuracy Coding networks: approx-mpe outperfroms IBP on low- induced width codes

SP2103 Heuristic search Mini-buckets record upper-bound heuristics The evaluation function over Best-first: expand a node with maximal evaluation function Branch and Bound: prune if f >= upper bound Properties: an exact algorithm Better heuristics lead to more prunning

SP2104 Heuristic Function Given a cost function P(a,b,c,d,e) = P(a) P(b|a) P(c|a) P(e|b,c) P(d|b,a) Define an evaluation function over a partial assignment as the probability of it’s best extension f*(a,e,d) = max b,c P(a,b,c,d,e) = = P(a) max b,c P)b|a) P(c|a) P(e|b,c) P(d|a,b) = g(a,e,d) H*(a,e,d) E E D A D B D D B

SP2105 Heuristic Function H*(a,e,d) = max b,c P(b|a) P(c|a) P(e|b,c) P(d|a,b) = max c P(c|a) max b P(e|b,c) P(b|a) P(d|a,b)  max c P(c|a) max b P(e|b,c) max b P(b|a) P(d|a,b) = H(a,e,d) f(a,e,d) = g(a,e,d) H(a,e,d)  f*(a,e,d) The heuristic function H is compiled during the preprocessing stage of the Mini-Bucket algorithm.

SP2106 max B P(e|b,c) P(d|a,b) P(b|a) max C P(c|a) h B (e,c) max D h B (d,a) max E h C (e,a) max A P(a) h E (a) h D (a) Heuristic Function The evaluation function f(x p ) can be computed using function recorded by the Mini-Bucket scheme and can be used to estimate the probability of the best extension of partial assignment x p ={x 1, …, x p }, f(x p )=g(xp)  H(x p ) For example, H(a,e,d) = h B (d,a)   h C (e,a) g(a,e,d) = P(a)

SP2107 Properties Heuristic is monotone Heuristic is admissible Heuristic is computed in linear time IMPORTANT: Mini-buckets generate heuristics of varying strength using control parameter – bound I Higher bound -> more preprocessing -> stronger heuristics -> less search Allows controlled trade-off between preprocessing and search

SP2108 Empirical Evaluation of mini-bucket heuristics

SP2109 “Road map” CSPs: complete algorithms CSPs: approximations Belief nets: complete algorithms Belief nets: approximations Local inference: mini-buckets Stochastic simulations Variational techniques MDPs

SP2110 Stochastic Simulation Forward sampling (logic sampling) Likelihood weighing Markov Chain Monte Carlo (MCMC): Gibbs sampling

SP2111 Approximation via Sampling

SP2112 Forward Sampling (logic sampling (Henrion, 1988))

SP2113 Forward sampling (example) Drawback: high rejection rate!

SP2114 Likelihood Weighing (Fung and Chang, 1990; Shachter and Peot, 1990) Works well for likely evidence! “Clamping” evidence+forward sampling+ weighing samples by evidence likelihood

SP2115 Gibbs Sampling (Geman and Geman, 1984) Markov Chain Monte Carlo (MCMC): create a Markov chain of samples Advantage: guaranteed to converge to P(X) Disadvantage: convergence may be slow

SP2116 Gibbs Sampling (cont’d) (Pearl, 1988) Markov blanket :

SP2117 “Road map” CSPs: complete algorithms CSPs: approximations Belief nets: complete algorithms Belief nets: approximations Local inference: mini-buckets Stochastic simulations Variational techniques MDPs

SP2118 Variational Approximations Idea: variational transformation of CPDs simplifies inference Advantages: Compute upper and lower bounds on P(Y) Usually faster than sampling techniques Disadvantages: More complex and less general: re-derived for each particular form of CPD functions

SP2119 Variational bounds: example log (x) This approach can be generalized for any concave (convex) function in order to compute its upper (lower) bounds

SP2120 Convex duality approach (Jaakkola and Jordan, 1997)

SP2121 Example: QMR-DT network (Quick Medical Reference – Decision-Theoretic (Shwe et al., 1991)) Noisy-OR model: 600 diseases 4000 findings

SP2122 Inference in QMR-DT Inference complexity: O(exp(min{p,k})) p = # of positive findings, k = max family size (Heckerman, 1989 (“Quickscore”), Rish and Dechter, 1998) Positive evidence “couples” the disease nodes factorized

SP2123 Variational approach to QMR-DT (Jaakkola and Jordan, 1997) The effect of positive evidence is now factorized (diseases are “decoupled”)

SP2124 Variational approach (cont.) Bounds on local CPDs yield a bound on posterior Two approaches: sequential and block Sequential: applies variational transformation to (a subset of) nodes sequentially during inference using a heuristic node ordering; then optimizes across variational parameters Block: selects in advance nodes to be transformed, then selects variational parameters minimizing the KL-distance between true and approximate posteriors

SP2125 Block approach

SP2126 Variational approach: summary Variational approximations were successfully applied to inference in QMR-DT and neural networks (logistic functions), and to learning (approximate E step in EM-algorithm) For more details, see: Saul, Jaakkola, and Jordan, 1996 Jaakkola and Jordan, 1997 Neal and Hinton, 1998 Jordan, 1999

SP2127 “Road map” CSPs: complete algorithms CSPs: approximations Belief nets: complete algorithms Belief nets: approximations MDPs: Elimination and Conditioning

SP2128 Decision-Theoretic Planning State = {X, Y, Battery_Level} Actions = {Go_North, Go_South, Go_West, Go_East} Probability of success = P Task: reach the goal location ASAP Example: robot navigation

SP2129 Dynamic Belief Networks (DBNs) Two-stage influence diagram Interaction graph

SP2130 Markov Decision Process

SP2131 Dynamic Programming: Elimination

SP2132 Bucket Elimination Complexity: O(exp(w*))

SP2133 MDPs: Elimination and Conditioning Finite-horizon MDPs: dynamic programming=elimination along temporal ordering (N slices) Infinite-horizon MDPs: Value Iteration (VI) = elimination along temporal ordering (iterative) Policy Iteration (PI) = conditioning on Aj, elimination on Xj (iterative) Bucket elimination: “non-temporal” orderings Complexity:

SP2134 MDPs: approximations Open directions for further research: Applying probabilistic inference approximations to DBNs Handling actions (rewards) Approximating elimination, heuristic search, etc.

SP2135 Conclusions Common reasoning approaches: elimination and conditioning Exact reasoning is often intractable => need approximations Approximation principles: Approximating elimination – local inference, bounding size of dependencies among variables (cliques in a problem’s graph). Mini-buckets, IBP, i-consistency enforcing Approximating conditioning – local search, stochastic simulations Other approximations: variational techniques, etc. Further research: Combining “orthogonal” approximation approaches Better understanding of “what works well where”: which approximation suits which problem structure Other approximation paradigms (e.g., other ways of approximating probabilities, constraints, cost functions)