Presentation is loading. Please wait.

Presentation is loading. Please wait.

Approximation Techniques for Automated Reasoning Irina Rish IBM T.J.Watson Research Center Rina Dechter University of California, Irvine.

Similar presentations


Presentation on theme: "Approximation Techniques for Automated Reasoning Irina Rish IBM T.J.Watson Research Center Rina Dechter University of California, Irvine."— Presentation transcript:

1 Approximation Techniques for Automated Reasoning Irina Rish IBM T.J.Watson Research Center rish@us.ibm.com Rina Dechter University of California, Irvine dechter@ics.uci.edu

2 SP22 Outline Introduction Reasoning tasks Reasoning approaches: elimination and conditioning CSPs: exact inference and approximations Belief networks: exact inference and approximations MDPs: decision-theoretic planning Conclusions

3 SP23 Automated reasoning tasks Propositional satisfiability Constraint satisfaction Planning and scheduling Probabilistic inference Decision-theoretic planning Etc. Reasoning is NP-hard Approximations

4 SP24 Graphical Frameworks Our focus - graphical frameworks: constraint and belief networks Nodes variables Edges dependencies (constraints, probabilities, utilities) Reasoning graph transformations

5 SP25 Propositional Satisfiability If Alex goes, then Becky goes: If Chris goes, then Alex goes: Query: Is it possible that Chris goes to the party but Becky does not? Example: party problem

6 SP26 Constraint Satisfaction Example: map coloring Variables - countries (A,B,C,etc.) Values - colors (e.g., red, green, yellow) Constraints:

7 SP27 Constrained Optimization Example: power plant scheduling

8 SP28 Probabilistic Inference smoking A S T V XD B C tuberculosis X-ray visit to Asia lung cancer bronchitis dyspnoea (shortness of breath) abnormality in lungs Query: P(T = yes | S = no, D = yes) = ? Example: medical diagnosis

9 SP29 Decision-Theoretic Planning State = {X, Y, Battery_Level} Actions = {Go_North, Go_South, Go_West, Go_East} Probability of success = P Task: reach the goal location ASAP Example: robot navigation

10 SP210 Reasoning Methods Our focus - conditioning and elimination Conditioning (“guessing” assignments, reasoning by assumptions) Branch-and-bound (optimization) Backtracking search (CSPs) Cycle-cutset (CSPs, belief nets) Variable elimination (inference, “propagation” of constraints, probabilities, cost functions) Dynamic programming (optimization) Adaptive consistency (CSPs) Joint-tree propagation (CSPs, belief nets)

11 SP211 Conditioning : Backtracking Search 0

12 SP212 Bucket E: E  D, E  C Bucket D: D  A Bucket C: C  B Bucket B: B  A Bucket A: A  C contradiction = D = C B = A Bucket Elimination Adaptive Consistency (Dechter & Pear, 1987) = 

13 SP213 Bucket-elimination and conditioning: a uniform framework Unifying approach to different reasoning tasks Understanding: commonality and differences “Technology transfer” Ease of implementation Extensions to hybrids: conditioning+elimination Approximations

14 SP214 Exact CSP techniques: complexity

15 SP215 Approximations Exact approaches can be intractable Approximate conditioning Local search, gradient descent (optimization, CSPs, SAT) Stochastic simulations (belief nets) Approximate elimination Local consistency enforcing (CSPs), local probability propagation (belief nets) Bounded resolution (SAT) Mini-bucket approach (belief nets) Hybrids (conditioning+elimination) Other approximations (e.g., variational)

16 SP216 “Road map” CSPs: complete algorithms Variable Elimination Conditioning (Search) CSPs: approximations Belief nets: complete algorithms Belief nets: approximations MDPs

17 SP217 Constraint Satisfaction Planning and scheduling Configuration and design problems Circuit diagnosis Scene labeling Temporal reasoning Natural language processing Applications:

18 SP218 AB redgreen redyellow greenred greenyellow yellowgreen yellow red Constraint Satisfaction Example: map coloring Variables - countries (A,B,C,etc.) Values - colors (e.g., red, green, yellow) Constraints: C A B D E F G

19 SP219 Constraint Networks

20 SP220 The Idea of Elimination 3 value assignment D B C R DBC eliminating E

21 SP221 Variable Elimination Eliminate variables one by one: “constraint propagation” Solution generation after elimination is backtrack-free

22 SP222 Elimination Operation: join followed by projection Join operation over A finds all solutions satisfying constraints that involve A

23 SP223 Bucket Elimination Adaptive Consistency (Dechter and Pearl, 1987) A E D C B E A D C B || R D BE, || R E || R DB || R DCB || R ACB || R AB RARA R C BE

24 SP224 Induced Width Width along ordering d: max # of previous neighbors (“parents”) Induced width The width in the ordered induced graph, obtained by connecting “parents” of each recursively, from i=n to 1.

25 SP225 Induced width (continued) Finding minimum- ordering is NP-complete (Arnborg, 1985) Greedy ordering heuristics: min-width, min- degree, max-cardinality (Bertele and Briochi, 1972; Freuder 1982) Tractable classes: trees have of an ordering is computed in O(n) time, i.e. complexity of elimination is easy to predict

26 SP226 Example: crossword puzzle

27 SP227 Crossword Puzzle: Adaptive consistency

28 SP228 Adaptive Consistency as “bucket-elimination” Initialize: partition constraints into For i=n down to 1 // process buckets in the reverse order for all relations do // join all relations and “project-out” If is not empty, add it to where k is the largest variable index in Else problem is unsatisfiable Return the set of all relations (old and new) in the buckets

29 SP229 Solving Trees (Mackworth and Freuder, 1985) Adaptive consistency is linear for trees and equivalent to enforcing directional arc-consistency (recording only unary constraints)

30 SP230 Properties of bucket-elimination (adaptive consistency) Adaptive consistency generates a constraint network that is backtrack-free (can be solved without deadends). The time and space complexity of adaptive consistency along ordering d is. Therefore, problems having bounded induced width are tractable (solved in polynomial time). Examples of tractable problem classes: trees ( ), series-parallel networks ( ), and in general k-trees ( ).

31 SP231 “Road map” CSPs: complete algorithms Variable Elimination Conditioning (Search) CSPs: approximations Belief nets: complete algorithms Belief nets: approximations MDPs

32 SP232 The Idea of Conditioning

33 SP233 Backtracking Search+Heuristics Look-ahead schemes Forward checking (Haralick and Elliot, 1980) MAC (full arc-consistency at each node) (Gashnig 1977) Look back schemes Backjumping (Gashnig 1977, Dechter 1990, Prosser 1993) Backmarking (Gashnig 1977) BJ+DVO (Frost and Dechter, 1994) Constraint learning (Dechter 1990, Frost and Dechter 1994, Bayardo and Miranker 1996) “Vanilla” backtracking + variable/value ordering Heuristics + constraint propagation + learning +…

34 SP234 Search complexity distributions Complexity histograms (deadends, time) => continuous distributions (Frost, Rish, and Vila 1997; Selman and Gomez 1997, Hoos 1998) nodes explored in the search space Frequency (probability)

35 SP235 Constraint Programming Constraint solving embedded in programming languages Allows flexible modeling + with algorithms Logic programs + forward checking Eclipse, Ilog, OPL Using only look-ahead schemes.

36 SP236 Complete CSP algorithms: summary Bucket elimination: adaptive consistency (CSP), directional resolution (SAT) elimination operation: join-project (CSP), resolution (SAT) Time and space exponential in the induced width (given a variable ordering) Conditioning: Backtracking search+heuristics Time complexity: worst-case O(exp(n)), but average-case is often much better. Space complexity: linear.

37 SP237 “Road map” CSPs: complete algorithms CSPs: approximations Approximating elimination Approximating conditioning Belief nets: complete algorithms Belief nets: approximations MDPs

38 SP238 Approximating Elimination: Local Constraint Propagation Problem: bucket-elimination algorithms are intractable when induced width is large Approximation: bound the size of recorded dependencies, i.e. perform local constraint propagation (local inference) Advantages: efficiency; may discover inconsistencies by deducing new constraints Disadvantages: does not guarantee a solution exist

39 SP239 From Global to Local Consistency

40 SP240 Constraint Propagation Arc-consistency, unit resolution, i-consistency 32,1, 32,1,32,1, 1  X, Y, Z, T  3 X  Y Y = Z T  Z X  T XY TZ 32,1,  =  

41 SP241 Constraint Propagation Arc-consistency, unit resolution, i-consistency 1  X, Y, Z, T  3 X  Y Y = Z T  Z X  T XY TZ  =   13 23 Incorporated into backtracking search Constraint programming languages powerful approach for modeling and solving combinatorial optimization problems.

42 SP242 Arc-consistency Only domain constraints are recorded: Example:

43 SP243 Local consistency: i-consistency i-consistency: Any consistent assignment to any i-1 variables is consistent with at least one value of any i-th variable strong i-consistency: k-consistency for every directional i-consistency Given an ordering, each variable is i-consistent with any i-1 preceding variables strong directional i-consistency Given an ordering, each variable is strongly i-consistent with any i-1 preceding variables

44 SP244 Directional i-consistency A E CD B D C B E D C B E D C B E Adaptived-arcd-path

45 SP245 Enforcing Directional i-consistency Directional i-consistency bounds the size of recorded constraints by i. i=1 - arc-consistency i=2 - path-consistency For, directional i-consistency is equivalent to adaptive consistency

46 SP246 Example: SAT Elimination operation – resolution Directional Resolution – adaptive consistency (Davis and Putnam, 1960; Dechter and Rish, 1994) Bounded resolution – bounds the resolvent size BDR(i) – directional i-consistency (Dechter and Rish, 1994) k-closure – full k-consistency (van Gelder and Tsuji, 1996) In general: bounded induced-width resolution DCDR(b) – generalizes cycle-cutset idea: limits induced width by conditioning on cutset variables (Rish and Dechter 1996, Rish and Dechter 2000)

47 SP247 Directional Resolution  Adaptive Consistency

48 SP248 DR complexity

49 SP249 History 1960 – resolution-based Davis-Putnam algorithm 1962 – resolution step replaced by conditioning (Davis, Logemann and Loveland, 1962) to avoid memory explosion, resulting into a backtracking search algorithm known as Davis-Putnam (DP), or DPLL procedure. The dependency on induced width was not known in 1960. 1994 – Directional Resolution (DR), a rediscovery of the original Davis-Putnam, identification of tractable classes (Dechter and Rish, 1994).

50 SP250 DR versus DPLL: complementary properties Uniform random 3-CNFs (large induced width) (k,m)-tree 3-CNFs (bounded induced width )

51 SP251 Complementary properties => hybrids

52 SP252 BDR-DP(i): bounded resolution + backtracking Complete algorithm: run BDR(i) as preprocessing before the Davis-Putnam backtracking algorithm. Empirical results: random vs. structured (low-w*) problems:

53 SP253 DCDR(b) Conditioning+DR

54 SP254

55 SP255 DCDR(b): empirical results

56 SP256 Approximating Elimination: Summary Key idea: local propagation, restricting the number of variables involved in recorded constraints Examples: arc-, path-, and i-consistency (CSPs), bounded resolution, k-closure (SAT) For SAT: bucket-elimination=directional resolution (original resolution- based Davis-Putnam) Conditioning=DPLL (backtracking search) Hybrids: bounded resolution+search= complete algorithms (BDR-DP(i), DCDR(b) )

57 SP257 “Road map” CSPs: complete algorithms CSPs: approximations Approximating elimination Approximating conditioning Belief nets: complete algorithms Belief nets: approximations MDPs

58 SP258 Approximating Conditioning: Local Search Problem: complete (systematic, exhaustive) search can be intractable (O(exp(n) worst-case) Approximation idea: explore only parts of search space Advantages: anytime answer; may “run into” a solution quicker than systematic approaches Disadvantages: may not find an exact solution even if there is one; cannot detect that a problem is unsatisfiable

59 SP259 Simple “greedy” search 1. Generate a random assignment to all variables 2. Repeat until no improvement made or solution found: // hill-climbing step 3. flip a variable (change its value) that increases the number of satisfied constraints Easily gets stuck at local maxima

60 SP260 GSAT – local search for SAT (Selman, Levesque and Mitchell, 1992) 1. For i=1 to MaxTries 2. Select a random assignment A 3. For j=1 to MaxFlips 4. if A satisfies all constraint, return A 5. else flip a variable to maximize the score 6. (number of satisfied constraints; if no variable 7. assignment increases the score, flip at random) 8. end 9. end Greatly improves hill-climbing by adding restarts and sideway moves

61 SP261 WalkSAT (Selman, Kautz and Cohen, 1994) With probability p random walk – flip a variable in some unsatisfied constraint With probability 1-p perform a hill-climbing step Adds random walk to GSAT: Randomized hill-climbing often solves large and hard satisfiable problems

62 SP262 Other approaches Different flavors of GSAT with randomization (GenSAT by Gent and Walsh, 1993; Novelty by McAllester, Kautz and Selman, 1997) Simulated annealing Tabu search Genetic algorithms Hybrid approximations: elimination+conditioning

63 SP263 Approximating conditioning with elimination Energy minimization in neural networks (Pinkas and Dechter, 1995) For cycle-cutset nodes, use the greedy update function (relative to neighbors). For the rest of nodes, run the arc-consistency algorithm followed by value assignment. cutset

64 SP264 GSAT with Cycle-Cutset (Kask and Dechter, 1996) Input: a CSP, a partition of the variables into cycle-cutset and tree variables Output: an assignment to all the variables Within each try: Generate a random initial asignment, and then alternate between the two steps: 1. Run Tree algorithm (arc-consistency+assignment) on the problem with fixed values of cutset variables. 2. Run GSAT on the problem with fixed values of tree variables.

65 SP265 Results: GSAT with Cycle-Cutset (Kask and Dechter, 1996)

66 SP266 Results: GSAT with Cycle-Cutset (Kask and Dechter, 1996)

67 SP267 “Road map” CSPs: complete algorithms CSPs: approximations Bayesian belief nets: complete algorithms Bucket-elimination Relation to: join-tree, Pearl’s poly-tree algorithm, conditioning Belief nets: approximations MDPs

68 SP268 Belief Networks = P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B) lung Cancer Smoking X-ray Bronchitis Dyspnoea P(D|C,B) P(B|S) P(S) P(X|C,S) P(C|S) P(S, C, B, X, D) Conditional IndependenciesEfficient Representation CPD: C B D=0 D=1 0 0 0.1 0.9 0 1 0.7 0.3 1 0 0.8 0.2 1 1 0.9 0.1

69 SP269 Example: Printer Troubleshooting

70 SP270 Example: Car Diagnosis

71 SP271 What are they good for? Diagnosis: P(cause|symptom)=? Medicine Bio- informatics Computer troubleshooting Stock market Text Classification Speech recognition Prediction: P(symptom|cause)=? Classification: P(class|data) Decision-making (given a cost function) cause symptom cause

72 SP272 Probabilistic Inference Tasks  Belief updating:  Finding most probable explanation (MPE)  Finding maximum a-posteriory hypothesis  Finding maximum-expected-utility (MEU) decision

73 SP273 Belief Updating lung Cancer Smoking X-ray Bronchitis Dyspnoea P (lung cancer=yes | smoking=no, dyspnoea=yes ) = ?

74 SP274 “Moral” Graph Conditional Probability Distribution (CPD) Clique in moral graph (“family”)

75 SP275 Belief updating: P(X|evidence)=? “Moral” graph A D E C B P(a|e=0) P(a,e=0)= P(a)P(b|a)P(c|a)P(d|b,a)P(e|b,c)= P(a) P(b|a)P(d|b,a)P(e|b,c) BC ED Variable Elimination P(c|a)

76 SP276 Bucket elimination Algorithm elim-bel (Dechter 1996) Elimination operator P(a|e=0) W*=4 ”induced width” (max clique size) bucket B: P(a) P(c|a) P(b|a) P(d|b,a) P(e|b,c) bucket C: bucket D: bucket E: bucket A: e=0 B C D E A

77 SP277 Elimination operator MPE W*=4 ”induced width” (max clique size) bucket B: P(a) P(c|a) P(b|a) P(d|b,a) P(e|b,c) bucket C: bucket D: bucket E: bucket A: e=0 B C D E A Finding Algorithm elim-mpe (Dechter 1996)

78 SP278 Generating the MPE-tuple C: E: P(b|a) P(d|b,a) P(e|b,c)B: D: A: P(a) P(c|a) e=0

79 SP279 Complexity of elimination The effect of the ordering: “Moral” graph A D E C B B C D E A E D C B A

80 SP280 Other tasks and algorithms MAP and MEU tasks: Similar bucket-elimination algorithms - elim-map, elim-meu (Dechter 1996) Elimination operation: either summation or maximization Restriction on variable ordering: summation must precede maximization (i.e. hypothesis or decision variables are eliminated last) Other inference algorithms: Join-tree clustering Pearl’s poly-tree propagation Conditioning, etc.

81 SP281 Relationship with join-tree clustering ABC BCE ADB A cluster is a set of buckets (a “super-bucket”)

82 SP282 Relationship with Pearl’s belief propagation in poly-trees Pearl’s belief propagation for single-root query elim-bel using topological ordering and super-buckets for families Elim-bel, elim-mpe, and elim-map are linear for poly-trees. “Diagnostic support” “Causal support”

83 SP283 Conditioning generates the probability tree Complexity of conditioning: exponential time, linear space

84 SP284 Conditioning+Elimination Idea: conditioning until of a (sub)problem gets small

85 SP285 Super-bucket elimination (Dechter and El Fattah, 1996) Eliminating several variables ‘at once’ Conditioning is done only in super-buckets

86 SP286 The idea of super-buckets Larger super-buckets (cliques) =>more time but less space Complexity: 1.Time: exponential in clique (super-bucket) size 2.Space: exponential in separator size

87 SP287 Application: circuit diagnosis Problem: Given a circuit and its unexpected output, identify faulty components. The problem can be modeled as a constraint optimization problem and solved by bucket elimination.

88 SP288 Time-Space Tradeoff

89 SP289 “Road map” CSPs: complete algorithms CSPs: approximations Belief nets: complete algorithms Belief nets: approximations Local inference: mini-buckets Stochastic simulations Variational techniques MDPs

90 SP290 Mini-buckets: “local inference” The idea is similar to i-consistency: bound the size of recorded dependencies Computation in a bucket is time and space exponential in the number of variables involved Therefore, partition functions in a bucket into “mini-buckets” on smaller number of variables

91 SP291 Mini-bucket approximation: MPE task Split a bucket into mini-buckets =>bound complexity

92 SP292 Approx-mpe(i) Input: i – max number of variables allowed in a mini-bucket Output: [lower bound (P of a sub-optimal solution), upper bound] Example: approx-mpe(3) versus elim-mpe

93 SP293 Properties of approx-mpe(i) Complexity: O(exp(2i)) time and O(exp(i)) time. Accuracy: determined by upper/lower (U/L) bound. As i increases, both accuracy and complexity increase. Possible use of mini-bucket approximations: As anytime algorithms (Dechter and Rish, 1997) As heuristics in best-first search (Kask and Dechter, 1999) Other tasks: similar mini-bucket approximations for: belief updating, MAP and MEU (Dechter and Rish, 1997)

94 SP294 Anytime Approximation

95 SP295 Empirical Evaluation (Dechter and Rish, 1997; Rish, 1999) Randomly generated networks Uniform random probabilities Random noisy-OR CPCS networks Probabilistic decoding Comparing approx-mpe and anytime-mpe versus elim-mpe

96 SP296 Random networks Uniform random: 60 nodes, 90 edges (200 instances) In 80% of cases, 10-100 times speed-up while U/L<2 Noisy-OR – even better results Exact elim-mpe was infeasible; appprox-mpe took 0.1 to 80 sec.

97 SP297 CPCS networks – medical diagnosis (noisy-OR model) Test case: no evidence 505.2 70.3anytime-mpe( ), 110.5 70.3anytime-mpe( ), 1697.6 115.8 elim-mpe cpcs422cpcs360 Algorithm Time (sec)

98 SP298 The effect of evidence More likely evidence=>higher MPE => higher accuracy (why?) Likely evidence versus random (unlikely) evidence

99 SP299 Probabilistic decoding Error-correcting linear block code State-of-the-art: approximate algorithm – iterative belief propagation (IBP) (Pearl’s poly-tree algorithm applied to loopy networks)

100 SP2100 Iterative Belief Proapagation Belief propagation is exact for poly-trees IBP - applying BP iteratively to cyclic networks No guarantees for convergence Works well for many coding networks

101 SP2101 approx-mpe vs. IBP Bit error rate (BER) as a function of noise (sigma):

102 SP2102 Mini-buckets: summary Mini-buckets – local inference approximation Idea: bound size of recorded functions Approx-mpe(i) - mini-bucket algorithm for MPE Better results for noisy-OR than for random problems Accuracy increases with decreasing noise in Accuracy increases for likely evidence Sparser graphs -> higher accuracy Coding networks: approx-mpe outperfroms IBP on low- induced width codes

103 SP2103 Heuristic search Mini-buckets record upper-bound heuristics The evaluation function over Best-first: expand a node with maximal evaluation function Branch and Bound: prune if f >= upper bound Properties: an exact algorithm Better heuristics lead to more prunning

104 SP2104 Heuristic Function Given a cost function P(a,b,c,d,e) = P(a) P(b|a) P(c|a) P(e|b,c) P(d|b,a) Define an evaluation function over a partial assignment as the probability of it’s best extension f*(a,e,d) = max b,c P(a,b,c,d,e) = = P(a) max b,c P)b|a) P(c|a) P(e|b,c) P(d|a,b) = g(a,e,d) H*(a,e,d) E E D A D B D D B 0 1 1 0 1 0

105 SP2105 Heuristic Function H*(a,e,d) = max b,c P(b|a) P(c|a) P(e|b,c) P(d|a,b) = max c P(c|a) max b P(e|b,c) P(b|a) P(d|a,b)  max c P(c|a) max b P(e|b,c) max b P(b|a) P(d|a,b) = H(a,e,d) f(a,e,d) = g(a,e,d) H(a,e,d)  f*(a,e,d) The heuristic function H is compiled during the preprocessing stage of the Mini-Bucket algorithm.

106 SP2106 max B P(e|b,c) P(d|a,b) P(b|a) max C P(c|a) h B (e,c) max D h B (d,a) max E h C (e,a) max A P(a) h E (a) h D (a) Heuristic Function The evaluation function f(x p ) can be computed using function recorded by the Mini-Bucket scheme and can be used to estimate the probability of the best extension of partial assignment x p ={x 1, …, x p }, f(x p )=g(xp)  H(x p ) For example, H(a,e,d) = h B (d,a)   h C (e,a) g(a,e,d) = P(a)

107 SP2107 Properties Heuristic is monotone Heuristic is admissible Heuristic is computed in linear time IMPORTANT: Mini-buckets generate heuristics of varying strength using control parameter – bound I Higher bound -> more preprocessing -> stronger heuristics -> less search Allows controlled trade-off between preprocessing and search

108 SP2108 Empirical Evaluation of mini-bucket heuristics

109 SP2109 “Road map” CSPs: complete algorithms CSPs: approximations Belief nets: complete algorithms Belief nets: approximations Local inference: mini-buckets Stochastic simulations Variational techniques MDPs

110 SP2110 Stochastic Simulation Forward sampling (logic sampling) Likelihood weighing Markov Chain Monte Carlo (MCMC): Gibbs sampling

111 SP2111 Approximation via Sampling

112 SP2112 Forward Sampling (logic sampling (Henrion, 1988))

113 SP2113 Forward sampling (example) Drawback: high rejection rate!

114 SP2114 Likelihood Weighing (Fung and Chang, 1990; Shachter and Peot, 1990) Works well for likely evidence! “Clamping” evidence+forward sampling+ weighing samples by evidence likelihood

115 SP2115 Gibbs Sampling (Geman and Geman, 1984) Markov Chain Monte Carlo (MCMC): create a Markov chain of samples Advantage: guaranteed to converge to P(X) Disadvantage: convergence may be slow

116 SP2116 Gibbs Sampling (cont’d) (Pearl, 1988) Markov blanket :

117 SP2117 “Road map” CSPs: complete algorithms CSPs: approximations Belief nets: complete algorithms Belief nets: approximations Local inference: mini-buckets Stochastic simulations Variational techniques MDPs

118 SP2118 Variational Approximations Idea: variational transformation of CPDs simplifies inference Advantages: Compute upper and lower bounds on P(Y) Usually faster than sampling techniques Disadvantages: More complex and less general: re-derived for each particular form of CPD functions

119 SP2119 Variational bounds: example log (x) This approach can be generalized for any concave (convex) function in order to compute its upper (lower) bounds

120 SP2120 Convex duality approach (Jaakkola and Jordan, 1997)

121 SP2121 Example: QMR-DT network (Quick Medical Reference – Decision-Theoretic (Shwe et al., 1991)) Noisy-OR model: 600 diseases 4000 findings

122 SP2122 Inference in QMR-DT Inference complexity: O(exp(min{p,k})) p = # of positive findings, k = max family size (Heckerman, 1989 (“Quickscore”), Rish and Dechter, 1998) Positive evidence “couples” the disease nodes factorized

123 SP2123 Variational approach to QMR-DT (Jaakkola and Jordan, 1997) The effect of positive evidence is now factorized (diseases are “decoupled”)

124 SP2124 Variational approach (cont.) Bounds on local CPDs yield a bound on posterior Two approaches: sequential and block Sequential: applies variational transformation to (a subset of) nodes sequentially during inference using a heuristic node ordering; then optimizes across variational parameters Block: selects in advance nodes to be transformed, then selects variational parameters minimizing the KL-distance between true and approximate posteriors

125 SP2125 Block approach

126 SP2126 Variational approach: summary Variational approximations were successfully applied to inference in QMR-DT and neural networks (logistic functions), and to learning (approximate E step in EM-algorithm) For more details, see: Saul, Jaakkola, and Jordan, 1996 Jaakkola and Jordan, 1997 Neal and Hinton, 1998 Jordan, 1999

127 SP2127 “Road map” CSPs: complete algorithms CSPs: approximations Belief nets: complete algorithms Belief nets: approximations MDPs: Elimination and Conditioning

128 SP2128 Decision-Theoretic Planning State = {X, Y, Battery_Level} Actions = {Go_North, Go_South, Go_West, Go_East} Probability of success = P Task: reach the goal location ASAP Example: robot navigation

129 SP2129 Dynamic Belief Networks (DBNs) Two-stage influence diagram Interaction graph

130 SP2130 Markov Decision Process

131 SP2131 Dynamic Programming: Elimination

132 SP2132 Bucket Elimination Complexity: O(exp(w*))

133 SP2133 MDPs: Elimination and Conditioning Finite-horizon MDPs: dynamic programming=elimination along temporal ordering (N slices) Infinite-horizon MDPs: Value Iteration (VI) = elimination along temporal ordering (iterative) Policy Iteration (PI) = conditioning on Aj, elimination on Xj (iterative) Bucket elimination: “non-temporal” orderings Complexity:

134 SP2134 MDPs: approximations Open directions for further research: Applying probabilistic inference approximations to DBNs Handling actions (rewards) Approximating elimination, heuristic search, etc.

135 SP2135 Conclusions Common reasoning approaches: elimination and conditioning Exact reasoning is often intractable => need approximations Approximation principles: Approximating elimination – local inference, bounding size of dependencies among variables (cliques in a problem’s graph). Mini-buckets, IBP, i-consistency enforcing Approximating conditioning – local search, stochastic simulations Other approximations: variational techniques, etc. Further research: Combining “orthogonal” approximation approaches Better understanding of “what works well where”: which approximation suits which problem structure Other approximation paradigms (e.g., other ways of approximating probabilities, constraints, cost functions)


Download ppt "Approximation Techniques for Automated Reasoning Irina Rish IBM T.J.Watson Research Center Rina Dechter University of California, Irvine."

Similar presentations


Ads by Google