Download presentation
Presentation is loading. Please wait.
Published byClifton McDowell Modified over 9 years ago
1
1 Automated Planning and Decision Making 2007 Automated Planning and Decision Making Prof. Ronen Brafman Various Subjects
2
2 Automated Planning and Decision Making Influence Diagrams
3
3 Automated Planning and Decision Making Example Problem Decide where to have the party? ○influencing variables: Weather – Rainy OR Sunny. Unknown value. Decision - Location: Out, Porch, In S0.4 R0.6 Location Weather Value Unknown value {O,P,I} Value under “my” control Value node O, S1 P, S0.95 I, S0.67 I, R0.57 P, R0.32 O, R0
4
4 Automated Planning and Decision Making Assigning Values Assign options with values in the following manner: ○Best gets 1 ○Worst gets 0 ○Others gets p such that for option X: X = p Best + (1-p) Worst “the p for which we don’t mind if we get X or a lottery between Best and Worst. O,S > P,S > I,R > I,S > P,R > O,R 1.95.67.57.32 0
5
5 Automated Planning and Decision Making Influence Diagram A Directed Graph with 3 node types: ○Chance Node (circle) – A variable with a value “we don’t control”. ○Decision Node (square) – A variable with a value “we control”. ○Value Node (diamond) – Describes the value function. Chance node has a conditional probability distribution over its values given its parents values. Value node has a value function given its parents values. A valid diagram must contain at least one value node and one decision node.
6
6 Automated Planning and Decision Making Influence Diagram Parent-Child Relations Parent of a chance node: conditioning variable (like in BN) Parent of a decision node: variable whose value is known at decision time. Parent of value node: variable whose value directly influences value/utility
7
Example (cont.): Influence Diagrams Lets see how to solve this ID. Find distribution : given o (Outside)given p (Porch)given i (In) o 0.4 0.6 So : 1 Ro : 0 0.4 i 0.6 Si : 0.57 Ri : 0.67 0.63 p 0.4 0.6 Sp : 0.95 Rp : 0.32 0.57 The solution is In.
8
8 Automated Planning and Decision Making Influence Diagram Assume we have a weather detector. What will the diagram look like? Location Weather Value Detector What will it mean to have an arrow with the opposite direction? “s”“r” s0.70.3 r0.40.6
9
Example (cont.): Influence Diagrams Lets see how to solve this ID. By bayes : P(S|”S”) P(R|”S”) = (P(“S”|R)*P(R))/P(“S”) = (0.4*0.6)/P(“S”) = = 0.24/P(“S”) = (P(“S”|S)*P(S))/P(“S”) = (0.7*0.4)/P(“S”) = = 0.28/P(“S”)
10
Example (cont.): Influence Diagrams Lets see how to solve this ID. By bayes : P(S|”S”) = 0.28/P(“S”) P(R|”S”) = 0.24/P(“S”) + + 1 = (0.24+0.28)/P(“S”) 1 = 0.54/P(“S”) 0.54 = P(“S”)
11
0.52 Example (cont.): Influence Diagrams Lets see how to solve this ID. By bayes : P(S|”S”) = 0.28/ P(R|”S”) = 0.24/ 0.52 = P(“S”) P(“S”) = 0.54 = 0.46
12
Example (cont.): Influence Diagrams Now we solve. For forecast “S”: o 0.54 0.46 So : 1Ro : 0 0.54 p 0.46 Sp : 0.95 Rp : 0.32 0.66 i 0.54 0.46 Si : 0.57 Ri : 0.67 0.57 given o (Outside)given p (Porch)given i (In) The solution is Porch.
13
13 Automated Planning and Decision Making Influence Diagram It is possible to have more then a single decision node. In which case we demand a fully ordered relation over the decision nodes. Location Weather Value Detector Use Detector? Assume the use costs money Can we drop one?
14
14 Automated Planning and Decision Making Solving Influence Diagrams Decision Tree Create a decision tree by some topological order. Evaluate the values bottom-up. ○For chance node use weighted-average. ○For decision node choose maximum. Location Weather Value Weather Value 10.57.67.95.32.4.6.4.6.4.6.57 O I P.03 R SR S R S
15
15 Automated Planning and Decision Making Solving Influence Diagrams Variable Elimination Create an expression describing the maximal average value. For chance nodes use . For decision nodes use (arg)max. ○For example: For c.n. For d.n. Average value For c.n. – from the table For d.n. – 1
16
16 Automated Planning and Decision Making Solving Influence Diagrams Variable Elimination The value of the expression is the average value of the optimal behavior for the decision problem. The calculation is similar to the one for BN, with the following differences: ○Max evaluation is done accordingly. ○When evaluating max, remember argmax.
17
17 Automated Planning and Decision Making Solving Influence Diagrams Reduction to Bayesian Net For a single decision node. V is transformed into a chance node with 2 values: “0” and “1”. The probability for “1” given its parents is the value of the parents according to the value function. Assume we observed V`s value as “1”. The decision node turns into a chance node where the conditional probabilities for all possible decisions are equal. In order to find the average value of each decision, we calculate the probability for V=1 given the decision. Choose the maximizing decision.
18
18 Automated Planning and Decision Making MDP as Influence Diagrams Assume Horizon = 3. What's missing? State! d1 Value d2d3 d1 Value d2d3 s1s2s3
19
19 Automated Planning and Decision Making value of information
20
20 Automated Planning and Decision Making Calculating the Value Of Information For each of the possible X's values x, calculate the value of the optimal decision given we know X=x. Calculate the average value: x ∈ X P(X=x)V(X=x) The value of information of X is: v - v(X) The value of information will be 0 if: ○No dependency. ○The value is the same for any possible decision. The value of the optimal decision under the knowledge of X The value of the optimal decision under the lack of knowledge of X
21
21 Automated Planning and Decision Making VoI for Party Problem Location Weather Value {O,P,I} Best Decision: Indoor Expected Value: 0.63 If we know the weather: If Sunny: outdoors (1) If Rainy: indoors (0.57) Expected value: 0.4*1+0.6*0.57 = 0.742 Value of perfect information on W: 0.112 S0.4 R0.6 O, S1 P, S0.95 I, S0.67 I, R0.57 P, R0.32 O, R0
22
22 Automated Planning and Decision Making Factored-MDPs
23
23 Automated Planning and Decision Making MDP as Variables Assignment ○V: States variables ○A: Actions ○Tr: Transition function – Tr:SxAxS→[0,1] OR Tr:SxA→Π(S). ○R: Reward function Main issue: representation ○We don’t want to represent Tr and R as tables (exponential in V)
24
24 Automated Planning and Decision Making Dynamic Bayesian Net Replaces the transition matrix using a variant of Bayes Nets One per action Main assumption: each action affects only a few variables, and their value depends only on a few variables x1x1 xnxn x` 1 x` n
25
25 Automated Planning and Decision Making Dynamic Bayesian Net 2 layer graph: ○1 st : pre-action state variables (x) ○2 nd post-action state variables (x’) Dashed edge : action does not affect x i Parents of x’ i : variables whose value influence the value of x’ i Table associated with x’ i quantifies this influence Tables can described compactly No priors on x i x1x1 xnxn x` 1 x` n x2x2 x` 2 x3x3 x` 3 X n-1 x` n-1 X n-2 x` n-2 In practice, this representation is quite efficient.
26
26 Automated Planning and Decision Making The Reward Function Use a decision tree. For example: Decision trees can be used to describe probability tables as well x y 5 4-17 FT FT
27
27 Automated Planning and Decision Making The Percolation Phenomena Effects percolate Usually implies that the optimal policy cannot be described (and computed) compact x1x1 xnxn x` 1 x` n x2x2 x` 2 X n-1 x` n-1 x`` 1 x`` n x`` 2 x`` n-1 Each variable affected by 2 others -- simple Each variable is affected by 3 others - simple Eventually: each variable affected by all others!
28
28 Automated Planning and Decision Making Solving Factored MDP Example: wzw`z`xx`yy` z 10 0 FT The reward function: y 1.0 F T x F T 0.90.0 z 1.0 F T y F T 0.90.0
29
29 Automated Planning and Decision Making Solving Factored MDP V 0 - no steps left y z 1.0 (10) + 10 20 F T y F T 0.9 (9) + 0 9 0.0 (0) + 0 V 1 - one step left z 10 F T 0 By the “previous” reward, i.e. for z V 2 - two steps left z … y x y x Expansions according to the tree of y y z y x … 30 0 0.9*20 0.1*9 18.9 0 0.9*9 0.1*9 8.1 …
30
30 Automated Planning and Decision Making Solving Factored MDP In each step we get an explanation. Need to explain all the variables. Combine the explaining trees. Once we know the values – calculate the average.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.