Download presentation
Presentation is loading. Please wait.
Published byJack Davis Modified over 6 years ago
1
Planning With Agents: An Efficient Approach Using Hierarchical Dynamic Decision Networks
Engineering Societies in the Agents World Workshop 2003 William H. Turkett, Jr. and John R. Rose [turkett, Department of Computer Science and Engineering University of South Carolina
2
Problem Description Increasingly complex domains for intelligent systems Large numbers of agents Complex goals that require good long-term strategies to achieve Dynamic and partially observable environments Enabling tool: Multi-agent systems using efficient, high-quality, and realistic planning
3
A Vision of Social Planning
Quality planning for locally decomposed tasks Multi-agent interaction to achieve global goals Improving global interaction by taking advantage of planning: Knowledge about others Knowledge from others Social structure Current Focus: Quality local planning Directing local planning to facilitate global interactions
4
Facilitating Global Interactions
Agents should be able to take advantage of knowledge of what other agents are doing Discover enabling/hindering actions Literature suggests sharing abstracted plans: Conflict resolution, High level commitments Agents should be able to respond to shared knowledge of environmental dynamics Reprioritized goals Changing costs Action success and failure rates Exploit organizational structure in distributing planning tasks
5
Quality Local Planning
Markov Decision Processes (MDP) – Most realistic means of posing the planning problem Probabilistic action effects Partial observability (POMDP) Prioritized goals Solution: Policy defining what action to take in a given state Complexity: Fully observable: polynomial Partially observable: exponential
6
Planning Architecture
Run-time planning algorithm Use dynamic decision networks to represent POMDPs Take advantage of hierarchical abstractions Use approximations for lookahead depth and in observability assumptions to reduce complexity
7
Modeling POMDPs as DDNs
Actions World State Observations Costs and Rewards
8
HDDN Hierarchy: Taxi Domain
Get Put Pickup Putdown Navigate North South East West TaxiLocation Destination Location Passenger Location 25 25 26
9
HDDN Planning: Taxi Domain
Algorithm: If action is abstract, then plan its implementation; Else if primitive, execute action; Taxi Get Put Navigate Pickup North East Execute: North
10
Minimizing Observation Dependence
Limiting Horizon Further agent looks ahead: Better knowledge about true value of taking actions More complex Approximation: Estimate the true value function with FOMDP solution and look ahead small number More observations agent takes into account: Better knowledge of uncertainty agent will face in future Approximation Consider only one observation when planning Minimizing Observation Dependence
11
Compared Approaches HDDN POMDP (Cassandra) HPOMDP (Pineau and Thrun)
Exact flat POMDP solver HPOMDP (Pineau and Thrun) Define hierarchical structure of smaller POMDPs, and solve each one with an exact solver
12
Comparison of Approaches
HDDN Incurs runtime planning costs instead of using pre-execution policy generation and table lookup Approximate solutions However: POMDP/HPOMDP may not incur runtime! Initial tests demonstrate that our approximate solutions are as good as HPOMDP, and a fair approximation of POMDP optimal solutions
13
Integrating into Social Planning
Abstract Plans Social Structure Shared Knowledge HDDN Datastructure generated during planning cycle Distribute DDNs to hierarchical organization Estimates of low level implementation DDN easy to update at runtime Fully observable MDP estimates fast HPOMDP Requires extra selection and prediction steps Distribute POMDPs to hierarchical organization Complete knowledge of low level implementation Regenerate all POMDP policies POMDP No abstraction used Regenerate complete POMDP policy
14
Test Domains: Part Problem
Optimal Policy (POMDP): HPOMDP/HDDN Policy: BL Reject Inspect NBL Paint Ship BL Reject Inspect NBL Paint Paint Ship
15
Test Domains: Cheese Taxi and Large Cheese Taxi
33 states 300 states
16
Test Domains: Cheese Taxi
17
Test Domains: Large Cheese Taxi
No Solution!
18
Test Domains: Twenty Questions
19
Overview Allow for quality local planning
Algorithm for runtime POMDP planning using Hierarchical Dynamic Decision Networks Relatively efficient and high-quality Direct local planning to facilitate global interactions Ability to directly share abstracted plan datastructure generated by HDDNs Ability to easily integrate in shared knowledge of environment Flexibly fit common hierarchical organizational structure
20
Extra Slides Follow
21
Towards Social Planning…
Incorporating social laws and philosophies into planning (through utility valuations) How do local decisions affect the society as a whole? Hierarchical distributed POMDPs Taking into account joint actions/joint rewards in the DDNs
22
Markov Decision Problems
0.8 0.1 0.0 Actions: North South East West R(s) = -0.4, D < 1 MDP: <S,A,T,R,D> Solution: Optimal Policy POMDP – observe just walls: Left, Right, Neither, Both, Good, Bad (Russell and Norvig)
23
MDP Complexity Difficult! Complexity Consider all possible states
Consider all possible actions Consider all possible transitions on those actions Consider all possible observations on those actions (POMDP) Do for multiple timesteps Complexity FOMDP: Polynomial POMDP: Exponential in horizon
24
Reducing Complexity Collapsing state/action space through hierarchical abstraction Limiting horizon Minimizing observation dependence But, how well will our agent perform? How close is our approximate solution to the optimal policy?
25
Hierarchical Domain Model
Process is an abstract action that represents the part being both painted and shipped Defining state and observation distributions: In input model, after Ship action is taken, enter new part state, so define transition on Process as transition to the new part state. Observation for Process is the observation seen that corresponds to the new part state.
26
Hierarchical Domain Model
Reward/Cost Estimation: Solve the subproblem as a FOMDP (Polynomial time) Ignore pure query actions Returns a fully-observable policy and value function for the Process abstract action each state Review policy diagram: If initial state moves to Process final state (painted and shipped), use MDP value function for that state as the estimate. If initial state never moves to Process final state use large overestimate costs as estimate. Anything blemished would never be shipped under the optimal Process MDP policy so the agent should never choose this action when blemished.
27
Analysis of HDDN Complexity of each DDN is related to size of largest table required in variable elimination algorithm Maximum table for exact solution: ~O(|A|t * |O|t) Decision node for each timeslice Observation node for each timeslice HDDN Algorithm: Our approach reduces |A| in each network through abstraction, uses just |O| due to myopic assumption, and uses small values of t.
28
Hierarchical Task Network
29
Delaying Decisions Future plans are often subject to change as the world changes and the agent gathers more information Very true for detailed plans, somewhat less true for high level plans Don’t waste time now developing plans that are likely to change Our approach: Only find a complete method of implementing the first scheduled action from each DDN Plan steps further away are planned at proportionally shallower levels.
30
Planning in Agent Societies
Distribute decisions to appropriate agents Managers High level goals High level view of the information Implementation details not important Low level agents Have the detailed knowledge for determining how to implement tasks they are given Primarily independent tasks
31
HDDN Planning Define a DDN for each abstract action
Given hierarchy of DDNs, start with most abstract level Generate optimal sequence of actions given current beliefs for abstract level i If first selected action is abstract, select DDN corresponding to that action and generate optimal sequence of actions for level i+1 If first selected action is primitive, execute that action
32
HDDN Planning: Taxi Domain
Get Put Navigate Pickup Navigate Putdown North East South East
33
Hierarchical Domain Model
Parts Ship Paint Reject Inspect Inspect Parts Reject Input Problem Process Need to define for Process: State transition model Observation model Rewards Paint Ship Hierarchical Abstraction
34
Hierarchical Domain Model
Transition model: Effects are desired results of abstract action completion Observation model: Observation seen for the desired completion state Rewards/costs: FOMDP estimation
35
Planning in Agent Societies
Sharing Abstract Plans: POMDP, HPOMDP – Impossible (POMDP) or requires extra step (HPOMDP) to generate abstractions from policy. HDDN – Abstracted plan datastructure is always available and what the agent is designed to work on. Handling Shared Knowledge of Environment: POMDP, HPOMDP Build policies before execution, environmental dynamics change requires regenerating policy HDDN Runtime planning, changes to environmental dynamics can be easily absorbed into DDNs Re-computing MDP estimates is relatively inexpensive compared to regenerating exact POMDP policy.
36
Test Domain: Parts Problem
States: 4 !FL-!BL-!PA !FL-!BL-PA FL-!BL-PA FL-BL-!PA Actions: Inspect, Reject, Paint, Ship Rewards/Costs: +1 for {(Reject, FL-BL-!PA), (Ship, !FL-!BL-PA)} -1 for Reject or Ship in any other state 0 for Inspect or Paint Abstractions: Process
37
Test Domains: Twenty Questions
Information intensive domain States: 12 Tomato, Apple, Banana, Mushroom, Potato, Carrot, Monkey, Rabbit, Robin, Marble, Ruby, Coal Actions: 20 GuessState for each state, AskAnimal, AskVegetable, AskMineral, AskFruit, AskWhite, AskBrown, AskRed, AskHard Rewards/Costs: -1 for each question -20 for an incorrect guess +5 for a correct guess Abstractions: subtaskVegetable, subtaskAnimal, subtaskMineral, subtaskFruit, subtaskRealVegetable Uncertainty: Answers: Correct 85%, Incorrect 10%, Noise 5% Outsider can change object with 11% probability
38
Test Domains: Cheese Taxi
States: 30 10 Locations * 3 Destinations Actions: 7 North, South East, West Pickup Putdown Query Rewards/Costs: -1 every move -10 incorrect pickup or putdown +20 correct putdown Abstractions: Get, Put, Navigate Task: Agent has to pick up passenger from S10, deliver to S0 or S4. Uncertainty: Initial location (observe walls only) Requested passenger destination Passenger can switch destination in S2,S6
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.