Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University.

Slides:

Advertisements

Similar presentations

Heuristic Search techniques

Advertisements

Planning with Non-Deterministic Uncertainty (Where failure is not an option) R&N: Chap. 12, Sect (+ Chap. 10, Sect 10.7)

Planning Module THREE: Planning, Production Systems,Expert Systems, Uncertainty Dr M M Awais.

A Hierarchical Multiple Target Tracking Algorithm for Sensor Networks Songhwai Oh and Shankar Sastry EECS, Berkeley Nest Retreat, Jan

Planning Module THREE: Planning, Production Systems,Expert Systems, Uncertainty Dr M M Awais.

MBD and CSP Meir Kalech Partially based on slides of Jia You and Brian Williams.

Probabilistic Planning (goal-oriented) Action Probabilistic Outcome Time 1 Time 2 Goal State 1 Action State Maximize Goal Achievement Dead End A1A2 I A1.

CS 484. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.

Planning Graphs * Based on slides by Alan Fern, Berthe Choueiry and Sungwook Yoon.

Dynamic Bayesian Networks (DBNs)

Planning with Loops Some New Results Yuxiao (Toby) Hu and Hector Levesque University of Toronto.

Best-First Search: Agendas

Algorithm Strategies Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

1 J.Y Greff, L. Idoumghar and R. Schott TDF && IECN / LORIA - INRIA July 2002 Using Markov Decision Processes to the Frequency Assignment Problem.

Termination Detection. Goal Study the development of a protocol for termination detection with the help of invariants.

Concurrent Markov Decision Processes Mausam, Daniel S. Weld University of Washington Seattle.

Solving Probabilistic Combinatorial Games Ling Zhao & Martin Mueller University of Alberta September 7, 2005 Paper link:

Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Process Scheduling for Performance Estimation and Synthesis.

1 Heuristic Search 4 4.0Introduction 4.1An Algorithm for Heuristic Search 4.2Admissibility, Monotonicity, and Informedness 4.3Using Heuristics in Games.

Constraint Satisfaction Problems

Concurrent Probabilistic Temporal Planning (CPTP) Mausam Joint work with Daniel S. Weld University of Washington Seattle.

Probabilistic Temporal Planning with Uncertain Durations Mausam Joint work with Daniel S. Weld University of Washington Seattle.

CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.

1 Structures and Strategies for State Space Search 3 3.0Introduction 3.1Graph Theory 3.2Strategies for State Space Search 3.3Using the State Space to Represent.

CS121 Heuristic Search Planning CSPs Adversarial Search Probabilistic Reasoning Probabilistic Belief Learning.

On the Task Assignment Problem : Two New Efficient Heuristic Algorithms.

Towards Scalable Critical Alert Mining Bo Zong 1 with Yinghui Wu 1, Jie Song 2, Ambuj K. Singh 1, Hasan Cam 3, Jiawei Han 4, and Xifeng Yan 1 1 UCSB, 2.

Automated Planning and HTNs Planning – A brief intro Planning – A brief intro Classical Planning – The STRIPS Language Classical Planning – The STRIPS.

1 Heuristic Search 4 4.0Introduction 4.1An Algorithm for Heuristic Search 4.2Admissibility, Monotonicity, and Informedness 4.3Using Heuristics in Games.

Distributed Scheduling. What is Distributed Scheduling? Scheduling: –A resource allocation problem –Often very complex set of constraints –Tied directly.

Factor Graphs Young Ki Baik Computer Vision Lab. Seoul National University.

By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.

A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.

AMOST Experimental Comparison of Code-Based and Model-Based Test Prioritization Bogdan Korel Computer Science Department Illinois Institute of Technology.

Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.

Using Abstraction in Multi-Rover Scheduling Bradley J. Clement and Anthony C. Barrett Artificial Intelligence Group Jet Propulsion Laboratory {bclement,

ROBUST RESOURCE ALLOCATION OF DAGS IN A HETEROGENEOUS MULTI-CORE SYSTEM Luis Diego Briceño, Jay Smith, H. J. Siegel, Anthony A. Maciejewski, Paul Maxwell,

May 2004 Department of Electrical and Computer Engineering 1 ANEW GRAPH STRUCTURE FOR HARDWARE- SOFTWARE PARTITIONING OF HETEROGENEOUS SYSTEMS A NEW GRAPH.

Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field.

Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.

Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013,

Computer Science CPSC 322 Lecture 13 Arc Consistency (4.5, 4.6 ) Slide 1.

Heuristics in Search-Space CSE 574 April 11, 2003 Dan Weld.

1 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering

External Memory Value Iteration Stefan Edelkamp, Shahid Jabbar Chair for Programming Systems, University of Dortmund, Germany Blai Bonet Departamento de.

Problem Reduction So far we have considered search strategies for OR graph. In OR graph, several arcs indicate a variety of ways in which the original.

CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.

Artificial Intelligence Lecture No. 6 Dr. Asad Ali Safi Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.

Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas.

Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.

Robust Planning using Constraint Satisfaction Techniques Daniel Buettner and Berthe Y. Choueiry Constraint Systems Laboratory Department of Computer Science.

Automated Planning and Decision Making Prof. Ronen Brafman Automated Planning and Decision Making Fully Observable MDP.

Hypergraph Partitioning With Fixed Vertices Andrew E. Caldwell, Andrew B. Kahng and Igor L. Markov UCLA Computer Science Department

By J. Hoffmann and B. Nebel

Heuristic Search Planners. 2 USC INFORMATION SCIENCES INSTITUTE Planning as heuristic search Use standard search techniques, e.g. A*, best-first, hill-climbing.

Multiple-goal Search Algorithms and their Application to Web Crawling Dmitry Davidov and Shaul Markovitch Computer Science Department Technion, Haifa 32000,

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Achieving Goals in Decentralized POMDPs Christopher Amato Shlomo Zilberstein UMass.

CS 326A: Motion Planning Probabilistic Roadmaps for Path Planning in High-Dimensional Configuration Spaces (1996) L. Kavraki, P. Švestka, J.-C. Latombe,

Relating Reinforcement Learning Performance to Classification performance Presenter: Hui Li Sept.11, 2006.

Class #17 – Thursday, October 27

Graphplan/ SATPlan Chapter

What to do when you don’t know anything know nothing

An Adaptive Middleware for Supporting Time-Critical Event Response

Class #19 – Monday, November 3

Resource Allocation for Distributed Streaming Applications

Graphplan/ SATPlan Chapter

Graphplan/ SATPlan Chapter

Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.

Depth-First Searches.

Presentation transcript:

Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University (Presented by: Li Li) AAAI 08 Chicago July 16, 2008

Outline Example domain Two usages of concurrent actions AO* and CPOAO* algorithms Heuristics used in CPOAO* Experiment results Conclusion and future work

A simple Mars rover domain Locations A, B, C and D on Mars: A B C D

Main features Aspects of complex domains Deadlines, limited resources Failures Oversubscription Concurrency Two types of parallel actions Different goals ( “ all finish ” ) Redundant ( “ early finish ” ) Aborting actions When they succeed When they fail

The actions ActionSuccess probability Description Move(L1,L2) 100%Move the rover from Location L1 to location L2 Sample (L) 70%Collect a soil sample at location L Camera (L) 60%Take a picture at location L

Problem 1 Initial state: The rover is at location A No other rewards have been achieved Rewards: r1 = 10: Get back to location A r2 = 2: Take a picture at location B r3 = 1: Collect a soil sample at location B r4 = 3: Take a picture at location C

Problem 1 Time Limit: The rover is only allowed to operate for 3 time units Actions: Each action takes 1 time unit to finish Actions can be executed in parallel if they are compatible

A solution to problem 1 (1) Move (A, B) (2) Camera (B) Sample (B) (3) Move (B, A) A B D C R4=3 R1=10 R2=2 R3=1

Add redundant actions Actions Camera0 (60%) and Camera1 (50%) can be executed concurrently. There are two rewards : R1: Take a picture P1 at location A R2: Take a picture P2 at location A

Two ways of using concurrent actions All finish: Speed up the execution Use concurrent actions to achieve different goals. Early finish: Redundancy for critical tasks Use concurrent actions to achieve the same goal.

Example of all finish actions If R1=10 and R2=10, execute Camera0 to achieve one reward and execute Camera1 to achieve another. (All finish) The expected total rewards = 10*60% + 10*50% = 11

Example of early finish actions If R1=100 and R2=10, Use both Camera0 and Camera1 to achieve R1. (Early finish) The expected total rewards = 100*50% + ( *50%)*60% = = 80

The AO* algorithm AO* searches in an and-or graph (hypergraph) OR AND Hyperarcs (compact way)

Concurrent Probabilistic Over- subscription AO* (CPOAO*) Concurrent action set Represent parallel actions rather than individual actions Use hyperarcs to represent them State Space Resource levels are part of a state Unfinished actions are part of a state

CPOAO* search Example A B C D A Mars Rover problem Map: Actions:  Move (Location, Location)  Image_S (Target) 50%, T= 4  Image_L (Target) 60%, T= 5  Sample (Target) 70%, T= 6 Targets:  I1 – Image at location B  I2 – Image at location C  S1 – Sample at location B  S2 – Sample at location D Rewards:  Have_Picture(I1) = 3  Have_Picture(I2) = 4  Have_Sample(S1) = 4  Have_Sample(S2) = 5  At_location(A) = 10;

CPOAO* search Example 18.4 S0 A B C D Expected reward calculated using the heuristics T=10

CPOAO* Search Example 15.2S0T=10 {Move(A,B)} {Move(A,C)} Do-nothing {Move(A,D)} S1S2S3 S T=7T=4T=6 T=10 A B C D Values of terminal nodes Expected reward calculated using the heuristics Expected reward calculated from children Best action

CPOAO* search Example S115.2 {Move(B,D)} Do-nothing {Move(A,B)} {a1,a2,a3} a1: Sample(T2) a3: Image_L(T1) a2: Image_S(T1) S5S6S7 S T=3 T=7 50% Sample(T2)=2 Image_L(T1)=1 A B C D Values of terminal nodes Expected reward calculated using the heuristics Expected reward calculated from children Best action S0 15.2

CPOAO* search Example S115.2 {Move(B,D)} Do-nothing {Move(A,B)} {a1,a2,a3} a1: Sample(T2) a3: Image_L(T1) a2: Image_S(T1) S5S6S7 S T=3 T=7 50% Sample(T2)=2 Image_L(T1)=1 A C D Values of terminal nodes Expected reward calculated using the heuristics Expected reward calculated from children Best action S0 15.2

CPOAO* search Example S111.5 {Move(B,D)} Do-nothing {Move(A,B)} {a1,a2,a3} a1: Sample(T2) a3: Image_L(T1) a2: Image_S(T1) S5 S6 S7 S T=3 T=2 T=7 50% Sample(T2)=2 Image_L(T1)=1 S10S9S11 S12S13S14 S15S T=1 T=2T=3 T=0 T=3 a4: Move(B,A) {a1} 2 1 {a1,a2} a5: Do-nothing {a4} {a5} {a4}{a5} A B C D Values of terminal nodes Expected reward calculated using the heuristics Expected reward calculated from children Best action S S2 13.2

CPOAO* search Example 13.2S0T=10 {Move(A,B)} {Move(A,C)} Do-nothing {Move(A,D)} S1 S2S3 S T=7 T=4 T=6 T=10 {a1,a2,a3} a1: Sample(T2) a3: Image_L(T1) a2: Image_S(T1) a4: Move(B,A) a5: Do-nothing A B C D Values of terminal nodes Expected reward calculated using the heuristics Expected reward calculated from children Best action

CPOAO* search Example 11.5S0T=10 {Move(A,B)} {Move(A,C)} Do-nothing {Move(A,D)} S1 S2 S3 S T=7 T=4T=10 {a1,a2,a3} T=6 S17S18S19S20 T=2 T=1T= a1: Sample(T2) a3: Image_L(T1) a2: Image_S(T1) a4: Move(B,A) a5: Do-nothing a6: Image_S(T3) a8: Move(C,D) {a6,a7} {a8} Do-nothing % A B C D Values of terminal nodes Expected reward calculated using the heuristics Expected reward calculated from children Best action a7: Image_L(T3)

CPOAO* search Example S111.5 {Move(B,D)} Do-nothing {Move(A,B)} {a1,a2,a3} a1: Sample(T1) a3: Image_L(T2) a2: Image_S(T2) S5 S6 S7 S T=3 T=2 T=7 50% Sample(T2)=2 Image_L(T1)=1 S10S9S11 S12S13S14 S15S T=1 T=2T=3 T=0 T=3 a4: Move(B,A) {a1} 2 1 {a1,a2} a5: Do-nothing {a4} {a5} {a4}{a5} S0 A B C D Values of terminal nodes Expected reward calculated using the heuristics Expected reward calculated from children Best action T=3

CPOAO* search improvements S111.5 {Move(B,D)} Do-nothing {Move(A,B)} {a1,a2,a3} S5 S6 S7 S T=3 T=2 T=7 50% Sample(T2)=2 Image_L(T1)=1 S10S9S11 S12S13S14 S15S T=1 T=2T=3 T=0 T=3 {a1} 2 1 {a1,a2} {a4} {a5} {a4}{a5} S0 Estimate total expected rewards Prune branches T=3 Plan Found:  Move(A,B)  Image_S(T1)  Move(B,A)

Heuristics used in CPOAO* A heuristic function to estimate the total expected reward for the newly generated states using a reverse plan graph. A group of rules to prune the branches of the concurrent action sets.

Estimating total rewards A three-step process using an rpgraph 1. Generate an rpgraph for each goal 2. Identify the enabled propositions 3. Compute the probability of achieving each goal Compute the expected rewards based on the probabilities Sum up the rewards to compute the value of this state

Heuristics to estimate the total rewards Have_Picture(I1) At_Location(B) Image_S(I1) Move(A,B) At_Location(A) At_Location(D)  Reverse plan graph  Start from goals.  Layers of actions and propositions  Cost marked on the actions  Accumulated cost marked on the propositions Move(A,B) Image_L(I1) At_Location(B) Move(D,B) At_Location(D) At_Location(A) Move(D,B) 4 9

Heuristics to estimate the total rewards  Given a specific state … At_Location(A)=T Time= 7  Enabled propositions are marked in blue. Have_Picture(I1) At_Location(B) Image_S(I1) Move(A,B) At_Location(A) At_Location(D) Move(A,B) Image_L(I1) At_Location(B) Move(D,B) At_Location(D) At_Location(A) Move(D,B) 4 9

Heuristics to estimate the total rewards  Enable more actions and propositions  Actions are probabilistic  Estimate the probabilities that propositions and the goals being true  Sum up rewards on all goals. Have_Picture(I1) At_Location(B) Image_S(I1) Move(A,B) At_Location(A) At_Location(D) Move(A,B) Image_L(I1) At_Location(B) Move(D,B) 4 9

Rules to prune branches (when time is the only resource) Include the action if it does not delete anything Ex. {action-1, action-2, action-3} is better than {action-2,action-3} if action-1 does not delete anything. Include the action if it can be aborted later Ex. {action-1,action-2} is better than {action-1} if the duration of action2 is longer than the duration of action-1. Don ’ t abort an action and restart it again immediately

Experimental work Mars rover problem with following actions Move Collect-Soil-Sample Camera0 Camera1 3 sets of experiments - Gradually increase complexity Base Algorithm With rules to prune branches only With both heuristics

Results of complexity experiments ProblemBase CPOAO*CPOAO* With Pruning OnlyCPOAO* With Pruning and rpgraph Time=20 Time=40Time=20Time=40 NGET (s)NGET (s)NGET (s)NGET (s)NGET (s) <0.1120< <0.1662< <0.1287< < <0.1204< < < <0.1380< < <0.1180< < <0.1417< < <0.1347< < < <0.1694< < Problem: locations-paths-rewards; NG: The number of nodes generated; ET: Execution time (sec.)

Ongoing and Future Work Ongoing Add typed objects and lifted actions Add linear resource consumptions Future Explore the possibility of using state caching Classify domains and develop domain-specific heuristic functions Approximation techniques

Related Work LAO*: A Heuristic Search Algorithm that finds solutions with loops (Hansen & Zilberstein 2001) CoMDP: Concurrent MDP (Mausam & Weld 2004) GSMDP: Generalized semi-Markov decision process (Younes & Simmons 2004) mGPT: A Probabilistic Planner based on Heuristic Search (Bonet & Geffner 2005) Over-subscription Planning (Smith 2004; Benton, Do, & Kambhampati 2005) HAO*: Planning with Continuous Resources in Stochastic Domains (Mausam, Benazera, Brafman, Meuleau & Hansen 2005)

Related Work  CPTP:Concurrent Probabilistic Temporal Planning (Mausam & Weld 2005)  Paragraph/Prottle: Concurrent Probabilistic Planning in the Graphplan Framework (Little & Thiebaux 2006)  FPG: Factored Policy Gradient Planner (Buffet & Aberdeen 2006)  Probabilistic Temporal Planning with Uncertain Durations (Mausam & Weld 2006)  HYBPLAN:A Hybridized Planner for Stochastic Domains (Mausam, Bertoli and Weld 2007)

Conclusion An AO* based modular framework Use redundant actions to increase robustness Abort running actions when needed Heuristic function using reverse plan graph Rules to prune branches