AN ADAPTIVE PLANNER BASED ON LEARNING OF PLANNING PERFORMANCE Kreshna Gopal & Thomas R. Ioerger Department of Computer Science Texas A&M University College.

AN ADAPTIVE PLANNER BASED ON LEARNING OF PLANNING PERFORMANCE Kreshna Gopal & Thomas R. Ioerger Department of Computer Science Texas A&M University College Station, TX

APPROACHES TO PLANNING Planning as Problem Solving Situation Calculus Planning STRIPS Partial Order Planning Hierarchical Planning Enhance Language Expressiveness Planning with Constraints Special Purpose Planning Reactive Planning Plan Execution and Monitoring Distributed, Continual Planning Planning Graphs Planning as Satisfiability Machine Learning Methods for Planning

MACHINE LEARNING METHODS FOR PLANNING Learning Macro-operators Learning Bugs and Repairs Explanation-Based Learning Reinforcement Learning Case-Based Planning Plan Reuse

PLAN REUSE: ISSUES Plan storage & indexing Plan retrieval - matching new problem with solved ones Plan modification - to suit requirements of new problem Nebel & Koehler: Plan matching is NP-hard Plan modification is worse than plan generation Motivations of proposed method Avoid plan modification Very efficient matching using a neural network

COMPONENTS OF PROPOSED PLANNING SYSTEM Default Planner Plan Library (I: Initial State, G: Goal State, P: Solution Plan) I 1 G 1 P 1 I 2 G 2 P 2. I n G n P n Training: predict default planner’s performance using a neural network

SCHEME OF REUSE New problem: Retrieved plan: P new I new G new P I P G I k G k P k Proposed approach: use default planner to generate P I and P G (instead of P new ) and return concatenation of P I, P k and P G as solution

DISTANCE AND GAIN METRICS Distance(I,G): Time default planner will take to solve Gain(I new,G new,I k,G k ): Distance(I new,G new ) Distance(I new,I k ) + Distance(G k,G new ) Choose case with maximum Gain There should be a minimum Distance and a minimum Gain for reuse

THE TRAINING PHASE Target function Time prediction, t Training experience Solved examples, D Target function representation n t =  w i f i (n features, f 0 = 1) i = 0 Learning algorithm Gradient descent: minimizes error, E, of weight vector w E(w) = ½  (t d - o d ) 2 d  D

GRADIENT DESCENT ALGORITHM Inputs:1. Training examples, where each example is a pair, where x is the input vector and t is target output value. 2. Learning rate,  3. Number of iterations, m Initialize each w i to some small random value repeat m times { Initialize each  w i to 0 for each training example do { Find the output o of the unit on input x for each linear unit weight w i do  w i =  w i +  (t - o)x i } for each linear unit weight w i do w i = w i +  w i }

FEATURE EXTRACTION Feature extraction by domain experts Knowledge-acquisition bottleneck Automatic feature extraction methods can be used Domain dependence: domain knowledge is crucial for efficient planning systems

PLAN RETRIEVAL AND REUSE ALGORITHM Inputs: LIBRARY, w, MinGain, MinTime and a new problem if Distance(I new, G new ) < MinTime then Call default planner to solve else { MaxGain = -  /* MaxGain records the maximum Gain so far */ for k = 1 to n do /* There are n cases in LIBRARY */ { k th case = Gain = Distance(I new, G new ) /[Distance(I new, I k ) + Distance(G k, G new )] if Gain > MaxGain then { MaxGain = Gain b = k /* b is the index of the best case found so far */ } if MaxGain > MinGain then { Call default planner to solve, which returns P I,b Call default planner to solve, which returns P G,b Return concatenation of P I,b, P b and P G,b } else Call default planner to solve }

EMPIRICAL EVALUATION Default planner: STRIPS (Shorts & Dickens) Learning: perceptron Blocks-world domain (3-7 blocks) Plan library (100 - 1000 cases) Common LISP, SPARCStation

EXAMPLE OF REUSE: SUSSMAN ANOMALY PROBLEM I new G new P I,K P G,k P K I k G k P K : MOVE-BLOCK-TO-TABLE(Blue,Red), MOVE-BLOCK-FROM-TABLE(Red,Blue) P I,K : MOVE-BLOCK-TO-BLOCK(Blue,Yellow,Red) P G,K : MOVE-BLOCK-FROM-TABLE(Yellow,Red)

BLOCKS-WORLD DOMAIN BLOCKS = {A, B, C, …} PREDICATES ON(A,B) – block A is on block B ON-TABLE(B) – block B is on table CLEAR(A) – block A is clear OPERATORS MOVE-BLOCK-TO-BLOCK(A,B,C) Move A from top of B to top of C MOVE-BLOCK-TO-TABLE(A,B) Move A from top of B to table MOVE-BLOCK-FROM-TABLE(A,B) Move A from table to top of B

FEATURES Domain-independent features Size of problem SIZE Number of conditions in goal state already satisfied in initial state SAT-CLEAR, SAT-ON, SAT-ON-TABLE Number of conditions in goal state not satisfied in initial state UNSAT-CLEAR, UNSAT-ON, UNSAT-ON-TABLE Domain-dependent features Number of stacks in the initial and goal states STACK-INIT, STACK-GOAL Number of blocks already in place i.e. they need not be moved to reach goal configuration IN-PLACE Heuristic function which guesses the number of planning steps STEPS

CONCLUSIONS Plan modification is avoided Problem ‘matching’ is done very efficiently The planning system is domain- independent Other target functions (like quality of plans) can be learned and predicted Utility problem Indexing the library Selective storage Integrate with other techniques

AN ADAPTIVE PLANNER BASED ON LEARNING OF PLANNING PERFORMANCE Kreshna Gopal & Thomas R. Ioerger Department of Computer Science Texas A&M University College.

Similar presentations

Presentation on theme: "AN ADAPTIVE PLANNER BASED ON LEARNING OF PLANNING PERFORMANCE Kreshna Gopal & Thomas R. Ioerger Department of Computer Science Texas A&M University College."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

AN ADAPTIVE PLANNER BASED ON LEARNING OF PLANNING PERFORMANCE Kreshna Gopal & Thomas R. Ioerger Department of Computer Science Texas A&M University College.

Similar presentations

Presentation on theme: "AN ADAPTIVE PLANNER BASED ON LEARNING OF PLANNING PERFORMANCE Kreshna Gopal & Thomas R. Ioerger Department of Computer Science Texas A&M University College."— Presentation transcript:

Similar presentations

About project

Feedback