Learning Teleoreactive Logic Programs by Observation Brian Salomaki, Dongkyu Choi, Negin Nejati, and Pat Langley Computational Learning Laboratory Stanford University, USA
Outline Motivation Overview Icarus: A reactive agent architecture Learning via Problem Solving Learning by Observation Preliminary Results Related Work Future Work
Motivation An intelligent agent existing in a real world should be able to encounter many scenarios and pursue different goals. There are two main approaches to overcome this need: The knowledge can be built in manually for all possible situations. The agent can expand its knowledge to the new scenarios itself by: Problem solving (lots of search) Might be too expensive Too slow Even impossible Learning by watching an expert
Learning by Observation (overview) Skill Hierarchy Problem Reactive Execution Initial State ? goal impasse? Effects of Primitive skills yes New Skills Expert’s Primitive Skill Sequence … Learning by Observation
Our Agent Architecture: ICARUS Perceptual Buffer Perception Long-Term Conceptual Memory Inference Short-Term Conceptual Memory Environment Long-Term Skill Memory Skill Retrieval Short-Term Skill Memory Action Motor Buffer
Teleoreactive Logic Programs ICARUS encodes long-term knowledge of three general types: Conceptual clauses Relational inference rules that refer to percepts (primitive) or other concepts (nonprimitive) Primitive skill clauses Stated as durative STRIPS operators Nonprimitive skill clauses Relational rules which specify: a head that indicates a goal the method achieves a set of (possibly defined) preconditions one or more ordered subskills for achieving the goal. Teleoreactive logic programs can be executed reactively but in a goal-directed manner (Nilsson, 1994).
Knowledge Representation I - Concept Hierarchy (clear (?block) :percepts ((block ?block)) :negatives ((on ?other ?block))) (unstackable (?block ?from) :percepts ((block ?block) (block ?from)) :positives ((on ?block ?from) (clear ?block) (hand-empty))) (on (?blk1 ?blk2) :percepts ((block ?blk1 x ?x1 y ?y1) (block ?blk2 x ?x2 y ?y2 h ?h)) :tests ((equal ?x1 ?x2) (>= ?y1 ?y2) (<= ?y1 (+ ?y2 ?h))))
Knowledge Representation II – Skill Hierarchy An example of a higher level skill in the hierarchy: (hand-empty () :percepts ((block ?c) (table ?t1)) :start ((putdownable ?c ?t1)) :ordered ((putdown ?c ?t1))) An example of the first level skill (operators): (putdown (?block ?t0) :percepts ((block ?block) (table ?t0 xpos ?xpos ypos ?ypos height ?height)) :start ((putdownable ?block ?t0)) :effects ((ontable ?block ?t0) (hand-empty)) :actions ((*horizontal-move ?block (+ ?xpos 1 (random100))) (*vertical-move ?block (+ ?ypos ?height)) (*ungrasp ?block)))
Inference and Execution Concepts are matched bottom up, starting from percepts. Skill paths are matched top down, starting from intentions. Primitive Skills Skill Instances’ Hierarchy Concept Instances’ Hierarchy
Learning Using Problem Solving Skill Hierarchy Problem Reactive Execution Initial State ? goal impasse? Effects of Primitive skills yes Executed plan Problem Solving Extracting new skills Problem solving involves means-ends analysis, except chaining occurs over both skills and concepts, and skills are executed whenever applicable.
Learning by Observation (LBO) : . State Sequence … Concept Hierarchy State Projection : . S0= Initial State New Skills Learning by Observation Expert’s Primitive Skill Sequence … Effects of Primitive skills Goal
Observational Inputs to Learning Module Learning by Observation Procedure Observational Inputs to Learning Module … Primitive skills’ Definitions Skill Chaining … Sn= S0= S1= S2= yes : . : . : . : . :effects no Concept instance Concept Chaining Primitive skill/operator Goal concept
Skill Chaining If made , the precondition of satisfied then learn: else learn: Set: goal Remove: from the expert’s trace Call LBO recursively (g() start: precondition on subskills: ( )) (g() start: subskills: )
Concept Chaining : … . : . : : . . … Expert’s trace Concept … State Sequence … Concept Definitions Call LBO module … subgoals: : . Parsing the Sequence Call LBO module : . : . : . ( () start: ( ,.., ) subskills: ( : )) Learning the new skill Call LBO module … Already satisfied
An Example from Blocks World Goal : (clear A) Expert’s trace: (unstack C B) (putdown C T) (unstack B A) Initial State: C B A T Known concepts: (on) (on-table) (clear) (holding) (hand-empty) (unstackable) (stackable) (pickupable) (putdownable) Known skills: (stack) (unstack) (pickup) (putdown)
Blocks World Example Cont’d clear (?B) :start ((unstackable ?C ?B)) :ordered ((unstack ?C ?B)) On B A Clear B unstackable B A clear A unstackable C B unstack B A Clear B unstack C B hand-empty putdownable C Hand-empty putdown C clear (?A) :start ((on ?B ?A)) :ordered ((unstackable ?B ?A) (unstack ?B ?A)) hand_empty :start ((putdownable ?C ?T1)) :ordered ((putdown ?C ?T1)) unstackable (?B ?A) :start ((on ?B ?A)) :ordered ((clear ?B) (hand-empty))
Preliminary Results Promising results in domains such as: Blocks World Depots Similar results to learning by problem solving when search is feasible After learning by observation, being able to solve problems that were not feasible to solve by problem solving
Related Work Explanation-Based Learning Behavioral Cloning Programming by Demonstration Mixed Initiative Learning
Future Work More experiments and evaluation (in progress) Irrelevant actions Interleaving goal achieving Using higher level skills Missing skills Multiple goals Unknown goal