1 Learning through Interactive Behavior Specifications Tolga Konik CSLI, Stanford University Douglas Pearson Three Penny Software John Laird University.

1 Learning through Interactive Behavior Specifications Tolga Konik CSLI, Stanford University Douglas Pearson Three Penny Software John Laird University of Michigan

2 Goal Automatically generate cognitive agents Reduce the cost of agent development Reduce the expertise required to develop agents.

3 Domains Autonomous Cognitive agents Dynamic Virtual Worlds Real time decisions based on knowledge and sensed data Soar agent architecture

4 Learning by Observation Approach: Observe expert behavior Learn to replicate it Why? We may want human-like agents In complex domains, imitating humans maybe easier than learning from scratch

5 Bottleneck in pure Learning by Observation PROBLEM: You cannot observe the internal reasoning of the expert SOLUTION: Ask the expert for additional information Goal annotations Use additional knowledge sources Task & domain knowledge

6 Learning by Observation Agent ActionsPercepts Learner Goal annotations Additional Task Knowledge Interface Environment Expert

7 Agent Interface Environment ILP 2004 Machine Learning Journal (forthcoming) Learning by Observation

8 Learning by Observation Critic Mode Agent Interface Environment Expert critic Learner

9 One Body, Two Minds ? How and when to switch control How the expert and the agent program communicate ? Agent Interface Environment Expert

10 Expert Diagrammatic Behavior Specification Agent Environment Redux Learner

11 Redux Visual rule editing Diagrammatic Behavior Specification

12 Get-item-in-room(Item) Get-item(Item) Go-through(Door) Goto-next-room Get-item-different-room(Item) Go-to-door(D)Go-to(Door) Goal Hierarchy Task-Performance knowledge is represented with a hierarchy of durative goals. i3 r1 r2 r3 r4 d1 d2 d3d4 d5 d6 i4 i3

13 r1 r2 r3 r4 d1 d2 d3d4 d5 d6 i4 i3 Get-item-in-room(Item) Get-item(i3) Go-through(Door) Goto-next-roomGet-item-different-room(Item) Go-to-door(D) Go-to(Door) i3 Get-item-in-room(i3) Item=i3 Goal Hierarchy

14 r1 r2 r3 r4 d1 d2 d3d4 d5 d6 i4 i3 Get-item-different-room(Item)Get-item-different-room(i3) Go-to(Door) Get-item-in-room(Item) Get-item(i3) Go-through(Door) Go-to(d1) i3 Door=d1 Item=i3 Goal Hierarchy

15 r1 r2 r3 r4 d1 d2 d3d4 d5 d6 i4 i3 Get-item-in-room(Item) Get-item(i3) Go-through(d1) Goto-next-room Get-item-different-room(i3) Go-to-door(D)Go-to(Door) i3 Door=d1 Goal Hierarchy

16 r1 r2 r3 r4 d1 d2 d3d4 d5 d6 i4 i3 Get-item-in-room(Item) Get-item(i3) Go-through(Door) Goto-next-room Get-item-different-room(i3) Go-to-door(D)Go-to(d3) i3 Door=d3 Goal Hierarchy

17 Behavior Specification Agent Expert Expert draws initial abstract situation Create senario by selecting actions

18 Goal Specification Agent Expert Goals are explicitly selected The agent contributes based on the current situation, current goal and its knowledge

19 Switching Roles Expert generates behavior if the agent doesn’t know how to pursue the current goal Agent may propose goals, subgoals and actions If the agent is correct, the expert observes and validates Otherwise rejects, corrects, or takes over Key to the interaction is shared goals shared assumption about the current situation

20 Goal Hierarchy Learning by Observation perspective Unobservable mental reasoning of the expert Learning Perspective Bias hypothesis space “learn agent” problem reduced to “learn goal selection and termination” MI Perspective information exchange between the expert and the agent

21 Relevant Knowledge Specification Agent Prepare food Expert can mark important objects in a decision Expert

22 Expert specified undesired actions and goals Expert rejected actions and goals of the approximately learned agent program Watch TV Rich Behavior Trace

23 Hypothetical Actions and Goals Situation history : a tree structure of possible behaviors Rich Behavior Trace

24 Input: Relational Situations Goal and action selections and rejections Additional annotations (i.e. important objects) Background knowledge Output: Rule based agent program Learn goal/action selection/termination generalizing over multiple examples Inductive Logic Programming to combine rich knowledge structures Relational Learning by Observation

25 Relational Learning by Observation

26 Find the common structures in the decision examples Relational Learning by Observation

27 ? “Select a door in the current room, which leads to a room that contains the item the agent wants to get” Learn relations between what the agent wants, perceives and knows. Relational Learning by Observation

28 Comparing Redux to LBO Advantages of Redux No real time constraints on behavior i.e. no waiting for a 2 hour long goal can be used to describe unlikely, but critical situations i.e. “Let’s assume that there is a nuclear melt-down.” Richer annotation opportunities Increase learning speed and quality Faster focus where knowledge is lacked most Immediate expert feedback on how rules behave

29 Comparing Redux to LBO Disadvantages of Redux Can’t learn low level behavior. Contains domain specific components Although most of Redux is domain independent Generating behavior may be slower. Additional annotations improve learning but require extra expert effort

30 Relational Behavior Trace A Situation: a symbolic snapshot of the observed environment at a time Behavior Trace : The Set of Situations in execution history

31 Annotated Behavior Traces Behavior is annotated with actions and goals: goto-room(r1), etc.

32 Summary Diagrammatic behavior specification approach: To extract rich behavior knowledge Interactive behavior specification Communication medium between the agents (explicit goals and assumed situation) Relational learning by observation approach to combine multiple complex knowledge sources

33 Future Work Improve mixed initiative interaction of the interface Explore domain independent diagrammatic interface features Allow the expert to enter context sensitive knowledge

34 Mixed initiative perspective Interactive behavior specification Diagrammatic representation of behavior communication medium between the agents Explicit goals and desired behavior Facilitates interaction between the agents

1 Learning through Interactive Behavior Specifications Tolga Konik CSLI, Stanford University Douglas Pearson Three Penny Software John Laird University.

Similar presentations

Presentation on theme: "1 Learning through Interactive Behavior Specifications Tolga Konik CSLI, Stanford University Douglas Pearson Three Penny Software John Laird University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Learning through Interactive Behavior Specifications Tolga Konik CSLI, Stanford University Douglas Pearson Three Penny Software John Laird University.

Similar presentations

Presentation on theme: "1 Learning through Interactive Behavior Specifications Tolga Konik CSLI, Stanford University Douglas Pearson Three Penny Software John Laird University."— Presentation transcript:

Similar presentations

About project

Feedback