Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Learning through Interactive Behavior Specifications Tolga Konik CSLI, Stanford University Douglas Pearson Three Penny Software John Laird University.

Similar presentations


Presentation on theme: "1 Learning through Interactive Behavior Specifications Tolga Konik CSLI, Stanford University Douglas Pearson Three Penny Software John Laird University."— Presentation transcript:

1 1 Learning through Interactive Behavior Specifications Tolga Konik CSLI, Stanford University Douglas Pearson Three Penny Software John Laird University of Michigan

2 2 Goal Automatically generate cognitive agents Reduce the cost of agent development Reduce the expertise required to develop agents.

3 3 Domains Autonomous Cognitive agents Dynamic Virtual Worlds Real time decisions based on knowledge and sensed data Soar agent architecture

4 4 Learning by Observation Approach: Observe expert behavior Learn to replicate it Why? We may want human-like agents In complex domains, imitating humans maybe easier than learning from scratch

5 5 Bottleneck in pure Learning by Observation PROBLEM: You cannot observe the internal reasoning of the expert SOLUTION: Ask the expert for additional information Goal annotations Use additional knowledge sources Task & domain knowledge

6 6 Learning by Observation Agent ActionsPercepts Learner Goal annotations Additional Task Knowledge Interface Environment Expert

7 7 Agent Interface Environment ILP 2004 Machine Learning Journal (forthcoming) Learning by Observation

8 8 Learning by Observation Critic Mode Agent Interface Environment Expert critic Learner

9 9 One Body, Two Minds ? How and when to switch control How the expert and the agent program communicate ? Agent Interface Environment Expert

10 10 Expert Diagrammatic Behavior Specification Agent Environment Redux Learner

11 11 Redux Visual rule editing Diagrammatic Behavior Specification

12 12 Get-item-in-room(Item) Get-item(Item) Go-through(Door) Goto-next-room Get-item-different-room(Item) Go-to-door(D)Go-to(Door) Goal Hierarchy Task-Performance knowledge is represented with a hierarchy of durative goals. i3 r1 r2 r3 r4 d1 d2 d3d4 d5 d6 i4 i3

13 13 r1 r2 r3 r4 d1 d2 d3d4 d5 d6 i4 i3 Get-item-in-room(Item) Get-item(i3) Go-through(Door) Goto-next-roomGet-item-different-room(Item) Go-to-door(D) Go-to(Door) i3 Get-item-in-room(i3) Item=i3 Goal Hierarchy

14 14 r1 r2 r3 r4 d1 d2 d3d4 d5 d6 i4 i3 Get-item-different-room(Item)Get-item-different-room(i3) Go-to(Door) Get-item-in-room(Item) Get-item(i3) Go-through(Door) Go-to(d1) i3 Door=d1 Item=i3 Goal Hierarchy

15 15 r1 r2 r3 r4 d1 d2 d3d4 d5 d6 i4 i3 Get-item-in-room(Item) Get-item(i3) Go-through(d1) Goto-next-room Get-item-different-room(i3) Go-to-door(D)Go-to(Door) i3 Door=d1 Goal Hierarchy

16 16 r1 r2 r3 r4 d1 d2 d3d4 d5 d6 i4 i3 Get-item-in-room(Item) Get-item(i3) Go-through(Door) Goto-next-room Get-item-different-room(i3) Go-to-door(D)Go-to(d3) i3 Door=d3 Goal Hierarchy

17 17 Behavior Specification Agent Expert Expert draws initial abstract situation Create senario by selecting actions

18 18 Goal Specification Agent Expert Goals are explicitly selected The agent contributes based on the current situation, current goal and its knowledge

19 19 Switching Roles Expert generates behavior if the agent doesn’t know how to pursue the current goal Agent may propose goals, subgoals and actions If the agent is correct, the expert observes and validates Otherwise rejects, corrects, or takes over Key to the interaction is shared goals shared assumption about the current situation

20 20 Goal Hierarchy Learning by Observation perspective Unobservable mental reasoning of the expert Learning Perspective Bias hypothesis space “learn agent” problem reduced to “learn goal selection and termination” MI Perspective information exchange between the expert and the agent

21 21 Relevant Knowledge Specification Agent Prepare food Expert can mark important objects in a decision Expert

22 22 Expert specified undesired actions and goals Expert rejected actions and goals of the approximately learned agent program Watch TV Rich Behavior Trace

23 23 Hypothetical Actions and Goals Situation history : a tree structure of possible behaviors Rich Behavior Trace

24 24 Input: Relational Situations Goal and action selections and rejections Additional annotations (i.e. important objects) Background knowledge Output: Rule based agent program Learn goal/action selection/termination generalizing over multiple examples Inductive Logic Programming to combine rich knowledge structures Relational Learning by Observation

25 25 Relational Learning by Observation

26 26 Find the common structures in the decision examples Relational Learning by Observation

27 27 ? “Select a door in the current room, which leads to a room that contains the item the agent wants to get” Learn relations between what the agent wants, perceives and knows. Relational Learning by Observation

28 28 Comparing Redux to LBO Advantages of Redux No real time constraints on behavior i.e. no waiting for a 2 hour long goal can be used to describe unlikely, but critical situations i.e. “Let’s assume that there is a nuclear melt-down.” Richer annotation opportunities Increase learning speed and quality Faster focus where knowledge is lacked most Immediate expert feedback on how rules behave

29 29 Comparing Redux to LBO Disadvantages of Redux Can’t learn low level behavior. Contains domain specific components Although most of Redux is domain independent Generating behavior may be slower. Additional annotations improve learning but require extra expert effort

30 30 Relational Behavior Trace A Situation: a symbolic snapshot of the observed environment at a time Behavior Trace : The Set of Situations in execution history

31 31 Annotated Behavior Traces Behavior is annotated with actions and goals: goto-room(r1), etc.

32 32 Summary Diagrammatic behavior specification approach: To extract rich behavior knowledge Interactive behavior specification Communication medium between the agents (explicit goals and assumed situation) Relational learning by observation approach to combine multiple complex knowledge sources

33 33 Future Work Improve mixed initiative interaction of the interface Explore domain independent diagrammatic interface features Allow the expert to enter context sensitive knowledge

34 34 Mixed initiative perspective Interactive behavior specification Diagrammatic representation of behavior communication medium between the agents Explicit goals and desired behavior Facilitates interaction between the agents


Download ppt "1 Learning through Interactive Behavior Specifications Tolga Konik CSLI, Stanford University Douglas Pearson Three Penny Software John Laird University."

Similar presentations


Ads by Google