Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning from how dogs learn Prof. Bruce Blumberg The Media Lab, MIT Prof. Bruce Blumberg The Media Lab, MIT.

Similar presentations


Presentation on theme: "Learning from how dogs learn Prof. Bruce Blumberg The Media Lab, MIT Prof. Bruce Blumberg The Media Lab, MIT."— Presentation transcript:

1 Learning from how dogs learn Prof. Bruce Blumberg The Media Lab, MIT bruce@media.mit.eduwww.media.mit.edu/~bruce Prof. Bruce Blumberg The Media Lab, MIT bruce@media.mit.eduwww.media.mit.edu/~bruce

2 About me…

3

4 Practical & compelling real-time learning Easy for interactive characters to learn what they ought to be able to learn Easy for a human trainer to guide learning process A compelling user experience Provide heuristics and practical design principles Easy for interactive characters to learn what they ought to be able to learn Easy for a human trainer to guide learning process A compelling user experience Provide heuristics and practical design principles

5 My bias & focus Learning occurs within an innate structure that biases… Learning occurs within an innate structure that biases… Attention Motivation Innate frequency, form and organization of behavior When certain things are most easily learned What are the catalytic components of the scaffolding that make learning possible? What are the catalytic components of the scaffolding that make learning possible? Learning occurs within an innate structure that biases… Learning occurs within an innate structure that biases… Attention Motivation Innate frequency, form and organization of behavior When certain things are most easily learned What are the catalytic components of the scaffolding that make learning possible? What are the catalytic components of the scaffolding that make learning possible?

6 sheep|dog:trial by eire See sheep|dog video on my website

7 Object persistence See object persistence video on my website

8 Temporal representation See temporal representation (aka Goatzilla) video on my website

9 Alpha Wolf See alpha wolf video on my website

10 Rover@home See rover@home video on my website or go to Scientific American Frontiers website

11 Dobie T. Coyote Goes to School See Dobie video on my website

12 Why look at Dog Training? Interactive characters pose unique challenges: Interactive characters pose unique challenges: State, action and state-action spaces are often continuous and far too big to search exhaustively To be compelling characters must Learn “obvious” contingencies between state, actions and consequences quickly Easy to train without visibility into internal state of character. Learning is only one thing they have to do. Dogs and their trainers seem to solve these problems easily Dogs and their trainers seem to solve these problems easily Interactive characters pose unique challenges: Interactive characters pose unique challenges: State, action and state-action spaces are often continuous and far too big to search exhaustively To be compelling characters must Learn “obvious” contingencies between state, actions and consequences quickly Easy to train without visibility into internal state of character. Learning is only one thing they have to do. Dogs and their trainers seem to solve these problems easily Dogs and their trainers seem to solve these problems easily

13 Invaluable resources Doing it, and talking to people who do it. Doing it, and talking to people who do it. Wilkes, Pryor, Ramirez Wilkes, Pryor, Ramirez Lindsay, Burch & Bailey, Mackintosh Lindsay, Burch & Bailey, Mackintosh Lorenz, Leyhausen, Coppinger & Coppinger Lorenz, Leyhausen, Coppinger & Coppinger Doing it, and talking to people who do it. Doing it, and talking to people who do it. Wilkes, Pryor, Ramirez Wilkes, Pryor, Ramirez Lindsay, Burch & Bailey, Mackintosh Lindsay, Burch & Bailey, Mackintosh Lorenz, Leyhausen, Coppinger & Coppinger Lorenz, Leyhausen, Coppinger & Coppinger

14 The problem facing dogs (real and synthetic) Set of all possible actions Set of all motivational goals Set of all possible stimuli What do I do, when, in order to best satisfy my motivational goals?

15 The space of possible stimuli is wicked big Set of all possible stimuli Smells Motion Sounds Dog sounds Speech Whistles Modality of Stimuli Time of Occurence State Space

16 The space of possible actions is also very big Set of all possible actions Action Time of Performance Figure -8 Shake Low shake High -5 Beg Down Left ear twitch Action Space

17 Who gets credit for good things happening? Yumm.. Action Figure -8 Shake Low shake High -5 Beg Down Left ear twitch Motion Sounds Dog sounds Speech Whistles Modality of Stimuli

18 Who gets credit for good things happening? stalkgrab-biteeyeorientkill-bitechase Yumm.. Time

19 Conventional idea: back propagation from goal stalk grab-bite eye orient kill-bite chase Yumm.. TimeCredit flows backward

20 Conventional idea: back propagation from goal stalk grab-bite eye orient kill-bite chase Yumm.. TimeCredit flows backward

21 Conventional idea: back propagation from goal stalk grab-bite eye orient kill-bite chase Yumm.. TimeCredit flows backward

22 The problem If each element in sequence has 3 variants, there are 729 possible combinations of which 1 may work (ignoring stimuli) If each element in sequence has 3 variants, there are 729 possible combinations of which 1 may work (ignoring stimuli) If there are 12 possible stimuli, there are 1,586,874,322,944 possible combinations of stimuli-action pairs to explore. If there are 12 possible stimuli, there are 1,586,874,322,944 possible combinations of stimuli-action pairs to explore. Don’t know if it is the right sequence until goal is reached Don’t know if it is the right sequence until goal is reached What happens if “variant” needs to be learned? What happens if “variant” needs to be learned? If each element in sequence has 3 variants, there are 729 possible combinations of which 1 may work (ignoring stimuli) If each element in sequence has 3 variants, there are 729 possible combinations of which 1 may work (ignoring stimuli) If there are 12 possible stimuli, there are 1,586,874,322,944 possible combinations of stimuli-action pairs to explore. If there are 12 possible stimuli, there are 1,586,874,322,944 possible combinations of stimuli-action pairs to explore. Don’t know if it is the right sequence until goal is reached Don’t know if it is the right sequence until goal is reached What happens if “variant” needs to be learned? What happens if “variant” needs to be learned?

23 Leyhausen’s suggestion… stalk grab-bite eye orient kill-bite chase TimeEach element is innately self- motivating and has innate reward metric motivation & reward

24 Leyhausen’s suggestion… stalk grab-bite eye orient kill-bite chase TimeEach element is innately self- motivating and has innate reward metric motivation & reward

25 Coppinger’s suggestion… stalk grab-bite eye orient kill-bite chase TimeVarying innate tendency to follow behavior with “next” in sequence

26 Functional goal plays incidental role stalk grab-bite eye orient kill-bite chase TimePropagated value from functional goal plays incidental role Yumm..

27 Big idea: innate biases make learning possible Biases include… Biases include… Temporal Proximity implies causality Attend more readily to certain classes of stimuli than to others (motion vs. speech) Lazy discovery (pay attention once you have a reason to pay attention) Elements may be “innately” self-motivating and have local metric of “goodness” Biases include… Biases include… Temporal Proximity implies causality Attend more readily to certain classes of stimuli than to others (motion vs. speech) Lazy discovery (pay attention once you have a reason to pay attention) Elements may be “innately” self-motivating and have local metric of “goodness”

28 Good trainers actively guide dog’s exploration Behavioral Behavioral Train behavior, then cue Differential rewards encourage variability Motor Motor Shaping Rewarding successive approximations Luring Pose, e.g. “down” Trajectory, e.g. “figure-8” Behavioral Behavioral Train behavior, then cue Differential rewards encourage variability Motor Motor Shaping Rewarding successive approximations Luring Pose, e.g. “down” Trajectory, e.g. “figure-8”

29 Dogs constrain search for causal agents Time Consequences Window: Trainer “clicks” signaling reward is coming. When reward is actually received Attention Window: Cue given immediately before or as dog is moving into desired pose SitApproachEat Dogs make the problem tractable by constraining search for causal agents to narrow temporal windows

30 Dogs use implicit feedback to guide perceptual learning Sit Time “sit-utterance” perceived. ApproachEat “click” perceived. Dog decides to sit Build & update perceptual model of “sit-utterance” Dogs use rewarded action to identify potentially promising state to explore and to guide formation of perceptual models

31 Dogs give credit where credit is due… Trainer repeatedly lures dog through a trajectory or into a pose Trainer repeatedly lures dog through a trajectory or into a pose Eventually, dog performs behavior spontaneously Eventually, dog performs behavior spontaneously Implication Implication Dog associates reward with resulting body configuration or trajectory and not just with “follow-your nose” Trainer repeatedly lures dog through a trajectory or into a pose Trainer repeatedly lures dog through a trajectory or into a pose Eventually, dog performs behavior spontaneously Eventually, dog performs behavior spontaneously Implication Implication Dog associates reward with resulting body configuration or trajectory and not just with “follow-your nose”

32 Observation: dogs give credit where credit is due Sit Time “sit-utterance” perceived. ApproachEat “click” perceived. Dog decides to sit 1.Credit sitting in presence of “sit-utterance” 2.Build & update perceptual model of “sit- utterance”

33 D.L.: Take Advantage of Predictable Regularities Constrain search for causal agents by taking advantage of temporal proximity & natural hierarchy of state spaces Constrain search for causal agents by taking advantage of temporal proximity & natural hierarchy of state spaces Use consequences to bias choice of action But vary performance and attend to differences Explore state and action spaces on “as- needed” basis Explore state and action spaces on “as- needed” basis Build models on demand Constrain search for causal agents by taking advantage of temporal proximity & natural hierarchy of state spaces Constrain search for causal agents by taking advantage of temporal proximity & natural hierarchy of state spaces Use consequences to bias choice of action But vary performance and attend to differences Explore state and action spaces on “as- needed” basis Explore state and action spaces on “as- needed” basis Build models on demand

34 D.L.: Make Use of All Feedback: Explicit & Implicit Use rewarded action as context for identifying Use rewarded action as context for identifying Promising state space and action space to explore Good examples from which to construct perceptual models, e.g., A good example of a “sit-utterance” is one that occurs within the context of a rewarded Sit. Use rewarded action as context for identifying Use rewarded action as context for identifying Promising state space and action space to explore Good examples from which to construct perceptual models, e.g., A good example of a “sit-utterance” is one that occurs within the context of a rewarded Sit.

35 D.L.: Make Them Easy to Train Respond quickly to “obvious” contingencies Respond quickly to “obvious” contingencies Support Luring and Shaping Support Luring and Shaping Techniques to prompt infrequently expressed or novel motor actions “Trainer friendly” credit assignment “Trainer friendly” credit assignment Assign credit to candidate that matches trainer’s expectation Respond quickly to “obvious” contingencies Respond quickly to “obvious” contingencies Support Luring and Shaping Support Luring and Shaping Techniques to prompt infrequently expressed or novel motor actions “Trainer friendly” credit assignment “Trainer friendly” credit assignment Assign credit to candidate that matches trainer’s expectation

36 The System

37 Dobie T. Coyote… See dobie video on my website

38 Limitations and Future Work Important extensions Important extensions Other kinds of learning (e.g., social or spatial) Generalization Sequences Expectation-based emotion system How will the system scale? How will the system scale? Important extensions Important extensions Other kinds of learning (e.g., social or spatial) Generalization Sequences Expectation-based emotion system How will the system scale? How will the system scale?

39 Useful Insights Use Use Temporal proximity to limit search. Hierarchical representations of state, action and state-action space & use implicit feedback to guide exploration “trainer friendly” credit assignment Luring and shaping are essential Luring and shaping are essential Use Use Temporal proximity to limit search. Hierarchical representations of state, action and state-action space & use implicit feedback to guide exploration “trainer friendly” credit assignment Luring and shaping are essential Luring and shaping are essential

40 Acknowledgements Members of the Synthetic Characters Group, past, present & future Members of the Synthetic Characters Group, past, present & future Gary Wilkes Gary Wilkes Funded by the Digital Life Consortium Funded by the Digital Life Consortium Members of the Synthetic Characters Group, past, present & future Members of the Synthetic Characters Group, past, present & future Gary Wilkes Gary Wilkes Funded by the Digital Life Consortium Funded by the Digital Life Consortium


Download ppt "Learning from how dogs learn Prof. Bruce Blumberg The Media Lab, MIT Prof. Bruce Blumberg The Media Lab, MIT."

Similar presentations


Ads by Google