Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning in Robots.

Similar presentations


Presentation on theme: "Learning in Robots."— Presentation transcript:

1 Learning in Robots

2

3 How are actions perceived? How is information parsed?
Gesture Recognition How are actions perceived? How is information parsed? Imitation Level of granularity: What is copied? Should it copy the intention, goal or dynamics of movement? Motor Learning How is information transferred across multiple modalities? Visuo-motor, Auditor-motor

4 BIOLOGICAL INSPIRATION
Prior to building any capability in robots, we might want to understand how the equivalent capability works in humans and other animals

5 Imitation Capabilities in Animals
Imitation Learning Gesture Recognition Imitation Capabilities in Animals Which species may exhibit imitation is still a main area of discussion and debate One differentiate “true” imitation from copying (flocking, schooling, following), stimulus enhancement, contagion or emulation Biological Inspiration Learning by Imitation

6 Imitation Capabilities in Animals
Imitation Learning Gesture Recognition Imitation Capabilities in Animals “True” imitation: Ability to learn new actions not part of the usual repertoire The appanage of humans only, and possibly great apes Biological Inspiration Learning by Imitation Whiten & Ham, Advances in the Study of Behaviour, 1992 Savage & Rumbaugh, Child Devel, 1993

7 Imitation Capabilities in Animals
Imitation Learning Gesture Recognition Imitation Capabilities in Animals Complex Imitation capabilities in Dolphins & Parrots. Large repertoire of imitation capabilities, demonstrating flexibility and generalization in different contexts. Biological Inspiration Learning by Imitation Moore, Behaviour, 1999. Herman, Imitation in Animals & Artifacts, MIT Press, 2002

8 Developmental Stages of Imitation
Imitation Learning Gesture Recognition Developmental Stages of Imitation Innate Facial Imitation (newborns  3 months) Tongue and lips protrusion, mouth-opening, head movements, cheek and brow motion, eye blinking Delayed imitation up to 24 hours  Imitation is mediated by a stored representation Biological Inspiration Learning by Imitation Meltzoff & Moore, Early Development and Parenting, 1997 Meltzoff & Moore, Developmental Psychology, 1989

9 Example: Predicting Joint Angles
Explorerobot: modified Walk. Only walk forward and turn Record joint commands and joint angles at each frame Learn mapping from: (Joint Angles, Joint Commands) to Next Joint Angles RAE – Relative Absolute Error RRSE – Relative Root Squared Error

10 Talking Robots Many talking robots exist, but they are still very primitive Actors for robot theatre, agents for advertisement, education and entertainment. Designing inexpensive natural size humanoid caricature and realistic robot heads Dog.com from Japan Machine Learning techniques used to teach robots behaviors, natural language dialogs and facial gestures.

11 Behavior, Dialog and Learning
Words communicate only about 35 % of the information transmitted from a sender to a receiver in a human-to-human communication. The remaining information is included in para-language. Emotions, thoughts, decision and intentions of a speaker can be recognized earlier than they are verbalized. Robot activity as a mapping of the sensed environment and internal states to behaviors and new internal states (emotions, energy levels, etc).

12 Neck and upper body movement generation

13 The perception learning tasks
Robot Vision: Where is a face? (Face detection) Who is this person (Face recognition, learning with supervisor, person’s name is given in the process. Age and gender of the person. Hand gestures. Emotions expressed as facial gestures (smile, eye movements, etc) Lips reading for speech recognition. Body language.

14 The perception learning tasks
Speech recognition: Who is this person (voice based speaker recognition, learning with supervisor, person’s name is given in the process.) Isolated words recognition for word spotting. Sentence recognition. Sensors. Temperature Touch movement

15 Learning the perception/behavior mappings
Tracking the human. Full gesticulation as a response to human behavior in dialogs and dancing/singing. Modification of semi-autonomous behaviors such as breathing, eye blinking, mechanical hand withdrawals, speech acts as response to person’s behaviors. Playing games with humans. Body contact with human such as safe gesticulation close to human and hand shaking.

16 Machine Learning Algorithms
Supervised learning ( ) Classification (discrete labels), Regression (real values) Unsupervised learning ( ) Finding association (in features) Reinforcement learning Decision making (robot, chess machine)

17 Supervised Machine Learning

18 Military/Government Robots
US Soldiers in Afghanistan being trained how to defuse a landmine using a PackBot.

19 Mars Rovers – Spirit and Opportunity
Space Robots Mars Rovers – Spirit and Opportunity Autonomous navigation features with human remote control and oversight

20 Unsupervised learning

21 Applications Social Bookmarking Socialized Bookmarks Tags

22 Probabilistic State Machines to describe emotions
“you are beautiful” / ”Thanks for a compliment” P=1 Happy state “you are blonde!” / ”I am not an idiot” P=0.3 “you are blonde!” / Do you suggest I am an idiot?” Unhappy state P=0.7 Ironic state

23 Probabilistic Grammars for performances
Speak ”Professor Perky”, blinks eyes twice P=0.1 Speak ”Professor Perky” Where? P=0.3 Who? P=0.5 P=0.5 P=0.5 Speak “in some location”, smiles broadly Speak “In the classroom”, shakes head Speak ”Doctor Lee” What? P=0.1 P=0.1 Speak “Was singing and dancing” P=0.1 P=0.1 Speak “Was drinking wine” ….

24 Example “Age Recognition”
Name (examples) Age (output) d Smile Height Hair Color Ahmed Kid (0) a(3) b(0) c(0) Aya Teenager (1) a(2) b(1) c(1) Rana Mid-age (2) a(1) b(2) c(2) AbdElRahman Old (3) a(0) b(3) c(3) Examples of data for learning, four people, given to the system

25 Example “Age Recognition”
Smile - a Very often often moderately rarely Values 3 2 1 Height - b Very Tall Tall Middle Short Color - c Grey Black Brown Blonde Encoding of features, values of multiple-valued variables

26 Multi-valued Map for Data
Groups show a simple induction from the Data ab\ c 1 2 3 00 - 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33 ab\ c 1 2 3 00 - 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33 d = F( a, b, c )

27 Old people smile rarely
Groups show a simple induction from the Data blonde hair Grey hair ab\ c 1 2 3 00 - 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33 Old people smile rarely Middle-age people smile moderately Teenagers smile often Children smile very often

28 Another example: teaching movements
Input variables Output variables

29 Example 1 of Rules for Thinning Algorithm
All four rules can be illustrated like that New and old one Old one Don’t care Rule 2 Rule 3 Rule 4 Rule 1

30 Human Affect as Reinforcement to Robot
Interactive robot learning Learning by Example Learning by Guidance Future directed learning cues Anticipatory reward Learning by Feedback Additional reinforcement signal (Breazeal & Velasquez; Isbell et al; Mitsunaga et al; Papudesi & Hubert) In our experiment: affective signal as additional reinforcement

31 Reinforcement Learning
Supervised (inductive) learning is the simplest and most studied type of learning How can an agent learn behaviors when it doesn’t have a teacher to tell it how to perform? The agent has a task to perform It takes some actions in the world At some later point, it gets feedback telling it how well it did on performing the task The agent performs the same task over and over again This problem is called reinforcement learning: The agent gets positive reinforcement for tasks done well The agent gets negative reinforcement for tasks done poorly

32 Human Affect as Reinforcement to Robot
Affective signal as additional reinforcement Web cam Emotional expression analysis Positive emotion (happy) = reward Negative emotion (sad) = punishment So: emotional expression is used in learning as rhuman, a social reward coming from the human observer Note: We interpret happy as positively valenced and sad as negatively valenced. THIS IS A SIMPLIFIED SETUP THAT ENABLES US TO TEST OUR HYPOTHESIS!

33 Passive vs. Active learning
Passive learning The agent has a fixed policy and tries to learn the utilities of states by observing the world go by Analogous to policy evaluation Often serves as a component of active learning algorithms Often inspires active learning algorithms Active learning The agent attempts to find an optimal (or at least good) policy by acting in the world Analogous to solving the underlying MDP 33

34 Formalization Given: Output: a state space S
a set of actions a1, …, ak reward value at the end of each trial (may be positive or negative) Output: a mapping from states to actions example: Alvinn (driving agent) state: configuration of the car learn a steering action for each state

35 Policy (Reactive/Closed-Loop Strategy)
-1 +1 2 3 1 4 A policy P is a complete mapping from states to actions

36 Value Function The agent knows what state it is in
The agent has a number of actions it can perform in each state. Initially, it doesn't know the value of any of the states If the outcome of performing an action at a state is deterministic, then the agent can update the utility value U() of states: U(oldstate) = reward + U(newstate) The agent learns the utility values of states as it works its way through the state space

37 Q-Learning Q-learning augments value iteration by maintaining an estimated utility value Q(s,a) for every action at every state The utility of a state U(s), or Q(s), is simply the maximum Q value over all the possible actions at that state Learns utilities of actions (not states)  model-free learning

38 Q-Learning foreach state s foreach action a Q(s,a)=0 s=currentstate do forever a = select an action do action a r = reward from doing a t = resulting state from doing a Q(s,a) = (1 – ) Q(s,a) +  (r +  Q(t)) s = t The learning coefficient, , determines how quickly our estimates are updated Normally,  is set to a small positive constant less than 1

39 Robot in a room +1 -1 reward +1 at [4,3], -1 at [4,2]
START actions: UP, DOWN, LEFT, RIGHT UP 80% move UP 10% move LEFT 10% move RIGHT want to learn a policy (what’s the solution?) can we learn it using (un)supervised learning? why not? so how do we learn it? any ideas? let the robot explore the environment reward +1 at [4,3], -1 at [4,2] reward for each step what’s the strategy to achieve max reward? what if the actions were deterministic?

40 Is this a solution? +1 -1 only if actions deterministic
not in this case (actions are stochastic) solution/policy mapping from each state to an action

41 Reward for each step -2 +1 -1

42 Reward for each step: -0.1 +1 -1

43 Reward for each step: -0.04 +1 -1

44 Reward for each step: -0.01 +1 -1

45 Reward for each step: +0.01 +1 -1

46 Example: Recycling Robot

47 Gridworld Example


Download ppt "Learning in Robots."

Similar presentations


Ads by Google