Learning in Robots.

Learning in Robots

How are actions perceived? How is information parsed?
Gesture Recognition How are actions perceived? How is information parsed? Imitation Level of granularity: What is copied? Should it copy the intention, goal or dynamics of movement? Motor Learning How is information transferred across multiple modalities? Visuo-motor, Auditor-motor

BIOLOGICAL INSPIRATION
Prior to building any capability in robots, we might want to understand how the equivalent capability works in humans and other animals

Imitation Capabilities in Animals
Imitation Learning Gesture Recognition Imitation Capabilities in Animals Which species may exhibit imitation is still a main area of discussion and debate One differentiate “true” imitation from copying (flocking, schooling, following), stimulus enhancement, contagion or emulation Biological Inspiration Learning by Imitation

Imitation Learning Gesture Recognition Imitation Capabilities in Animals “True” imitation: Ability to learn new actions not part of the usual repertoire The appanage of humans only, and possibly great apes Biological Inspiration Learning by Imitation Whiten & Ham, Advances in the Study of Behaviour, 1992 Savage & Rumbaugh, Child Devel, 1993

Imitation Learning Gesture Recognition Imitation Capabilities in Animals Complex Imitation capabilities in Dolphins & Parrots. Large repertoire of imitation capabilities, demonstrating flexibility and generalization in different contexts. Biological Inspiration Learning by Imitation Moore, Behaviour, 1999. Herman, Imitation in Animals & Artifacts, MIT Press, 2002

Developmental Stages of Imitation
Imitation Learning Gesture Recognition Developmental Stages of Imitation Innate Facial Imitation (newborns  3 months) Tongue and lips protrusion, mouth-opening, head movements, cheek and brow motion, eye blinking Delayed imitation up to 24 hours  Imitation is mediated by a stored representation Biological Inspiration Learning by Imitation Meltzoff & Moore, Early Development and Parenting, 1997 Meltzoff & Moore, Developmental Psychology, 1989

Example: Predicting Joint Angles
Explorerobot: modified Walk. Only walk forward and turn Record joint commands and joint angles at each frame Learn mapping from: (Joint Angles, Joint Commands) to Next Joint Angles RAE – Relative Absolute Error RRSE – Relative Root Squared Error

Talking Robots Many talking robots exist, but they are still very primitive Actors for robot theatre, agents for advertisement, education and entertainment. Designing inexpensive natural size humanoid caricature and realistic robot heads Dog.com from Japan Machine Learning techniques used to teach robots behaviors, natural language dialogs and facial gestures.

Behavior, Dialog and Learning
Words communicate only about 35 % of the information transmitted from a sender to a receiver in a human-to-human communication. The remaining information is included in para-language. Emotions, thoughts, decision and intentions of a speaker can be recognized earlier than they are verbalized. Robot activity as a mapping of the sensed environment and internal states to behaviors and new internal states (emotions, energy levels, etc).

Neck and upper body movement generation

The perception learning tasks
Robot Vision: Where is a face? (Face detection) Who is this person (Face recognition, learning with supervisor, person’s name is given in the process. Age and gender of the person. Hand gestures. Emotions expressed as facial gestures (smile, eye movements, etc) Lips reading for speech recognition. Body language.

The perception learning tasks
Speech recognition: Who is this person (voice based speaker recognition, learning with supervisor, person’s name is given in the process.) Isolated words recognition for word spotting. Sentence recognition. Sensors. Temperature Touch movement

Learning the perception/behavior mappings
Tracking the human. Full gesticulation as a response to human behavior in dialogs and dancing/singing. Modification of semi-autonomous behaviors such as breathing, eye blinking, mechanical hand withdrawals, speech acts as response to person’s behaviors. Playing games with humans. Body contact with human such as safe gesticulation close to human and hand shaking.

Machine Learning Algorithms
Supervised learning ( ) Classification (discrete labels), Regression (real values) Unsupervised learning ( ) Finding association (in features) Reinforcement learning Decision making (robot, chess machine)

Supervised Machine Learning

Military/Government Robots
US Soldiers in Afghanistan being trained how to defuse a landmine using a PackBot.

Mars Rovers – Spirit and Opportunity
Space Robots Mars Rovers – Spirit and Opportunity Autonomous navigation features with human remote control and oversight

Unsupervised learning

Applications Social Bookmarking Socialized Bookmarks Tags

Probabilistic State Machines to describe emotions
“you are beautiful” / ”Thanks for a compliment” P=1 Happy state “you are blonde!” / ”I am not an idiot” P=0.3 “you are blonde!” / Do you suggest I am an idiot?” Unhappy state P=0.7 Ironic state

Probabilistic Grammars for performances
Speak ”Professor Perky”, blinks eyes twice P=0.1 Speak ”Professor Perky” Where? P=0.3 Who? P=0.5 P=0.5 P=0.5 Speak “in some location”, smiles broadly Speak “In the classroom”, shakes head Speak ”Doctor Lee” What? P=0.1 P=0.1 Speak “Was singing and dancing” P=0.1 P=0.1 Speak “Was drinking wine” ….

Example “Age Recognition”
Name (examples) Age (output) d Smile Height Hair Color Ahmed Kid (0) a(3) b(0) c(0) Aya Teenager (1) a(2) b(1) c(1) Rana Mid-age (2) a(1) b(2) c(2) AbdElRahman Old (3) a(0) b(3) c(3) Examples of data for learning, four people, given to the system

Example “Age Recognition”
Smile - a Very often often moderately rarely Values 3 2 1 Height - b Very Tall Tall Middle Short Color - c Grey Black Brown Blonde Encoding of features, values of multiple-valued variables

Multi-valued Map for Data
Groups show a simple induction from the Data ab\ c 1 2 3 00 - 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33 ab\ c 1 2 3 00 - 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33 d = F( a, b, c )

Old people smile rarely
Groups show a simple induction from the Data blonde hair Grey hair ab\ c 1 2 3 00 - 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33 Old people smile rarely Middle-age people smile moderately Teenagers smile often Children smile very often

Another example: teaching movements
Input variables Output variables

Example 1 of Rules for Thinning Algorithm
All four rules can be illustrated like that New and old one Old one Don’t care Rule 2 Rule 3 Rule 4 Rule 1

Human Affect as Reinforcement to Robot
Interactive robot learning Learning by Example Learning by Guidance Future directed learning cues Anticipatory reward Learning by Feedback Additional reinforcement signal (Breazeal & Velasquez; Isbell et al; Mitsunaga et al; Papudesi & Hubert) In our experiment: affective signal as additional reinforcement

Reinforcement Learning
Supervised (inductive) learning is the simplest and most studied type of learning How can an agent learn behaviors when it doesn’t have a teacher to tell it how to perform? The agent has a task to perform It takes some actions in the world At some later point, it gets feedback telling it how well it did on performing the task The agent performs the same task over and over again This problem is called reinforcement learning: The agent gets positive reinforcement for tasks done well The agent gets negative reinforcement for tasks done poorly

Human Affect as Reinforcement to Robot
Affective signal as additional reinforcement Web cam Emotional expression analysis Positive emotion (happy) = reward Negative emotion (sad) = punishment So: emotional expression is used in learning as rhuman, a social reward coming from the human observer Note: We interpret happy as positively valenced and sad as negatively valenced. THIS IS A SIMPLIFIED SETUP THAT ENABLES US TO TEST OUR HYPOTHESIS!

Passive vs. Active learning
Passive learning The agent has a fixed policy and tries to learn the utilities of states by observing the world go by Analogous to policy evaluation Often serves as a component of active learning algorithms Often inspires active learning algorithms Active learning The agent attempts to find an optimal (or at least good) policy by acting in the world Analogous to solving the underlying MDP 33

Formalization Given: Output: a state space S
a set of actions a1, …, ak reward value at the end of each trial (may be positive or negative) Output: a mapping from states to actions example: Alvinn (driving agent) state: configuration of the car learn a steering action for each state

Policy (Reactive/Closed-Loop Strategy)
-1 +1 2 3 1 4 A policy P is a complete mapping from states to actions

Value Function The agent knows what state it is in
The agent has a number of actions it can perform in each state. Initially, it doesn't know the value of any of the states If the outcome of performing an action at a state is deterministic, then the agent can update the utility value U() of states: U(oldstate) = reward + U(newstate) The agent learns the utility values of states as it works its way through the state space

Q-Learning Q-learning augments value iteration by maintaining an estimated utility value Q(s,a) for every action at every state The utility of a state U(s), or Q(s), is simply the maximum Q value over all the possible actions at that state Learns utilities of actions (not states)  model-free learning

Q-Learning foreach state s foreach action a Q(s,a)=0 s=currentstate do forever a = select an action do action a r = reward from doing a t = resulting state from doing a Q(s,a) = (1 – ) Q(s,a) +  (r +  Q(t)) s = t The learning coefficient, , determines how quickly our estimates are updated Normally,  is set to a small positive constant less than 1

Robot in a room +1 -1 reward +1 at [4,3], -1 at [4,2]
START actions: UP, DOWN, LEFT, RIGHT UP 80% move UP 10% move LEFT 10% move RIGHT want to learn a policy (what’s the solution?) can we learn it using (un)supervised learning? why not? so how do we learn it? any ideas? let the robot explore the environment reward +1 at [4,3], -1 at [4,2] reward for each step what’s the strategy to achieve max reward? what if the actions were deterministic?

Is this a solution? +1 -1 only if actions deterministic
not in this case (actions are stochastic) solution/policy mapping from each state to an action

Reward for each step -2 +1 -1

Reward for each step: -0.1 +1 -1

Reward for each step: +0.01 +1 -1

Example: Recycling Robot

Gridworld Example

Learning in Robots.

Similar presentations

Presentation on theme: "Learning in Robots."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning in Robots.

Similar presentations

Presentation on theme: "Learning in Robots."— Presentation transcript:

Similar presentations

About project

Feedback