Learning in Robots.

Slides:

Advertisements

Similar presentations

Affective Facial Expressions Facilitate Robot Learning Joost Broekens Pascal Haazebroek LIACS, Leiden University, The Netherlands.

Advertisements

Cognitive Systems, ICANN panel, Q1 What is machine intelligence, as beyond pattern matching, classification and prediction. What is machine intelligence,

Behavioral Theories of Motor Control

Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.

Jennifer Goodall, Nick Webb, Katy DeCorah

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

Robotics for Intelligent Environments

Computational aspects of motor control and motor learning Michael I. Jordan* Mark J. Buller (mbuller) 21 February 2007 *In H. Heuer & S. Keele, (Eds.),

Learning: Reinforcement Learning Russell and Norvig: ch 21 CMSC421 – Fall 2005.

Intelligent Agents: an Overview. 2 Definitions Rational behavior: to achieve a goal minimizing the cost and maximizing the satisfaction. Rational agent:

Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.

Reinforcement Learning Russell and Norvig: Chapter 21 CMSC 421 – Fall 2006.

Biointelligence Laboratory School of Computer Science and Engineering Seoul National University Cognitive Robots © 2014, SNU CSE Biointelligence Lab.,

Humanoid Robots Debzani Deb.

Speech Recognition Robot

Reinforcement Learning

Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.

Model of the Human  Name Stan  Emotion Happy  Command Watch me  Face Location (x,y,z) = (122, 34, 205)  Hand Locations (x,y,z) = (85, -10, 175) (x,y,z)

Intellectual Development from One to Three Chapter 12.

ENTERFACE’08 Multimodal Communication with Robots and Virtual Agents mid-term presentation.

The RoboCup Standard Platform League : Soccer for robots Requires fast, stable, intelligent robots Robots wear out and are time consuming to work with.

CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.

A Bayesian Model of Imitation in Infants and Robots

Chapter 13 Instructing Students.

RoboCup: The Robot World Cup Initiative

Human Development.

Intellectual Development of the Infant

Understand the importance of early intervention to support the speech, language and communication needs of children and young people.

Ups and Downs Southwest Conference 2007

San Diego May 22, 2013 Giovanni Saponaro Giampiero Salvi

Introduction to Artificial Intelligence

Human Development.

Chapter 11: Artificial Intelligence

Chapter 11: Artificial Intelligence

7th Grade Family and Consumer Sciences

Done Done Course Overview What is AI? What are the Major Challenges?

Reinforcement learning (Chapter 21)

Helping Children Learn

Reinforcement learning (Chapter 21)

Artificial Intelligence Lecture No. 5

CH. 1: Introduction 1.1 What is Machine Learning Example:

Communication Skills COMM 101 Lecture#2

Communication in general is process of sending and receiving messages that enables humans to share knowledge, attitudes, and skills. Although we usually.

Human Development.

Show and Tell: imitation by an 10-minute-old

Learning Styles and Multiple Intelligences

Reinforcement Learning

Intelligent Agents Chapter 2.

© James D. Skrentny from notes by C. Dyer, et. al.

Markov Decision Processes

Human Development.

Course Instructor: knza ch

Markov Decision Processes

Announcements Homework 3 due today (grace period through Friday)

Human Development.

Human Development.

Human Development.

Overview of Machine Learning

Dr. Unnikrishnan P.C. Professor, EEE

Basic Concepts and Issues on Human Development

Child Development.

CS 188: Artificial Intelligence Fall 2008

Human Development.

The Stages of Language & Literacy Development

CS639: Data Management for Data Science

Learning to Communicate

Human Development. Growth: generally refers to changes in size.

Personalized machine learning for robot perception of affect and engagement in autism therapy by Ognjen Rudovic, Jaeryoung Lee, Miles Dai, Björn Schuller,

COMMUNICATION: What’s all the Talk about?

Presentation transcript:

Learning in Robots

How are actions perceived? How is information parsed? Gesture Recognition How are actions perceived? How is information parsed? Imitation Level of granularity: What is copied? Should it copy the intention, goal or dynamics of movement? Motor Learning How is information transferred across multiple modalities? Visuo-motor, Auditor-motor

BIOLOGICAL INSPIRATION Prior to building any capability in robots, we might want to understand how the equivalent capability works in humans and other animals

Imitation Capabilities in Animals Imitation Learning Gesture Recognition Imitation Capabilities in Animals Which species may exhibit imitation is still a main area of discussion and debate One differentiate “true” imitation from copying (flocking, schooling, following), stimulus enhancement, contagion or emulation Biological Inspiration Learning by Imitation

Imitation Capabilities in Animals Imitation Learning Gesture Recognition Imitation Capabilities in Animals “True” imitation: Ability to learn new actions not part of the usual repertoire The appanage of humans only, and possibly great apes Biological Inspiration Learning by Imitation Whiten & Ham, Advances in the Study of Behaviour, 1992 Savage & Rumbaugh, Child Devel, 1993

Imitation Capabilities in Animals Imitation Learning Gesture Recognition Imitation Capabilities in Animals Complex Imitation capabilities in Dolphins & Parrots. Large repertoire of imitation capabilities, demonstrating flexibility and generalization in different contexts. Biological Inspiration Learning by Imitation Moore, Behaviour, 1999. Herman, Imitation in Animals & Artifacts, MIT Press, 2002

Developmental Stages of Imitation Imitation Learning Gesture Recognition Developmental Stages of Imitation Innate Facial Imitation (newborns  3 months) Tongue and lips protrusion, mouth-opening, head movements, cheek and brow motion, eye blinking Delayed imitation up to 24 hours  Imitation is mediated by a stored representation Biological Inspiration Learning by Imitation Meltzoff & Moore, Early Development and Parenting, 1997 Meltzoff & Moore, Developmental Psychology, 1989

Example: Predicting Joint Angles Explorerobot: modified Walk. Only walk forward and turn Record joint commands and joint angles at each frame Learn mapping from: (Joint Angles, Joint Commands) to Next Joint Angles RAE – Relative Absolute Error RRSE – Relative Root Squared Error

Talking Robots Many talking robots exist, but they are still very primitive Actors for robot theatre, agents for advertisement, education and entertainment. Designing inexpensive natural size humanoid caricature and realistic robot heads Dog.com from Japan Machine Learning techniques used to teach robots behaviors, natural language dialogs and facial gestures.

Behavior, Dialog and Learning Words communicate only about 35 % of the information transmitted from a sender to a receiver in a human-to-human communication. The remaining information is included in para-language. Emotions, thoughts, decision and intentions of a speaker can be recognized earlier than they are verbalized. Robot activity as a mapping of the sensed environment and internal states to behaviors and new internal states (emotions, energy levels, etc).

Neck and upper body movement generation

The perception learning tasks Robot Vision: Where is a face? (Face detection) Who is this person (Face recognition, learning with supervisor, person’s name is given in the process. Age and gender of the person. Hand gestures. Emotions expressed as facial gestures (smile, eye movements, etc) Lips reading for speech recognition. Body language.

The perception learning tasks Speech recognition: Who is this person (voice based speaker recognition, learning with supervisor, person’s name is given in the process.) Isolated words recognition for word spotting. Sentence recognition. Sensors. Temperature Touch movement

Learning the perception/behavior mappings Tracking the human. Full gesticulation as a response to human behavior in dialogs and dancing/singing. Modification of semi-autonomous behaviors such as breathing, eye blinking, mechanical hand withdrawals, speech acts as response to person’s behaviors. Playing games with humans. Body contact with human such as safe gesticulation close to human and hand shaking.

Machine Learning Algorithms Supervised learning ( ) Classification (discrete labels), Regression (real values) Unsupervised learning ( ) Finding association (in features) Reinforcement learning Decision making (robot, chess machine)

Supervised Machine Learning

Military/Government Robots US Soldiers in Afghanistan being trained how to defuse a landmine using a PackBot.

Mars Rovers – Spirit and Opportunity Space Robots Mars Rovers – Spirit and Opportunity Autonomous navigation features with human remote control and oversight

Unsupervised learning

Applications Social Bookmarking Socialized Bookmarks Tags

Probabilistic State Machines to describe emotions “you are beautiful” / ”Thanks for a compliment” P=1 Happy state “you are blonde!” / ”I am not an idiot” P=0.3 “you are blonde!” / Do you suggest I am an idiot?” Unhappy state P=0.7 Ironic state

Probabilistic Grammars for performances Speak ”Professor Perky”, blinks eyes twice P=0.1 Speak ”Professor Perky” Where? P=0.3 Who? P=0.5 P=0.5 P=0.5 Speak “in some location”, smiles broadly Speak “In the classroom”, shakes head Speak ”Doctor Lee” What? P=0.1 P=0.1 Speak “Was singing and dancing” P=0.1 P=0.1 Speak “Was drinking wine” ….

Example “Age Recognition” Name (examples) Age (output) d Smile Height Hair Color Ahmed Kid (0) a(3) b(0) c(0) Aya Teenager (1) a(2) b(1) c(1) Rana Mid-age (2) a(1) b(2) c(2) AbdElRahman Old (3) a(0) b(3) c(3) Examples of data for learning, four people, given to the system

Example “Age Recognition” Smile - a Very often often moderately rarely Values 3 2 1 Height - b Very Tall Tall Middle Short Color - c Grey Black Brown Blonde Encoding of features, values of multiple-valued variables

Multi-valued Map for Data Groups show a simple induction from the Data ab\ c 1 2 3 00 - 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33 ab\ c 1 2 3 00 - 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33 d = F( a, b, c )

Old people smile rarely Groups show a simple induction from the Data blonde hair Grey hair ab\ c 1 2 3 00 - 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33 Old people smile rarely Middle-age people smile moderately Teenagers smile often Children smile very often

Another example: teaching movements Input variables Output variables

Example 1 of Rules for Thinning Algorithm All four rules can be illustrated like that New and old one Old one Don’t care Rule 2 Rule 3 Rule 4 Rule 1

Human Affect as Reinforcement to Robot Interactive robot learning Learning by Example Learning by Guidance Future directed learning cues Anticipatory reward Learning by Feedback Additional reinforcement signal (Breazeal & Velasquez; Isbell et al; Mitsunaga et al; Papudesi & Hubert) In our experiment: affective signal as additional reinforcement

Reinforcement Learning Supervised (inductive) learning is the simplest and most studied type of learning How can an agent learn behaviors when it doesn’t have a teacher to tell it how to perform? The agent has a task to perform It takes some actions in the world At some later point, it gets feedback telling it how well it did on performing the task The agent performs the same task over and over again This problem is called reinforcement learning: The agent gets positive reinforcement for tasks done well The agent gets negative reinforcement for tasks done poorly

Human Affect as Reinforcement to Robot Affective signal as additional reinforcement Web cam Emotional expression analysis Positive emotion (happy) = reward Negative emotion (sad) = punishment So: emotional expression is used in learning as rhuman, a social reward coming from the human observer Note: We interpret happy as positively valenced and sad as negatively valenced. THIS IS A SIMPLIFIED SETUP THAT ENABLES US TO TEST OUR HYPOTHESIS!

Passive vs. Active learning Passive learning The agent has a fixed policy and tries to learn the utilities of states by observing the world go by Analogous to policy evaluation Often serves as a component of active learning algorithms Often inspires active learning algorithms Active learning The agent attempts to find an optimal (or at least good) policy by acting in the world Analogous to solving the underlying MDP 33

Formalization Given: Output: a state space S a set of actions a1, …, ak reward value at the end of each trial (may be positive or negative) Output: a mapping from states to actions example: Alvinn (driving agent) state: configuration of the car learn a steering action for each state

Policy (Reactive/Closed-Loop Strategy) -1 +1 2 3 1 4 A policy P is a complete mapping from states to actions

Value Function The agent knows what state it is in The agent has a number of actions it can perform in each state. Initially, it doesn't know the value of any of the states If the outcome of performing an action at a state is deterministic, then the agent can update the utility value U() of states: U(oldstate) = reward + U(newstate) The agent learns the utility values of states as it works its way through the state space

Q-Learning Q-learning augments value iteration by maintaining an estimated utility value Q(s,a) for every action at every state The utility of a state U(s), or Q(s), is simply the maximum Q value over all the possible actions at that state Learns utilities of actions (not states)  model-free learning

Q-Learning foreach state s foreach action a Q(s,a)=0 s=currentstate do forever a = select an action do action a r = reward from doing a t = resulting state from doing a Q(s,a) = (1 – ) Q(s,a) +  (r +  Q(t)) s = t The learning coefficient, , determines how quickly our estimates are updated Normally,  is set to a small positive constant less than 1

Robot in a room +1 -1 reward +1 at [4,3], -1 at [4,2] START actions: UP, DOWN, LEFT, RIGHT UP 80% move UP 10% move LEFT 10% move RIGHT want to learn a policy (what’s the solution?) can we learn it using (un)supervised learning? why not? so how do we learn it? any ideas? let the robot explore the environment reward +1 at [4,3], -1 at [4,2] reward -0.04 for each step what’s the strategy to achieve max reward? what if the actions were deterministic?

Is this a solution? +1 -1 only if actions deterministic not in this case (actions are stochastic) solution/policy mapping from each state to an action

Reward for each step -2 +1 -1

Reward for each step: -0.1 +1 -1

Reward for each step: -0.04 +1 -1

Reward for each step: -0.01 +1 -1

Reward for each step: +0.01 +1 -1

Example: Recycling Robot

Gridworld Example