Presentation by Sasha Beltinova A Survey of Autonomous Human Affect Detection Methods for Social Robots Engaged in Natural HRI Derek McColl, Alexander Hong, Naoaki Hatakeyama, Goldie Nejat, Beno Benhabib Presentation by Sasha Beltinova
How everyone’s title slides look The Office reference...hopefully at least some people in the class have watched it!
A Survey of Autonomous Human Affect Detection Methods for Social Robots Engaged in Natural HRI This is essentially the agenda. Unpack the title to inform what the presentation will be about.
A Survey of Autonomous Human Affect Detection Methods for Social Robots Engaged in Natural HRI Emotional State
A Survey of Autonomous Human Affect Detection Methods for Social Robots Engaged in Natural HRI Emotional State Robots that interact with humans using natural human communication methods like speech or body language
A Survey of Autonomous Human Affect Detection Methods for Social Robots Engaged in Natural HRI Emotional State Robots that interact with humans using natural human communication methods like speech or body language Human Robot Interaction
A Survey of Autonomous Human Affect Detection Methods for Social Robots Engaged in Natural HRI Emotional State Robots that interact with humans using natural human communication methods like speech or body language Human Robot Interaction Non-acted
A Survey of Autonomous Human Affect Detection Methods for Social Robots Engaged in Natural HRI A survey of how social robots recognize humans’ emotional state in natural, non-acted interactions with humans.
HRI
What is HRI? Human-Robot Interaction (HRI): Development of robots that engage humans in various scenarios Study of human-robot interactions during these scenarios Requires interaction with a physical robot
Social HRI Robots interact using human modalities like speech, body language, and facial expressions Correctly determining humans’ social cues is necessary for responding appropriately Affect detection should not interfere with or influence human’s behavior during HRI Should be able to interpret and respond to natural (non-acted) displays of emotion Affective HRI: Collaborative Assistive Mimicry General or multi-purpose
Human Affect
Affect: Categorical A set of discrete states, e.g.happiness, surprise, fear, disgust, anger, and sadness Most commonly used: Ekman’s Facial Action Coding System (FACS) Position of facial AU correspond to distinct facial expressions of emotions Strength: affective states are easily distinguishable Weakness: Affect not included in the model cannot be categorized
Affect: Dimensional Affective state is a continuous spectrum across several dimensions Most commonly used: two-dimensional valence-arousal model Strength: can account for all possible affective states and their variations Weakness: sometimes hard to distinguish one affect from another
Methods of Affect Recognition during HRI Single input mode: Facial affect recognition Body language based affect recognition Voice based affect recognition Recognition of affective physiological signals Multimodal inputs: combination of the above
Affect Recognition: Single Mode of Input
Facial Affect Recognition Facial expressions are a natural way to communicate opinions, intentions, and emotions in face-to-face human interactions. To emulate empathy, social robots need to be able to: Interpret and recognize human affective state based on humans’ facial expressions Display (or simulate the display of) their own affect Express their perspective intentions to humans Most systems consist of : 2D onboard cameras to detect facial features Classifiers to estimate affect
Facial Affect Recognition Collaborative HRI Scenario: Children play chess with iCat 2D webcam to detect facial features iCat determined affect and responded appropriately Children reported perceived quality of interaction Robots displaying empathy is important in interactions with humans Affect categorized as positive, neutral, or negative valence Affect model consisted of: probability of a smile (calculated using SVMs nd facial geometric features from 2D and 3D facial landmarks extracted using Seeing Machine’s faceAPI eye gaze, game state, game evolution Children in several control groups, ones in adaptive empathic control group believed iCat knew how they felt
Facial Affect Recognition Mimicry HRI Scenario: AIBO robot dog mimics human facial expression External camera to capture person’s facial expression Facial expression recognized from streaming video using time-series representation of the tracked facial actions AIBO mimics same expression with LED display Facial Expression detection method had 80% accuracy AIBO was able to mimic human expression only with slight delay
Body Language Based Affect Recognition Body language like arm positions or head posture can communicate affective state in social interactions. Identifying body language displays can help social robots identify and influence human behavior in a variety of HRI tasks. Many systems include: Kinetic sensors to capture and model person’s trunk and arm poses during interaction Interpretation and classification systems for body poses and gestures
Body Language Based Affect Recognition Assistive HRI: Brian2.1 used to provide social assistance to elderly in long-term care facility during meal eating to encourage engament in eating activity Kinect sensor mounted on robot’s chest to identify body language displays Features extracted included head, trunk, arm position to classify valence and arousal Robot prompted humans to eat or drink items in one-on-one interaction Kinetic sensor used to detect body language in non-meal actions (manipulating utensils or cup etc). Valence and arousal calculated from 3D data 2D video data used for human baseline coding 77.9% recognition rate for valence and 93.6% recognition rate for arousal
Body Language Based Affect Recognition https://www.youtube.com/watch?v=VIEMVp-dW9s
Voice Based Affect Recognition People often communicate affect with their voices. Identifying vocal affect is important for effective bi-directional communication with humans Systems usually include: Onboard microphones to detect emotions from human voice Software to identify emotional state Some vocal intonation features: Voice intensity and pitch Speech rate Word articulation Vocal intonation features like voice intensity, speech rate, or word articulation directly correspond to specific affective states.
Voice Based Affect Recognition Collaborative HRI: Cyton robot arm picked up object based on user affect Voice signal recorded from microphone headset Robot arm guided by human voice intonation picked desired objects Unfamiliar users found easy to use high arousal is usually associated with cognitive effort and attentional processes high unpredictability and surprise are usually related to attentional shifts High arousal -> robot should be attentive High predictability -> robot is doing well Low predictability -> human is uncertain and robot should hesitate or switch targets
Recognition of Affective Physiological Signals Physiological responses like heart rate, skin conductance, breathing rate, or pupil dialation can indicate emotional state. Systems usually include wearable tech like EKG and skin-conductance sensors
Recognition of Affective Physiological Signals Humans watched a industrial robot arm perform tasks Robot arm planned a safe path or a standard path and executed Executed at various speeds Person’s arousal measured Estimated arousal levels strongly correlated with self-reported anxiety levels The safe path planner is similar to the potential field method, with the addition of a danger criterion, comprising of factors that affect the impact force during a collision between the robot and the human, which is minimized along the path. pick location was specified to the right and away from the subject, and the place location was directly in front and close to the subject lower levels of anxiety and surprise, and higher levels of calm, for the safe planned paths anxiety response for the fast pick and place trajectory was found to be significantly higher (α=0.05, student t-test) for the PF planned path
Affect Recognition: Multimodal Input
Multimodal Affect Recognition Benefits: Alternative ways to determine affect when one modality fais Complementary and diverse information can increase system robustness and performance Challenges: More difficult to acquire and process
Discussion
Current State of the World Autonomous detection of emotional state of humans is important for good HRI Forms of affective interactions include: Collaborative HRI Assistive HRI Mimicry HRI Mutli-purpose HRI Categorical model of human affect is most commonly used Using classifiers like SVMs, HMMs, and NNs Facial expressions is the most popular input mode for detecting affect Typically done with 2D camera
Challenges Current systems can identify only a small number of affective states. Humans experience and display a large variety of emotions during social interactions. Facial affect recognition, the most commonly used modality, relies on 2D camera in most cases. Performs well only when humans face is directly in front of the camera. Many studies carried out in short game-like interaction scenarios. Affect recognition techniques should be tested in long-term studies across wide variety of tasks and scenarios.
Don’t shave it for later