An Emotive Lifelike Robotics Face for Face-to-Face Communication

An Emotive Lifelike Robotics Face for Face-to-Face Communication
ExpressionBot An Emotive Lifelike Robotics Face for Face-to-Face Communication Ali Mollahosseini, Gabriel Graitzer, Eric Borts, Stephen Conyers, Richard M. Voyles, Ronald Cole, and Mohammad H. Mahoor Presented by Jake Kwon

Categories of Robotic Faces
1) Mechatronic Faces 2) Android Faces Kismet by MIT (2000) Albert Hubo (2010)

Categories of Robotic Faces
3) Onscreen Faces Grace (2004) Onscreen Pro: highly flexible and low cost (because we can change the appearance as we desire) Con: no physical embodiment, and still again looks scary Light-projected physical avatars Pro: Also highly flexible and low cost (as you can see facial expression is projected on a physical robot mask) Con: other than the universal weakness of all robotic faces so far, which is the scary appearance, this one is pretty good Baxter (2012)

Light-Projected Physical Avatar
3D printed facial mask Modeled in Autodesk Maya Wig for aesthetics Portable projector

Projection Calibration
Direct projection of the animation on the mask appears distorted Projection itself is 2D yet the projected surface is 3D Calibration is done using checkerboard in the screen to define a piecewise homography mapping between the corresponding rectangles of the mask and on the screen

Speech Uses Bavieca speech recognizer (open source speech recognition toolkit written in c++) Visually similar phonemes are grouped together Ex) buy, pie, my Uses open source toolkit for speech recognition. To animate lips, ExpressionBot groups visually similar phonemes together, and it comes down to 20 uniques classes of phonemes Which is a skill that is rigorously used by animation studios.

Emotion Lip Blending Six basic expressions
Joy, fear, sadness, anger, disgust, and surprise Examples Cheeks and lip corners are raised to express joy Inner eyebrows are raised, and eyebrows and lip corners are lowered to express sadness Conflicts Surprise and phoneme /b/ This formula basically considers both what expression is supposed to look like with current lip movement. Such effort is needed, because, For example, combining the surprise expression which causes the mouth to be fully open, conflicts with the production of phonemes like /b/, /f/ and /v/ that are produced with the lips closed or nearly closed. Combining the joy emotion with puckered mouth visemes such as /o/ will also result in visual speech and expressions that are not natural and are perceived as abnormal or creepy. To overcome this problem, we designed a table that provides a phoneme weight factor and a maximum emotion weight for every phoneme emotion combination. These values are adjusted empirically for each combination.

Emotion Lip Blending In order to blend the expressions with the lip movement, the animation uses the following formula to generate facial expressions based on the current phonemes and emotion morph targets: Fc = the current phoneme Fj^max = the desired expression model at the maximum intensity F0 = the Neutral model λj = the intensity of the j th expression model Fj, λj ∈ [0,1] One of the biggest conflicts researchers are facing is natural emotional communication while having a verbal conversation. In other words, when a robot is supposed to show happiness while saying phoneme /o/ such as hello. The facial expression rather turnout creepy because happiness makes mouth to smile yet pronuncing the phoneme /o/ makes the mouth make an “o” shape. To overcome such conflicts, expressionbot proposes following formula to create a natural emotional and verbal communication This formula basically considers both what expression is supposed to look like with current lip movement. Such effort is needed, because, For example, combining the surprise expression which causes the mouth to be fully open, conflicts with the production of phonemes like /b/, /f/ and /v/ that are produced with the lips closed or nearly closed. Combining the joy emotion with puckered mouth visemes such as /o/ will also result in visual speech and expressions that are not natural and are perceived as abnormal or creepy. To overcome this problem, we designed a table that provides a phoneme weight factor and a maximum emotion weight for every phoneme emotion combination. These values are adjusted empirically for each combination.

Experiments

Emotion Identification Experiment
Participants were asked to classify expression of 3D and 2D animations Expression was shown for 5 seconds Participants were able to identify Anger much better in 3D than in 2D

Speech Realism Experiment
Participants were presented with two speeches Basic Proposed (grouped phoneme) Rank realism on scale of 0 to 5 Find the effectiveness of grouping visually similar phonemes such as buy and pie

Eye Gaze Experiment Five subjects were seated around the head
Shifting eye gaze only Screen Agent: 50% Physical Agent: 88% Shifting eye gaze + random head movement Screen Agent: 42% Physical Agent: 77% Subjects were asked to answer whether the head is gazing at them.

Conclusion Relatively low cost, $1500, due to 3D printed mask and open source speech recognition toolkit Better emotion expression More realistic speech Better eye gaze identification

Competition $249 on Kickstarter

What Did I Learn

An Emotive Lifelike Robotics Face for Face-to-Face Communication

Similar presentations

Presentation on theme: "An Emotive Lifelike Robotics Face for Face-to-Face Communication"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Emotive Lifelike Robotics Face for Face-to-Face Communication

Similar presentations

Presentation on theme: "An Emotive Lifelike Robotics Face for Face-to-Face Communication"— Presentation transcript:

Similar presentations

About project

Feedback