Presentation is loading. Please wait.

Presentation is loading. Please wait.

Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.

Similar presentations


Presentation on theme: "Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science."— Presentation transcript:

1 Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science University of Houston

2 Motivation Avatars have been increasingly used in Human-Computer Interfaces –Teleconferencing, computer-mediated communication, distance education, online virtual worlds, etc. Human-like avatar gestures influence human perception significantly –Facial expressions –Hand gestures –Lip movements –head movements One of the crucial visual cues to facilitate engaging social interaction and communication

3 How do talking head movements affect perception?

4 Our Quantitative Perspective Uncover how talking avatar head movements affect human perception –User-rated head animations’ naturalness –Joint features extracted from head animations (with audio) Acoustic speech features Head motion patterns –Quantitatively analyze the association between extracted joint features and user ratings Joint Features Perception (rating) Analysis of the association Talking Avatar Head Animations User evaluation Feature extraction

5 Data Acquisition and Processing Acquisition of the audio-head motion dataset –Head & speech were recorded simultaneously –Head motion: optical motion capture system (120 Hz) –Speech: microphone (48 kHz) Processing of the captured audio-head motion dataset –Head motion: 3 Euler rotation angles per frame –Speech: pitches and RMS energy –Aligned head & speech datasets to the same frame rate (24 FPS) Y-axis rotation X-axis rotation Z-axis rotation

6 Subjective Evaluation Using the captured dataset, we generated 60 head animation clips –Based on 15 recorded speech clips –4 different audio-head motion generation techniques –Mosaic on the mouth region User study –18 participants –Ages: 23~28 –Gender: female (16.67%), male (83.33%) –Language: fluent English- speakers –User rating: 1~5 Original dataPlay back the captured HMMs[Busso et al. 05] Mood-Swings[Chuang et al. 05] RandomRandomly generated

7 Speech-Head Motion Features and Perception Measure the correlation between head motion and speech features –Canonical Correlation Analysis (CCA) Pitch-Head motion and human perception –Computed Pearson coefficient: 0.731 Energy-Head motion and human perception –Seem random, definitely not linear.

8 Speech-Head Motion Features and Perception Implications for CHI –Validate the tight coordination between speech and head motion: Precise timing in generation is required Delayed head movement generation may significantly degrade human perception –An approximate linear correlation between user ratings and CCA for Pitch-head motion Prosody driven head motion synthesis could be fundamentally sound. –No a simple linear correlation between user ratings and CCA for RMS Energy-head motion RMS energy may vary among sentences

9 Frequency-Domain Analysis of Head Motion Frequency-domain analysis of head motion –Head motion: rotation angles –Frequency spectrum: FFT transform applied to the head rotation angle vector Association between head motion spectrum and human perception –With squared magnitude less than 5 degree. - X-axis: average user rating (2.1 ~ 4.2) - Y-axis: the squared magnitude of three Euler angles in the head rotation (0 ~ 5 degree) - Z-axis: Frequency spectrum (0 ~ 19 Hz) X-axis Y-axis Z-axis

10 Frequency-Domain Analysis of Head Motion Key observations –Highly rated: low-frequency Natural head motion: less than 10 Hz –Lowly rated: high-frequency Typically lager than 12 Hz With a small range of head movements Implications for HCI –The comfortable head motion frequency zone: 0~12 Hz –Smooth post-processing for head motion generations of talking avatar Smooth: Post-process the synthesized head motions Simply crop the high frequency part from the synthesized head motions Low-frequency patterns High-frequency patterns

11 Conclusion and Future Work Summary of our findings –The coupling between the pitch and head motion has a strong linear correlation with human perception –The perceived-natural head motions mainly consist of low- frequency motion components and those high-frequency components (>12 Hz) will damage human perception significantly. Future work –Multi-party conversation scenario –Analysis of other fundamental speech features: pause, repetitions, etc. Acknowledgments: This work is in part supported by NSF IIS-0914965, Texas Norman Hackerman Advanced Research 003652-0058-2007, and research gifts from Google and Nokia.


Download ppt "Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science."

Similar presentations


Ads by Google