Presentation is loading. Please wait.

Presentation is loading. Please wait.

The SIMILAR NoE Summer Workshop 2005 Combined Gesture-Speech Analysis and Synthesis M. Emre Sargın, Engin Erzin, Yücel Yemez, A. Murat Tekalp

Similar presentations


Presentation on theme: "The SIMILAR NoE Summer Workshop 2005 Combined Gesture-Speech Analysis and Synthesis M. Emre Sargın, Engin Erzin, Yücel Yemez, A. Murat Tekalp"— Presentation transcript:

1 The SIMILAR NoE Summer Workshop 2005 Combined Gesture-Speech Analysis and Synthesis M. Emre Sargın, Engin Erzin, Yücel Yemez, A. Murat Tekalp {msargin,eerzin,yyemez,mtekalp}@ku.edu.tr Multimedia Vision and Graphics Laboratory, Koc University

2 The SIMILAR NoE Summer Workshop 2005 Outline  Project Objective  Technical Description Preparation of Gesture-Speech Database Detection of Gesture Elements Gesture-Speech Correlation Analysis Synthesis of Gestures Accompanying Speech  Resources  Work Plan  Team Members

3 The SIMILAR NoE Summer Workshop 2005 Project Objective  The production of speech and gesture is interactive throughout the entire communication process.  Computer-Human Interaction systems should be interactive such that, for an edutainment application, animated person’s speech should be aided and complemented by it’s gestures.  Two main goals of this project: Analysis and modeling of correlation between speech and gestures. Synthesis of correlated natural gestures accompanying speech.

4 The SIMILAR NoE Summer Workshop 2005 Technical Description  Preparation of Gesture-Speech Database  Detection of Gesture Elements  Gesture-Speech Correlation Analysis  Synthesis of Gestures Accompanying Speech

5 The SIMILAR NoE Summer Workshop 2005 Preparation of Database  Gestures of a specific person will be investigated.  The video database related with that specific person should include the gestures that he/she frequently uses.  Locations of head, arm, elbows, etc. should easily be detectable and traceable.

6 The SIMILAR NoE Summer Workshop 2005 Detection of Gesture Elements  In this project, we consider arm and head gestures.  Main tasks included in detection of gesture elements: Tracking of head region. Tracking of hand and possibly shoulder and elbow. Extraction of gesture features. Recognition and labeling of gestures.

7 The SIMILAR NoE Summer Workshop 2005 Head Region Tracking  To extract motion information coming from head one should first extract head region.  Exhaustive search of head in each frame is a possible solution. However this is computationally inefficient.  Tracking is efficient by the means of computational complexity.  Motion information calculated for tracking will be used for head gesture features.

8 The SIMILAR NoE Summer Workshop 2005 Tracking Methodology  Exhaustive search for head region in initial frame Haar-Based Face Detection Skin Color information  Extraction of motion information from head region Optical flow vectors Fitting global motion parameters optical flow vectors  Warp search window according to motion information.  Search for head region in the search window.

9 The SIMILAR NoE Summer Workshop 2005 Head Tracking Results

10 The SIMILAR NoE Summer Workshop 2005 Hand Tracking Methodology  Hand region will be extracted using skin color information.  Robust State-Space Tracking will be applied. Observations are position of hand. States are position, speed and acceleration of hand. Kalman Filtering removes unwanted noise from features In Regular Kalman Filter, parameters are fixed. In Robust Kalman Filter parameters are re-adjusted for each iteration to minimize MSE and overcome the effects of abrupt changes in motion of hand.

11 The SIMILAR NoE Summer Workshop 2005 Extraction of Gesture Features  Head Gesture Features: Global Motion Parameters calculated within head region will be used.  Hand Gesture Features: Hand center of mass position and calculated velocity will form hand gesture features.

12 The SIMILAR NoE Summer Workshop 2005 Gesture-Speech Correlation Analysis  Recognized gestures are labeled w.r.t. time. Head Gestures: Down, Up, Left, Right, Left-Right, … Arm Gestures: Abduction, Adduction, Extension, …  Recognized speech patterns are labeled w.r.t. time. Semantic Info: Approval, Refusal phrases, etc. Prosodic Info: Intonational phrases, ToBI transcriptions, etc.  Correlation Analysis via examining Co-occurrence Matrix Input/Output Hidden Markov Models

13 The SIMILAR NoE Summer Workshop 2005 Co-occurrence Matrix  Estimation of joint probability distribution function, f(g,s)  For each time sample give a vote to related gesture- speech label pair.  For a specific speech element the most correlated gesture feature will be: g i =argmax ( f (g x,s i ) )  Relatively easy to compute.  Gives an intuition about what we are examining. x

14 The SIMILAR NoE Summer Workshop 2005 Input/Output Hidden Markov Models  IOHMM is a graphical model which allows the mapping of input sequences into output sequences.  It is used in three tasks of sequence processing: Prediction Regression Classification  The model is trained to maximize the conditional distribution of an output sequence {y 1,…,y t } given an input sequence {x 1,…,x t }.  In our project: Input sequence will be speech labels. Output sequence will be gesture labels.

15 The SIMILAR NoE Summer Workshop 2005 Synthesis of Gestures Accompanying Speech  Based on the methodology used in correlation analysis given a speech signal: Features will be extracted. Most probable speech label will be designated to speech patterns. Gesture pattern that is most correlated with speech pattern will be used to animate a stick model of a person.

16 The SIMILAR NoE Summer Workshop 2005 Resources  Database Preparation and Labeling VirtualDub Anvil Paraat  Image Processing and Feature Extraction: Matlab Image Processing Toolbox OpenCV Image Processing Library  Gesture-Speech Correlation Analysis HTK HMM Toolbox Torch Machine Learning Library

17 The SIMILAR NoE Summer Workshop 2005 Work Plan  Timeline of the project:  Schedule of the lectures:

18 The SIMILAR NoE Summer Workshop 2005 Team Members  Ferda Ofli Koc University Image, Video Processing and Feature Extraction  Yelena Yasinnik Massachusetts Institute of Technology Audio-Visual Correlation Analysis  Oya Aran Bogazici University Gesture Based Human-Computer Interaction Systems

19 The SIMILAR NoE Summer Workshop 2005 Team Members  Alexey Anatolievich Karpov Saint-Petersburg Institute for Informatics and Automation Speech Based Human-Computer Interaction Systems  Stephen Wilson University College Dublin Audio-Visual Gesture Annotation  Alexander Refsum Jensenius Department of Music, Oslo University Gesture Analysis

20 The SIMILAR NoE Summer Workshop 2005 References  Jie Yao and Jeremy R. Cooperstock, “Arm Gesture Detection in a Classroom Environment,” Proc. WACV’02 pp. 153-157, 2002.  Y. Azoz, L. Devi. R. Sharma, “Tracking Hand Dynamics in Unconstrained Environments,” Proc. Int. Conference on Automatic Face and Gesture Recognition’98 pp. 274-279, 1998.  S. Malassiotis, N. Aifanti, M.G. Strintzis, “A Gesture Recognition System Using 3D Data,” Proc. Int. Symposium on 3D Data Processing Visualization and Transmission’02 pp. 190- 193,2002.  J-M. Chung, N. Ohnishi, “Cue Circles: Image Feature for Measuring 3-D Motion of Articulated Objects Using Sequential Image Pair,” Proc. Int. Conference on Automatic Face and Gesture Recognition’98 pp. 474-479, 1998.  S. Kettebekov, M. Yeasin, R. Sharma, “Prosody based co-analysis for continuous recognition of coverbal gestures,”Proc. ICMI’02 pp.161-166, 2002.  F. Quek, D. McNeill, R. Ansari, X-F. Ma, R. Bryll, S. Duncan, K.E. McCullough “Gesture cues for conversational interaction in monocular video,” Proc. Int. Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems’99 pp. 119-126, 1999.  For detailed information visit: http://htk.eng.cam.ac.ukhttp://htk.eng.cam.ac.uk  Rabiner, L.; Juang, B., “An introduction to hidden Markov models” ASSP Magazine, IEEE, Vol.3, Iss.1, pp. 4- 16, Jan 1986  Jae-Moon Chung; Ohnishi, N., “Cue circles: image feature for measuring 3-D motion of articulated objects using sequential image pair” Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on, Vol., Iss., pp. 474-479, 14-16 Apr 1998  A. Just, O. Bernier, S. Marcel., “Recognition of isolated complex mono- and bi-manual 3D hand gestures” Proc. 6. ICAFGR, 2004


Download ppt "The SIMILAR NoE Summer Workshop 2005 Combined Gesture-Speech Analysis and Synthesis M. Emre Sargın, Engin Erzin, Yücel Yemez, A. Murat Tekalp"

Similar presentations


Ads by Google