The SIMILAR NoE Summer Workshop 2005 Combined Gesture-Speech Analysis and Synthesis M. Emre Sargın, Engin Erzin, Yücel Yemez, A. Murat Tekalp

Slides:



Advertisements
Similar presentations
Víctor Ponce Miguel Reyes Xavier Baró Mario Gorga Sergio Escalera Two-level GMM Clustering of Human Poses for Automatic Human Behavior Analysis Departament.
Advertisements

M. Emre Sargın, Ferda Ofli, Yelena Yasinnik, Oya Aran, Alexey Karpov, Stephen Wilson,Engin Erzin, Yücel Yemez, A. Murat Tekalp Combined Gesture- Speech.
Hidden Markov Models Adapted from Dr Catherine Sweeney-Reed’s slides.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Exchanging Faces in Images SIGGRAPH ’04 Blanz V., Scherbaum K., Vetter T., Seidel HP. Speaker: Alvin Date: 21 July 2004.
1 Robust Video Stabilization Based on Particle Filter Tracking of Projected Camera Motion (IEEE 2009) Junlan Yang University of Illinois,Chicago.
Probabilistic video stabilization using Kalman filtering and mosaicking.
Object Detection and Tracking Mike Knowles 11 th January 2005
Computer Vision for Interactive Computer Graphics Mrudang Rawal.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.
CS335 Principles of Multimedia Systems Multimedia and Human Computer Interfaces Hao Jiang Computer Science Department Boston College Nov. 20, 2007.
Presented by Pat Chan Pik Wah 28/04/2005 Qualifying Examination
Instructor : Dr. K. R. Rao Presented by: Rajesh Radhakrishnan.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Real-Time Decentralized Articulated Motion Analysis and Object Tracking From Videos Wei Qu, Member, IEEE, and Dan Schonfeld, Senior Member, IEEE.
Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.
Jason Li Jeremy Fowers Ground Target Following for Unmanned Aerial Vehicles.
Final Presentation. Lale AkarunOya Aran Alexey Karpov Milos Zeleny Hasim Sak Erinc Dikici Alp Kindiroglu Marek Hruz Pavel Campr Daniel Schorno Alexander.
Introduction to Automatic Speech Recognition
A Fast and Robust Fingertips Tracking Algorithm for Vision-Based Multi-touch Interaction Qunqun Xie, Guoyuan Liang, Cheng Tang, and Xinyu Wu th.
Tracking Pedestrians Using Local Spatio- Temporal Motion Patterns in Extremely Crowded Scenes Louis Kratz and Ko Nishino IEEE TRANSACTIONS ON PATTERN ANALYSIS.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.
Fingertip Tracking Based Active Contour for General HCI Application Proceedings of the First International Conference on Advanced Data and Information.
3D Fingertip and Palm Tracking in Depth Image Sequences
Knowledge Systems Lab JN 9/10/2002 Computer Vision: Gesture Recognition from Images Joshua R. New Knowledge Systems Laboratory Jacksonville State University.
Multimedia Specification Design and Production 2013 / Semester 2 / week 8 Lecturer: Dr. Nikos Gazepidis
Cut-And-Stitch: Efficient Parallel Learning of Linear Dynamical Systems on SMPs Lei Li Computer Science Department School of Computer Science Carnegie.
Hidden Markov Models Applied to Information Extraction Part I: Concept Part I: Concept HMM Tutorial HMM Tutorial Part II: Sample Application Part II: Sample.
Multimodal Interaction Dr. Mike Spann
Babol university of technology Presentation: Alireza Asvadi
Hand Gesture Recognition System for HCI and Sign Language Interfaces Cem Keskin Ayşe Naz Erkan Furkan Kıraç Özge Güler Lale Akarun.
Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.
7-Speech Recognition Speech Recognition Concepts
GESTURE ANALYSIS SHESHADRI M. (07MCMC02) JAGADEESHWAR CH. (07MCMC07) Under the guidance of Prof. Bapi Raju.
Learning and Recognizing Human Dynamics in Video Sequences Christoph Bregler Alvina Goh Reading group: 07/06/06.
A Method for Hand Gesture Recognition Jaya Shukla Department of Computer Science Shiv Nadar University Gautam Budh Nagar, India Ashutosh Dwivedi.
Project title : Automated Detection of Sign Language Patterns Faculty: Sudeep Sarkar, Barbara Loeding, Students: Sunita Nayak, Alan Yang Department of.
An Information Fusion Approach for Multiview Feature Tracking Esra Ataer-Cansizoglu and Margrit Betke ) Image and.
Recognition, Analysis and Synthesis of Gesture Expressivity George Caridakis IVML-ICCS.
1 Webcam Mouse Using Face and Eye Tracking in Various Illumination Environments Yuan-Pin Lin et al. Proceedings of the 2005 IEEE Y.S. Lee.
M Institute for Human-Machine Communication Munich University of Technology Sascha Schreiber Face Tracking and Person Action.
A New Fingertip Detection and Tracking Algorithm and Its Application on Writing-in-the-air System The th International Congress on Image and Signal.
ECE 8443 – Pattern Recognition EE 3512 – Signals: Continuous and Discrete Objectives: Spectrograms Revisited Feature Extraction Filter Bank Analysis EEG.
Recognizing Action at a Distance Alexei A. Efros, Alexander C. Berg, Greg Mori, Jitendra Malik Computer Science Division, UC Berkeley Presented by Pundik.
Head Tracking in Meeting Scenarios Sascha Schreiber.
Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.
Efficient Visual Object Tracking with Online Nearest Neighbor Classifier Many slides adapt from Steve Gu.
Michael Isard and Andrew Blake, IJCV 1998 Presented by Wen Li Department of Computer Science & Engineering Texas A&M University.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Team Members Ming-Chun Chang Lungisa Matshoba Steven Preston Supervisors Dr James Gain Dr Patrick Marais.
Final Year Project. Project Title Kalman Tracking For Image Processing Applications.
Activity Analysis of Sign Language Video Generals exam Neva Cherniavsky.
Learning video saliency from human gaze using candidate selection CVPR2013 Poster.
CS 548 Spring 2016 Model and Regression Trees Showcase by Yanran Ma, Thanaporn Patikorn, Boya Zhou Showcasing work by Gabriele Fanelli, Juergen Gall, and.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
3D Motion Classification Partial Image Retrieval and Download Multimedia Project Multimedia and Network Lab, Department of Computer Science.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Bayesian Decision Theory Case Studies CS479/679 Pattern Recognition Dr. George Bebis.
A. M. R. R. Bandara & L. Ranathunga
ECE 417 Lecture 1: Multimedia Signal Processing
3D Motion Classification Partial Image Retrieval and Download
Supervised Time Series Pattern Discovery through Local Importance
Intelligent Information System Lab
Video-based human motion recognition using 3D mocap data
Multimodal Caricatural Mirror
Handwritten Characters Recognition Based on an HMM Model
Presentation transcript:

The SIMILAR NoE Summer Workshop 2005 Combined Gesture-Speech Analysis and Synthesis M. Emre Sargın, Engin Erzin, Yücel Yemez, A. Murat Tekalp Multimedia Vision and Graphics Laboratory, Koc University

The SIMILAR NoE Summer Workshop 2005 Outline  Project Objective  Technical Description Preparation of Gesture-Speech Database Detection of Gesture Elements Gesture-Speech Correlation Analysis Synthesis of Gestures Accompanying Speech  Resources  Work Plan  Team Members

The SIMILAR NoE Summer Workshop 2005 Project Objective  The production of speech and gesture is interactive throughout the entire communication process.  Computer-Human Interaction systems should be interactive such that, for an edutainment application, animated person’s speech should be aided and complemented by it’s gestures.  Two main goals of this project: Analysis and modeling of correlation between speech and gestures. Synthesis of correlated natural gestures accompanying speech.

The SIMILAR NoE Summer Workshop 2005 Technical Description  Preparation of Gesture-Speech Database  Detection of Gesture Elements  Gesture-Speech Correlation Analysis  Synthesis of Gestures Accompanying Speech

The SIMILAR NoE Summer Workshop 2005 Preparation of Database  Gestures of a specific person will be investigated.  The video database related with that specific person should include the gestures that he/she frequently uses.  Locations of head, arm, elbows, etc. should easily be detectable and traceable.

The SIMILAR NoE Summer Workshop 2005 Detection of Gesture Elements  In this project, we consider arm and head gestures.  Main tasks included in detection of gesture elements: Tracking of head region. Tracking of hand and possibly shoulder and elbow. Extraction of gesture features. Recognition and labeling of gestures.

The SIMILAR NoE Summer Workshop 2005 Head Region Tracking  To extract motion information coming from head one should first extract head region.  Exhaustive search of head in each frame is a possible solution. However this is computationally inefficient.  Tracking is efficient by the means of computational complexity.  Motion information calculated for tracking will be used for head gesture features.

The SIMILAR NoE Summer Workshop 2005 Tracking Methodology  Exhaustive search for head region in initial frame Haar-Based Face Detection Skin Color information  Extraction of motion information from head region Optical flow vectors Fitting global motion parameters optical flow vectors  Warp search window according to motion information.  Search for head region in the search window.

The SIMILAR NoE Summer Workshop 2005 Head Tracking Results

The SIMILAR NoE Summer Workshop 2005 Hand Tracking Methodology  Hand region will be extracted using skin color information.  Robust State-Space Tracking will be applied. Observations are position of hand. States are position, speed and acceleration of hand. Kalman Filtering removes unwanted noise from features In Regular Kalman Filter, parameters are fixed. In Robust Kalman Filter parameters are re-adjusted for each iteration to minimize MSE and overcome the effects of abrupt changes in motion of hand.

The SIMILAR NoE Summer Workshop 2005 Extraction of Gesture Features  Head Gesture Features: Global Motion Parameters calculated within head region will be used.  Hand Gesture Features: Hand center of mass position and calculated velocity will form hand gesture features.

The SIMILAR NoE Summer Workshop 2005 Gesture-Speech Correlation Analysis  Recognized gestures are labeled w.r.t. time. Head Gestures: Down, Up, Left, Right, Left-Right, … Arm Gestures: Abduction, Adduction, Extension, …  Recognized speech patterns are labeled w.r.t. time. Semantic Info: Approval, Refusal phrases, etc. Prosodic Info: Intonational phrases, ToBI transcriptions, etc.  Correlation Analysis via examining Co-occurrence Matrix Input/Output Hidden Markov Models

The SIMILAR NoE Summer Workshop 2005 Co-occurrence Matrix  Estimation of joint probability distribution function, f(g,s)  For each time sample give a vote to related gesture- speech label pair.  For a specific speech element the most correlated gesture feature will be: g i =argmax ( f (g x,s i ) )  Relatively easy to compute.  Gives an intuition about what we are examining. x

The SIMILAR NoE Summer Workshop 2005 Input/Output Hidden Markov Models  IOHMM is a graphical model which allows the mapping of input sequences into output sequences.  It is used in three tasks of sequence processing: Prediction Regression Classification  The model is trained to maximize the conditional distribution of an output sequence {y 1,…,y t } given an input sequence {x 1,…,x t }.  In our project: Input sequence will be speech labels. Output sequence will be gesture labels.

The SIMILAR NoE Summer Workshop 2005 Synthesis of Gestures Accompanying Speech  Based on the methodology used in correlation analysis given a speech signal: Features will be extracted. Most probable speech label will be designated to speech patterns. Gesture pattern that is most correlated with speech pattern will be used to animate a stick model of a person.

The SIMILAR NoE Summer Workshop 2005 Resources  Database Preparation and Labeling VirtualDub Anvil Paraat  Image Processing and Feature Extraction: Matlab Image Processing Toolbox OpenCV Image Processing Library  Gesture-Speech Correlation Analysis HTK HMM Toolbox Torch Machine Learning Library

The SIMILAR NoE Summer Workshop 2005 Work Plan  Timeline of the project:  Schedule of the lectures:

The SIMILAR NoE Summer Workshop 2005 Team Members  Ferda Ofli Koc University Image, Video Processing and Feature Extraction  Yelena Yasinnik Massachusetts Institute of Technology Audio-Visual Correlation Analysis  Oya Aran Bogazici University Gesture Based Human-Computer Interaction Systems

The SIMILAR NoE Summer Workshop 2005 Team Members  Alexey Anatolievich Karpov Saint-Petersburg Institute for Informatics and Automation Speech Based Human-Computer Interaction Systems  Stephen Wilson University College Dublin Audio-Visual Gesture Annotation  Alexander Refsum Jensenius Department of Music, Oslo University Gesture Analysis

The SIMILAR NoE Summer Workshop 2005 References  Jie Yao and Jeremy R. Cooperstock, “Arm Gesture Detection in a Classroom Environment,” Proc. WACV’02 pp ,  Y. Azoz, L. Devi. R. Sharma, “Tracking Hand Dynamics in Unconstrained Environments,” Proc. Int. Conference on Automatic Face and Gesture Recognition’98 pp ,  S. Malassiotis, N. Aifanti, M.G. Strintzis, “A Gesture Recognition System Using 3D Data,” Proc. Int. Symposium on 3D Data Processing Visualization and Transmission’02 pp ,2002.  J-M. Chung, N. Ohnishi, “Cue Circles: Image Feature for Measuring 3-D Motion of Articulated Objects Using Sequential Image Pair,” Proc. Int. Conference on Automatic Face and Gesture Recognition’98 pp ,  S. Kettebekov, M. Yeasin, R. Sharma, “Prosody based co-analysis for continuous recognition of coverbal gestures,”Proc. ICMI’02 pp ,  F. Quek, D. McNeill, R. Ansari, X-F. Ma, R. Bryll, S. Duncan, K.E. McCullough “Gesture cues for conversational interaction in monocular video,” Proc. Int. Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems’99 pp ,  For detailed information visit:  Rabiner, L.; Juang, B., “An introduction to hidden Markov models” ASSP Magazine, IEEE, Vol.3, Iss.1, pp , Jan 1986  Jae-Moon Chung; Ohnishi, N., “Cue circles: image feature for measuring 3-D motion of articulated objects using sequential image pair” Automatic Face and Gesture Recognition, Proceedings. Third IEEE International Conference on, Vol., Iss., pp , Apr 1998  A. Just, O. Bernier, S. Marcel., “Recognition of isolated complex mono- and bi-manual 3D hand gestures” Proc. 6. ICAFGR, 2004