M. Emre Sargın, Ferda Ofli, Yelena Yasinnik, Oya Aran, Alexey Karpov, Stephen Wilson,Engin Erzin, Yücel Yemez, A. Murat Tekalp Combined Gesture- Speech.

Slides:



Advertisements
Similar presentations
FEATURE PERFORMANCE COMPARISON FEATURE PERFORMANCE COMPARISON y SC is a training set of k-dimensional observations with labels S and C b C is a parameter.
Advertisements

Facial feature localization Presented by: Harvest Jang Spring 2002.
Automatic in vivo Microscopy Video Mining for Leukocytes * Chengcui Zhang, Wei-Bang Chen, Lin Yang, Xin Chen, John K. Johnstone.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Recent Developments in Human Motion Analysis
Recognition of Human Gait From Video Rong Zhang, C. Vogler, and D. Metaxas Computational Biomedicine Imaging and Modeling Center Rutgers University.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Robust Lane Detection and Tracking
MULTIPLE MOVING OBJECTS TRACKING FOR VIDEO SURVEILLANCE SYSTEMS.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
A Vision-Based System that Detects the Act of Smoking a Cigarette Xiaoran Zheng, University of Nevada-Reno, Dept. of Computer Science Dr. Mubarak Shah,
Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.
The SIMILAR NoE Summer Workshop 2005 Combined Gesture-Speech Analysis and Synthesis M. Emre Sargın, Engin Erzin, Yücel Yemez, A. Murat Tekalp
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
HAND GESTURE BASED HUMAN COMPUTER INTERACTION. Hand Gesture Based Applications –Computer Interface A 2D/3D input device (Hand Tracking) Translation of.
Final Presentation. Lale AkarunOya Aran Alexey Karpov Milos Zeleny Hasim Sak Erinc Dikici Alp Kindiroglu Marek Hruz Pavel Campr Daniel Schorno Alexander.
Knowledge Systems Lab JN 8/24/2015 A Method for Temporal Hand Gesture Recognition Joshua R. New Knowledge Systems Laboratory Jacksonville State University.
Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011 Kei Hashimoto, Shinji Takaki, Keiichiro Oura, and Keiichi Tokuda Nagoya.
Isolated-Word Speech Recognition Using Hidden Markov Models
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Fingertip Tracking Based Active Contour for General HCI Application Proceedings of the First International Conference on Advanced Data and Information.
Knowledge Systems Lab JN 9/10/2002 Computer Vision: Gesture Recognition from Images Joshua R. New Knowledge Systems Laboratory Jacksonville State University.
Exploiting video information for Meeting Structuring ….
CSSE463: Image Recognition Day 30 This week This week Today: motion vectors and tracking Today: motion vectors and tracking Friday: Project workday. First.
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.
7-Speech Recognition Speech Recognition Concepts
A Method for Hand Gesture Recognition Jaya Shukla Department of Computer Science Shiv Nadar University Gautam Budh Nagar, India Ashutosh Dwivedi.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Detecting Pedestrians Using Patterns of Motion and Appearance Paul Viola Microsoft Research Irfan Ullah Dept. of Info. and Comm. Engr. Myongji University.
An Information Fusion Approach for Multiview Feature Tracking Esra Ataer-Cansizoglu and Margrit Betke ) Image and.
Multimodal Information Analysis for Emotion Recognition
Recognition, Analysis and Synthesis of Gesture Expressivity George Caridakis IVML-ICCS.
DIEGO AGUIRRE COMPUTER VISION INTRODUCTION 1. QUESTION What is Computer Vision? 2.
M Institute for Human-Machine Communication Munich University of Technology Sascha Schreiber Face Tracking and Person Action.
A New Fingertip Detection and Tracking Algorithm and Its Application on Writing-in-the-air System The th International Congress on Image and Signal.
ECE 8443 – Pattern Recognition EE 3512 – Signals: Continuous and Discrete Objectives: Spectrograms Revisited Feature Extraction Filter Bank Analysis EEG.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
CAMEO: Year 1 Progress and Year 2 Goals Manuela Veloso, Takeo Kanade, Fernando de la Torre, Paul Rybski, Brett Browning, Raju Patil, Carlos Vallespi, Betsy.
Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.
The geometry of the system consisting of the hyperbolic mirror and the CCD camera is shown to the right. The points on the mirror surface can be expressed.
Rick Parent - CIS681 Motion Analysis – Human Figure Processing video to extract information of objects Motion tracking Pose reconstruction Motion and subject.
Ch 5b: Discriminative Training (temporal model) Ilkka Aho.
Experimental Results Abstract Fingerspelling is widely used for education and communication among signers. We propose a new static fingerspelling recognition.
Performance Comparison of Speaker and Emotion Recognition
By: David Gelbendorf, Hila Ben-Moshe Supervisor : Alon Zvirin
Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.
Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer.
Activity Analysis of Sign Language Video Generals exam Neva Cherniavsky.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Face Recognition and Tracking for Human-Robot Interaction using PeCoFiSH Alex Eisner This material is based upon work supported by the National Science.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
“Articulatory Talking Head” Showcase Project, INRIA, KTH. Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team KTH, Centre.
Detection, Tracking and Recognition in Video Sequences Supervised By: Dr. Ofer Hadar Mr. Uri Perets Project By: Sonia KanOra Gendler Ben-Gurion University.
Motion tracking TEAM D, Project 11: Laura Gui - Timisoara Calin Garboni - Timisoara Peter Horvath - Szeged Peter Kovacs - Debrecen.
Student Gesture Recognition System in Classroom 2.0 Chiung-Yao Fang, Min-Han Kuo, Greg-C Lee, and Sei-Wang Chen Department of Computer Science and Information.
Hand Gestures Based Applications
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
핵심어 검출을 위한 단일 끝점 DTW 알고리즘 Yong-Sun Choi and Soo-Young Lee
ROBUST FACE NAME GRAPH MATCHING FOR MOVIE CHARACTER IDENTIFICATION
Supervised Time Series Pattern Discovery through Local Importance
Tracking parameter optimization
Video-based human motion recognition using 3D mocap data
הפקולטה להנדסת חשמל - המעבדה לבקרה ורובוטיקה גילוי תנועה ועקיבה אחר מספר מטרות מתמרנות הטכניון - מכון טכנולוגי לישראל TECHNION.
Multimodal Caricatural Mirror
CSSE463: Image Recognition Day 30
CSSE463: Image Recognition Day 30
CSSE463: Image Recognition Day 30
Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.
Presentation transcript:

M. Emre Sargın, Ferda Ofli, Yelena Yasinnik, Oya Aran, Alexey Karpov, Stephen Wilson,Engin Erzin, Yücel Yemez, A. Murat Tekalp Combined Gesture- Speech Analysis and Synthesis

Outline Project Objective Technical Details –Preparation of Gesture-Speech Database –Determination of Gestural – Auditory Events –Detection of Gestural – Auditory Events –Gesture-Speech Correlation Analysis –Synthesis of Gestures Accompanying Speech Resources Concluding Remarks and Future Work Demonstration

Project Objective The production of speech and gesture is interactive throughout the entire communication process. Computer-Human Interaction systems should be interactive such that, for an edutainment application, animated person’s speech should be aided and complemented by it’s gestures. Two main goals of this project: –Analysis and modeling of correlation between speech and gestures. –Synthesis of correlated natural gestures accompanying speech.

Technical Details Preparation of Gesture-Speech Database Determination of Gestural – Auditory Events Detection of Gestural – Auditory Events Gesture-Speech Correlation Analysis Synthesis of Gestures Accompanying Speech

Preparation of Database Gestures and Speech of a specific subject (Can-Ann) was investigated. 25 minutes video of a native English speaker giving directions, 25 fps, frames.

Determination of Gestural – Auditory Events Database is manually examined to find specific, repetitive gestural and auditory events. Note that, the events found for one specific subject is personal and can vary from culture to culture. –During the refusal phrases Turkish Style → Upward Movement of Head European Style → Left-Right Movement of Head –The Can-Ann does not use these gestural events at all. Auditory Events: –Semantic Information (Keywords): “Left”, “Right” and “Straight”. –Prosodic Information: “Accent”. Gestural Events: –Head Movements: “Down”, “Tilt”. –Hand Movements: “Left”, “Right”, “Straight”.

Correlation Results

Detection of Gesture Elements In this project, we consider arm and head gestures. Gesture features are selected as: –Head Gesture Features: Global Motion Parameters calculated within head region. –Hand Gesture Features: Hand center of mass position and calculated velocity. Main tasks included in detection of gesture elements: –Tracking of head region. Optical Flow Based –Tracking of hand region. Kalman Filter Based Particle Filter Based –Extraction of gesture features. –Recognition and labeling of gestures.

Detection of Auditory Elements In this project, we consider semantic and prosodic events. Main tasks included in detection of gesture elements: –Extraction of Speech Features: MFCC Pitch Intensity –Keyword Spotting HMM Based Dynamic Time Warping Based –Accent Detection HMM Based Sliding Window Based

Training Testing Grammar: left right straight silence garbage Training speech Labels for keywords Unknown speechLabels for keywords - Hidden Markov Toolkit (HTK) was used as base technology for development of keyword spotter - 20 minutes of speech were labelled manually and used for training - Speaker-dependent speech recognition system - each keyword was pronounced in training speech at least 30 times Keyword Spotting (HMM Based): Training

- 5.5 minutes of speech were used for testing - Speech fragment contains aproximately 600 words of which 35 are keywords First experiments: keyword spotter was able to find almost all keywords in test speech, but it gives many false alarms. Keyword Spotting (HMM Based): Testing

Keyword Spotting (Dynamic Time Warping) MFCC Parameters are used for parameterization Dynamic time warping method is used to find an optimal match between two given sequences (e.g. time series). Results: Recognized keywordsMissed wordsFalse alarms 33222

Accent Detection (Sliding Window Based) Parameters are calculated given a sliding window: –Pitch contour –Number of local minimum and maximum in pitch contour –Intensity Windows that has high intensity values are selected. Median Filtering is used to remove short windows. The candidate accent windows are labeled using connected component analysis. The candidate accent regions that contain few or many local minimums and maximums are eliminated. Remaining candidate regions are selected as accents. Proposed method detects %68 of accents and gives 25% F.A.

Synthesis of Gestures Accompanying Speech Based on the methodology used in correlation analysis given a speech signal: –Features will be extracted. –Most probable speech label will be designated to speech patterns. –Gesture pattern that is most correlated with speech pattern will be used to animate a stick model of a person.

Hand Gesture Models Original Hand Trajectories Generated trajectories based on HMM

Resources Database Preparation and Labeling –VirtualDub –Anvil –Paraat Image Processing and Feature Extraction: –Matlab Image Processing Toolbox –OpenCV Image Processing Library Gesture-Speech Correlation Analysis –HTK HMM Toolbox –Torch Machine Learning Library

Concluding Remarks and Future Work Database will be extended with new subjects. Algorithms and methods will be tested using new databases. HMM based accent detector will be implemented. Keyword and event sets will be extended. Database scenarios will be extended.

Demonstration I

Demonstration II