UCD Electronic and Electrical Engineering Robust Multi-modal Person Identification with Tolerance of Facial Expression Niall Fox Dr Richard Reilly University.

Slides:



Advertisements
Similar presentations
Descriptive schemes for facial expression introduction.
Advertisements

CRICOS No J † CSIRO ICT Centre * Speech, Audio, Image and Video Research Laboratory Audio-visual speaker verification using continuous fused HMMs.
© Fraunhofer FKIE Corinna Harwardt Automatic Speaker Recognition in Military Environment.
Combining Heterogeneous Sensors with Standard Microphones for Noise Robust Recognition Horacio Franco 1, Martin Graciarena 12 Kemal Sonmez 1, Harry Bratt.
Advanced Speech Enhancement in Noisy Environments
Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks Sergio Escalera, Petia Radeva, Jordi Vitrià, Xavier Barò and Bogdan Raducanu.
Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.
Performance Evaluation Measures for Face Detection Algorithms Prag Sharma, Richard B. Reilly DSP Research Group, Department of Electronic and Electrical.
A Colour Face Image Database for Benchmarking of Automatic Face Detection Algorithms Prag Sharma, Richard B. Reilly UCD DSP Research Group This work is.
GMM-Based Multimodal Biometric Verification Yannis Stylianou Yannis Pantazis Felipe Calderero Pedro Larroy François Severin Sascha Schimke Rolando Bonal.
Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban.
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Real-time Embedded Face Recognition for Smart Home Fei Zuo, Student Member, IEEE, Peter H. N. de With, Senior Member, IEEE.
Communications & Multimedia Signal Processing Analysis of the Effects of Train noise on Recognition Rate using Formants and MFCC Esfandiar Zavarehei Department.
Lip Feature Extraction Using Red Exclusion Trent W. Lewis and David M.W. Powers Flinders University of SA VIP2000.
CRICOS No J † e-Health Research Centre/ CSIRO ICT Centre * Speech, Audio, Image and Video Research Laboratory Comparing Audio and Visual Information.
W M AM A I AI IM AIM Time (samples) Response (V) True rating Predicted rating  =0.94  =0.86 Irritation Pleasantness.
ICASSP'06 1 S. Y. Kung 1 and M. W. Mak 2 1 Dept. of Electrical Engineering, Princeton University 2 Dept. of Electronic and Information Engineering, The.
Visual Speech Recognition Using Hidden Markov Models Kofi A. Boakye CS280 Course Project.
Communications & Multimedia Signal Processing Analysis of Effects of Train/Car noise in Formant Track Estimation Qin Yan Department of Electronic and Computer.
Oral Defense by Sunny Tang 15 Aug 2003
Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.
Representing Acoustic Information
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
CRICOS No J † CSIRO ICT Centre * Speech, Audio, Image and Video Research Laboratory An Examination of Audio-Visual Fused HMMs for Speaker Recognition.
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.
EE 492 ENGINEERING PROJECT LIP TRACKING Yusuf Ziya Işık & Ashat Turlibayev Yusuf Ziya Işık & Ashat Turlibayev Advisor: Prof. Dr. Bülent Sankur Advisor:
LOGO On Person Authentication by Fusing Visual and Thermal Face Biometrics Presented by: Rubie F. Vi ñ as, 方如玉 Adviser: Dr. Shih-Chung Chen, 陳世中.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
Multiple Image Watermarking Applied to Health Information Management
Jacob Zurasky ECE5526 – Spring 2011
Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.
Multimodal Information Analysis for Emotion Recognition
Sound-Event Partitioning and Feature Normalization for Robust Sound-Event Detection 2 Department of Electronic and Information Engineering The Hong Kong.
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
Image Processing Architecture, © 2001, 2002, 2003 Oleh TretiakPage 1 ECE-C490 Image Processing Architecture MP-3 Compression Course Review Oleh Tretiak.
Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School.
Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Experimental Results Abstract Fingerspelling is widely used for education and communication among signers. We propose a new static fingerspelling recognition.
Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga Maurício O. Tsugawa ©2002,
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
From Error Control to Error Concealment Dr Farokh Marvasti Multimedia Lab King’s College London.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
Wrapping Snakes For Improved Lip Segmentation Matthew Ramage Dr Euan Lindsay (Supervisor) Department of Mechanical Engineering.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Alexandrina Rogozan Adaptive Fusion of Acoustic and Visual Sources for Automatic Speech Recognition UNIVERSITE du MAINE
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments Good morning, My name is Guan-Lin Chao, from Carnegie Mellon.
ARTIFICIAL NEURAL NETWORKS
Feature Mapping FOR SPEAKER Diarization IN NOisy conditions
Spoken Digit Recognition
Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.
Speech and Audio Processing
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Visual Processing in Fingerprint Experts and Novices
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
Multimodal Caricatural Mirror
SNR-Invariant PLDA Modeling for Robust Speaker Verification
Presenter: Shih-Hsiang(士翔)
End-to-End Speech-Driven Facial Animation with Temporal GANs
Presentation transcript:

UCD Electronic and Electrical Engineering Robust Multi-modal Person Identification with Tolerance of Facial Expression Niall Fox Dr Richard Reilly University College Dublin Ireland

UCD Electronic and Electrical Engineering Overview Motivation Analysis for Speech and Mouth Feature Experts Results for Individual 2 Experts Automatic Integration of Experts Results of Integration Conclusions

UCD Electronic and Electrical Engineering Motivation Human Communication is multimodal Benefits of using visual information - Unaffected by acoustic noise - Complementary to audio signal - Audio and visual noise is uncorellated - Increased robustness and accuracy

UCD Electronic and Electrical Engineering Audio-Visual Platform Score Integration Feature Extraction Modelling/ Scoring

UCD Electronic and Electrical Engineering Audio Expert 20 ms Hamming window, 10 ms overlap 16 static features –15 Mel Frequency Cepstrum Coefficients (MFCC) –1 Energy of each frame 16 delta features

UCD Electronic and Electrical Engineering Mouth Features Expert ROI Extraction Gray scale image is employed Pre-processing: –Histogram-equalisation, –De-meaning DCT Transform applied to ROI (Top 14 features selected)

UCD Electronic and Electrical Engineering Database XM2VTS database 295 subjects 4 sessions (monthly spaced) of the sentence “Joe took fathers green shoe bench out”

UCD Electronic and Electrical Engineering Person Identification Tests Tested on 251 subjects from database of 295 Train models on monthly sessions 1, 2 and 3, Test on session 4 HMMs model audio and mouth features AWGN was added to the audio JPEG compression of video images

UCD Electronic and Electrical Engineering Audio Expert Scores 97% at 48 dB, 37% at 21dB Large roll off Audio SNR [dB] Identification Accuracy [%] Audio

UCD Electronic and Electrical Engineering Image Degradation Levels QF 8QF 6QF 4QF 3QF 2 QF 50 QF 25 QF 18QF 14 QF 10 QF 8QF 6QF 4QF 3QF 2 QF 50 QF 25 QF 18QF 14 QF levels of JPEG compression Image frames Mouth regions

UCD Electronic and Electrical Engineering Mouth Features Expert Scores JPEG Quality Factor Identification Accuracy [%] 86% at GF = 50, 48% at QF = 2

UCD Electronic and Electrical Engineering Audio-Visual Platform Score Integration Feature Extraction Modelling/ Scoring

UCD Electronic and Electrical Engineering Expert Weightings Weighted Likelihood Summation )|(.)|(.)|,( iVV ViAAAiVAAV SOlSOlSOOl  V}{A,m ),|()|(    SOlSOl mmmmm }{maxarg )1,0( VAV opt V V    Expert Reliability Measure Automatically Choose Weight ] 1,0 [, and 1  VAVA 

UCD Electronic and Electrical Engineering Expert Weightings Automatically choose weight

UCD Electronic and Electrical Engineering Fusion of Audio and Mouth Feature Experts A = 37% at 21dB, V = 48% at QF = 2, AV = 72% at (21db, QF=2) Accuracy [%] Audio Level [SNR ] Visual Level [JPEG] Visual Alone Audio Alone

UCD Electronic and Electrical Engineering Conclusions AV system is robust to both audio and visual degradations High performance of mouth region (85%) -Robust to facial expressions, occlusion. Further work Test other types of audio and visual degradations XM2VTS DB: High quality Record real world data in office type scenario …

UCD Electronic and Electrical Engineering XM2VTS Database Controlled, uniform illumination Constant visual background Controlled acousitc background

UCD Electronic and Electrical Engineering UCD Recordings Non-controlled, non-uniform illumination Varying viusal background Noisy acousitc background

UCD Electronic and Electrical Engineering Niall Fox Web: Dr Richard Reilly DSP Group, UCD, Dublin, Ireland This work is supported by Enterprise Ireland under the Informatics Research Initiative