Abstract This article investigates the importance of the vocal source information for speaker recogni- tion. We propose a novel feature extraction scheme.

Slides:



Advertisements
Similar presentations
Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.
Advertisements

© Fraunhofer FKIE Corinna Harwardt Automatic Speaker Recognition in Military Environment.
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Vocal Joystick A New Dimension in Human-Machine Interaction ET 2 Presentation Group 3 Jeremy Moody, Carrie Chudy.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Face Recognition & Biometric Systems, 2005/2006 Face recognition process.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
Speech Recognition in Noise
SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification
Communications & Multimedia Signal Processing Analysis of Effects of Train/Car noise in Formant Track Estimation Qin Yan Department of Electronic and Computer.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Advisor: Prof. Tony Jebara
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh
Representing Acoustic Information
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Blind speech dereverberation using multiple microphones Inseon JANG, Seungjin CHOI Intelligent Multimedia Lab Department of Computer Science and Engineering,
Evaluation of Speaker Recognition Algorithms. Speaker Recognition Speech Recognition and Speaker Recognition speaker recognition performance is dependent.
Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
Multimodal Information Analysis for Emotion Recognition
Overview of Part I, CMSC5707 Advanced Topics in Artificial Intelligence KH Wong (6 weeks) Audio signal processing – Signals in time & frequency domains.
Sound-Event Partitioning and Feature Normalization for Robust Sound-Event Detection 2 Department of Electronic and Information Engineering The Hong Kong.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
AGA 4/28/ NIST LID Evaluation On Use of Temporal Dynamics of Speech for Language Identification Andre Adami Pavel Matejka Petr Schwarz Hynek Hermansky.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
A NEW FEATURE EXTRACTION MOTIVATED BY HUMAN EAR Amin Fazel Sharif University of Technology Hossein Sameti, S. K. Ghiathi February 2005.
HMM-Based Synthesis of Creaky Voice
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
Performance Comparison of Speaker and Emotion Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
More On Linear Predictive Analysis
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
Topic: Pitch Extraction
Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
Speech Enhancement using Excitation Source Information B. Yegnanarayana, S.R. Mahadeva Prasanna & K. Sreenivasa Rao Department of Computer Science & Engineering.
Speech Processing Laboratory, Temple University May 5, Structure-Based Speech Classification Using Nonlinear Embedding Techniques Uchechukwu Ofoegbu.
Linear Prediction.
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
High Quality Voice Morphing
Speech Enhancement Summer 2009
ARTIFICIAL NEURAL NETWORKS
Linear Prediction.
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
A maximum likelihood estimation and training on the fly approach
Speech Processing Final Project
Presenter: Shih-Hsiang(士翔)
Presentation transcript:

Abstract This article investigates the importance of the vocal source information for speaker recogni- tion. We propose a novel feature extraction scheme to exploit the time-frequency propert- ies of the LP residual signal. The new feature, named Wavelet Octave Coefficients of Residues (WOCOR), provides additional speaker discriminative power and is demonstrated to improve the overall performance of speaker recognition system with the conventional vocal tract feature, the MFCCs. Speaker Specific Vocal Source Signal Acknowledgement This effort was partially supported by a research grant awarded by the Hong Kong Research Grant Council. The authors wish to acknowledge Dr. Frank Soong for instructive discussions and suggestions during this work. Time-Frequency Analysis of Vocal Source Signal for Speaker Recognition Nengheng Zheng, P.C. Ching and Tan Lee Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong SAR of China Conclusion The vocal source time-frequency information provides addition speaker discriminative power and improves the overall performance of speaker recognition system Source-tract separation by LP inverse filtering  Estimating the AR coefficients of V(z) by linear prediction analysis  Inverse filtering s(n) for the output e(n)  e(n) is highly related to u(n) and is speaker dependent. Speech production Feature Extraction With Time-Frequency Analysis on the Residual Signal Voice activity detection and pitch tracking  Only voiced segments are interested  Energy and zero-crossing detection for VAD  Cepstrum analysis for pitch tracking LP inverse filtering  Inverse filter the voiced frame speech Pitch synchronous wavelet transform  Exact pitch tracking by detecting the residual bursts  Wavelet transform on every two pitch cycles residual signal with one pitch cycle overlap  Vocal tract features are widely used in speaker recognition system, i.e., MFCC, LPCC, etc.  The vocal-cords vibrating mechanism is speaker dependent  We are aiming at capturing the time frequency properties of the glottal source. Time-frequency feature generation  Firstly, divide the wavelet coefficients into octave groups Glottal pulsesVocal tractSpeech signal u(n)u(n) H(z)s(n)s(n) Experiments Corpus  Read Cantonese HK ID number  40 male speakers  4 enrollment and 6 testing sessions  Microphone and telephone speech Baseline system  MFCC_D_A  128 component GMM Recognition results  Recognition error rate with WOCOR α PerformanceIDER (%)EER (%) MFCC_D_A MIC TEL MFCC_D_A + WOCOR 4 MIC TEL  Recognition error rate with fused source-tract information  Information fusion in score level  Secondly, generate the feature vector, named the first order Wavelet Octave Coefficients of Residues (WOCOR 1 )  Furthermore, to obtain more temporal details, divide each octave into sub-groups and generate high order WOCOR α LP Inverse Filtering VAD and Pitch Tracking Pitch Synchronous Wavelet Transform s(n) Time-Frequency Feature Generation WOCOR e(n)F0F0 We(a,b)We(a,b) Comments  Temporal details of vocal source signal is useful for speaker recognition  In telephone speech, the relative improvement by source information is 24% for identification and 14% for verification; in microphone, 22% and 11% for identification and verification, respectively. with w t experimentally determined s(n)s(n) e(n)e(n)