Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa

Slides:



Advertisements
Similar presentations
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Advertisements

Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Speech Recognition System Jaime Díaz Raiza Muñiz.
A PRESENTATION BY SHAMALEE DESHPANDE
Representing Acoustic Information
Eng. Shady Yehia El-Mashad
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Presented by Tienwei Tsai July, 2005
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Implementing a Speech Recognition System on a GPU using CUDA
Jacob Zurasky ECE5526 – Spring 2011
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
Basics of Neural Networks Neural Network Topologies.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Chapter 5: Speech Recognition An example of a speech recognition system Speech recognition techniques Ch5., v.5b1.
ON REAL-TIME MEAN-AND-VARIANCE NORMALIZATION OF SPEECH RECOGNITION FEATURES Pere Pujol, Dušan Macho, and Climent NadeuNational ICT TALP Research Center.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
VQ for ASR 張智星 多媒體資訊檢索實驗室 清華大學 資訊工程系.
Chapter 9 DTW and VQ Algorithm  9.1 Basic idea of DTW  9.2 DTW algorithm  9.3 Basic idea of VQ  9.4 LBG algorithm  9.5 Improvement of VQ.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Performance Comparison of Speaker and Emotion Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
Speaker Verification System Part B Final Presentation
Predicting Voice Elicited Emotions
Vector Quantization Vector quantization is used in many applications such as image and voice compression, voice recognition (in general statistical pattern.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
ADAPTIVE BABY MONITORING SYSTEM Team 56 Michael Qiu, Luis Ramirez, Yueyang Lin ECE 445 Senior Design May 3, 2016.
BIOMETRICS VOICE RECOGNITION. Meaning Bios : LifeMetron : Measure Bios : LifeMetron : Measure Biometrics are used to identify the input sample when compared.
Speech Processing Dr. Veton Këpuska, FIT Jacob Zurasky, FIT.
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
ARTIFICIAL NEURAL NETWORKS
Spoken Digit Recognition
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Speech Enhancement with Binaural Cues Derived from a Priori Codebook
Customer Satisfaction Based on Voice
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Speech Recognition Christian Schulze
Leigh Anne Clevenger Pace University, DPS ’16
PROJECT PROPOSAL Shamalee Deshpande.
Mel-spectrum to Mel-cepstrum Computation A Speech Recognition presentation October Ji Gu
Sfax University, Tunisia
Neuro-Fuzzy and Soft Computing for Speaker Recognition (語者辨識)
Digital Systems: Hardware Organization and Design
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
A maximum likelihood estimation and training on the fly approach
Speaker Identification:
Presenter: Shih-Hsiang(士翔)
Measuring the Similarity of Rhythmic Patterns
Keyword Spotting Dynamic Time Warping
Presentation transcript:

Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa Text-Independent Speaker Identification System Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa

Outline for Today 1. 2. 3. 4. 5. Speaker Recognition Field System Overview 2. MFCC & VQ 3. Experimental Results 4. Live Demo 5.

Speaker Recognition Field Speaker Verification Speaker Identification Text Dependent Text Independent Text Dependent Text Independent

System Overview Speaker modeling Speaker Model Database Feature Training mode Speaker Model Database Speaker modeling Feature extraction Feature Matching Testing Mode Speech input Speaker ID Decision Logic

Feature Extraction Feature extraction:is a special form of dimensionality reduction. The aim: is to extract the formants.

Feature Extraction The extracted features must have specific characteristics: Easily measurable, occur naturally and frequently in speech. Not change over time. Vary as much among speakers, consistent for each speaker. Not affected by: speaker health, background noise. Many algorithms to extract them: LPC,LPCC,HFCC,MFCC. We used Mel Frequency Cepstral Coefficients algorithm: MFCC.

Feature Extraction Using MFCC Input speech Framing and windowing Fast Fourier transform Absolute value Mel scaled-filter bank Log Feature vectors Discrete cosine transform

Framing And Windowing FFT Spectrum Glottal pulse Vocal tract

Mel Scaled-Filter Bank Spectrum Mel spectrum mel(f)= 2595*log10(1+f/700)

Cepstrum Mel spectrum MFCC Coeff. DCT of the logarithm of the magnitude spectrum, the glottal pulse and the impulse response can be separated.

Classification Classification, that is to build a unique model for each speaker in the database. Two major types of models for classification. Stochastic models: GMM,HMM,ANN Template models: VQ , DTW We used VQ algorithm.

Clustered into codewords VQ Algorithm The VQ technique consists of extracting a small number of representative feature vectors. The first step is to build a speaker-database consisting of N codebooks, one for each speaker in the database. Clustered into codewords Speaker model (codebook) Speaker Feature vectors This done by K-means Clustering algorithm

K-means Clustering start No. of clusters k No yes centroids No change End Distance objects to centroids Grouping based on minimum distance

VQ Example Given data points, split into 4 codebook vectors with initial values at (2,2),(4,6),(6,5),(8,8).

VQ Example Once there’s no more change, the feature space will be partitioned into 4 regions. Any input feature can be classified as belonging to one of the 4 regions. The entire codebook can be specified by the 4 centroid points.

K-means Clustering If we set the codebook size to 8 then the output of the clustering will be: VQ MFCC’s of a speaker (1000x12) Speaker Codebook (8x12)

Feature Matching For each codebook a distortion measure is computed. The speaker with the lowest distortion is chosen. Define the distortion measure Euclidean distance.

System Operates In Two Modes Offline Online Monitoring Microphone Inputs MFCC Feature Extraction Calculate VQ Distortion Make Decision & Display

Applications Speaker Recognition for Authentication. Banking application. Forensic Speaker Recognition Proving the identity of a recorded voice can help to convict a criminal or discharge an innocent in court. Speaker Recognition for Surveillance. Electronic eavesdropping of telephone and radio conversations.

Results 12 MFCC, 29 Filter banks, 64 Codebook size … ELSDSR database. To show how the system identify the speaker according to Euclidean distance calculation. Sp 1 Sp 2 Sp 3 Sp 4 Sp 5 10.7492 13.2712 17.8646 14.7885 13.2859 13.2364 10.2740 13.2884 11.7941 14.0461 17.5438 16.1177 11.9029 16.2916 17.7199 16.1360 13.7095 15.5633 11.7528 16.7327 14.9324 15.7028 17.2842 17.8917 12.3504

Results Frame Size Vs. ID rate. No. of MFCC ID Rate 5 76 % 12 91 % 20 Number of MFCC Vs. ID rate. Frame Size Vs. ID rate. Frame size(10-30) ms Good No. of MFCC ID Rate 5 76 % 12 91 % 20 Above 30 ms Bad

Results The effect of the codebook size on the ID rate & VQ distortion.

Results Number of filter-banks Vs. ID rate & VQ distortion.

Results The performance of the system on different test shot lengths. Test speech length ID Rate 0.2 sec 60 % 2 sec 85 % 6 sec 90 % 10 sec 95 %

Summary Effect of changing some parameters on: MFCC algorithm. VQ algorithm. Our system identify the speaker regardless of the language and the text. Satisfied results: The same training and testing environment. Test data needs to be several ten seconds.

Thank You