Presentation for EEL6586 Automatic Speech Processing

Slides:

Advertisements

Similar presentations

© Fraunhofer FKIE Corinna Harwardt Automatic Speaker Recognition in Military Environment.

Advertisements

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson

Toward Automatic Music Audio Summary Generation from Signal Analysis Seminar „Communications Engineering“ 11. December 2007 Patricia Signé.

1 Video Processing Lecture on the image part (8+9) Automatic Perception Volker Krüger Aalborg Media Lab Aalborg University Copenhagen

Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors

Application of light fields in computer vision AMARI LEWIS – REU STUDENT AIDEAN SHARGHI- PH.D STUENT.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

The 1980’s Collection of large standard corpora Front ends: auditory models, dynamics Engineering: scaling to large vocabulary continuous speech Second.

LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.

Segmentation and Event Detection in Soccer Audio Lexing Xie, Prof. Dan Ellis EE6820, Spring 2001 April 24 th, 2001.

Object Recognition with Informative Features and Linear Classification Authors: Vidal-Naquet & Ullman Presenter: David Bradley.

4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,

Speaker Adaptation for Vowel Classification

Speaker Detection Without Models Dan Gillick July 27, 2004.

Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.

ENEE408G Capstone Design Project: Multimedia Signal Processing Group 1 By : William “Chris” Paul Louis Lo Jang-Hyun Ko Ronald McLaren Final Project : V-LOCK.

Advisor: Prof. Tony Jebara

Representing Acoustic Information

How To Do Multivariate Pattern Analysis

Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.

All features considered separately are relevant in a speech / music classification task. The fusion allows to raise the accuracy rate up to 94% for speech.

Macquarie RT05s Speaker Diarisation System Steve Cassidy Centre for Language Technology Macquarie University Sydney.

SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,

Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,

Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.

VBS Documentation and Implementation The full standard initiative is located at Quick description Standard manual.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.

Implementing a Speech Recognition System on a GPU using CUDA

Jacob Zurasky ECE5526 – Spring 2011

Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.

Multimodal Information Analysis for Emotion Recognition

Basics of Neural Networks Neural Network Topologies.

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )

Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School.

Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

A study on face system Speaker: Mine-Quan Jing National Chiao Tung University.

A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.

Singer similarity / identification Francois Thibault MUMT 614B McGill University.

Fast and Robust Algorithm of Tracking Multiple Moving Objects for Intelligent Video Surveillance Systems Jong Sun Kim, Dong Hae Yeom, and Young Hoon Joo,2011.

Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga Maurício O. Tsugawa ©2002,

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,

Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.

BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.

Speaker Verification Using Adapted GMM Presented by CWJ 2000/8/16.

Speech Processing Using HTK Trevor Bowden 12/08/2008.

Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,

Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:

Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.

A Tutorial on Speaker Verification First A. Author, Second B. Author, and Third C. Author.

Day 3: computer vision.

Artificial Intelligence with .NET

Automatic Speech Processing Project

ARTIFICIAL NEURAL NETWORKS

Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.

Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan

A Tutorial on HOG Human Detection

PROJECT PROPOSAL Shamalee Deshpande.

Speech Processing Speech Recognition

3. Applications to Speaker Verification

Isolated word, speaker independent speech recognition

Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa

AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION

Speech Processing Final Project

Keyword Spotting Dynamic Time Warping

THE ASSISTIVE SYSTEM SHIFALI KUMAR BISHWO GURUNG JAMES CHOU

Presentation transcript:

Presentation for EEL6586 Automatic Speech Processing Speaker Verification Seth McNeill 18 April 2003 Presentation for EEL6586 Automatic Speech Processing Hello, I am Seth McNeill and my project is Speaker Verification

Outline of things to come 1) Who am I? 2) Verification vs. ID 3) Features 4) Modeling 5) My Project Here is an outline of the things to come in this presentation. First, Who am I? Second, Speaker Verification vs. Speaker ID Third, What features to use Fourth, how do I model the data? Lastly, my project 2/12

Who am I? First Year Graduate Student First Year at UF From SE Washington State 3/12

Note three things: No trees, hills – this hill is taller top-to-bottom than the tallest point in FL, and lastly, no signs of rain – it is dry

Speaker Verification vs Speaker ID Verification – Are you an imposter? ID – Which of N speakers are you? Speaker verification asks the question, are you who you say you are? Speaker ID asks, who of all the people I know are you? 4/12

Features Mel-Cepstrum, D Cepstrum, Cepstral Mean Subtraction Glottal Flow Derivative Liljencrants-Fant (LF) model Sub-Cepstrum 96% 95% 74% Note that the first feature set used in speaker verification is the same as speech recognition. Actually, speaker ID is very similar to speaker dependent speech recognition Excitation features can be used for speaker ID The glottal flow derivative has a 95% accuracy The Liljencrants-Fant model has 74% accuracy One feature extraction method mentioned in the book that I hadn’t seen before was the sub-cepstrum It is the time domain method of getting mel-cepstrum features. You convolve the time domain impulse response of each of the mel filters with your signal to get the coefficients 5/12

Gaussian Mixture Model (GMM) Loses Temporal Data Person vs. “Background” Person vs. Threshold The gaussian mixture model is used for speaker verification. This looses the temporal data, but that makes sense because what you want is to see what parts of the feature space each speaker occupies There are two methods of testing the model You can the test data to the person and a “background” model the background model is made from lots of people who are not the person you are testing against this requires lots of data. The other method is just to use a threshold. If the likelihood that the test data came from the model is greater than a threshold, the person is not an imposter. 6/12

GMM (continued) Here is a video to show what training the GMM is like using an expectation maximization algorithm. 7/12

My Project C++ Implementation Energy Based Endpoint Detection Mel-Cepstrum, D Cepstrum Coefficients Single Window with Nearest Neighbor Muliple Window with GMM I did a C++ implementation because I wanted my project to work on any Windows computer, not have to buy Matlab. My project uses energy based feature extraction. As we know, this is not the best, but is quick and easy. I have chosen to use the Mel-Cepstrum and delta cepstrum coefficients Depending on time I will either use a single window (remember temporal data doesn’t matter) and nearest neighbor Or I will use multiple windows and GMM. GMM takes more data to train, so using multiple windows helps. 8/12

Current Progress Data Capture from Sound Card Endpoint Detection 9/12 Currently I have the data capture from the sound card working And endpoint detection running. 9/12

Demo 10/12

Future Progress Feature Extraction Modeling Motion Detection Text-to-Speech Visual Verification Future things to do are: Feature extraction, I think I finally found a good way to do that. Modeling – either nearest neighbor or GMM. I think I have software which will help make it easier 11/12

Speaker Verification Questions or Comments? 12/12 Any Questions or Comments? 12/12

Another GMM Video