Jacob Zurasky ECE5526 – Spring 2011

Slides:

Advertisements

Similar presentations

Building an ASR using HTK CS4706

Advertisements

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.

Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson

Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.

Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.

Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.

Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.

Jacob Zurasky ECE5525 Fall  Goals ◦ Determine if the principles of speech processing relate to snoring sounds. ◦ Use homomorphic filtering techniques.

F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)

1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.

A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

On Recognizing Music Using HMM Following the path craved by Speech Recognition Pioneers.

Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.

A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

Representing Acoustic Information

Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.

Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011 Kei Hashimoto, Shinji Takaki, Keiichiro Oura, and Keiichi Tokuda Nagoya.

Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,

Page 0 of 14 Dynamical Invariants of an Attractor and potential applications for speech data Saurabh Prasad Intelligent Electronic Systems Human and Systems.

Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.

Diamantino Caseiro and Isabel Trancoso INESC/IST, 2000 Large Vocabulary Recognition Applied to Directory Assistance Services.

Modeling speech signals and recognizing a speaker.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.

Implementing a Speech Recognition System on a GPU using CUDA

Evaluation of Speaker Recognition Algorithms. Speaker Recognition Speech Recognition and Speaker Recognition speaker recognition performance is dependent.

Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.

Csc Lecture 7 Recognizing speech. Geoffrey Hinton.

Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.

Multimodal Information Analysis for Emotion Recognition

Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,

Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )

Hidden Markov Classifiers for Music Genres. Igor Karpov Rice University Comp 540 Term Project Fall 2002.

A NEW FEATURE EXTRACTION MOTIVATED BY HUMAN EAR Amin Fazel Sharif University of Technology Hossein Sameti, S. K. Ghiathi February 2005.

Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.

Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.

Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1.

Hardware Accelerator for Hot-word Recognition Gautam Das Govardan Jonathan Mathews Wasim Shaikh Mojes Koli.

Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga Maurício O. Tsugawa ©2002,

Performance Comparison of Speaker and Emotion Recognition

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,

Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.

BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.

HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.

Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.

Speech Processing Using HTK Trevor Bowden 12/08/2008.

Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.

1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.

Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.

Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan

ADAPTIVE BABY MONITORING SYSTEM Team 56 Michael Qiu, Luis Ramirez, Yueyang Lin ECE 445 Senior Design May 3, 2016.

Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.

Speech Processing Dr. Veton Këpuska, FIT Jacob Zurasky, FIT.

Speech Processing AEGIS RET All-Hands Meeting

ARTIFICIAL NEURAL NETWORKS

Speech Processing AEGIS RET All-Hands Meeting

Spoken Digit Recognition

Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.

Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan

Mel-spectrum to Mel-cepstrum Computation A Speech Recognition presentation October Ji Gu

AN ANALYSIS OF TWO COMMON REFERENCE POINTS FOR EEGS

Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa

AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION

Artificial Intelligence 2004 Speech & Natural Language Processing

Keyword Spotting Dynamic Time Warping

Presentation transcript:

Jacob Zurasky ECE5526 – Spring 2011 Acoustic Modeling Jacob Zurasky ECE5526 – Spring 2011

Project Goals Generate an Acoustic Model for digit recognition Create a MATLAB tool to transcribe training data to be used in model generation Create a MATLAB tool to generate an Acoustic Model based on the training data and transcriptions

What is an Acoustic Model? A statistical representation of sounds that make up a word. Created from feature vectors that are extracted from the training data

Feature Vectors For each window of speech (~10ms), compress speech data to 39 MFCC’s Impossible to directly compare speech samples to identify characteristics

MFCC’s Mel Frequency Cepstral Coefficients Models the sound samples in a way that is similar to the way the human ear perceives sound Utterance is broken into windows, ~10mS Apply a hamming window to the section

MFCC’s FFT to obtain the log frequency spectrum Mel Filters applied to log FFT output DCT applied to Mel Filter output Result is 12 MFCC’s Compute energy of the signal

Training Data – ‘mark_words.m’ Obtain data containing the words you wish to build an acoustic model for Create a transcription file for each utterance

Training Data – ‘model.m’ For each part of a word to be analyzed, collect all feature vectors associated Repeat for each utterance Calculate the means and variances for each dimension of the feature vector for a given part of the word

Training Data – ‘evaluate.m’ Implements classification of sound Compares current feature vector to the acoustic model Determines if probability is high enough to transition states

Issues Encountered Small training data set Some digits need more than two sections Time constraints of making all transcriptions 1,3,5,7 models fairly reliable 2,4,6 models need refinement

Future Goals Implement true HMM system for classification Add automated training and re-evaluation of the HMM Apply techniques to analysis of breathing sounds for sleep disorder recognition