A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

Slides:



Advertisements
Similar presentations
Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.
Advertisements

Building an ASR using HTK CS4706
Speech Processing AEGIS RET All-Hands Meeting University of Central Florida July 20, 2012 Applications of Images and Signals in High Schools.
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
CMSC Assignment 1 Audio signal processing
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
ASR Intro: Outline ASR Research History Difficulties and Dimensions Core Technology Components 21st century ASR Research (Next two lectures)
Neural Net Algorithms for SC Vowel Recognition Presentation for EE645 Neural Networks and Learning Algorithms Spring 2003 Diana Stojanovic.
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
Signal Modeling for Robust Speech Recognition With Frequency Warping and Convex Optimization Yoon Kim March 8, 2000.
1 Lab Preparation Initial focus on Speaker Verification –Tools –Expertise –Good example “Biometric technologies are automated methods of verifying or recognising.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter.
A PRESENTATION BY SHAMALEE DESHPANDE
Representing Acoustic Information
Audio Processing for Ubiquitous Computing Uichin Lee KAIST KSE.
Eng. Shady Yehia El-Mashad
Basics of Signal Processing. SIGNALSOURCE RECEIVER describe waves in terms of their significant features understand the way the waves originate effect.
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Implementing a Speech Recognition System on a GPU using CUDA
Jacob Zurasky ECE5526 – Spring 2011
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Performance Comparison of Speaker and Emotion Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Speech Processing Dr. Veton Këpuska, FIT Jacob Zurasky, FIT.
Acoustic to Articoulatory Speech Inversion by Dynamic Time Warping
PATTERN COMPARISON TECHNIQUES
Speech Processing AEGIS RET All-Hands Meeting
Ch. 5: Speech Recognition
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
ARTIFICIAL NEURAL NETWORKS
Speech Processing AEGIS RET All-Hands Meeting
Spoken Digit Recognition
Cepstrum and MFCC Cepstrum MFCC Speech processing.
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Speech Recognition Christian Schulze
PROJECT PROPOSAL Shamalee Deshpande.
Mel-spectrum to Mel-cepstrum Computation A Speech Recognition presentation October Ji Gu
Isolated word, speaker independent speech recognition
Neuro-Fuzzy and Soft Computing for Speaker Recognition (語者辨識)
8-Speech Recognition Speech Recognition Concepts
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Digital Systems: Hardware Organization and Design
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
CS 188: Artificial Intelligence Spring 2006
Mark Hasegawa-Johnson 10/2/2018
Speech Signal Representations
Measuring the Similarity of Rhythmic Patterns
Keyword Spotting Dynamic Time Warping
Presentation transcript:

A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

G OALS Learn how it works ! Focus: Pre-Processing Dynamic Time Warping/Dynamic Programming Verify using MATLAB Build a simple Voice to Text Converter application.

H OW DOES IT WORK ? Record Extract a voice Feature Vectors Digitized Speech Signal (.wave file) Acoustic Preprocessing (DFT + MFCC) Speech Recognizer (Dynamic Time Warping)

S PEECH SIGNAL Voiced Excitation  fundamental frequency (Speaker dependent) Loudness  signal amplitude Vocal tract shape  spectral shaping (most important to recognize words) A time signal of vowel /a:/ (fs=11 kHz, length=100ms) time

ACOUSTIC PRE-PROCESSING DFT (Discrete Fourier Transform)  Spectral Coeff. Inverse DFT on log power spectrum  Cepstral Coeff. Makes it easier to extract spectral shaping of the speech signal. frequency Log power spectrum of vowel /a:/ (fs=11 kHz, N=512) Power spectrum of the vowel /a:/ after cepstral smoothing

MFCC (M EL FREQUENCY CEPSTRAL COEFFICIENTS ) Mel frequency scale reflects frequency resolution of human ear. Coeff. Of power spectrum  Mel Spectral Coeff. (FEATURE VECTOR)

RECOGNIZER One word spoken contains dozens of feature vectors. (preprocessing every 10 ms of signal) Compute a ”distance” between this unknown sequence of vectors (unknown word) and known sequence of vectors (prototypes of words to recognize) PROBLEM !! Unequal length of vector sequence

D YNAMIC TIME WARPING : F IND OPTIMAL ASSIGNMENT PATH

DTW : R ECOGNIZING CONNECTED WORDS

MATLAB FUNCTIONS PRE-PROCESSING recordMelMatrix(3) S = wavread(“speech.wav”) C = Melfiltermatrix(S, N, K) computeMelSpectrum( C,S); DISPLAY FEATURES Featuredisp.m WORD RECOGNITION dp_asym(vector1, vector2)

R ESULTS hellohello1

libraryhello

computerhello

3.0304e e e+00 3

Welcome home (male) Welcome home (female)

Welcome homeWelcome back

Welcome homeComputer Science

Welcome backComputer Science

2.6418e e e e+003

THANKS ! ANY QUESTIONS?