Speech Processing Using HTK Trevor Bowden 12/08/2008.

Slides:



Advertisements
Similar presentations
Speech Processing AEGIS RET All-Hands Meeting University of Central Florida July 20, 2012 Applications of Images and Signals in High Schools.
Advertisements

Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
Jacob Zurasky ECE5525 Fall  Goals ◦ Determine if the principles of speech processing relate to snoring sounds. ◦ Use homomorphic filtering techniques.
CMSC Assignment 1 Audio signal processing
AN INTRODUCTION TO PRAAT Tina John M.A. Institute of Phonetics and digital Speech Processing - University Kiel Institute of Phonetics and Speech Processing.
Properties of continuous Fourier Transforms
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
AC modeling of quasi-resonant converters Extension of State-Space Averaging to model non-PWM switches Use averaged switch modeling technique: apply averaged.
SOME SIMPLE MANIPULATIONS OF SOUND USING DIGITAL SIGNAL PROCESSING Richard M. Stern demo August 31, 2004 Department of Electrical and Computer.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
ECE Spring 2010 Introduction to ECE 802 Selin Aviyente Associate Professor.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
Representing Acoustic Information
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Speech and Language Processing
Multiresolution STFT for Analysis and Processing of Audio
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Implementing a Speech Recognition System on a GPU using CUDA
Jacob Zurasky ECE5526 – Spring 2011
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Overview of Part I, CMSC5707 Advanced Topics in Artificial Intelligence KH Wong (6 weeks) Audio signal processing – Signals in time & frequency domains.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Hidden Markov Classifiers for Music Genres. Igor Karpov Rice University Comp 540 Term Project Fall 2002.
CEPSTRAL ANALYSIS Cepstral analysis synthesis on the mel frequency scale, and an adaptative algorithm for it. Cecilia Caruncho Llaguno.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
EE Audio Signals and Systems Linear Prediction Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Week 11 – Spectral TV and Convex analysis Guy Gilboa Course
Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
7.0 Speech Signals and Front-end Processing References: , 3.4 of Becchetti of Huang.
PATTERN COMPARISON TECHNIQUES
State Space Representation
ARTIFICIAL NEURAL NETWORKS
Vocoders.
Spoken Digit Recognition
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Cepstrum and MFCC Cepstrum MFCC Speech processing.
Linear Prediction.
Urban Sound Classification with a Convolution Neural Network
3. Applications to Speaker Verification
Linear Predictive Coding Methods
Presentation for EEL6586 Automatic Speech Processing
State Space Analysis UNIT-V.
Digital Systems: Hardware Organization and Design
feature extraction methods for EEG EVENT DETECTION
Uses of filters To remove unwanted components in a signal
Speech Processing Final Project
Presented by Chen-Wei Liu
Presentation transcript:

Speech Processing Using HTK Trevor Bowden 12/08/2008

Outline Concept of Project HTK Feature Extraction Capabilities Details of Feature Extraction Script Future Development

Concept of Project Explore HTK Feature Extraction Capabilities  Feature Output Types  Additional Feature Parameters Ideal Solution  Derive Any Feature Type from Any Corpus

HTK Feature Extraction Models Hamming Window FFT()Log() Linear Prediction Analysis Cepstral Analysis Hamming Window

HTK Feature Extraction Capabilities Feature Extraction Methods  Linear Prediction Analysis  Cepstral Analysis  Mel-Scaling  Perceptual Linear Prediction Analysis Additional Feature Information  Signal Energy  Derivative Information

Linear Prediction Analysis Vocal Tract Transfer Function Transfer Function Coefficients Solution Autocorrelation Matrices Autocorrelation of Speech Amplitude of Model

Cepstral Analysis Logarithmic Spectral Domain (Cepstral Domain) Allows for Separation of Convolved Signals

Mel-Scaling Perception of sound by the human mind is non-linear in that the mind perceives a non-linear scale of pitches to be equally spaced in the frequency domain.

Perceptual Linear Prediction Analysis Perceptual linear prediction is a combination of both linear prediction and Cepstral analysis. The spectrum of the speech data is first converted using the Mel scale. The data is then cubed and linear prediction coefficients are computed. From these coefficients Cepstral analysis is performed.

Signal Energy and Derivatives Signal Energy Delta Coefficients Acceleration Coefficients Third Differential Coefficients

Speech Processing of the AMI Corpus Ideal Solution Yields Generic Feature Types from Generic Corpus Corpora Have Varying Audio File Types and Varying Organizational Structures Corpora Have Varying Methods for Annotation

Speech Processing of the AMI Corpus Project Solution Yields Generic Feature Types from Corpora with Riff Format WAV Audio Files Two Main Functions of Script  Traverse Corpus Directory Tree Generate List of Audio Files Produce Feature Data  Using User-Defined Configuration File

Future Development Expand Script to Handle Audio Inputs of Any File Type Include Processing for Specific Corpus Annotations