Download presentation
Presentation is loading. Please wait.
Published byImogen Sharp Modified over 9 years ago
1
Speech Processing Using HTK Trevor Bowden 12/08/2008
2
Outline Concept of Project HTK Feature Extraction Capabilities Details of Feature Extraction Script Future Development
3
Concept of Project Explore HTK Feature Extraction Capabilities Feature Output Types Additional Feature Parameters Ideal Solution Derive Any Feature Type from Any Corpus
4
HTK Feature Extraction Models Hamming Window FFT()Log() Linear Prediction Analysis Cepstral Analysis Hamming Window
5
HTK Feature Extraction Capabilities Feature Extraction Methods Linear Prediction Analysis Cepstral Analysis Mel-Scaling Perceptual Linear Prediction Analysis Additional Feature Information Signal Energy Derivative Information
6
Linear Prediction Analysis Vocal Tract Transfer Function Transfer Function Coefficients Solution Autocorrelation Matrices Autocorrelation of Speech Amplitude of Model
7
Cepstral Analysis Logarithmic Spectral Domain (Cepstral Domain) Allows for Separation of Convolved Signals
8
Mel-Scaling Perception of sound by the human mind is non-linear in that the mind perceives a non-linear scale of pitches to be equally spaced in the frequency domain.
9
Perceptual Linear Prediction Analysis Perceptual linear prediction is a combination of both linear prediction and Cepstral analysis. The spectrum of the speech data is first converted using the Mel scale. The data is then cubed and linear prediction coefficients are computed. From these coefficients Cepstral analysis is performed.
10
Signal Energy and Derivatives Signal Energy Delta Coefficients Acceleration Coefficients Third Differential Coefficients
11
Speech Processing of the AMI Corpus Ideal Solution Yields Generic Feature Types from Generic Corpus Corpora Have Varying Audio File Types and Varying Organizational Structures Corpora Have Varying Methods for Annotation
12
Speech Processing of the AMI Corpus Project Solution Yields Generic Feature Types from Corpora with Riff Format WAV Audio Files Two Main Functions of Script Traverse Corpus Directory Tree Generate List of Audio Files Produce Feature Data Using User-Defined Configuration File
13
Future Development Expand Script to Handle Audio Inputs of Any File Type Include Processing for Specific Corpus Annotations
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.