Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Stuttgart, October 18, 2001
Goals of the Course Our part Your part ? Basic theoretical concepts about Speech Signal Processing - SDSP Waveform generation for TTS systems - TTS Automatic Speech Recognition (Statistical approach)- ASR Fundaments of programing in Matlab It will be the tool used for our simulations Your part ? Describe and justify the important aspects and drawbacks in the algorithm. Next term: Speech Signal Processing II Going deeper into more Theoretical and Pratical aspects of : SSP, TTS and ASR.
Tutorial of Matlab Principles of linear algebra Programing in Matlab Vectors, Matrices, linear systems Programing in Matlab Variables, operators, ... if statements, switch statements, for loops, while loops, continue statements, break statements, ... I/O operations Graphical visualization Executable files Subroutines
Matlab : Graphical visualization [X,Y] = meshgrid(-8:.5:8); R = sqrt(X.^2 + Y.^2) + eps; Z = sin(R)./R; mesh(X,Y,Z,'EdgeColor','black') surf(X,Y,Z,'FaceColor','red','EdgeColor','none'); camlight left; lighting phong
Matlab : Graphical visualization – Optimization in a hiperbolic (quadratic) surface Mean squared error - E Weight
SDSP : Looking through time Speech signal : Analog and digital amplitude quantization Sampling rate time
SDSP : Transformation and Digital filters Transformations Z-Transforms, Fourier transforms Digital filters FIR, IIR
SDSP – Frame based analysis Waveform multiplied for the hanning window : xw Hanning window : w Magnitude of the spectrum of xw Freq. Response of the LP-filter
SDSP - Looking at frequency components through time Current Previous Before smoothing After smoothing
SDSP : Vector quantization Voronoi Space : Centroid and Distortion meassure
TTS - Waveform generation for TTS Analysis and Resynthesis – Coding and Decoding L P A n a l y s i ( z ) I v e r F t 1 c h M k o p S m g T R d u x E O C D f . U / V Parametrization : Mapping the waveform into a set of parameters Reconstruction: Synthesis of the waveform from the set of parameters. Prosody : F0 Duration Amplitude A – LP coeficients e – LP residue En – Prototypes Fo – Fundamental frequency U/UV – Voiced / Unvoiced transitions
TTS - Waveform generation for TTS Speech coding Parametric coders, Waveform coders, Hybrid coders TTS – Concatenative approach Time scale and Frequency scale modifications Spectral smoothings Unit selection Original TTS Original Resynthesized Modified : sin(x+)
ASR - Automatic Speech Recognition Front-End Signal Processing Feature extraction Perceptual domain, Articulatory domain Acoustic modeling HMM : Hidden Markov Model ANN/HMM : Hybrid models - Artificial Neural Network and HMM Statistical Language Modeling N-grammars, smoothing techniques Search : Decoding Viterbi, Stack decoding, ...
ASR – HMM - Topology Ergotic model Left-right model
ASR – HMM – Basic principle
ASR – HMM - Viterbi alignment 5 1 2 ( a ) b c d
ASR – HMM – Forward-Backward
ASR – ANN/HMM
Evaluation : Exercises and Simulations List of Exercises SDSP, TTS, ASR Simulations SDSP Vector quantization TTS Waveform Interpolation ASR Acoustic modeling using : HMM and ANN+HMM Language modeling Decoding
Evaluation : Report Reports 4 pages, two colunms. Sections Write the analysis and results of the simulation in a format of a paper 4 pages, two colunms. Sections Abstract Introduction Brief theoretical description of the method Methodology used to perform the experiment Results Conclusions and suggestions for further works Bibliograph
Days of classes Normal semester 2001 October : 18, 25, (01 is a hollyday) November : 8, 15, 22, 29 December : 6,13,20 2002 January : 10,17,24,31 February : 7,14 Total : 15 days. Option two October : 18, 25 March : An one week block seminar : 1.5 hours a day. Total : 13 days. Option one October : 16,18,23,25,30 November : 6,8,13,15,20,22,27,29 February : 5,7,12,14 Total : 17 days. Option three March : An one week block seminar : 3 hours a day. Equivalent to 15 days