Download presentation
Presentation is loading. Please wait.
Published byDerick Martin Modified over 9 years ago
1
EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003
2
No, not MS Windows ® …
3
…not those either!
4
Speech windows Speech is NONSTATIONARY
5
Assume speech is stationary over ‘short’ window of time. ‘SEVEN’ Speech windows
6
What is a ‘short’ window of time? 10 μs: smallest difference detectable by auditory system (localization), 3 ms: shortest phoneme (plosive burst), 10 ms: glottal pulse period, 100 ms: average phoneme duration, 4 s: exhale period during speech. ‘Short’ depends on application.
7
Applications using windows Automatic speech recognition, Speech coding/decoding, Speaker identification, Text-to-speech synthesis, Noise reduction Typical window (frame) length: 20-30 ms Typical frame rate: 100 frames/sec
8
Short-time analysis s(n) : entire speech utterance w(n) : window function x(n) : frame of speech Window function is non-zero for N samples, n=0,…,N-1
9
Short-term Fourier Transform s(m) : entire speech utterance w(m) : window function X(n,ω) : STFT of speech at time n STFT is a smoothed version of original spectrum.
10
STFT example s(n) : pure sinewave of infinite length w(n) : rectangular window:
11
STFT example |W(ω)| * |S(ω)| ω0ω0 ω0ω0 = |X(ω)|
12
Window types Rectangular Hann (cosine) Hamming (raised cosine) Blackman Kaiser-Bessel Tradeoff between leakage and blurring
13
Window tradeoff Blurring: main lobe width A Leakage: side lobe suppression B B A
14
Popular windows WindowUnit BWSidelobe Rectangle1-13 dB Hann2-31 dB Hamming2-43 dB Blackman3-68 dB Kaiser- Bessel 4-91 dB
15
Practical issues Rule of thumb: –Time domain, use Rectangle window –Freq domain, use Hamming window Why?
16
Time domain issues Correlation in time domain interfered by tapered windows 20 ms /eh/, male utterance, pitch measurement (normalized autocorrelation). First side peak lower using Hamming window
17
Frequency domain issues fs=12.5 KHz, /eh/, 800 samples, male speaker. Blurring/Leakage tradeoff evidence:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.