Download presentation
Presentation is loading. Please wait.
Published byAnnice Henderson Modified over 9 years ago
1
LE 460 L Acoustics and Experimental Phonetics L-13
Anu Khosla DRDO, Delhi 1 1
2
Introduction Most of analysis methods are not designed to analyse sounds whose characteristics are changing in time Practical solution is to model the speech signal as a slowly varying function of time. During intervals of 5 to 25 ms the speech characteristics don’t change too much and are considered to be constant. Analyse in small segments - analysis intervals Optimal analysis interval length depends on the kind of information you want to extract from the speech signal. Therefore the analysis results always represent some kind of average of the analysis interval.
3
Parameters for Analysis
Three parameters to be decided for analysis Window Length There is no one optimal window length that fits all circumstances It depends on the type of analysis and the type of signal e.g - to make spectrograms one often chooses either 5 ms for a wideband spectrogram or 40 ms for narrow band - For pitch analysis a window length of 40 ms is more appropriate
4
Time step This parameter determines the amount of overlap between successive segments. If the time step is much smaller than the window length we have much overlap. If time step is larger than the window length we have no overlap at all. In general we like to have at least 50% overlap between two succeeding frames and we will chose a time step smaller than half the window length.
5
Window shape In general we want the sound segment’s amplitudes to start and end smoothly. A lot of different window shapes are popular in speech analysis, square window (or rectangular window) Hamming window Hanning window Bartlett window. In Praat the default windowing function is the Gaussian window.
7
Speech Analysis Short Time Analysis In time domain
Short time energy:Used to segment speech into smaller units Short time zero crossing: Used to help in making voicing decisions (high ZCR indicates unvoiced speech) Short time autocorrelation : pitch determination In Frequency Domain Fourier analysis:Spectrogram, formants
8
Computerized Speech Precautions
Try to avoid making recordings in reverberant rooms (a church is very reverberant). • Try to avoid making recordings at places where environment is noisy and uncontrollable • To avoid large intensity variations in the recording, the distance from the speaker’s mouth to the microphone should remain as constant as possible. Avoid simultaneous speaking
9
Computerized Speech Speech (sound) is analog Computers are digital
We need to convert Th e s p ee ch s i g n al l e v el v a r ie s w i th t i m(e)
11
Sampling is the reduction of a continuous signal to a discrete signal
Sampling frequency or sampling rate fs is defined as the number of samples obtained in one second (samples per second), fs = 1/T. Shannon and Nyquist proved in the 1930’s that for the digital signal to be a faithful representation of the analog signal, a relation between the sampling frequency and the bandwidth of the signal had to be maintained. The Nyquist-Shannon sampling theorem: A sound s(t) that contains no frequencies higher than F hertz is completely determined by giving its sample values at a series of points spaced 1=(2F ) seconds apart. The number of sample values per second corresponds to the term sampling frequency. Sample values at intervals of 1/2F s translate to a sampling frequency of 2F hertz.
12
Poor Sampling Sampling Frequency = 1/2 X Wave Frequency
Sampling rate 2* wave period
13
Even Worse Sampling Frequency = 1/3 X Wave Frequency
14
Higher Sampling Frequency
Sampling Frequency = 2/3 Wave Frequency
15
Getting Better Sampling Frequency = Wave Frequency
16
Good Sampling Sampling Frequency = 2 X Wave Frequency
17
Shannon-Nyquist's Sampling Theorem
A sampled time signal must not contain components at frequencies above half the sampling rate (The so-called Nyquist frequency) The highest frequency which can be accurately represented is one-half of the sampling rate
18
Range of Human Hearing 20 – 20,000 Hz
We lose high frequency response with age Women generally have better response than men To reproduce 20 kHz requires a sampling rate of 40 kHz Below the Nyquist frequency we introduce aliasing
19
Effect of Aliasing Fourier Theorem states that any waveform can be reproduced by sine waves. Improperly sampled signals will have other sine wave components.
20
Half the Nyquist Frequency
21
Nyquist Frequency
22
Recovery of a sampled sine wave for different sampling rates
23
Sampling
24
Quantization and encoding of a sampled signal
25
Quantization Error When a signal is quantized, we introduce an error - the coded signal is an approximation of the actual amplitude value. The difference between actual and coded value (midpoint) is referred to as the quantization error. The more zones, the smaller which results in smaller errors. BUT, the more zones the more bits required to encode the samples -> higher bit rate
26
Digitization of Analog Signal
Sample analog signal in time and amplitude Find closest approximation Original signal Sample value D/2 3D/2 5D/2 7D/2 -D/2 -3D/2 -5D/2 -7D/2 Approximation 3 bits / sample Rs = Bit rate = # bits/sample x # samples/second
27
All DAC’s have a fixed highest sampling frequency and to guarantee that the input contains no frequencies higher than half this frequency we have to filter them out. If we don’t filter out these frequencies, they get aliased and would also contribute to the digitized representation.
29
For most phonemes, almost all of the energy is contained in the 5Hz-4 kHz range, allowing a sampling rate of 8 kHz. This is the sampling rate used by nearly all telephony systems CD quality audio is recorded at 16-bit.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.