Download presentation
1
Investigating physical properties of speech sounds
Acoustic Phonetics Investigating physical properties of speech sounds From chapter 7, Rogers (2000)
2
Speech Sound Representation Reconsidered
Articulatory phonetic approach: Describing sounds depending on how they are produced Problems of this approach Representation is only in terms of symbols Sounds are not like that in reality It’s not reflected that some sounds are more confusing each other when perceived while others are not Eg) i/e vs. a/k s/f vs. s/m So we need another way of describing speech sounds Reviving Sonus
3
Acoustic representation of speech sounds
Representing sounds as they are Visual other than symbolic representation Depending more upon perception than production or articulation Physical properties are analyzed Similarities and differences of sounds are disclosed Reviving Sonus
4
Acoustic definition of sound
Variation in air pressure Movements of air particles An audible disturbance of a medium produced by a source The source: any object that vibrates Eg) musical instruments, human vocal cords, microphone The medium: any elastic object that carries vibration Eg) air, water Reviving Sonus
5
Advantages of acoustic representation
Real/physical mechanism of speech communication is represented No convention, no confusion, no controversy Gradual change of sounds are shown Example) How loud a sound is Small variations are shown Helpful for understanding how computers synthesize speech and how speech recognition works Reviving Sonus
6
What to represent? Three aspects sounds that can differ Pitch Loudness
Quality (Length) Reviving Sonus
7
How to represent acoustically?
Sound is air particle movements The best and agreed way of expressing air particle movements: Waveform Another necessary way of representing sound: Spectrum Reviving Sonus
8
Waveforms
9
Waveform properties Simple harmonic movement + Time elapse Waveform
Individual particles move only backward and forward Reviving Sonus
10
Air particle movement No force Initial force Time Elasticity Inertia
Displacement Reviving Sonus
11
Simple Waveform Reviving Sonus
12
Speech sound properties shown in waveforms
Differentiation of sounds Sounds are different, which is crucial in human speech as a communication method Ways in which sounds can differ Perceptually: Pitch, Loudness, Quality Acoustically: Frequency, Amplitude, Phase Waveform shows differences in Acoustic correlate of Loudness Amplitude Acoustic correlate of Pitch Frequency Reviving Sonus
13
Amplitude representing Loudness
(b) 파란색보다 빨간색 파형의 진폭이 2배로 크다 빨간색 소리가 더 큼 Reviving Sonus
14
Amplitude (cntd.) Air pressure fluctuation
The extent of the maximum variation in air pressure from normal during a sound Unit: Bel, Decibel(dB; 1/10 of Bel), Bark dB: Common logarithm of power ratios Twice amplitude is not heard as twice loud Loud sound: particles move farther and more rapidly Reviving Sonus
15
Frequency representing Pitch
(a) (b) Reviving Sonus
16
Frequency (cntd.) The rate at which sound source vibrates
Sound sources: tuning forks, vocal cords, etc Units: Hz, cps (cycle per second) Depending upon Length of the pendulum Length of tuning fork prongs F(requency) = 1/T(period) SONUS reviving
17
Frequency (cntd.) Standard A frequency: 440 Hz
Octave: a note which is exactly twice the frequency of another note Eg) A(440Hz), A’(880Hz), A’’(1760Hz) Audible Frequency Human: 20Hz(or16Hz) – 20KHz Bats: 20KHz – 100KHz Fastest telephone vibration: 35KHz Most of the human speech sound frequency: below 8KHz Reviving Sonus
18
Frequency (cntd.) Pitch and frequency are not in linear relationship
Only in the low frequency, fairly linear Hz difference sounds greater than Hz difference Reviving Sonus
19
Phase difference Reviving Sonus
20
Phase (cntd.) Phase differences cause different waveforms But
Human ears do not perceive phase differences Reviving Sonus
21
Waveform is not sufficient..
Two sounds with the same pitch and loudness can still differ Example) Violin A vs. Piano A Example) [i] vs. [a] Another way of representation needed Spectrum Reviving Sonus
22
More about waveform first..
To know about spectrum and its representation of quality, we need to know more about waveform Reviving Sonus
23
Types of Waveforms: Pure tones vs. Complex waves
Most sounds, including human speech, sources produce complex vibrations Pure tone: single harmonic motion (SHM), with only one frequency Complex wave: more than one harmonic motion, multiple frequency Pure tone + pure tone of the same frequency and phase another pure tone Pure tone + pure tones of different frequency a complex tone Reviving Sonus
24
Pure tone (Simple Wave, simple harmonic motion, Sinusoid, Sine wave)
Reviving Sonus
25
Complex wave 100 Hz Hz Hz Reviving Sonus
26
[a] production by a female speaker
Complex wave [a] production by a female speaker Reviving Sonus
27
Types of Waveform: Repetitive vs. non-repetitive wave
Strictly Repetitive (periodic): sine wave, ideal sounds Virtually Repetitive (periodic): vowels, sonorants Non-repetitive (aperiodic): obstruents white noise (most complex) click Reviving Sonus
28
Periodic vs non-periodic wave
Aperiodic [s] Periodic wave [a] Reviving Sonus
29
Limitation of Waveform Representation
Sound can be heard in 3 different way Loudness, Pitch, Quality Quality can’t be represented directly in waveforms A new way of representation needed Spectrum Reviving Sonus
30
Spectrum
31
Background Knowledge on Spectrum
Sound waves can be either simple or complex Simple: sinusoid Complex: Combined simple waves with different frequency Sound quality can be determined by the way such simple waves combine into a complex wave If a complex wave can be split into each simple wave we can see the secret Reviving Sonus
32
Waveform and Spectrum (100Hz + 200Hz + 300Hz )
4 2 100 200 300 Hz Reviving Sonus
33
An Example of Spectrum Reviving Sonus
34
Formants shown in spectrum
Frequency component(s) with boosted energy Formant frequency: Its frequency Reason for formant shaping: Filtering function in vocal tract Decisive aspect of sound quality For vowels three formants (F1, F2, F3) are especially important for their distinction Reviving Sonus
35
An Example of Formant : Vowel [«]
Reviving Sonus
36
An Example of Formant: Vowel [e]
1 2 3 4 5 6 50 300 550 800 1050 1300 1550 1800 2050 2300 2550 2800 3050 3300 3550 3800 4050 Hz Amplitude F1 F2 F3 Reviving Sonus
37
Disadvantages of Spectrum Representation
Less intuitive X-axis denotes frequency level No time varying representation Hard to see interaction with Waveforms Thus, a new way of representation needed Spectrogram Reviving Sonus
38
Spectrogram & its reading
39
What is spectrogram? Begin to be used since 1940s
Another representation of frequency domain analysis The most popular way of representing spectral information 3 dimensional representation X-axis: Time Y-axis: Frequency Darkness (or color): Energy Reviving Sonus
40
Waveform & Spectrogram aligned
Reviving Sonus
41
Spectrogram example (color resolution of word “compute”)
Reviving Sonus
42
Spectrogram example (grayscale of word “compute”)
Reviving Sonus
43
Types of spectrogram Wideband spectrogram Narrowband spectrogram
better time resolution Narrowband spectrogram better frequency resolution Reviving Sonus
44
Wideband vs. Narrowband spectrograms of the question "Is Pat sad, or mad?" The 5th, 10th and 15th harmonics have been marked by white squares in two of the vowels Reviving Sonus
45
Advantages & Disadvantages
Time alignment Disadvantages Less reliable than waveform Reviving Sonus
46
Vowel Spectrogram Formant frequencies are critical cues for vowel distinction F1: Height high vowels: low F1 F2: Backness back vowels: low F2 Reviving Sonus
47
Examples of formant frequencies of English monophthongs
à F3 2900 2550 2490 2640 2380 2300 2500 2390 F2 2250 1900 1770 1660 1100 1030 870 1500 1190 F1 280 400 550 690 710 450 310 900 640 Reviving Sonus
48
From http://hctv.humnet.ucla.edu/departments/linguistics
"heed, hid, head, had, hod, hawed, hood, who'd" (a male speaker, American English) From Reviving Sonus
49
Consonant Spectrogram
General Acoustic structure more complicated than vowels Adjacent sounds (especially vowels) convey important information locus High frequency characteristics especially for fricatives and affricates Reviving Sonus
50
What is LOCUS Information of formant transition from vowels into obstruents or from obstruents into vowels The target frequency that each formant transition is heading toward as an obstruction is made, or the frequency the transition comes as the obstruction is released The characteristic of the consonantal place and manner roughly the same in different vowel contexts Reviving Sonus
51
Stops General Fairly distinct locus for each place Burst
Silence during the closure (only at syllable onset position) Virtually no difference during the closure Reviving Sonus
52
Stops (cntd.) Voicing distinction
voiced: vertical striations for voiced sounds, less abrupt burst, frequently weakened to be like fricatives or approximants voiceless: generally abrupt burst at higher frequency area Reviving Sonus
53
Stops (cntd.) Place distinction bilabial alveolar velar
relatively low F2, F3 locus rising into and falling out of vowel weak and spread vertical lines alveolar F2 locus about 1800 Hz Strong vertical lines velar Velar pinch: vowels F2, F3 merging often double burst long formant transitions Reviving Sonus
54
Stops (cntd.) Manner distinction Silence duration, VOT, Following V F0
Aspirated [pH] short long high Tense [p’] Lax [p] mid low Reviving Sonus
55
Examples -- “a bab, a dad, a gag”
Reviving Sonus
56
Place dependent loci Reviving Sonus
57
Fricatives General Random noise pattern especially in high frequency regions Place distinction Labiodental [f, v]: rising locus into the following vowel Dental [T, D]: major energy above 6000Hz Alveolar [s, z]: major energy above 4000Hz Alveopalatal [s&, zà]: major energy above 2000Hz Glottal [h]: the trace of formant frequencies of neighbouring vowels Reviving Sonus
58
Fricatives (cntd.) Weak vs. strong Strong [s, z, s&, zà]: darker bands
Weak [f, v, T, D]: spread and fainter Voiced [v, D ]: often so weak and confused with nasals or approximants Cues to tell [T] from [f]: higher formants of [T] fall into adjacent vowels Reviving Sonus
59
Example – “fie, thigh, sigh, shy”
Reviving Sonus
60
Example – “ever, weather, fizzer, pleasure”
Reviving Sonus
61
Nasals General Place distinction
Formants similar to vowels but fainter Very low F1 (about 250Hz), F2 (about 2500Hz), and F3 (about 3250Hz) Place distinction bilabial [m]: downward F2, F3 locus alveolar [n]: less amount of F2 transition velar [N ]: velar pinch Reviving Sonus
62
Examples -- “a Pam, a tan, a kang”
Reviving Sonus
63
Liquids & Approximants
General Formants similar to vowels but fainter (especially at high frequency regions) Approximately F1(250Hz), F2(1200Hz), F3(2400Hz) Slow formant movements Reviving Sonus
64
Liquids & Approximants (cntd.)
Phone specific properties Labial glide [w]: very low F1, F2 ( Hz|) and gets too close to each relatively low F3 rapid falloff of spectral amplitude (formant movements) Palatal glide [y]: extremely low F1 extremely high F2, F3 Reviving Sonus
65
Liquids & Approximants (cntd.)
Phone specific properties (cntd.) Flap [R]: soft burst, short duration Retroflex [r]: F3 dipping down close to F2 General lowering of F3, F4 Lateral [l]: Low F1, F2 (approx. F1 250Hz, F2 1200Hz) usually substantial energy in the high F region Reviving Sonus
66
Example – “led, red, wed, yell”
Reviving Sonus
67
Final remarks on spectrogram
Spectrogram is not the only cue for acoustic distinction of speech sounds. When there is a mismatch between waveform & spectrogram, the waveform is more reliable in general. Reviving Sonus
68
References & Links Reviving Sonus
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.