Presentation is loading. Please wait.

Presentation is loading. Please wait.

Investigating physical properties of speech sounds

Similar presentations


Presentation on theme: "Investigating physical properties of speech sounds"— Presentation transcript:

1 Investigating physical properties of speech sounds
Acoustic Phonetics Investigating physical properties of speech sounds From chapter 7, Rogers (2000)

2 Speech Sound Representation Reconsidered
Articulatory phonetic approach: Describing sounds depending on how they are produced Problems of this approach Representation is only in terms of symbols  Sounds are not like that in reality It’s not reflected that some sounds are more confusing each other when perceived while others are not Eg) i/e vs. a/k s/f vs. s/m So we need another way of describing speech sounds Reviving Sonus

3 Acoustic representation of speech sounds
Representing sounds as they are Visual other than symbolic representation Depending more upon perception than production or articulation Physical properties are analyzed Similarities and differences of sounds are disclosed Reviving Sonus

4 Acoustic definition of sound
Variation in air pressure Movements of air particles An audible disturbance of a medium produced by a source The source: any object that vibrates Eg) musical instruments, human vocal cords, microphone The medium: any elastic object that carries vibration Eg) air, water Reviving Sonus

5 Advantages of acoustic representation
Real/physical mechanism of speech communication is represented No convention, no confusion, no controversy Gradual change of sounds are shown Example) How loud a sound is Small variations are shown Helpful for understanding how computers synthesize speech and how speech recognition works Reviving Sonus

6 What to represent? Three aspects sounds that can differ Pitch Loudness
Quality (Length) Reviving Sonus

7 How to represent acoustically?
Sound is air particle movements The best and agreed way of expressing air particle movements: Waveform Another necessary way of representing sound: Spectrum Reviving Sonus

8 Waveforms

9 Waveform properties Simple harmonic movement + Time elapse  Waveform
Individual particles move only backward and forward Reviving Sonus

10 Air particle movement No force Initial force Time Elasticity Inertia
Displacement Reviving Sonus

11 Simple Waveform Reviving Sonus

12 Speech sound properties shown in waveforms
Differentiation of sounds Sounds are different, which is crucial in human speech as a communication method Ways in which sounds can differ Perceptually: Pitch, Loudness, Quality Acoustically: Frequency, Amplitude, Phase Waveform shows differences in Acoustic correlate of Loudness  Amplitude Acoustic correlate of Pitch  Frequency Reviving Sonus

13 Amplitude representing Loudness
(b) 파란색보다 빨간색 파형의 진폭이 2배로 크다  빨간색 소리가 더 큼 Reviving Sonus

14 Amplitude (cntd.) Air pressure fluctuation
The extent of the maximum variation in air pressure from normal during a sound Unit: Bel, Decibel(dB; 1/10 of Bel), Bark dB: Common logarithm of power ratios Twice amplitude is not heard as twice loud Loud sound: particles move farther and more rapidly Reviving Sonus

15 Frequency representing Pitch
(a) (b) Reviving Sonus

16 Frequency (cntd.) The rate at which sound source vibrates
Sound sources: tuning forks, vocal cords, etc Units: Hz, cps (cycle per second) Depending upon Length of the pendulum Length of tuning fork prongs F(requency) = 1/T(period) SONUS reviving

17 Frequency (cntd.) Standard A frequency: 440 Hz
Octave: a note which is exactly twice the frequency of another note Eg) A(440Hz), A’(880Hz), A’’(1760Hz) Audible Frequency Human: 20Hz(or16Hz) – 20KHz Bats: 20KHz – 100KHz Fastest telephone vibration: 35KHz Most of the human speech sound frequency: below 8KHz Reviving Sonus

18 Frequency (cntd.) Pitch and frequency are not in linear relationship
Only in the low frequency, fairly linear Hz difference sounds greater than Hz difference Reviving Sonus

19 Phase difference Reviving Sonus

20 Phase (cntd.) Phase differences cause different waveforms But
Human ears do not perceive phase differences Reviving Sonus

21 Waveform is not sufficient..
Two sounds with the same pitch and loudness can still differ Example) Violin A vs. Piano A Example) [i] vs. [a] Another way of representation needed Spectrum Reviving Sonus

22 More about waveform first..
To know about spectrum and its representation of quality, we need to know more about waveform Reviving Sonus

23 Types of Waveforms: Pure tones vs. Complex waves
Most sounds, including human speech, sources produce complex vibrations Pure tone: single harmonic motion (SHM), with only one frequency Complex wave: more than one harmonic motion, multiple frequency Pure tone + pure tone of the same frequency and phase  another pure tone Pure tone + pure tones of different frequency  a complex tone Reviving Sonus

24 Pure tone (Simple Wave, simple harmonic motion, Sinusoid, Sine wave)
Reviving Sonus

25 Complex wave 100 Hz Hz Hz Reviving Sonus

26 [a] production by a female speaker
Complex wave [a] production by a female speaker Reviving Sonus

27 Types of Waveform: Repetitive vs. non-repetitive wave
Strictly Repetitive (periodic): sine wave, ideal sounds Virtually Repetitive (periodic): vowels, sonorants Non-repetitive (aperiodic): obstruents white noise (most complex) click Reviving Sonus

28 Periodic vs non-periodic wave
Aperiodic [s] Periodic wave [a] Reviving Sonus

29 Limitation of Waveform Representation
Sound can be heard in 3 different way Loudness, Pitch, Quality Quality can’t be represented directly in waveforms A new way of representation needed Spectrum Reviving Sonus

30 Spectrum

31 Background Knowledge on Spectrum
Sound waves can be either simple or complex Simple: sinusoid Complex: Combined simple waves with different frequency Sound quality can be determined by the way such simple waves combine into a complex wave If a complex wave can be split into each simple wave we can see the secret Reviving Sonus

32 Waveform and Spectrum (100Hz + 200Hz + 300Hz )
4 2 100 200 300 Hz Reviving Sonus

33 An Example of Spectrum Reviving Sonus

34 Formants shown in spectrum
Frequency component(s) with boosted energy Formant frequency: Its frequency Reason for formant shaping: Filtering function in vocal tract Decisive aspect of sound quality For vowels three formants (F1, F2, F3) are especially important for their distinction Reviving Sonus

35 An Example of Formant : Vowel [«]
Reviving Sonus

36 An Example of Formant: Vowel [e]
1 2 3 4 5 6 50 300 550 800 1050 1300 1550 1800 2050 2300 2550 2800 3050 3300 3550 3800 4050 Hz Amplitude F1 F2 F3 Reviving Sonus

37 Disadvantages of Spectrum Representation
Less intuitive X-axis denotes frequency level No time varying representation Hard to see interaction with Waveforms Thus, a new way of representation needed  Spectrogram Reviving Sonus

38 Spectrogram & its reading

39 What is spectrogram? Begin to be used since 1940s
Another representation of frequency domain analysis The most popular way of representing spectral information 3 dimensional representation X-axis: Time Y-axis: Frequency Darkness (or color): Energy Reviving Sonus

40 Waveform & Spectrogram aligned
Reviving Sonus

41 Spectrogram example (color resolution of word “compute”)
Reviving Sonus

42 Spectrogram example (grayscale of word “compute”)
Reviving Sonus

43 Types of spectrogram Wideband spectrogram Narrowband spectrogram
better time resolution Narrowband spectrogram better frequency resolution Reviving Sonus

44 Wideband vs. Narrowband spectrograms of the question "Is Pat sad, or mad?" The 5th, 10th and 15th harmonics have been marked by white squares in two of the vowels Reviving Sonus

45 Advantages & Disadvantages
Time alignment Disadvantages Less reliable than waveform Reviving Sonus

46 Vowel Spectrogram Formant frequencies are critical cues for vowel distinction F1: Height high vowels: low F1 F2: Backness back vowels: low F2 Reviving Sonus

47 Examples of formant frequencies of English monophthongs
à F3 2900 2550 2490 2640 2380 2300 2500 2390 F2 2250 1900 1770 1660 1100 1030 870 1500 1190 F1 280 400 550 690 710 450 310 900 640 Reviving Sonus

48 From http://hctv.humnet.ucla.edu/departments/linguistics
"heed, hid, head, had, hod, hawed, hood, who'd" (a male speaker, American English) From Reviving Sonus

49 Consonant Spectrogram
General Acoustic structure more complicated than vowels Adjacent sounds (especially vowels) convey important information  locus High frequency characteristics  especially for fricatives and affricates Reviving Sonus

50 What is LOCUS Information of formant transition from vowels into obstruents or from obstruents into vowels The target frequency that each formant transition is heading toward as an obstruction is made, or the frequency the transition comes as the obstruction is released The characteristic of the consonantal place and manner  roughly the same in different vowel contexts Reviving Sonus

51 Stops General Fairly distinct locus for each place Burst
Silence during the closure (only at syllable onset position) Virtually no difference during the closure Reviving Sonus

52 Stops (cntd.) Voicing distinction
voiced: vertical striations for voiced sounds, less abrupt burst, frequently weakened to be like fricatives or approximants voiceless: generally abrupt burst at higher frequency area Reviving Sonus

53 Stops (cntd.) Place distinction bilabial alveolar velar
relatively low F2, F3 locus  rising into and falling out of vowel weak and spread vertical lines alveolar F2 locus about 1800 Hz Strong vertical lines velar Velar pinch: vowels F2, F3 merging often double burst long formant transitions Reviving Sonus

54 Stops (cntd.) Manner distinction Silence duration, VOT, Following V F0
Aspirated [pH] short long high Tense [p’] Lax [p] mid low Reviving Sonus

55 Examples -- “a bab, a dad, a gag”
Reviving Sonus

56 Place dependent loci Reviving Sonus

57 Fricatives General Random noise pattern especially in high frequency regions Place distinction Labiodental [f, v]: rising locus into the following vowel Dental [T, D]: major energy above 6000Hz Alveolar [s, z]: major energy above 4000Hz Alveopalatal [s&, zà]: major energy above 2000Hz Glottal [h]: the trace of formant frequencies of neighbouring vowels Reviving Sonus

58 Fricatives (cntd.) Weak vs. strong Strong [s, z, s&, zà]: darker bands
Weak [f, v, T, D]: spread and fainter Voiced [v, D ]: often so weak and confused with nasals or approximants Cues to tell [T] from [f]: higher formants of [T] fall into adjacent vowels Reviving Sonus

59 Example – “fie, thigh, sigh, shy”
Reviving Sonus

60 Example – “ever, weather, fizzer, pleasure”
Reviving Sonus

61 Nasals General Place distinction
Formants similar to vowels but fainter Very low F1 (about 250Hz), F2 (about 2500Hz), and F3 (about 3250Hz) Place distinction bilabial [m]: downward F2, F3 locus alveolar [n]: less amount of F2 transition velar [N ]: velar pinch Reviving Sonus

62 Examples -- “a Pam, a tan, a kang”
Reviving Sonus

63 Liquids & Approximants
General Formants similar to vowels but fainter (especially at high frequency regions) Approximately F1(250Hz), F2(1200Hz), F3(2400Hz) Slow formant movements Reviving Sonus

64 Liquids & Approximants (cntd.)
Phone specific properties Labial glide [w]: very low F1, F2 ( Hz|) and gets too close to each relatively low F3 rapid falloff of spectral amplitude (formant movements) Palatal glide [y]: extremely low F1 extremely high F2, F3 Reviving Sonus

65 Liquids & Approximants (cntd.)
Phone specific properties (cntd.) Flap [R]: soft burst, short duration Retroflex [r]: F3 dipping down close to F2 General lowering of F3, F4 Lateral [l]: Low F1, F2 (approx. F1 250Hz, F2 1200Hz) usually substantial energy in the high F region Reviving Sonus

66 Example – “led, red, wed, yell”
Reviving Sonus

67 Final remarks on spectrogram
Spectrogram is not the only cue for acoustic distinction of speech sounds. When there is a mismatch between waveform & spectrogram, the waveform is more reliable in general. Reviving Sonus

68 References & Links Reviving Sonus


Download ppt "Investigating physical properties of speech sounds"

Similar presentations


Ads by Google