Investigating physical properties of speech sounds

Slides:



Advertisements
Similar presentations
Acoustic/Prosodic Features
Advertisements

Tom Lentz (slides Ivana Brasileiro)
Normal Aspects of Articulation. Definitions Phonetics Phonology Articulatory phonetics Acoustic phonetics Speech perception Phonemic transcription Phonetic.
CS 551/651: Structure of Spoken Language Spectrogram Reading: Approximants John-Paul Hosom Fall 2010.
Sounds that “move” Diphthongs, glides and liquids.
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
Basic Spectrogram & Clinical Application: Consonants
Acoustic Characteristics of Consonants
Speech Perception Dynamics of Speech
1 CS 551/651: Structure of Spoken Language Lecture 4: Characteristics of Manner of Articulation John-Paul Hosom Fall 2008.
1 CS 551/651: Structure of Spoken Language Spectrogram Reading: Stops John-Paul Hosom Fall 2010.
Phonetics.
Properties of Sound Neil Freebern. Sound Sound is produced when something vibrates. Vibrations disturb the air, creating variations in air pressure. Variation.
The frequency spectrum
Speech Science XII Speech Perception (acoustic cues) Version
Digital Systems: Hardware Organization and Design
PHONETICS AND PHONOLOGY
Physics of Sounds Overview Properties of vibrating systems Free and forced vibrations Resonance and frequency response Sound waves in air Frequency, wavelength,
Chapter 6 (Sections ) Sound. The speed of sound in a substance depends on: the mass of its constituent atoms, and the strength of the forces between.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
ACOUSTICAL THEORY OF SPEECH PRODUCTION
Introduction to Acoustics Words contain sequences of sounds Each sound (phone) is produced by sending signals from the brain to the vocal articulators.
PH 105 Dr. Cecilia Vogel Lecture 14. OUTLINE  consonants  vowels  vocal folds as sound source  formants  speech spectrograms  singing.
Phonetics (Part 1) Dr. Ansa Hameed.
Unit 4 Articulation I.The Stops II.The Fricatives III.The Affricates IV.The Nasals.
Spectrogram & its reading
Acoustic Phonetics Investigating physical properties of speech sounds.
Chapter 6 Features PHONOLOGY (Lane 335).
Oscillations about Equilibrium
1 Sounds: the building blocks of language CA461 Speech Processing 1 Lecture 2.
Consonants and vowel January Review where we’ve been We’ve listened to the sounds of “our” English, and assigned a set of symbols to them. We.
© 2010 Pearson Education, Inc. Conceptual Physics 11 th Edition Chapter 21: MUSICAL SOUNDS Noise and Music Musical Sounds Pitch Sound Intensity and Loudness.
Linguistics I Chapter 4 The Sounds of Language.
Harmonics, Timbre & The Frequency Domain
Speech Perception. Phoneme - a basic unit of a speech sound that distinguishes one word from another Phonemes do not have meaning on their own but they.
Phonetics: the generation of speech Phonemes “The shortest segment of speech that, if changed, would change the meaning of a word.” hog fog log *Phonemes.
Acoustic Phonetics 3/9/00. Acoustic Theory of Speech Production Modeling the vocal tract –Modeling= the construction of some replica of the actual physical.
MUSIC 318 MINI-COURSE ON SPEECH AND SINGING
Standing waves on a string (review) n=1,2,3... Different boundary conditions: Both ends fixed (see above) Both ends free (similar to both ends fixed )
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Speech Science Fall 2009 Oct 26, Consonants Resonant Consonants They are produced in a similar way as vowels i.e., filtering the complex wave produced.
Speech Science VII Acoustic Structure of Speech Sounds WS
LING 001 Introduction to Linguistics Fall 2010 Sound Structure I: Phonetics Acoustic phonetics Jan. 27.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
Speech Science Oct 7, 2009.
Voice Quality + Stop Acoustics
Say “blink” For each segment (phoneme) write a script using terms of the basic articulators that will say “blink.” Consider breathing, voicing, and controlling.
The end of vowels + The beginning of fricatives November 19, 2012.
David Meredith Aalborg University
Transitions + Perception March 27, 2012 Tidbits First: Guidelines for the final project report So far, I have two people who want to present their projects.
Stops Stops include / p, b, t, d, k, g/ (and glottal stop)
Statistical NLP Spring 2011
Introduction to Language Phonetics 1. Explore the relationship between sound and spelling Become familiar with International Phonetic Alphabet (IPA )
Stop Acoustics and Glides December 2, 2013 Where Do We Go From Here? The Final Exam has been scheduled! Wednesday, December 18 th 8-10 am (!) Kinesiology.
Stop + Approximant Acoustics
Phonetics: consonants
Vowels, part 4 November 16, 2015 Just So You Know Today: Vowel remnants + Source-Filter Theory For Wednesday: vowel transcription! Turkish and British.
1 Acoustic Phonetics 3/28/00. 2 Nasal Consonants Produced with nasal radiation of acoustic energy Sound energy is transmitted through the nasal cavity.
Acoustic Phonetics 3/14/00.
Stop/Plosives.
ARCHITECTURAL ACOUSTICS
HOW WE TRANSMIT SOUNDS? Media and communication 김경은 김다솜 고우.
PHONETICS AND PHONOLOGY
Structure of Spoken Language
Analyzing the Speech Signal
The Vocal Pedagogy Workshop Session III – Articulation
Analyzing the Speech Signal
Elements of Acoustic Phonetics
Speech Perception (acoustic cues)
Eugeniusz Cyran KUL, Lublin
Presentation transcript:

Investigating physical properties of speech sounds Acoustic Phonetics Investigating physical properties of speech sounds From chapter 7, Rogers (2000)

Speech Sound Representation Reconsidered Articulatory phonetic approach: Describing sounds depending on how they are produced Problems of this approach Representation is only in terms of symbols  Sounds are not like that in reality It’s not reflected that some sounds are more confusing each other when perceived while others are not Eg) i/e vs. a/k s/f vs. s/m So we need another way of describing speech sounds Reviving Sonus

Acoustic representation of speech sounds Representing sounds as they are Visual other than symbolic representation Depending more upon perception than production or articulation Physical properties are analyzed Similarities and differences of sounds are disclosed Reviving Sonus

Acoustic definition of sound Variation in air pressure Movements of air particles An audible disturbance of a medium produced by a source The source: any object that vibrates Eg) musical instruments, human vocal cords, microphone The medium: any elastic object that carries vibration Eg) air, water Reviving Sonus

Advantages of acoustic representation Real/physical mechanism of speech communication is represented No convention, no confusion, no controversy Gradual change of sounds are shown Example) How loud a sound is Small variations are shown Helpful for understanding how computers synthesize speech and how speech recognition works Reviving Sonus

What to represent? Three aspects sounds that can differ Pitch Loudness Quality (Length) Reviving Sonus

How to represent acoustically? Sound is air particle movements The best and agreed way of expressing air particle movements: Waveform Another necessary way of representing sound: Spectrum Reviving Sonus

Waveforms

Waveform properties Simple harmonic movement + Time elapse  Waveform Individual particles move only backward and forward Reviving Sonus

Air particle movement No force Initial force Time Elasticity Inertia Displacement Reviving Sonus

Simple Waveform Reviving Sonus

Speech sound properties shown in waveforms Differentiation of sounds Sounds are different, which is crucial in human speech as a communication method Ways in which sounds can differ Perceptually: Pitch, Loudness, Quality Acoustically: Frequency, Amplitude, Phase Waveform shows differences in Acoustic correlate of Loudness  Amplitude Acoustic correlate of Pitch  Frequency Reviving Sonus

Amplitude representing Loudness (b) 파란색보다 빨간색 파형의 진폭이 2배로 크다  빨간색 소리가 더 큼 Reviving Sonus

Amplitude (cntd.) Air pressure fluctuation The extent of the maximum variation in air pressure from normal during a sound Unit: Bel, Decibel(dB; 1/10 of Bel), Bark dB: Common logarithm of power ratios Twice amplitude is not heard as twice loud Loud sound: particles move farther and more rapidly Reviving Sonus

Frequency representing Pitch (a) (b) Reviving Sonus

Frequency (cntd.) The rate at which sound source vibrates Sound sources: tuning forks, vocal cords, etc Units: Hz, cps (cycle per second) Depending upon Length of the pendulum Length of tuning fork prongs F(requency) = 1/T(period) SONUS reviving

Frequency (cntd.) Standard A frequency: 440 Hz Octave: a note which is exactly twice the frequency of another note Eg) A(440Hz), A’(880Hz), A’’(1760Hz) Audible Frequency Human: 20Hz(or16Hz) – 20KHz Bats: 20KHz – 100KHz Fastest telephone vibration: 35KHz Most of the human speech sound frequency: below 8KHz Reviving Sonus

Frequency (cntd.) Pitch and frequency are not in linear relationship Only in the low frequency, fairly linear 600-700Hz difference sounds greater than 3600-3700Hz difference Reviving Sonus

Phase difference Reviving Sonus

Phase (cntd.) Phase differences cause different waveforms But Human ears do not perceive phase differences Reviving Sonus

Waveform is not sufficient.. Two sounds with the same pitch and loudness can still differ Example) Violin A vs. Piano A Example) [i] vs. [a] Another way of representation needed Spectrum Reviving Sonus

More about waveform first.. To know about spectrum and its representation of quality, we need to know more about waveform Reviving Sonus

Types of Waveforms: Pure tones vs. Complex waves Most sounds, including human speech, sources produce complex vibrations Pure tone: single harmonic motion (SHM), with only one frequency Complex wave: more than one harmonic motion, multiple frequency Pure tone + pure tone of the same frequency and phase  another pure tone Pure tone + pure tones of different frequency  a complex tone Reviving Sonus

Pure tone (Simple Wave, simple harmonic motion, Sinusoid, Sine wave) Reviving Sonus

Complex wave 100 Hz + 200 Hz + 300 Hz Reviving Sonus

[a] production by a female speaker Complex wave [a] production by a female speaker Reviving Sonus

Types of Waveform: Repetitive vs. non-repetitive wave Strictly Repetitive (periodic): sine wave, ideal sounds Virtually Repetitive (periodic): vowels, sonorants Non-repetitive (aperiodic): obstruents white noise (most complex) click Reviving Sonus

Periodic vs non-periodic wave Aperiodic [s] Periodic wave [a] Reviving Sonus

Limitation of Waveform Representation Sound can be heard in 3 different way Loudness, Pitch, Quality Quality can’t be represented directly in waveforms A new way of representation needed Spectrum Reviving Sonus

Spectrum

Background Knowledge on Spectrum Sound waves can be either simple or complex Simple: sinusoid Complex: Combined simple waves with different frequency Sound quality can be determined by the way such simple waves combine into a complex wave If a complex wave can be split into each simple wave we can see the secret Reviving Sonus

Waveform and Spectrum (100Hz + 200Hz + 300Hz ) 4 2 100 200 300 Hz Reviving Sonus

An Example of Spectrum Reviving Sonus

Formants shown in spectrum Frequency component(s) with boosted energy Formant frequency: Its frequency Reason for formant shaping: Filtering function in vocal tract Decisive aspect of sound quality For vowels three formants (F1, F2, F3) are especially important for their distinction Reviving Sonus

An Example of Formant : Vowel [«] Reviving Sonus

An Example of Formant: Vowel [e] 1 2 3 4 5 6 50 300 550 800 1050 1300 1550 1800 2050 2300 2550 2800 3050 3300 3550 3800 4050 Hz Amplitude F1 F2 F3 Reviving Sonus

Disadvantages of Spectrum Representation Less intuitive X-axis denotes frequency level No time varying representation Hard to see interaction with Waveforms Thus, a new way of representation needed  Spectrogram Reviving Sonus

Spectrogram & its reading

What is spectrogram? Begin to be used since 1940s Another representation of frequency domain analysis The most popular way of representing spectral information 3 dimensional representation X-axis: Time Y-axis: Frequency Darkness (or color): Energy Reviving Sonus

Waveform & Spectrogram aligned Reviving Sonus

Spectrogram example (color resolution of word “compute”) Reviving Sonus

Spectrogram example (grayscale of word “compute”) Reviving Sonus

Types of spectrogram Wideband spectrogram Narrowband spectrogram better time resolution Narrowband spectrogram better frequency resolution Reviving Sonus

Wideband vs. Narrowband spectrograms of the question "Is Pat sad, or mad?" The 5th, 10th and 15th harmonics have been marked by white squares in two of the vowels Reviving Sonus

Advantages & Disadvantages Time alignment Disadvantages Less reliable than waveform Reviving Sonus

Vowel Spectrogram Formant frequencies are critical cues for vowel distinction F1: Height high vowels: low F1 F2: Backness back vowels: low F2 Reviving Sonus

Examples of formant frequencies of English monophthongs « Ã F3 2900 2550 2490 2640 2380 2300 2500 2390 F2 2250 1900 1770 1660 1100 1030 870 1500 1190 F1 280 400 550 690 710 450 310 900 640 Reviving Sonus

From http://hctv.humnet.ucla.edu/departments/linguistics "heed, hid, head, had, hod, hawed, hood, who'd" (a male speaker, American English) From http://hctv.humnet.ucla.edu/departments/linguistics Reviving Sonus

Consonant Spectrogram General Acoustic structure more complicated than vowels Adjacent sounds (especially vowels) convey important information  locus High frequency characteristics  especially for fricatives and affricates Reviving Sonus

What is LOCUS Information of formant transition from vowels into obstruents or from obstruents into vowels The target frequency that each formant transition is heading toward as an obstruction is made, or the frequency the transition comes as the obstruction is released The characteristic of the consonantal place and manner  roughly the same in different vowel contexts Reviving Sonus

Stops General Fairly distinct locus for each place Burst Silence during the closure (only at syllable onset position) Virtually no difference during the closure Reviving Sonus

Stops (cntd.) Voicing distinction voiced: vertical striations for voiced sounds, less abrupt burst, frequently weakened to be like fricatives or approximants voiceless: generally abrupt burst at higher frequency area Reviving Sonus

Stops (cntd.) Place distinction bilabial alveolar velar relatively low F2, F3 locus  rising into and falling out of vowel weak and spread vertical lines alveolar F2 locus about 1800 Hz Strong vertical lines velar Velar pinch: vowels F2, F3 merging often double burst long formant transitions Reviving Sonus

Stops (cntd.) Manner distinction Silence duration, VOT, Following V F0 Aspirated [pH] short long high Tense [p’] Lax [p] mid low Reviving Sonus

Examples -- “a bab, a dad, a gag” Reviving Sonus

Place dependent loci Reviving Sonus

Fricatives General Random noise pattern especially in high frequency regions Place distinction Labiodental [f, v]: rising locus into the following vowel Dental [T, D]: major energy above 6000Hz Alveolar [s, z]: major energy above 4000Hz Alveopalatal [s&, zà]: major energy above 2000Hz Glottal [h]: the trace of formant frequencies of neighbouring vowels Reviving Sonus

Fricatives (cntd.) Weak vs. strong Strong [s, z, s&, zà]: darker bands Weak [f, v, T, D]: spread and fainter Voiced [v, D ]: often so weak and confused with nasals or approximants Cues to tell [T] from [f]: higher formants of [T] fall into adjacent vowels Reviving Sonus

Example – “fie, thigh, sigh, shy” Reviving Sonus

Example – “ever, weather, fizzer, pleasure” Reviving Sonus

Nasals General Place distinction Formants similar to vowels but fainter Very low F1 (about 250Hz), F2 (about 2500Hz), and F3 (about 3250Hz) Place distinction bilabial [m]: downward F2, F3 locus alveolar [n]: less amount of F2 transition velar [N ]: velar pinch Reviving Sonus

Examples -- “a Pam, a tan, a kang” Reviving Sonus

Liquids & Approximants General Formants similar to vowels but fainter (especially at high frequency regions) Approximately F1(250Hz), F2(1200Hz), F3(2400Hz) Slow formant movements Reviving Sonus

Liquids & Approximants (cntd.) Phone specific properties Labial glide [w]: very low F1, F2 (600-1000Hz|) and gets too close to each relatively low F3 rapid falloff of spectral amplitude (formant movements) Palatal glide [y]: extremely low F1 extremely high F2, F3 Reviving Sonus

Liquids & Approximants (cntd.) Phone specific properties (cntd.) Flap [R]: soft burst, short duration Retroflex [r]: F3 dipping down close to F2 General lowering of F3, F4 Lateral [l]: Low F1, F2 (approx. F1 250Hz, F2 1200Hz) usually substantial energy in the high F region Reviving Sonus

Example – “led, red, wed, yell” Reviving Sonus

Final remarks on spectrogram Spectrogram is not the only cue for acoustic distinction of speech sounds. When there is a mismatch between waveform & spectrogram, the waveform is more reliable in general. Reviving Sonus

References & Links http://cslu.cse.ogi.edu/tutordemos/SpectrogramReading/spectrogram_reading.html http://hctv.humnet.ucla.edu/departments/linguistics/VowelsandConsonants/course http://www.cs.indiana.edu/~port/teach/306/speech.acoustics.html http://www.phon.ucl.ac.uk/courses/spsci/b203/week2-5.pdf Reviving Sonus