Laurel or Yanny? Brad Story Speech, Language, and Hearing Sciences

Slides:



Advertisements
Similar presentations
Advances in Speech Synthesis
Advertisements

Tom Lentz (slides Ivana Brasileiro)
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
Acoustic Characteristics of Consonants
Perturbation Theory March 11, 2013 Just So You Know The Fourier Analysis/Vocal Tract exercise is due on Wednesday. Please note: don’t make too much out.
Speech Science XII Speech Perception (acoustic cues) Version
8 VOCE VISTA, ELECTROGLOTTOGRAMS, CLOSED QUOTIENTS
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
Speech Perception Overview of Questions Can computers perceive speech as well as humans? Does each word that we hear have a unique pattern associated.
Introduction to Acoustics Words contain sequences of sounds Each sound (phone) is produced by sending signals from the brain to the vocal articulators.
Image and Sound Editing Raed S. Rasheed Sound What is sound? How is sound recorded? How is sound recorded digitally ? How does audio get digitized.
Introduction to Speech Synthesis ● Key terms and definitions ● Key processes in sythetic speech production ● Text-To-Phones ● Phones to Synthesizer parameters.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
SPEECH PERCEPTION The Speech Stimulus Perceiving Phonemes Top-Down Processing Is Speech Special?
Hearing & Deafness (5) Timbre, Music & Speech Vocal Tract.
Hearing & Deafness (5) Timbre, Music & Speech.
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
Representing Acoustic Information
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Source/Filter Theory and Vowels February 4, 2010.
DIGITAL WATERMARKING OF AUDIO SIGNALS USING A PSYCHOACOUSTIC AUDITORY MODEL AND SPREAD SPECTRUM THEORY * By: Ricardo A. Garcia *Research done at: University.
Speech Perception. Phoneme - a basic unit of a speech sound that distinguishes one word from another Phonemes do not have meaning on their own but they.
Audio Scene Analysis and Music Cognitive Elements of Music Listening
Resonance, Revisited March 4, 2013 Leading Off… Project report #3 is due! Course Project #4 guidelines to hand out. Today: Resonance Before we get into.
Acoustic Phonetics 3/9/00. Acoustic Theory of Speech Production Modeling the vocal tract –Modeling= the construction of some replica of the actual physical.
MUSIC 318 MINI-COURSE ON SPEECH AND SINGING
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Speech Science Fall 2009 Oct 26, Consonants Resonant Consonants They are produced in a similar way as vowels i.e., filtering the complex wave produced.
Acoustic Analysis of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.
ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
Say “blink” For each segment (phoneme) write a script using terms of the basic articulators that will say “blink.” Consider breathing, voicing, and controlling.
Speech analysis with Praat Paul Trilsbeek DoBeS training course June 2007.
LATERALIZATION OF PHONOLOGY 2 DAY 23 – OCT 21, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
Formants, Resonance, and Deriving Schwa March 10, 2009.
Resonance October 23, 2014 Leading Off… Don’t forget: Korean stops homework is due on Tuesday! Also new: mystery spectrograms! Today: Resonance Before.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Acoustics Research Institute S_TOOLS -STx Supplement Speech Formant Tracking and Fundamental Frequency Extraction Default Parameter Setting.
Introduction to Digital Speech Processing Presented by Dr. Allam Mousa 1 An Najah National University SP_1_intro.
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Acoustic Phonetics 3/14/00.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 6- 1.
COMBINATION TONES The Science of Sound Chapter 8 MUSICAL ACOUSTICS.
Copyright © American Speech-Language-Hearing Association
Communication Skills The Importance For Instructors To Have Excellent Communication Skills By: Mohamed Zaki Duration: ~15 minutes.
Talking with computers
與語音學、訊號波型、頻譜特性有關的網址 17. Three Tutorials on Voicing and Plosives 8. Fundamental frequency and harmonics.
CS 591 S1 – Computational Audio -- Spring, 2017
The Standard Deviation as a Ruler and the Normal Model
Copyright © American Speech-Language-Hearing Association
CS 591 S1 – Computational Audio -- Spring, 2017
Breathy Voice Note that you can hear both a buzzy (periodic) component and a hissy (aperiodic) component.
Speech Perception.
Spectrograms.
The Vocoder and its related technology
Somatosensory Precision in Speech Production
Norm-Based Coding of Voice Identity in Human Auditory Cortex
EE513 Audio Signals and Systems
How Students Log In and Start a Test
Volume 17, Issue 13, Pages (July 2007)
Speech Perception (acoustic cues)
Attentive Tracking of Sound Sources
Function and Evolution of Vibrato-like Frequency Modulation in Mammals
Tools for Speech Analysis
DIGITAL WATERMARKING OF AUDIO SIGNALS USING A PSYCHOACOUSTIC AUDITORY MODEL AND SPREAD SPECTRUM THEORY By: Ricardo A. Garcia University of Miami School.
Audio and accelerometer spectrograms of SEP2 calls.
Blues Scale ©  Copyright Kapow! 2017.
Presentation transcript:

Laurel or Yanny? Brad Story Speech, Language, and Hearing Sciences University of Arizona May 22, 2018 During May 2018, a Twitter meme containing an audio file from an online dictionary asked readers to decide if they heard the name “Laurel” or “Yanny” (the original audio was linked to the word “laurel”). The meme went viral and responses showed nearly an even split between listeners for each name. I was initially contacted by one media source to comment on the meme and provided a brief acoustic analysis. My explanation and graphic is shown to the right. This was subsequently (and very rapidly) found by other media sources and resulting in more requests for comment. Other explanations were circulating too. Quality of signal, compression, spectral tilt, perceptual weighting, high versus low frequency bands, and masking effects were some the areas of focus - all possible contributors to the effect. Another idea was that the audio file contained two “tracks” – one that was “Laurel” and another “Yanny” – and somehow they had been merged to produce the dichotomy (this explanation suggests the entire event was pre-meditated). It was also shown that shifting all frequencies in the audio signal downward by 20-30% revealed “Yanny” to most listeners. That was an interesting finding because it suggests that the third formant in “laurel”, when shifted downward in frequency, could be taken as the second formant in “yanny”; in addition the downward shift in frequency likely produces conditions were F1 and F2 of “Laurel” perceptually merge and become a single formant. These slides are provided to be used as a demonstration of the Laurel/Yanny effect

Laurel or Yanny? Brad Story, University of Arizona, 05.22.2018 In order to rule out any bizarre or particularly special quality of the original twitter audio signal, and to allow for flexible control of voice source and vocal tract parameters, a version of “laurel” was generated with TubeTalker*, a computational model of speech production. An audio signal was generated by specifying the vocal tract modulations (analogous to articulatory movements) required for the word “laurel” – that is, nothing “Yanny”-like was specified at all. The unaltered TubeTalker version is shown below in the first panel (far left) as a waveform and narrowband spectrogram. The black dots the vocal tract resonances (formants) that were calculated during the production of the word. The gray audio icon will play the original signal, whereas the orange icon will play a pre-emphasized version. Pre-emphasis tilts the spectrum +6 dB per octave which may enhance the effect. The narrow band spectrogram show Frequency shifted versions are shown from left to right. The numbers above each sample indicate the frequency scale factor (i.e. .9 means frequency was downshifted by 10%). Listen to each and decide Laurel or Yanny. If you play samples consecutively from left to right and then from right to left, you may experience a hysteresis effect. That is, you may shift from “Yanny” back to “Laurel” at a different sample than the shift from “Laurel” to “Yanny”. 1 .9 .85 .8 .78 .75 .7 TubeTalker TubeTalker (pre-emphasized)

Laurel or Yanny? Brad Story, University of Arizona, 05.22.2018 The audio samples below are the entire series of frequency shifted versions of TubeTalker’s “Laurel”. The progression follows this set of frequency shift scale factors: 1 .9 .85 .8 .78  .75 .70 .75 .78 .8 .85  .9   1 TubeTalker TubeTalker (pre-emphasized) *TubeTalker references Story, B.H., (2005). A parametric model of the vocal tract area function for vowel and consonant simulation, J. Acoust. Soc. Am., 117(5), 3231-3254. Story, B.H., and Bunton, K., (2010). Relation of vocal tract shape, formant transitions, and stop consonant identification, J. Spch. Lang. Hear. Res., 53, 1514-1528. Story, B.H., (2013). Phrase-level speech simulation with an airway modulation model of speech production, Computer Speech and Language. 27(4), 989-1010. Story, B. H., and Bunton, K., (2017). An acoustically-driven vocal tract model for stop consonant production, Speech Comm., 87, 1-17. Final version published online: 20-Dec-2016. DOI: 10.1016/j.specom.2016.12.001.

Laurel or ???? Brad Story, University of Arizona, 05.22.2018 Just for fun, the audio samples below are the entire series of frequency shifted versions of TubeTalker’s “Laurel” played in reverse. The progression again follows this set of frequency shift scale factors: 1 .9 .85 .8 .78  .75 .70 .75 .78 .8 .85  .9   1 TubeTalker - Reversed TubeTalker - Reversed (pre-emphasized)