Hallucinations in Auditory Perception!!! Malcolm Slaney Yahoo! Research Stanford CCRMA.

Slides:



Advertisements
Similar presentations
Neuro-inspired Speech Recognition
Advertisements

Audio Workgroup Neuro-inspired Speech Recognition.
Audio Workgroup Neuro-inspired Speech Recognition.
Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson.
Song Intersection by Approximate Nearest Neighbours Michael Casey, Goldsmiths Malcolm Slaney, Yahoo! Inc.
Acoustic/Prosodic Features
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Auditory scene analysis 2
Multipitch Tracking for Noisy Speech
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
Timbre perception. Objective Timbre perception and the physical properties of the sound on which it depends Formal definition: ‘that attribute of auditory.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Signal Modeling for Robust Speech Recognition With Frequency Warping and Convex Optimization Yoon Kim March 8, 2000.
An Exploration of timbre: its perception, analysis and representation Dr. Deirdre Bolger CNRS-LMS,Paris Invited lecture, Institut für Musikwissenschaft,
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.
PH 105 Dr. Cecilia Vogel Lecture 12. OUTLINE  Timbre review  Spectrum  Fourier Synthesis  harmonics and periodicity  Fourier Analysis  Timbre and.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
Why is ASR Hard? Natural speech is continuous
Human Psychoacoustics shows ‘tuning’ for frequencies of speech If a tree falls in the forest and no one is there to hear it, will it make a sound?
Recent Research in Musical Timbre Perception James W. Beauchamp University of Illinois at Urbana-Champaign Andrew B. Horner Hong University of Science.
Representing Acoustic Information
Topics covered in this chapter
Music retrieval Conventional music retrieval systems Exact queries: ”Give me all songs from J.Lo’s latest album” What about ”Give me the music that I like”?
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Deep Learning for Speech and Language Yoshua Bengio, U. Montreal NIPS’2009 Workshop on Deep Learning for Speech Recognition and Related Applications December.
A.Murari 1 (24) Frascati 27 th March 2012 Residual Analysis for the qualification of Equilibria A.Murari 1, D.Mazon 2, J.Vega 3, P.Gaudio 4, M.Gelfusa.
Audio Scene Analysis and Music Cognitive Elements of Music Listening
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.
Evaluation of Speaker Recognition Algorithms. Speaker Recognition Speech Recognition and Speaker Recognition speaker recognition performance is dependent.
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Speech Based Optimization of Hearing Devices Alice E. Holmes, Rahul Shrivastav, Hannah W. Siburt & Lee Krause.
MULTIMEDIA INPUT / OUTPUT TECHNOLOGIES INTRODUCTION 6/1/ A.Aruna, Assistant Professor, Faculty of Information Technology.
Gammachirp Auditory Filter
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none.
Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio
SOUND PRESSURE, POWER AND LOUDNESS MUSICAL ACOUSTICS Science of Sound Chapter 6.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
1 LING 696B: Final thoughts on nonparametric methods, Overview of speech processing.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Spring 2008 Jinze.
PATTERN COMPARISON TECHNIQUES
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez, Xavier Serra
Information-Theoretic Listening
The General Linear Model (GLM)
Individual Differences Reveal the Basis of Consonance
Josh H. McDermott, Eero P. Simoncelli  Neuron 
Volume 25, Issue 15, Pages (August 2015)
Presentation on Timbre Similarity
Julia Hirschberg and Sarah Ita Levitan CS 6998
Realtime Recognition of Orchestral Instruments
Deep neural networks for spike sorting: exploring options
Measuring the Similarity of Rhythmic Patterns
Presentation transcript:

Hallucinations in Auditory Perception!!! Malcolm Slaney Yahoo! Research Stanford CCRMA

Hadoop

One Dimensional (waveform) Two Dimensional (not a spectrogram) Three Dimensional (neural movie) Time Autocorrelation Lag Cochlear Place Time Cochlear Place Pressure Cochlear Processing Correlogram Processing

Center Frequency Distance down cochlea Time Interval (s) Autocorrelation Lag With help from Richard O. Duda Correlogram

Success Reconstructing from correlogram –NIPS Keynote

Continuation –Tone and Noise –Parliament Cough Hear two voices? What do you hear? –Waveforms? –Ideas? Problems

Time Autocorrelation Lag Cochlear Place Time Cochlear Place Pressure Cochlear Processing Correlogram Processing

Wedding Sine Natural Speech Examples

What Vowel is This? Word 1 Word 2 Word 3 Peter Ladefoged

McGurk

Speech Object Sinewave Speech Object Wedding Vision Audio Locate Ventroloquism Vision Audio Locate Dots Vision Speech McGurk Speech Environment Vowel?

ASR /w/ /  / /n/ S1S1 S2S2 S3S3 Word model showing phonemes for the word one Acoustic (phoneme) model for the phoneme /  / One Two Three One Two Three One Two Three Language model for the words: “one”, “two”, “three”

Conventional Scene Analysis Slide by Dan Ellis (Columbia)

Barker—ASR

Goto—CASA with MIDI MIDI Sequence

Old plus New Principle Slide by Dan Ellis (Columbia)

Ellis—Prediction Driven

Saliency

Saliency Example Time-frequency display Saliency map shows high-interest locations

Saliency Maps Longer tones better Missing parts salient Modulation more salient Forward masking works

Sound Examples Birds Calls Cows Horse Waterfall

Saliency Comparison Details of saliency comparison Model predictions

Relational Network (Simple) X Y Z M M X M Y M Z m Patches of neurons Each measure one quantity Bidirectional relations for feedback/feedforward Thanks to Rodney Douglas

Relational Network (example) Input here Relational Feedback Relational specification Relational feedback

ASR Relational Network Cochlea Delay Phone Recognizer Word Recognizer A patch of neurons (one of N output) Note: We don’t know how to represent delays Phone Recognizer Bidirectional links enforce phoneme/word constraints

Desired Results /A/ Phoneme Patch /I/ Phoneme Patch AI Word Patch IA Word Patch Phoneme Input AI A Relational Feedback A WithoutWith

Simulation

Simulation 2

Simulation 3

Grossberg— ART

Statistical Means ICA –Different distributions One Microphone –GMM models of distribution

Conventional

Better?

Thanks

Pitch

Silicon Frequency Response Tone ramps into two cochleas

Cochlear Best Frequency

Cochlear Rate Profiles Left CochleaRight Cochlea Spikes per utterance

Hardware Overview Cochlea Learning Phoneme Word PCI-AER (for remapping) Cochlea Shih-Chii Liu Giacomo Indiveri Implemented in M ATLAB

LSH Movie

By Lloyd Watts Auditory Map

Please do more Neurophysiology! DavidJerryPrabhakar

Timbre definition Sound color –Instruments –Vowels Static Dynamic Timbre Pitch Loudness All sound

Multi-Dimensional Scaling of Timbre Measure –Distances Estimate –Positions Art –Label axis Spectral flux Decay Spectral centroid McAdams et al. (1995)

Desired perception model Compact (parsimonious) Three Properties –Predictive Explain distance perception –Simple model Orthogonal axis –Linear model Interpolate sounds A B ? Test Euclidean distance Assumption

Experimental Contrast Old Way New Way Sound Parameter spacePerception Sound Perception Guess a model that fits the data Model

Spectral shape using MFCC A huge tapestry hung in her hallway. Time (frames)

MFCC and LFC MFCC Sound Spectrum Filterbank log10 DCT MFCC LFC Sound Spectrum DCT LFC

Kernel function of DCT Spectrum –superposition of DCT kernels Cepstrum coefficients –Coefficients for superposition

Parameter space: MFCC C6= C3=

Parameter space: LFC C6=0 C3=

Synthesize stimuli Harmonics: pitch and vibrato –Amplitude weighted by the spectral shape flatweighted Desired spectral shape Vertical - frequency, Horizontal - amplitude

Experiment procedures Paired stimuli (AB, AG, AD, …) Rate dissimilarities using 0- 9 scale 10 subjects –Quiet office –Individual sessions (headphone)

2D linear regression Known values: x, y, d - estimate a and b Residual from Euclidean model Euclidean Fitting C3 C6 Perceptual Judgement d Model prediction

Results summary Tristimulus model MFCC LFC

Experiment results MFCC better Still good Redundant dimension? MFCC: most successful timbre model Less linearity for high coeffs

Remix Examples Abba Gimme Gimme Madonna Hung Up Tracy Young Remix of Hung Up Tracy Young Remix 2 of Hung Up

Specificity Spectrum Cover songsRemixes Look for specific exact matches Bag of Features model Our work (nearest neighbor) FingerprintingGenre

Cross-Correlation 2M songs –3 minutes –10 frames/ second 72 Billion

Curse of Dimensionality Histogram of distances between Gaussian data –Normalized to the mean Nearest Neighbor Ill-posed?

Distractors

Center Frequency Distance down cochlea Time Interval (s) Autocorrelation Lag Correlogram