Download presentation
Presentation is loading. Please wait.
Published byNoreen Nicholson Modified over 9 years ago
2
Hallucinations in Auditory Perception!!! Malcolm Slaney Yahoo! Research Stanford CCRMA
3
Hadoop
6
One Dimensional (waveform) Two Dimensional (not a spectrogram) Three Dimensional (neural movie) Time Autocorrelation Lag Cochlear Place Time Cochlear Place Pressure Cochlear Processing Correlogram Processing
7
Center Frequency Distance down cochlea Time Interval (s) Autocorrelation Lag With help from Richard O. Duda Correlogram
9
Success Reconstructing from correlogram –NIPS Keynote
10
Continuation –Tone and Noise –Parliament Cough Hear two voices? What do you hear? –Waveforms? –Ideas? Problems
11
Time Autocorrelation Lag Cochlear Place Time Cochlear Place Pressure Cochlear Processing Correlogram Processing
12
Wedding Sine Natural Speech Examples
13
What Vowel is This? Word 1 Word 2 Word 3 Peter Ladefoged
17
McGurk
18
Speech Object Sinewave Speech Object Wedding Vision Audio Locate Ventroloquism Vision Audio Locate Dots Vision Speech McGurk Speech Environment Vowel?
21
ASR /w/ / / /n/ S1S1 S2S2 S3S3 Word model showing phonemes for the word one Acoustic (phoneme) model for the phoneme / / One Two Three One Two Three One Two Three Language model for the words: “one”, “two”, “three”
22
Conventional Scene Analysis Slide by Dan Ellis (Columbia)
23
Barker—ASR
24
Goto—CASA with MIDI MIDI Sequence
25
Old plus New Principle Slide by Dan Ellis (Columbia)
26
Ellis—Prediction Driven
27
Saliency
28
Saliency Example Time-frequency display Saliency map shows high-interest locations
29
Saliency Maps Longer tones better Missing parts salient Modulation more salient Forward masking works
30
Sound Examples Birds Calls Cows Horse Waterfall
31
Saliency Comparison Details of saliency comparison Model predictions
32
Relational Network (Simple) X Y Z M M X M Y M Z m Patches of neurons Each measure one quantity Bidirectional relations for feedback/feedforward Thanks to Rodney Douglas
33
Relational Network (example) Input here Relational Feedback Relational specification Relational feedback
34
ASR Relational Network Cochlea Delay Phone Recognizer Word Recognizer A patch of neurons (one of N output) Note: We don’t know how to represent delays Phone Recognizer Bidirectional links enforce phoneme/word constraints
35
Desired Results /A/ Phoneme Patch /I/ Phoneme Patch AI Word Patch IA Word Patch Phoneme Input AI A Relational Feedback A WithoutWith
36
Simulation
37
Simulation 2
38
Simulation 3
39
Grossberg— ART
40
Statistical Means ICA –Different distributions One Microphone –GMM models of distribution
41
Conventional
42
Better?
43
Thanks malcolm@ieee.org
45
Pitch
46
Silicon Frequency Response Tone ramps into two cochleas
47
Cochlear Best Frequency
48
Cochlear Rate Profiles Left CochleaRight Cochlea Spikes per utterance
49
Hardware Overview Cochlea Learning Phoneme Word PCI-AER (for remapping) Cochlea Shih-Chii Liu Giacomo Indiveri Implemented in M ATLAB
51
LSH Movie
52
By Lloyd Watts Auditory Map
53
Please do more Neurophysiology! DavidJerryPrabhakar
62
Timbre definition Sound color –Instruments –Vowels Static Dynamic Timbre Pitch Loudness All sound
63
Multi-Dimensional Scaling of Timbre Measure –Distances Estimate –Positions Art –Label axis Spectral flux Decay Spectral centroid McAdams et al. (1995)
64
Desired perception model Compact (parsimonious) Three Properties –Predictive Explain distance perception –Simple model Orthogonal axis –Linear model Interpolate sounds A B ? Test Euclidean distance Assumption
65
Experimental Contrast Old Way New Way Sound Parameter spacePerception Sound Perception Guess a model that fits the data Model
66
Spectral shape using MFCC A huge tapestry hung in her hallway. Time (frames)
67
MFCC and LFC MFCC Sound Spectrum Filterbank log10 DCT MFCC LFC Sound Spectrum DCT LFC
68
Kernel function of DCT Spectrum –superposition of DCT kernels Cepstrum coefficients –Coefficients for superposition
69
Parameter space: MFCC C6=00.250.50.75 C3=0 0.75 0.5 0.25
70
Parameter space: LFC C6=0 C3=0 0.250.50.75 0.5 0.25
71
Synthesize stimuli Harmonics: pitch and vibrato –Amplitude weighted by the spectral shape flatweighted Desired spectral shape Vertical - frequency, Horizontal - amplitude
72
Experiment procedures Paired stimuli (AB, AG, AD, …) Rate dissimilarities using 0- 9 scale 10 subjects –Quiet office –Individual sessions (headphone)
73
2D linear regression Known values: x, y, d - estimate a and b Residual from Euclidean model Euclidean Fitting C3 C6 Perceptual Judgement d Model prediction
74
Results summary Tristimulus model MFCC LFC
75
Experiment results MFCC better Still good Redundant dimension? MFCC: most successful timbre model Less linearity for high coeffs
77
Remix Examples Abba Gimme Gimme Madonna Hung Up Tracy Young Remix of Hung Up Tracy Young Remix 2 of Hung Up
78
Specificity Spectrum Cover songsRemixes Look for specific exact matches Bag of Features model Our work (nearest neighbor) FingerprintingGenre
79
Cross-Correlation 2M songs –3 minutes –10 frames/ second 72 Billion
80
Curse of Dimensionality Histogram of distances between Gaussian data –Normalized to the mean Nearest Neighbor Ill-posed?
81
Distractors
82
Center Frequency Distance down cochlea Time Interval (s) Autocorrelation Lag Correlogram
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.