Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hallucinations in Auditory Perception!!! Malcolm Slaney Yahoo! Research Stanford CCRMA.

Similar presentations


Presentation on theme: "Hallucinations in Auditory Perception!!! Malcolm Slaney Yahoo! Research Stanford CCRMA."— Presentation transcript:

1

2 Hallucinations in Auditory Perception!!! Malcolm Slaney Yahoo! Research Stanford CCRMA

3 Hadoop

4

5

6 One Dimensional (waveform) Two Dimensional (not a spectrogram) Three Dimensional (neural movie) Time Autocorrelation Lag Cochlear Place Time Cochlear Place Pressure Cochlear Processing Correlogram Processing

7 Center Frequency Distance down cochlea Time Interval (s) Autocorrelation Lag With help from Richard O. Duda Correlogram

8

9 Success Reconstructing from correlogram –NIPS Keynote

10 Continuation –Tone and Noise –Parliament Cough Hear two voices? What do you hear? –Waveforms? –Ideas? Problems

11 Time Autocorrelation Lag Cochlear Place Time Cochlear Place Pressure Cochlear Processing Correlogram Processing

12 Wedding Sine Natural Speech Examples

13 What Vowel is This? Word 1 Word 2 Word 3 Peter Ladefoged

14

15

16

17 McGurk

18 Speech Object Sinewave Speech Object Wedding Vision Audio Locate Ventroloquism Vision Audio Locate Dots Vision Speech McGurk Speech Environment Vowel?

19

20

21 ASR /w/ /  / /n/ S1S1 S2S2 S3S3 Word model showing phonemes for the word one Acoustic (phoneme) model for the phoneme /  / One Two Three One Two Three One Two Three Language model for the words: “one”, “two”, “three”

22 Conventional Scene Analysis Slide by Dan Ellis (Columbia)

23 Barker—ASR

24 Goto—CASA with MIDI MIDI Sequence

25 Old plus New Principle Slide by Dan Ellis (Columbia)

26 Ellis—Prediction Driven

27 Saliency

28 Saliency Example Time-frequency display Saliency map shows high-interest locations

29 Saliency Maps Longer tones better Missing parts salient Modulation more salient Forward masking works

30 Sound Examples Birds Calls Cows Horse Waterfall

31 Saliency Comparison Details of saliency comparison Model predictions

32 Relational Network (Simple) X Y Z M M X M Y M Z m Patches of neurons Each measure one quantity Bidirectional relations for feedback/feedforward Thanks to Rodney Douglas

33 Relational Network (example) Input here Relational Feedback Relational specification Relational feedback

34 ASR Relational Network Cochlea Delay Phone Recognizer Word Recognizer A patch of neurons (one of N output) Note: We don’t know how to represent delays Phone Recognizer Bidirectional links enforce phoneme/word constraints

35 Desired Results /A/ Phoneme Patch /I/ Phoneme Patch AI Word Patch IA Word Patch Phoneme Input AI A Relational Feedback A WithoutWith

36 Simulation

37 Simulation 2

38 Simulation 3

39 Grossberg— ART

40 Statistical Means ICA –Different distributions One Microphone –GMM models of distribution

41 Conventional

42 Better?

43 Thanks malcolm@ieee.org

44

45 Pitch

46 Silicon Frequency Response Tone ramps into two cochleas

47 Cochlear Best Frequency

48 Cochlear Rate Profiles Left CochleaRight Cochlea Spikes per utterance

49 Hardware Overview Cochlea Learning Phoneme Word PCI-AER (for remapping) Cochlea Shih-Chii Liu Giacomo Indiveri Implemented in M ATLAB

50

51 LSH Movie

52 By Lloyd Watts Auditory Map

53 Please do more Neurophysiology! DavidJerryPrabhakar

54

55

56

57

58

59

60

61

62 Timbre definition Sound color –Instruments –Vowels Static Dynamic Timbre Pitch Loudness All sound

63 Multi-Dimensional Scaling of Timbre Measure –Distances Estimate –Positions Art –Label axis Spectral flux Decay Spectral centroid McAdams et al. (1995)

64 Desired perception model Compact (parsimonious) Three Properties –Predictive Explain distance perception –Simple model Orthogonal axis –Linear model Interpolate sounds A B ? Test Euclidean distance Assumption

65 Experimental Contrast Old Way New Way Sound Parameter spacePerception Sound Perception Guess a model that fits the data Model

66 Spectral shape using MFCC A huge tapestry hung in her hallway. Time (frames)

67 MFCC and LFC MFCC Sound Spectrum Filterbank log10 DCT MFCC LFC Sound Spectrum DCT LFC

68 Kernel function of DCT Spectrum –superposition of DCT kernels Cepstrum coefficients –Coefficients for superposition

69 Parameter space: MFCC C6=00.250.50.75 C3=0 0.75 0.5 0.25

70 Parameter space: LFC C6=0 C3=0 0.250.50.75 0.5 0.25

71 Synthesize stimuli Harmonics: pitch and vibrato –Amplitude weighted by the spectral shape flatweighted Desired spectral shape Vertical - frequency, Horizontal - amplitude

72 Experiment procedures Paired stimuli (AB, AG, AD, …) Rate dissimilarities using 0- 9 scale 10 subjects –Quiet office –Individual sessions (headphone)

73 2D linear regression Known values: x, y, d - estimate a and b Residual from Euclidean model Euclidean Fitting C3 C6 Perceptual Judgement d Model prediction

74 Results summary Tristimulus model MFCC LFC

75 Experiment results MFCC better Still good Redundant dimension? MFCC: most successful timbre model Less linearity for high coeffs

76

77 Remix Examples Abba Gimme Gimme Madonna Hung Up Tracy Young Remix of Hung Up Tracy Young Remix 2 of Hung Up

78 Specificity Spectrum Cover songsRemixes Look for specific exact matches Bag of Features model Our work (nearest neighbor) FingerprintingGenre

79 Cross-Correlation 2M songs –3 minutes –10 frames/ second 72 Billion

80 Curse of Dimensionality Histogram of distances between Gaussian data –Normalized to the mean Nearest Neighbor Ill-posed?

81 Distractors

82 Center Frequency Distance down cochlea Time Interval (s) Autocorrelation Lag Correlogram


Download ppt "Hallucinations in Auditory Perception!!! Malcolm Slaney Yahoo! Research Stanford CCRMA."

Similar presentations


Ads by Google