Presentation is loading. Please wait.

Presentation is loading. Please wait.

Human Auditory Cognition 2014 Alain de Cheveigné, Andre van Schaik, Chetan Singh Thakur, David Karpul, Dorothee Arzounian, Edmund Lalor, Ernst Niebur,

Similar presentations


Presentation on theme: "Human Auditory Cognition 2014 Alain de Cheveigné, Andre van Schaik, Chetan Singh Thakur, David Karpul, Dorothee Arzounian, Edmund Lalor, Ernst Niebur,"— Presentation transcript:

1

2 Human Auditory Cognition 2014 Alain de Cheveigné, Andre van Schaik, Chetan Singh Thakur, David Karpul, Dorothee Arzounian, Edmund Lalor, Ernst Niebur, Giovanni Di Liberto, Guillaume Garreau, James O’Sullivan, Jessica Thompson, John Foxe, Lakshmi Krishnan, Malcolm Slaney, Manu Rastogi, Marcela Mendoza, Psyche Loui, Shih-Chii Liu, Simon Kelly, Siohoi Ieng, Thomas Murray, Tobi Delbruck, Victor Benichoux, Victor Minces, Vikram Ramanarayanan, Yves Boubenec

3 Summary Telluride Experiments Wow!

4 Noise Expected (Priming) Imagined (Ghosts) Changes (Texture) Reconstructions Hardware

5 The Problem What word will I say at the end of this _____. Priming Decisions … Joan said … Your canapes are wonderful

6 Ghosts

7 Ghosts – Motivation Visual: Auditory: ???

8 Ghost Simulation Noise Input White Pink Bable Cortical Human Model Choice Domain Spectrogram Cochleogram Comparison Approach Euclidean Cosine Xcorr Accumulation Approach Spectrogram Evaluation Approach Spectrogram Cochleagram better Noise 1 Noise 2 worse Lure/Template subtract

9 Ghosts – Auditory Simulations Noise Samples Target

10 Ghosts – Simulation Results

11 Ghosts – Human Results Output Inverse Target/Lure (Superstition)Average Spectrogram of Positive Choices

12 Ghosts – Humans vs. Simulations p < 1e-4 (Yes similarity) – (No similarity)

13 Ghosts – EEG Course Noise yes Noise no Word + noise Time [ms] Difference (Yes – No)

14 Prime Estimate filter Choice prediction Stimulus SUCCESS!!! (Not a guarantee) Ghosts – EEG Hypothesis

15 mTRF “Superstition” filter EEG Filter estimation User choice prediction r1r1 r2r2 Noise EEG prediction Subject 1 Subject 2 Subject 3 Overall choice prediction accuracy 59% ± 1.5% (p < 0.05) Ghosts – EEG Model

16 Ghosts – Summary Auditory convergence! EEG shows distinct filter!!

17 Textures

18 Neural signature of accumulation of sensory evidence Connell et al., 2012 Textures – Question How do statistics affect the neural accumulation of evidence? Difficulty of the task, unpredictability

19 Difficulty Textures – Stimulus

20 Late changes Early changes Psychophysics Timing of change (s) Difficulty

21 Button press

22 Voltage Average across subjects (n=3...)

23 Voltage

24 Priming

25

26 cheesy sunny Target Valid prime Invalid prime Target cheesy pretty sunny ready

27 cheesy sunny Target Valid prime Invalid prime Target Performance on the same stimuli is different as a function of sensory context

28 Priming – EEG Analysis The same stimuli elicit different cortical responses from auditory cortex as a function of sensory context

29 Priming Summary Context changes selectivity of auditory cortex to modulate the responses to upcoming stimuli. This is true for all 100 words! Our ability to recognize the expected word is enhanced by this filter!

30 Reconstructions Envelopes vs. Onsets CCA DBN/NMF

31 History But it misses all this: EEG

32 Our brain likes onsets

33 EEG Predictions

34 Goal: Find a transform of speech and a transform of EEG such that they are maximally correlated. This allows us to match EEG to speech (as in attention monitoring). - Better than correlating speech with reconstructed speech: speech contains details that do not show up in the EEG, so reconstructed speech is poorly correlated with real speech. - Better than correlating EEG with predicted EEG: EEG contains speech-irrelevant activity, so predicted EEG is poorly correlated with real EEG. cochlear filterbank measure correlation denoise, reduce dim. CCA speech EEG Reconstructions – Relating EEG to Speech

35 Data: collected by Giovanni Di Liberto (Ed Lalor's lab). Stimulus is speech ("The old man and the sea), approx 1.6 hours in 47 files. EEG recorded from 8 subjects, 130 channels, SR=512 Hz, ~2-3 minute sessions Audio preprocessing: FFT-based cochlear filterbank, 40 channels, range 100-8000Hz, bandwidth = 1 ERB. Filter output instantaneous power is smoothed =~ 30ms, 4 th root, SR=512 Hz  40 channel spectrogram (time series of "instantaneous partial loudness") EEG preprocessing: detrend (10th order polynomial), high-pass 0.1 Hz, low-pass 400 ms sq window, denoise with DSS to remove activity widely different between files Audio-EEG comparison: - concatenate 10 files (~20 min) - EEG: time shift 200ms x [0, 1, 2, 3], PCA, keep ~ 40 PCs - spectrogram: PCA, keep ~5 PCs - CCA between EEG and spectrogram PCs, - correlate 1st CCA component of EEG with 1st CCA component of spectrogram - test against surrogate data (spectrograms rotated by random amount, 100 trials)

36 Correlation between EEG and audio transforms: r ~= 0.35 (0.12 on surrogate data) The first CCA component from EEG is our best estimate of activity in the brain that cares about continuous speech Reconstructions – Results

37 TOWARD FINDING A LOW-DIMENSIONAL REPRESENTATION OF PHONETIC INFORMATION FROM SPEECH AND EEG SIGNALS SPEECHEEG PHONEMES LATENT REPRESENTATION NMFNMF, DBN

38 WHAT IS THE CURRENT STATE OF THE ART? 37 AESPA: Uses linear regression to map speech envelope to ERP. Does not use all the info in either signal. We would like to use all information in both speech and EEG to extract a lower- dimensional representation of phonetic information.

39 DATA PREPROCESSING - Re-reference to mastoids - Bad channel rejection & interpolation 1–4Hz 4–7Hz 7–15Hz 15–30Hz 128-channel EEG ICA Stability Analysis Equivalent Current Dipole Estimation

40 NON-NEGATIVE MATRIX FACTORIZATION (NMF) Factorize +ve M x N data matrix V into the product of 2 +ve matrices : Basis matrix W (dim: M x K ). Activation/Encoding matrix H (dim: K x N ). 39 We then find W, H that minimize the following cost function under constraints: W, H ≥ 0 - which leads to the following multiplicative update equations: D. Lee and H. Seung, “Algorithms for non-negative matrix factorization,” NIPS, 2001.

41 WHY NMF v. OTHER TECHNIQUES ? 40 NMF bases are more interpretable and part-based (since they are combine additively); at the same time not an over-approximation BASIS ACTIVATIONS RECONSTRUCTED IMAGES

42 Speech n th Hidden Layer Speech 1 st Hidden Layer Speech Visible Layer EEG 3 rd Hidden Layer EEG 1 st Hidden Layer EEG Visible Layer Associative Layer Phone Label Layer Phone 1 st Hidden Layer Phone n th Hidden Layer MFCC Features EEG Features Phone Labels Speech Transformation EEG Transformation Phone Transformation Low Dimensional Shared Representation … …… THE VISION

43 EEG Hidden Layers EEG Visible Layer EEG Features Phone Labels 80% – 10% – 10% train-dev-test split IN PRACTICE, SO FAR… NMF EEG Features Phone Labels Activations (latent repn.) SVM Trained on continuous speech (audiobook) NMF System DBN System

44 Real-time multi-talker speech recognition using automated attention from the ITD information of a binaural silicon cochlea

45 Attending to Conversations 68 23 34 56 81 28 11 39 83 23 Task: Recognize the highest valued (two digit) numbers Where do I attend? ? ? ? ?? ? ?

46 Cognition ASR Binaural Receiver Novelty/ Salience Recognized Digits Male/Female ITD Histogram Salience Attention

47 Scene Analysis Engineering (2014) Cognition (Python) Python (Sphinx) Binaural (jAER) Novelty (Python) State: Direction to attend, Digits recognized Task: Switch attention based on recognition and saliency UDP (ITD) (Salience) (digits) (Obama/Cameron) UDP (sound samples)

48 Analog Scene Analysis – Things to solve Saliency Binaural onset Online speech recognition Difficulty of getting sounds into computer Difficulty of interfacing to a real-time speech recognition toolbox Cognition (held up by difficulty of the real-time speech recognition) Two digit sentences, easy semantics Sound Separation Using delay-and-add to separate speakers

49 FPGA Cochlea

50 FPGA Results! Implemented real-time FPGA implementation of Dick Lyon’s Cochlea model. Implemented Shamma’s coherent sound-segregation task.

51 FPGA Cochlea Response of Chirp Signal

52 FPGA – Sound segregation problem Temporal coherence -> Sound stream Look for common modulation

53 Correlation matrix @ time t ihc bm 2Hz 4Hz 8Hz Stimulus Channel.................. Attention Signal Mask array Reconstructed tone Cochlea 16Hz

54 FPGA – Cochleagram and Modulation Output

55 FPGA – Coherence Output

56 FPGA – Applied mask After applying mask to the Cochlear channel, we can reconstruct each tone. Mask would be chosen based on attention signal.

57 FPGA – Future work Extend this work to speech signals, to segregate sources in a cocktail party.

58 Thank you!!! For loaning us 64-channel actiCHamp for recording our EEG data. For providing trial licenses for students and lab equipment.

59 Noise Expected (Priming) Imagined (Ghosts) Changes (Texture) Reconstructions Hardware


Download ppt "Human Auditory Cognition 2014 Alain de Cheveigné, Andre van Schaik, Chetan Singh Thakur, David Karpul, Dorothee Arzounian, Edmund Lalor, Ernst Niebur,"

Similar presentations


Ads by Google