Human Auditory Cognition 2014 Alain de Cheveigné, Andre van Schaik, Chetan Singh Thakur, David Karpul, Dorothee Arzounian, Edmund Lalor, Ernst Niebur,

Human Auditory Cognition 2014 Alain de Cheveigné, Andre van Schaik, Chetan Singh Thakur, David Karpul, Dorothee Arzounian, Edmund Lalor, Ernst Niebur, Giovanni Di Liberto, Guillaume Garreau, James O’Sullivan, Jessica Thompson, John Foxe, Lakshmi Krishnan, Malcolm Slaney, Manu Rastogi, Marcela Mendoza, Psyche Loui, Shih-Chii Liu, Simon Kelly, Siohoi Ieng, Thomas Murray, Tobi Delbruck, Victor Benichoux, Victor Minces, Vikram Ramanarayanan, Yves Boubenec

Summary Telluride Experiments Wow!

Noise Expected (Priming) Imagined (Ghosts) Changes (Texture) Reconstructions Hardware

The Problem What word will I say at the end of this _____. Priming Decisions … Joan said … Your canapes are wonderful

Ghosts

Ghosts – Motivation Visual: Auditory: ???

Ghost Simulation Noise Input White Pink Bable Cortical Human Model Choice Domain Spectrogram Cochleogram Comparison Approach Euclidean Cosine Xcorr Accumulation Approach Spectrogram Evaluation Approach Spectrogram Cochleagram better Noise 1 Noise 2 worse Lure/Template subtract

Ghosts – Auditory Simulations Noise Samples Target

Ghosts – Simulation Results

Ghosts – Human Results Output Inverse Target/Lure (Superstition)Average Spectrogram of Positive Choices

Ghosts – Humans vs. Simulations p < 1e-4 (Yes similarity) – (No similarity)

Ghosts – EEG Course Noise yes Noise no Word + noise Time [ms] Difference (Yes – No)

Prime Estimate filter Choice prediction Stimulus SUCCESS!!! (Not a guarantee) Ghosts – EEG Hypothesis

mTRF “Superstition” filter EEG Filter estimation User choice prediction r1r1 r2r2 Noise EEG prediction Subject 1 Subject 2 Subject 3 Overall choice prediction accuracy 59% ± 1.5% (p < 0.05) Ghosts – EEG Model

Ghosts – Summary Auditory convergence! EEG shows distinct filter!!

Textures

Neural signature of accumulation of sensory evidence Connell et al., 2012 Textures – Question How do statistics affect the neural accumulation of evidence? Difficulty of the task, unpredictability

Difficulty Textures – Stimulus

Late changes Early changes Psychophysics Timing of change (s) Difficulty

Button press

Voltage Average across subjects (n=3...)

Voltage

Priming

cheesy sunny Target Valid prime Invalid prime Target cheesy pretty sunny ready

cheesy sunny Target Valid prime Invalid prime Target Performance on the same stimuli is different as a function of sensory context

Priming – EEG Analysis The same stimuli elicit different cortical responses from auditory cortex as a function of sensory context

Priming Summary Context changes selectivity of auditory cortex to modulate the responses to upcoming stimuli. This is true for all 100 words! Our ability to recognize the expected word is enhanced by this filter!

Reconstructions Envelopes vs. Onsets CCA DBN/NMF

History But it misses all this: EEG

Our brain likes onsets

EEG Predictions

Goal: Find a transform of speech and a transform of EEG such that they are maximally correlated. This allows us to match EEG to speech (as in attention monitoring). - Better than correlating speech with reconstructed speech: speech contains details that do not show up in the EEG, so reconstructed speech is poorly correlated with real speech. - Better than correlating EEG with predicted EEG: EEG contains speech-irrelevant activity, so predicted EEG is poorly correlated with real EEG. cochlear filterbank measure correlation denoise, reduce dim. CCA speech EEG Reconstructions – Relating EEG to Speech

Data: collected by Giovanni Di Liberto (Ed Lalor's lab). Stimulus is speech ("The old man and the sea), approx 1.6 hours in 47 files. EEG recorded from 8 subjects, 130 channels, SR=512 Hz, ~2-3 minute sessions Audio preprocessing: FFT-based cochlear filterbank, 40 channels, range 100-8000Hz, bandwidth = 1 ERB. Filter output instantaneous power is smoothed =~ 30ms, 4 th root, SR=512 Hz  40 channel spectrogram (time series of "instantaneous partial loudness") EEG preprocessing: detrend (10th order polynomial), high-pass 0.1 Hz, low-pass 400 ms sq window, denoise with DSS to remove activity widely different between files Audio-EEG comparison: - concatenate 10 files (~20 min) - EEG: time shift 200ms x [0, 1, 2, 3], PCA, keep ~ 40 PCs - spectrogram: PCA, keep ~5 PCs - CCA between EEG and spectrogram PCs, - correlate 1st CCA component of EEG with 1st CCA component of spectrogram - test against surrogate data (spectrograms rotated by random amount, 100 trials)

Correlation between EEG and audio transforms: r ~= 0.35 (0.12 on surrogate data) The first CCA component from EEG is our best estimate of activity in the brain that cares about continuous speech Reconstructions – Results

TOWARD FINDING A LOW-DIMENSIONAL REPRESENTATION OF PHONETIC INFORMATION FROM SPEECH AND EEG SIGNALS SPEECHEEG PHONEMES LATENT REPRESENTATION NMFNMF, DBN

WHAT IS THE CURRENT STATE OF THE ART? 37 AESPA: Uses linear regression to map speech envelope to ERP. Does not use all the info in either signal. We would like to use all information in both speech and EEG to extract a lower- dimensional representation of phonetic information.

DATA PREPROCESSING - Re-reference to mastoids - Bad channel rejection & interpolation 1–4Hz 4–7Hz 7–15Hz 15–30Hz 128-channel EEG ICA Stability Analysis Equivalent Current Dipole Estimation

NON-NEGATIVE MATRIX FACTORIZATION (NMF) Factorize +ve M x N data matrix V into the product of 2 +ve matrices : Basis matrix W (dim: M x K ). Activation/Encoding matrix H (dim: K x N ). 39 We then find W, H that minimize the following cost function under constraints: W, H ≥ 0 - which leads to the following multiplicative update equations: D. Lee and H. Seung, “Algorithms for non-negative matrix factorization,” NIPS, 2001.

WHY NMF v. OTHER TECHNIQUES ? 40 NMF bases are more interpretable and part-based (since they are combine additively); at the same time not an over-approximation BASIS ACTIVATIONS RECONSTRUCTED IMAGES

Speech n th Hidden Layer Speech 1 st Hidden Layer Speech Visible Layer EEG 3 rd Hidden Layer EEG 1 st Hidden Layer EEG Visible Layer Associative Layer Phone Label Layer Phone 1 st Hidden Layer Phone n th Hidden Layer MFCC Features EEG Features Phone Labels Speech Transformation EEG Transformation Phone Transformation Low Dimensional Shared Representation … …… THE VISION

EEG Hidden Layers EEG Visible Layer EEG Features Phone Labels 80% – 10% – 10% train-dev-test split IN PRACTICE, SO FAR… NMF EEG Features Phone Labels Activations (latent repn.) SVM Trained on continuous speech (audiobook) NMF System DBN System

Real-time multi-talker speech recognition using automated attention from the ITD information of a binaural silicon cochlea

Attending to Conversations 68 23 34 56 81 28 11 39 83 23 Task: Recognize the highest valued (two digit) numbers Where do I attend? ? ? ? ?? ? ?

Cognition ASR Binaural Receiver Novelty/ Salience Recognized Digits Male/Female ITD Histogram Salience Attention

Scene Analysis Engineering (2014) Cognition (Python) Python (Sphinx) Binaural (jAER) Novelty (Python) State: Direction to attend, Digits recognized Task: Switch attention based on recognition and saliency UDP (ITD) (Salience) (digits) (Obama/Cameron) UDP (sound samples)

Analog Scene Analysis – Things to solve Saliency Binaural onset Online speech recognition Difficulty of getting sounds into computer Difficulty of interfacing to a real-time speech recognition toolbox Cognition (held up by difficulty of the real-time speech recognition) Two digit sentences, easy semantics Sound Separation Using delay-and-add to separate speakers

FPGA Cochlea

FPGA Results! Implemented real-time FPGA implementation of Dick Lyon’s Cochlea model. Implemented Shamma’s coherent sound-segregation task.

FPGA Cochlea Response of Chirp Signal

FPGA – Sound segregation problem Temporal coherence -> Sound stream Look for common modulation

Correlation matrix @ time t ihc bm 2Hz 4Hz 8Hz Stimulus Channel.................. Attention Signal Mask array Reconstructed tone Cochlea 16Hz

FPGA – Cochleagram and Modulation Output

FPGA – Coherence Output

FPGA – Applied mask After applying mask to the Cochlear channel, we can reconstruct each tone. Mask would be chosen based on attention signal.

FPGA – Future work Extend this work to speech signals, to segregate sources in a cocktail party.

Thank you!!! For loaning us 64-channel actiCHamp for recording our EEG data. For providing trial licenses for students and lab equipment.

Noise Expected (Priming) Imagined (Ghosts) Changes (Texture) Reconstructions Hardware

Human Auditory Cognition 2014 Alain de Cheveigné, Andre van Schaik, Chetan Singh Thakur, David Karpul, Dorothee Arzounian, Edmund Lalor, Ernst Niebur,

Similar presentations

Presentation on theme: "Human Auditory Cognition 2014 Alain de Cheveigné, Andre van Schaik, Chetan Singh Thakur, David Karpul, Dorothee Arzounian, Edmund Lalor, Ernst Niebur,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Human Auditory Cognition 2014 Alain de Cheveigné, Andre van Schaik, Chetan Singh Thakur, David Karpul, Dorothee Arzounian, Edmund Lalor, Ernst Niebur,

Similar presentations

Presentation on theme: "Human Auditory Cognition 2014 Alain de Cheveigné, Andre van Schaik, Chetan Singh Thakur, David Karpul, Dorothee Arzounian, Edmund Lalor, Ernst Niebur,"— Presentation transcript:

Similar presentations

About project

Feedback