Download presentation
Presentation is loading. Please wait.
Published byJonah Holland Modified over 9 years ago
2
Human Auditory Cognition 2014 Alain de Cheveigné, Andre van Schaik, Chetan Singh Thakur, David Karpul, Dorothee Arzounian, Edmund Lalor, Ernst Niebur, Giovanni Di Liberto, Guillaume Garreau, James O’Sullivan, Jessica Thompson, John Foxe, Lakshmi Krishnan, Malcolm Slaney, Manu Rastogi, Marcela Mendoza, Psyche Loui, Shih-Chii Liu, Simon Kelly, Siohoi Ieng, Thomas Murray, Tobi Delbruck, Victor Benichoux, Victor Minces, Vikram Ramanarayanan, Yves Boubenec
3
Summary Telluride Experiments Wow!
4
Noise Expected (Priming) Imagined (Ghosts) Changes (Texture) Reconstructions Hardware
5
The Problem What word will I say at the end of this _____. Priming Decisions … Joan said … Your canapes are wonderful
6
Ghosts
7
Ghosts – Motivation Visual: Auditory: ???
8
Ghost Simulation Noise Input White Pink Bable Cortical Human Model Choice Domain Spectrogram Cochleogram Comparison Approach Euclidean Cosine Xcorr Accumulation Approach Spectrogram Evaluation Approach Spectrogram Cochleagram better Noise 1 Noise 2 worse Lure/Template subtract
9
Ghosts – Auditory Simulations Noise Samples Target
10
Ghosts – Simulation Results
11
Ghosts – Human Results Output Inverse Target/Lure (Superstition)Average Spectrogram of Positive Choices
12
Ghosts – Humans vs. Simulations p < 1e-4 (Yes similarity) – (No similarity)
13
Ghosts – EEG Course Noise yes Noise no Word + noise Time [ms] Difference (Yes – No)
14
Prime Estimate filter Choice prediction Stimulus SUCCESS!!! (Not a guarantee) Ghosts – EEG Hypothesis
15
mTRF “Superstition” filter EEG Filter estimation User choice prediction r1r1 r2r2 Noise EEG prediction Subject 1 Subject 2 Subject 3 Overall choice prediction accuracy 59% ± 1.5% (p < 0.05) Ghosts – EEG Model
16
Ghosts – Summary Auditory convergence! EEG shows distinct filter!!
17
Textures
18
Neural signature of accumulation of sensory evidence Connell et al., 2012 Textures – Question How do statistics affect the neural accumulation of evidence? Difficulty of the task, unpredictability
19
Difficulty Textures – Stimulus
20
Late changes Early changes Psychophysics Timing of change (s) Difficulty
21
Button press
22
Voltage Average across subjects (n=3...)
23
Voltage
24
Priming
26
cheesy sunny Target Valid prime Invalid prime Target cheesy pretty sunny ready
27
cheesy sunny Target Valid prime Invalid prime Target Performance on the same stimuli is different as a function of sensory context
28
Priming – EEG Analysis The same stimuli elicit different cortical responses from auditory cortex as a function of sensory context
29
Priming Summary Context changes selectivity of auditory cortex to modulate the responses to upcoming stimuli. This is true for all 100 words! Our ability to recognize the expected word is enhanced by this filter!
30
Reconstructions Envelopes vs. Onsets CCA DBN/NMF
31
History But it misses all this: EEG
32
Our brain likes onsets
33
EEG Predictions
34
Goal: Find a transform of speech and a transform of EEG such that they are maximally correlated. This allows us to match EEG to speech (as in attention monitoring). - Better than correlating speech with reconstructed speech: speech contains details that do not show up in the EEG, so reconstructed speech is poorly correlated with real speech. - Better than correlating EEG with predicted EEG: EEG contains speech-irrelevant activity, so predicted EEG is poorly correlated with real EEG. cochlear filterbank measure correlation denoise, reduce dim. CCA speech EEG Reconstructions – Relating EEG to Speech
35
Data: collected by Giovanni Di Liberto (Ed Lalor's lab). Stimulus is speech ("The old man and the sea), approx 1.6 hours in 47 files. EEG recorded from 8 subjects, 130 channels, SR=512 Hz, ~2-3 minute sessions Audio preprocessing: FFT-based cochlear filterbank, 40 channels, range 100-8000Hz, bandwidth = 1 ERB. Filter output instantaneous power is smoothed =~ 30ms, 4 th root, SR=512 Hz 40 channel spectrogram (time series of "instantaneous partial loudness") EEG preprocessing: detrend (10th order polynomial), high-pass 0.1 Hz, low-pass 400 ms sq window, denoise with DSS to remove activity widely different between files Audio-EEG comparison: - concatenate 10 files (~20 min) - EEG: time shift 200ms x [0, 1, 2, 3], PCA, keep ~ 40 PCs - spectrogram: PCA, keep ~5 PCs - CCA between EEG and spectrogram PCs, - correlate 1st CCA component of EEG with 1st CCA component of spectrogram - test against surrogate data (spectrograms rotated by random amount, 100 trials)
36
Correlation between EEG and audio transforms: r ~= 0.35 (0.12 on surrogate data) The first CCA component from EEG is our best estimate of activity in the brain that cares about continuous speech Reconstructions – Results
37
TOWARD FINDING A LOW-DIMENSIONAL REPRESENTATION OF PHONETIC INFORMATION FROM SPEECH AND EEG SIGNALS SPEECHEEG PHONEMES LATENT REPRESENTATION NMFNMF, DBN
38
WHAT IS THE CURRENT STATE OF THE ART? 37 AESPA: Uses linear regression to map speech envelope to ERP. Does not use all the info in either signal. We would like to use all information in both speech and EEG to extract a lower- dimensional representation of phonetic information.
39
DATA PREPROCESSING - Re-reference to mastoids - Bad channel rejection & interpolation 1–4Hz 4–7Hz 7–15Hz 15–30Hz 128-channel EEG ICA Stability Analysis Equivalent Current Dipole Estimation
40
NON-NEGATIVE MATRIX FACTORIZATION (NMF) Factorize +ve M x N data matrix V into the product of 2 +ve matrices : Basis matrix W (dim: M x K ). Activation/Encoding matrix H (dim: K x N ). 39 We then find W, H that minimize the following cost function under constraints: W, H ≥ 0 - which leads to the following multiplicative update equations: D. Lee and H. Seung, “Algorithms for non-negative matrix factorization,” NIPS, 2001.
41
WHY NMF v. OTHER TECHNIQUES ? 40 NMF bases are more interpretable and part-based (since they are combine additively); at the same time not an over-approximation BASIS ACTIVATIONS RECONSTRUCTED IMAGES
42
Speech n th Hidden Layer Speech 1 st Hidden Layer Speech Visible Layer EEG 3 rd Hidden Layer EEG 1 st Hidden Layer EEG Visible Layer Associative Layer Phone Label Layer Phone 1 st Hidden Layer Phone n th Hidden Layer MFCC Features EEG Features Phone Labels Speech Transformation EEG Transformation Phone Transformation Low Dimensional Shared Representation … …… THE VISION
43
EEG Hidden Layers EEG Visible Layer EEG Features Phone Labels 80% – 10% – 10% train-dev-test split IN PRACTICE, SO FAR… NMF EEG Features Phone Labels Activations (latent repn.) SVM Trained on continuous speech (audiobook) NMF System DBN System
44
Real-time multi-talker speech recognition using automated attention from the ITD information of a binaural silicon cochlea
45
Attending to Conversations 68 23 34 56 81 28 11 39 83 23 Task: Recognize the highest valued (two digit) numbers Where do I attend? ? ? ? ?? ? ?
46
Cognition ASR Binaural Receiver Novelty/ Salience Recognized Digits Male/Female ITD Histogram Salience Attention
47
Scene Analysis Engineering (2014) Cognition (Python) Python (Sphinx) Binaural (jAER) Novelty (Python) State: Direction to attend, Digits recognized Task: Switch attention based on recognition and saliency UDP (ITD) (Salience) (digits) (Obama/Cameron) UDP (sound samples)
48
Analog Scene Analysis – Things to solve Saliency Binaural onset Online speech recognition Difficulty of getting sounds into computer Difficulty of interfacing to a real-time speech recognition toolbox Cognition (held up by difficulty of the real-time speech recognition) Two digit sentences, easy semantics Sound Separation Using delay-and-add to separate speakers
49
FPGA Cochlea
50
FPGA Results! Implemented real-time FPGA implementation of Dick Lyon’s Cochlea model. Implemented Shamma’s coherent sound-segregation task.
51
FPGA Cochlea Response of Chirp Signal
52
FPGA – Sound segregation problem Temporal coherence -> Sound stream Look for common modulation
53
Correlation matrix @ time t ihc bm 2Hz 4Hz 8Hz Stimulus Channel.................. Attention Signal Mask array Reconstructed tone Cochlea 16Hz
54
FPGA – Cochleagram and Modulation Output
55
FPGA – Coherence Output
56
FPGA – Applied mask After applying mask to the Cochlear channel, we can reconstruct each tone. Mask would be chosen based on attention signal.
57
FPGA – Future work Extend this work to speech signals, to segregate sources in a cocktail party.
58
Thank you!!! For loaning us 64-channel actiCHamp for recording our EEG data. For providing trial licenses for students and lab equipment.
59
Noise Expected (Priming) Imagined (Ghosts) Changes (Texture) Reconstructions Hardware
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.