Download presentation
Presentation is loading. Please wait.
1
Neuro-inspired Speech Recognition
Group Members Ismail Uysal Yoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson Shihab Shamma Hynek Hermanski Shih-Chii Liu Giacomo Indiveri Malcolm Slaney
2
Audio Projects Speech Recognition More ASR Localization
3
Shihab is Running See http://www.hardrock100.com/index.asp
Shihab arriving in Telluride in 2004 (should happen around 4PM today)
4
Localization Effort Microphones Speaker
ITD estimation from pure tones Interaural Time Difference (ITD) Estimated from time difference between spikes of two matching channels. Interaural Intensity Difference (IID) Difference of spike counts between two cochleae. Azimuth: Combination of ITD and IID Microphones Azimuth estimation from music Speaker
5
Localization Effort
6
FPAA/Mote – Word Recognition
7
FPAA/Mote – Word Recognition
Robosapien—listens to the spoken commands…. Field Programmable Analog Array (FPAA)—based analog cochlea (non-spiking) with envelope detection. MOTE—based pattern matching using matched filtering with “receptive fields”
8
FPAA/Mote – Word Recognition
Status: FPAA – (we are using a new FPAA) 2nd-order sections synthesized but a full auditory filter bank is not yet up. MOTE – real-time communication with Matlab and sampling operational.
9
Relational Network (Simple)
Patches of neurons Each measure one quantity Bidirectional relations for feedback/feedforward Thanks to Rodney Douglas X Y Z M m
10
Relational Network (example)
Relational specification Input here Relational feedback Relational Feedback
11
ASR Relational Network
Bidirectional links enforce phoneme/word constraints Phone Recognizer Cochlea Word Recognizer Phone Recognizer Delay A patch of neurons (one of N output) Note: We don’t know how to represent delays
12
Relational Advantages
Not an HMM HMMs are great, but… Incorporate other knowledge Bottom-up perception Top-down word hypothesis Hallucinate Based on experience Hear “ba..” and know that Bad, bat, bar, bass, band follow >
13
Silicon Cochlea (van Schaik, Liu, 2004) high frequency low frequency
Basilar membrane high frequency low frequency Inner hair cells BASILAR MEMBRANE INNER HAIR CELLS Ganglion cells GANGLION CELLS (van Schaik, Liu, 2004)
14
Silicon Frequency Response
Tone ramps into two cochleas
15
Cochlear Rate Profiles
Spikes per utterance Left Cochlea Right Cochlea
16
Learning Algorithms Statistical Liquid State Machine
SAS (Pick best channels for decision) Least squares (for software demo) Liquid State Machine Take input to high dimensions with spiking net Spike Timing Dependent Plasticity (STDP) Giocomo/Srinjoy Chip Brader/Fusi LSM Spiking Output Vowel 1 Vowel 2
17
Learning Chip Architecture
Cochlea Chip Immediate Cochlea Delayed Cochlea Plastic synapses Nonplastic synapses Excit. Learning Chip Neurons Inhib. Phoneme 1 Phoneme 2 Phoneme 1 Phoneme 2 Binary synaptic weights: , , Relational Network
18
Tone Results Tone recognition Training Testing
Spike input from silicon cochlea Training Two tones Duplicated input Positive and negative examples Testing
19
Phoneme Results Phoneme recognition Training Testing
Spike input from silicon cochlea Training Two phonemes Duplicated inputs Positive and negative examples Testing
20
Behind the Curtain
21
Hardware Overview Phoneme Word Cochlea Learning
PCI-AER (for remapping) Learning Cochlea Learning Giacomo Indiveri Shih-Chii Liu PCI-AER (for remapping) Implemented in MATLAB
22
Infrastructure Difficulties
Remapper Ensuing the problems surrounding AER mapper boards, remapping the AER data from silicon cochlea to the learning chip had to be done in Matlab. (very slow) Power The unpredictable problem caused by the variation in supply voltage as much as 1V. Sharing chips The learning chip had to be shared with two other workgroups. PC replacement
23
Impedance Difficulties
Cochlear firing rates Cochlea: 6M spikes/second 30k channels, 200 spikes/second Silicon Cochlea: 30k spikes/second 30 channels, 1k spike/second Learning Chip: 3k spikes/second 30 channels, 100 spikes/second Dynamic range
24
Desired Results Relational Feedback Without With /A/ Phoneme Patch
/I/ Phoneme Patch AI Word Patch IA Word Patch A A A I Phoneme Input
25
Simulation
26
Simulation 2
27
Simulation 3
28
Great Job! Student Members Ismail Uysal Yoojin Chung
Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor
30
Silicon Cochlea Raster plot for two different tone inputs
Mean firing rates for two different vowel inputs Channel Number Time in microseconds Channel Number
31
Word Recognizer Four example raster plot (silence, A_, A_ with relational, AI)
32
Software Simulation
33
Software Simulation
34
Behind the Curtain
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.