Neuro-inspired Speech Recognition Group Members Ismail Uysal Yoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson Shihab Shamma Hynek Hermanski Shih-Chii Liu Giacomo Indiveri Malcolm Slaney
Audio Projects Speech Recognition More ASR Localization
Shihab is Running See http://www.hardrock100.com/index.asp Shihab arriving in Telluride in 2004 (should happen around 4PM today)
Localization Effort Microphones Speaker ITD estimation from pure tones Interaural Time Difference (ITD) Estimated from time difference between spikes of two matching channels. Interaural Intensity Difference (IID) Difference of spike counts between two cochleae. Azimuth: Combination of ITD and IID Microphones Azimuth estimation from music Speaker
Localization Effort
FPAA/Mote – Word Recognition
FPAA/Mote – Word Recognition Robosapien—listens to the spoken commands…. Field Programmable Analog Array (FPAA)—based analog cochlea (non-spiking) with envelope detection. MOTE—based pattern matching using matched filtering with “receptive fields”
FPAA/Mote – Word Recognition Status: FPAA – (we are using a new FPAA) 2nd-order sections synthesized but a full auditory filter bank is not yet up. MOTE – real-time communication with Matlab and sampling operational.
Relational Network (Simple) Patches of neurons Each measure one quantity Bidirectional relations for feedback/feedforward Thanks to Rodney Douglas X Y Z M m
Relational Network (example) Relational specification Input here Relational feedback Relational Feedback
ASR Relational Network Bidirectional links enforce phoneme/word constraints Phone Recognizer Cochlea Word Recognizer Phone Recognizer Delay A patch of neurons (one of N output) Note: We don’t know how to represent delays
Relational Advantages Not an HMM HMMs are great, but… Incorporate other knowledge Bottom-up perception Top-down word hypothesis Hallucinate Based on experience Hear “ba..” and know that Bad, bat, bar, bass, band follow >
Silicon Cochlea (van Schaik, Liu, 2004) high frequency low frequency Basilar membrane high frequency low frequency Inner hair cells BASILAR MEMBRANE INNER HAIR CELLS Ganglion cells GANGLION CELLS (van Schaik, Liu, 2004)
Silicon Frequency Response Tone ramps into two cochleas
Cochlear Rate Profiles Spikes per utterance Left Cochlea Right Cochlea
Learning Algorithms Statistical Liquid State Machine SAS (Pick best channels for decision) Least squares (for software demo) Liquid State Machine Take input to high dimensions with spiking net Spike Timing Dependent Plasticity (STDP) Giocomo/Srinjoy Chip Brader/Fusi LSM Spiking Output Vowel 1 Vowel 2
Learning Chip Architecture Cochlea Chip Immediate Cochlea Delayed Cochlea Plastic synapses Nonplastic synapses Excit. Learning Chip Neurons Inhib. Phoneme 1 Phoneme 2 Phoneme 1 Phoneme 2 Binary synaptic weights: , , Relational Network
Tone Results Tone recognition Training Testing Spike input from silicon cochlea Training Two tones Duplicated input Positive and negative examples Testing
Phoneme Results Phoneme recognition Training Testing Spike input from silicon cochlea Training Two phonemes Duplicated inputs Positive and negative examples Testing
Behind the Curtain
Hardware Overview Phoneme Word Cochlea Learning PCI-AER (for remapping) Learning Cochlea Learning Giacomo Indiveri Shih-Chii Liu PCI-AER (for remapping) Implemented in MATLAB
Infrastructure Difficulties Remapper Ensuing the problems surrounding AER mapper boards, remapping the AER data from silicon cochlea to the learning chip had to be done in Matlab. (very slow) Power The unpredictable problem caused by the variation in supply voltage as much as 1V. Sharing chips The learning chip had to be shared with two other workgroups. PC replacement
Impedance Difficulties Cochlear firing rates Cochlea: 6M spikes/second 30k channels, 200 spikes/second Silicon Cochlea: 30k spikes/second 30 channels, 1k spike/second Learning Chip: 3k spikes/second 30 channels, 100 spikes/second Dynamic range
Desired Results Relational Feedback Without With /A/ Phoneme Patch /I/ Phoneme Patch AI Word Patch IA Word Patch A A A I Phoneme Input
Simulation
Simulation 2
Simulation 3
Great Job! Student Members Ismail Uysal Yoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor
Silicon Cochlea Raster plot for two different tone inputs Mean firing rates for two different vowel inputs Channel Number Time in microseconds Channel Number
Word Recognizer Four example raster plot (silence, A_, A_ with relational, AI)
Software Simulation
Software Simulation
Behind the Curtain