Neuro-inspired Speech Recognition

Neuro-inspired Speech Recognition
Group Members Ismail Uysal Yoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson Shihab Shamma Hynek Hermanski Shih-Chii Liu Giacomo Indiveri Malcolm Slaney

Audio Projects Speech Recognition More ASR Localization

Shihab is Running See http://www.hardrock100.com/index.asp
Shihab arriving in Telluride in 2004 (should happen around 4PM today)

Localization Effort Microphones Speaker
ITD estimation from pure tones Interaural Time Difference (ITD) Estimated from time difference between spikes of two matching channels. Interaural Intensity Difference (IID) Difference of spike counts between two cochleae. Azimuth: Combination of ITD and IID Microphones Azimuth estimation from music Speaker

Localization Effort

FPAA/Mote – Word Recognition

Robosapien—listens to the spoken commands…. Field Programmable Analog Array (FPAA)—based analog cochlea (non-spiking) with envelope detection. MOTE—based pattern matching using matched filtering with “receptive fields”

Status: FPAA – (we are using a new FPAA) 2nd-order sections synthesized but a full auditory filter bank is not yet up. MOTE – real-time communication with Matlab and sampling operational.

Relational Network (Simple)
Patches of neurons Each measure one quantity Bidirectional relations for feedback/feedforward Thanks to Rodney Douglas X Y Z M m

Relational Network (example)
Relational specification Input here Relational feedback Relational Feedback

ASR Relational Network
Bidirectional links enforce phoneme/word constraints Phone Recognizer Cochlea Word Recognizer Phone Recognizer Delay A patch of neurons (one of N output) Note: We don’t know how to represent delays

Relational Advantages
Not an HMM HMMs are great, but… Incorporate other knowledge Bottom-up perception Top-down word hypothesis Hallucinate Based on experience Hear “ba..” and know that Bad, bat, bar, bass, band follow >

Silicon Cochlea (van Schaik, Liu, 2004) high frequency low frequency
Basilar membrane high frequency low frequency Inner hair cells BASILAR MEMBRANE INNER HAIR CELLS Ganglion cells GANGLION CELLS (van Schaik, Liu, 2004)

Silicon Frequency Response
Tone ramps into two cochleas

Cochlear Rate Profiles
Spikes per utterance Left Cochlea Right Cochlea

Learning Algorithms Statistical Liquid State Machine
SAS (Pick best channels for decision) Least squares (for software demo) Liquid State Machine Take input to high dimensions with spiking net Spike Timing Dependent Plasticity (STDP) Giocomo/Srinjoy Chip Brader/Fusi LSM Spiking Output Vowel 1 Vowel 2

Learning Chip Architecture
Cochlea Chip Immediate Cochlea Delayed Cochlea Plastic synapses Nonplastic synapses Excit. Learning Chip Neurons Inhib. Phoneme 1 Phoneme 2 Phoneme 1 Phoneme 2 Binary synaptic weights:  , ,  Relational Network

Tone Results Tone recognition Training Testing
Spike input from silicon cochlea Training Two tones Duplicated input Positive and negative examples Testing

Phoneme Results Phoneme recognition Training Testing
Spike input from silicon cochlea Training Two phonemes Duplicated inputs Positive and negative examples Testing

Behind the Curtain

Hardware Overview Phoneme Word Cochlea Learning
PCI-AER (for remapping) Learning Cochlea Learning Giacomo Indiveri Shih-Chii Liu PCI-AER (for remapping) Implemented in MATLAB

Infrastructure Difficulties
Remapper Ensuing the problems surrounding AER mapper boards, remapping the AER data from silicon cochlea to the learning chip had to be done in Matlab. (very slow) Power The unpredictable problem caused by the variation in supply voltage as much as 1V. Sharing chips The learning chip had to be shared with two other workgroups. PC replacement

Impedance Difficulties
Cochlear firing rates Cochlea: 6M spikes/second 30k channels, 200 spikes/second Silicon Cochlea: 30k spikes/second 30 channels, 1k spike/second Learning Chip: 3k spikes/second 30 channels, 100 spikes/second Dynamic range

Desired Results Relational Feedback Without With /A/ Phoneme Patch
/I/ Phoneme Patch AI Word Patch IA Word Patch A A A I Phoneme Input

Simulation

Simulation 2

Simulation 3

Great Job! Student Members Ismail Uysal Yoojin Chung
Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor

Silicon Cochlea Raster plot for two different tone inputs
Mean firing rates for two different vowel inputs Channel Number Time in microseconds Channel Number

Word Recognizer Four example raster plot (silence, A_, A_ with relational, AI)

Software Simulation

Behind the Curtain

Neuro-inspired Speech Recognition

Similar presentations

Presentation on theme: "Neuro-inspired Speech Recognition"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Neuro-inspired Speech Recognition

Similar presentations

Presentation on theme: "Neuro-inspired Speech Recognition"— Presentation transcript:

Similar presentations

About project

Feedback