Presentation is loading. Please wait.

Presentation is loading. Please wait.

Neuro-inspired Speech Recognition

There are copies: 1
Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson.

Similar presentations


Presentation on theme: "Neuro-inspired Speech Recognition"— Presentation transcript:

1 Neuro-inspired Speech Recognition
Group Members Ismail Uysal Yoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson Shihab Shamma Hynek Hermanski Shih-Chii Liu Giacomo Indiveri Malcolm Slaney

2 Audio Projects Speech Recognition More ASR Localization

3 Shihab is Running See http://www.hardrock100.com/index.asp
Shihab arriving in Telluride in 2004 (should happen around 4PM today)

4 Localization Effort Microphones Speaker
ITD estimation from pure tones Interaural Time Difference (ITD) Estimated from time difference between spikes of two matching channels. Interaural Intensity Difference (IID) Difference of spike counts between two cochleae. Azimuth: Combination of ITD and IID Microphones Azimuth estimation from music Speaker

5 Localization Effort

6 FPAA/Mote – Word Recognition

7 FPAA/Mote – Word Recognition
Robosapien—listens to the spoken commands…. Field Programmable Analog Array (FPAA)—based analog cochlea (non-spiking) with envelope detection. MOTE—based pattern matching using matched filtering with “receptive fields”

8 FPAA/Mote – Word Recognition
Status: FPAA – (we are using a new FPAA) 2nd-order sections synthesized but a full auditory filter bank is not yet up. MOTE – real-time communication with Matlab and sampling operational.

9 Relational Network (Simple)
Patches of neurons Each measure one quantity Bidirectional relations for feedback/feedforward Thanks to Rodney Douglas X Y Z M m

10 Relational Network (example)
Relational specification Input here Relational feedback Relational Feedback

11 ASR Relational Network
Bidirectional links enforce phoneme/word constraints Phone Recognizer Cochlea Word Recognizer Phone Recognizer Delay A patch of neurons (one of N output) Note: We don’t know how to represent delays

12 Relational Advantages
Not an HMM HMMs are great, but… Incorporate other knowledge Bottom-up perception Top-down word hypothesis Hallucinate Based on experience Hear “ba..” and know that Bad, bat, bar, bass, band follow >

13 Silicon Cochlea (van Schaik, Liu, 2004) high frequency low frequency
Basilar membrane high frequency low frequency Inner hair cells BASILAR MEMBRANE INNER HAIR CELLS Ganglion cells GANGLION CELLS (van Schaik, Liu, 2004)

14 Silicon Frequency Response
Tone ramps into two cochleas

15 Cochlear Rate Profiles
Spikes per utterance Left Cochlea Right Cochlea

16 Learning Algorithms Statistical Liquid State Machine
SAS (Pick best channels for decision) Least squares (for software demo) Liquid State Machine Take input to high dimensions with spiking net Spike Timing Dependent Plasticity (STDP) Giocomo/Srinjoy Chip Brader/Fusi LSM Spiking Output Vowel 1 Vowel 2

17 Learning Chip Architecture
Cochlea Chip Immediate Cochlea Delayed Cochlea Plastic synapses Nonplastic synapses Excit. Learning Chip Neurons Inhib. Phoneme 1 Phoneme 2 Phoneme 1 Phoneme 2 Binary synaptic weights:  , ,  Relational Network

18 Tone Results Tone recognition Training Testing
Spike input from silicon cochlea Training Two tones Duplicated input Positive and negative examples Testing

19 Phoneme Results Phoneme recognition Training Testing
Spike input from silicon cochlea Training Two phonemes Duplicated inputs Positive and negative examples Testing

20 Behind the Curtain

21 Hardware Overview Phoneme Word Cochlea Learning
PCI-AER (for remapping) Learning Cochlea Learning Giacomo Indiveri Shih-Chii Liu PCI-AER (for remapping) Implemented in MATLAB

22 Infrastructure Difficulties
Remapper Ensuing the problems surrounding AER mapper boards, remapping the AER data from silicon cochlea to the learning chip had to be done in Matlab. (very slow) Power The unpredictable problem caused by the variation in supply voltage as much as 1V. Sharing chips The learning chip had to be shared with two other workgroups. PC replacement

23 Impedance Difficulties
Cochlear firing rates Cochlea: 6M spikes/second 30k channels, 200 spikes/second Silicon Cochlea: 30k spikes/second 30 channels, 1k spike/second Learning Chip: 3k spikes/second 30 channels, 100 spikes/second Dynamic range

24 Desired Results Relational Feedback Without With /A/ Phoneme Patch
/I/ Phoneme Patch AI Word Patch IA Word Patch A A A I Phoneme Input

25 Simulation

26 Simulation 2

27 Simulation 3

28 Great Job! Student Members Ismail Uysal Yoojin Chung
Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor

29

30 Silicon Cochlea Raster plot for two different tone inputs
Mean firing rates for two different vowel inputs Channel Number Time in microseconds Channel Number

31 Word Recognizer Four example raster plot (silence, A_, A_ with relational, AI)

32 Software Simulation

33 Software Simulation

34 Behind the Curtain


Download ppt "Neuro-inspired Speech Recognition"

Similar presentations


Ads by Google