Neuro-inspired Speech Recognition

Slides:

Advertisements

Similar presentations

Audio Workgroup Neuro-inspired Speech Recognition.

Advertisements

Audio Workgroup Neuro-inspired Speech Recognition.

Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson.

Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.

An AER Analog Silicon Cochlea Model using Pseudo Floating Gate Transconductors Master Thesis in Electronics and Computer Science, Microelectronics Programme,

Figures for Chapter 7 Advanced signal processing Dillon (2001) Hearing Aids.

Purpose The aim of this project was to investigate receptive fields on a neural network to compare a computational model to the actual cortical-level auditory.

MIMICKING THE HUMAN EAR Philipos Loizou (author) Oliver Johnson (me)

The Auditory Nervous System Classical Ascending Pathway.

WIMS Capstone Proposal DSP Demo Abigail Fuentes Rivera Esteban Valentin Lugo Michael Ortiz Sanchez ICOM 5047 Prof Nayda Santiago.

cells in cochlear nucleus

Neural NetworksNN 11 Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22

To Understand, Survey and Implement Neurodynamic Models By Farhan Tauheed Asif Tasleem.

Structure and function

Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter.

Welcome To The Odditory System! Harry I. Haircell: Official Cochlea Mascot K+K+ AIR FLUID amplification.

SOMTIME: AN ARTIFICIAL NEURAL NETWORK FOR TOPOLOGICAL AND TEMPORAL CORRELATION FOR SPATIOTEMPORAL PATTERN LEARNING.

Radial-Basis Function Networks

Sarah Middleton Supervised by: Anton van Wyk, Jacques Cilliers, Pascale Jardin and Florence Nadal 3 December 2010.

Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.

Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.

Introduction to Neural Networks. Neural Networks in the Brain Human brain “computes” in an entirely different way from conventional digital computers.

IE 585 Introduction to Neural Networks. 2 Modeling Continuum Unarticulated Wisdom Articulated Qualitative Models Theoretic (First Principles) Models Empirical.

Senior Design Fall 06 and Spring 07 Speech Strategy for the Cochlear Implant.

Methods Neural network Neural networks mimic biological processing by joining layers of artificial neurons in a meaningful way. The neural network employed.

ECE 488 Computer Engineering Design I Fall 2005 Hau Ngo Ming Zhang.

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.

Pencil-and-Paper Neural Networks Prof. Kevin Crisp St. Olaf College.

Gammachirp Auditory Filter

A NEW FEATURE EXTRACTION MOTIVATED BY HUMAN EAR Amin Fazel Sharif University of Technology Hossein Sameti, S. K. Ghiathi February 2005.

GUIDED BY T.JAYASANKAR, ASST.PROFESSOR OF ECE, ANNA UNIVERSITY OF TIRUCHIRAPPALLI. PRESENTED BY C.SENTHILKUMAR, REG.NO: , M.E(MBCBS),COM SYSTEM,VI.

A Model of Binaural Processing Based on Tree-Structure Filter-Bank

Pitch Perception Or, what happens to the sound from the air outside your head to your brain….

Kirchhoff Institute for Physics Johannes Schemmel Ruprecht-Karls-Universität Heidelberg 1 Accelerated Neuromorphic Hardware : Hybrid Plasticity - The Next.

Computational Modeling of the Auditory Periphery:

Center for Wireless Integrated MicroSystems (WIMS) Code for Cochlear Implant with Low-Power Constraints Luis Calderón, Axel Claudio, Jomayra Marrero, Guillermo.

7/6/99 MITE1 Fully Parallel Learning Neural Network Chip for Real-time Control Students: (Dr. Jin Liu), Borte Terlemez Advisor: Dr. Martin Brooke.

Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....

CS 351/ IT 351 Modeling and Simulation Technologies HPC Architectures Dr. Jim Holten.

CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.

Neural Networks. Molecules Levels of Information Processing in the Nervous System 0.01  m Synapses 1m1m Neurons 100  m Local Networks 1mm Areas /

Energy, Stereoscopic Depth, and Correlations. Molecules Levels of Information Processing in the Nervous System 0.01  m Synapses 1m1m Neurons 100 

Hallucinations in Auditory Perception!!! Malcolm Slaney Yahoo! Research Stanford CCRMA.

Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.

Neural Networks Si Wu Dept. of Informatics PEV III 5c7 Spring 2008.

FPGA Cochlea Andre Van Schaik & Chetan Singh Thakur.

Ghent University Compact hardware for real-time speech recognition using a Liquid State Machine Benjamin Schrauwen – Michiel D’Haene David Verstraeten.

Final Year Project Eoin Culhane. MIDI Guitar Guitar with 6 outputs 1 output for each string Each individual string output will be converted to MIDI.

Motorola presents in collaboration with CNEL Introduction  Motivation: The limitation of traditional narrowband transmission channel  Advantage: Phone.

Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:

Structures of the Ear Eustachian tube “Popping” ears Outer, middle, & inner ear.

語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山助教: 熊信寬

March 31, 2016Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I 1 … let us move on to… Artificial Neural Networks.

An Introduction To The Backpropagation Algorithm.

Artificial Intelligence for Speech Recognition

Cognitive Computing…. Computational Neuroscience

Presenter: Prof.Dr.-Eng. Gheorghe-Daniel Andreescu

3) determine motion and sound perceptions.

Presenter: Artur M. KUCZAPSKI

The Brain as an Efficient and Robust Adaptive Learner

Demonstration of STDP based Neural Networks on an FPGA

Emre O. Neftci iScience Volume 5, Pages (July 2018) DOI: /j.isci

The Brain as an Efficient and Robust Adaptive Learner

Biological Based Networks

Week 13: Neurobiology of Hearing Part 2

VoiceXML An investigation Author: Mya Anderson

Presentation transcript:

Neuro-inspired Speech Recognition Group Members Ismail Uysal Yoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson Shihab Shamma Hynek Hermanski Shih-Chii Liu Giacomo Indiveri Malcolm Slaney

Audio Projects Speech Recognition More ASR Localization

Shihab is Running See http://www.hardrock100.com/index.asp Shihab arriving in Telluride in 2004 (should happen around 4PM today)

Localization Effort Microphones Speaker ITD estimation from pure tones Interaural Time Difference (ITD) Estimated from time difference between spikes of two matching channels. Interaural Intensity Difference (IID) Difference of spike counts between two cochleae. Azimuth: Combination of ITD and IID Microphones Azimuth estimation from music Speaker

Localization Effort

FPAA/Mote – Word Recognition

FPAA/Mote – Word Recognition Robosapien—listens to the spoken commands…. Field Programmable Analog Array (FPAA)—based analog cochlea (non-spiking) with envelope detection. MOTE—based pattern matching using matched filtering with “receptive fields”

FPAA/Mote – Word Recognition Status: FPAA – (we are using a new FPAA) 2nd-order sections synthesized but a full auditory filter bank is not yet up. MOTE – real-time communication with Matlab and sampling operational.

Relational Network (Simple) Patches of neurons Each measure one quantity Bidirectional relations for feedback/feedforward Thanks to Rodney Douglas X Y Z M m

Relational Network (example) Relational specification Input here Relational feedback Relational Feedback

ASR Relational Network Bidirectional links enforce phoneme/word constraints Phone Recognizer Cochlea Word Recognizer Phone Recognizer Delay A patch of neurons (one of N output) Note: We don’t know how to represent delays

Relational Advantages Not an HMM HMMs are great, but… Incorporate other knowledge Bottom-up perception Top-down word hypothesis Hallucinate Based on experience Hear “ba..” and know that Bad, bat, bar, bass, band follow >

Silicon Cochlea (van Schaik, Liu, 2004) high frequency low frequency Basilar membrane high frequency low frequency Inner hair cells BASILAR MEMBRANE INNER HAIR CELLS Ganglion cells GANGLION CELLS (van Schaik, Liu, 2004)

Silicon Frequency Response Tone ramps into two cochleas

Cochlear Rate Profiles Spikes per utterance Left Cochlea Right Cochlea

Learning Algorithms Statistical Liquid State Machine SAS (Pick best channels for decision) Least squares (for software demo) Liquid State Machine Take input to high dimensions with spiking net Spike Timing Dependent Plasticity (STDP) Giocomo/Srinjoy Chip Brader/Fusi LSM Spiking Output Vowel 1 Vowel 2

Learning Chip Architecture Cochlea Chip Immediate Cochlea Delayed Cochlea Plastic synapses Nonplastic synapses Excit. Learning Chip Neurons Inhib. Phoneme 1 Phoneme 2 Phoneme 1 Phoneme 2 Binary synaptic weights:  , ,  Relational Network

Tone Results Tone recognition Training Testing Spike input from silicon cochlea Training Two tones Duplicated input Positive and negative examples Testing

Phoneme Results Phoneme recognition Training Testing Spike input from silicon cochlea Training Two phonemes Duplicated inputs Positive and negative examples Testing

Behind the Curtain

Hardware Overview Phoneme Word Cochlea Learning PCI-AER (for remapping) Learning Cochlea Learning Giacomo Indiveri Shih-Chii Liu PCI-AER (for remapping) Implemented in MATLAB

Infrastructure Difficulties Remapper Ensuing the problems surrounding AER mapper boards, remapping the AER data from silicon cochlea to the learning chip had to be done in Matlab. (very slow) Power The unpredictable problem caused by the variation in supply voltage as much as 1V. Sharing chips The learning chip had to be shared with two other workgroups. PC replacement

Impedance Difficulties Cochlear firing rates Cochlea: 6M spikes/second 30k channels, 200 spikes/second Silicon Cochlea: 30k spikes/second 30 channels, 1k spike/second Learning Chip: 3k spikes/second 30 channels, 100 spikes/second Dynamic range

Desired Results Relational Feedback Without With /A/ Phoneme Patch /I/ Phoneme Patch AI Word Patch IA Word Patch A A A I Phoneme Input

Simulation

Simulation 2

Simulation 3

Great Job! Student Members Ismail Uysal Yoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor

Silicon Cochlea Raster plot for two different tone inputs Mean firing rates for two different vowel inputs Channel Number Time in microseconds Channel Number

Word Recognizer Four example raster plot (silence, A_, A_ with relational, AI)

Software Simulation

Software Simulation

Behind the Curtain