Dealing with Acoustic Noise Part 2: Beamforming Mark Hasegawa-Johnson University of Illinois Lectures at CLSP WS06 July 25, 2006.

Slides:



Advertisements
Similar presentations
Discriminative Training in Speech Processing Filipp Korkmazsky LORIA.
Advertisements

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Microphone Array Post-filter based on Spatially- Correlated Noise Measurements for Distant Speech Recognition Kenichi Kumatani, Disney Research, Pittsburgh.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
1/44 1. ZAHRA NAGHSH JULY 2009 BEAM-FORMING 2/44 2.
Stochastic Differentiation Lecture 3 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO Working Group on Continuous.
SYSTEMS Identification
3/24/2006Lecture notes for Speech Communications Multi-channel speech enhancement Chunjian Li DICOM, Aalborg University.
Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London.
A Multipath Sparse Beamforming Method
Introduction to Bayesian Parameter Estimation
Why is ASR Hard? Natural speech is continuous
Dept. E.E./ESAT-STADIUS, KU Leuven homes.esat.kuleuven.be/~moonen/
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Adaptive Noise Cancellation ANC W/O External Reference Adaptive Line Enhancement.
Review of Probability.
Random Processes and LSI Systems What happedns when a random signal is processed by an LSI system? This is illustrated below, where x(n) and y(n) are random.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Introduction to Automatic Speech Recognition
Sound Localization PART 2 Ali Javed, Josh Manuel, Brunet Breaux, Michael Browning.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Introduction SNR Gain Patterns Beam Steering Shading Resources: Wiki:
Cepstral Vector Normalization based On Stereo Data for Robust Speech Recognition Presenter: Shih-Hsiang Lin Luis Buera, Eduardo Lleida, Antonio Miguel,
Eigenstructure Methods for Noise Covariance Estimation Olawoye Oyeyele AICIP Group Presentation April 29th, 2003.
Random Processes ECE460 Spring, Power Spectral Density Generalities : Example: 2.
Heart Sound Background Noise Removal Haim Appleboim Biomedical Seminar February 2007.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Digital Audio Signal Processing Lecture-2: Microphone Array Processing - Fixed Beamforming - Marc Moonen Dept. E.E./ESAT-STADIUS, KU Leuven
1.Processing of reverberant speech for time delay estimation. Probleme: -> Getting the time Delay of a reverberant speech with severals microphone. ->Getting.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Signal and Noise Models SNIR Maximization Least-Squares Minimization MMSE.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Nico De Clercq Pieter Gijsenbergh.  Problem  Solutions  Single-channel approach  Multichannel approach  Our assignment Overview.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Definitions Random Signal Analysis (Review) Discrete Random Signals Random.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.
Timo Haapsaari Laboratory of Acoustics and Audio Signal Processing April 10, 2007 Two-Way Acoustic Window using Wave Field Synthesis.
Adv DSP Spring-2015 Lecture#9 Optimum Filters (Ch:7) Wiener Filters.
In-car Speech Recognition Using Distributed Microphones Tetsuya Shinde Kazuya Takeda Fumitada Itakura Center for Integrated Acoustic Information Research.
1 Quantization Error Analysis Author: Anil Pothireddy 12/10/ /10/2002.
NCAF Manchester July 2000 Graham Hesketh Information Engineering Group Rolls-Royce Strategic Research Centre.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
SYSTEMS Identification Ali Karimpour Assistant Professor Ferdowsi University of Mashhad Reference: “System Identification Theory For The User” Lennart.
Microphone Array Project ECE5525 – Speech Processing Robert Villmow 12/11/03.
Professors: Eng. Diego Barral Eng. Mariano Llamedo Soria Julian Bruno
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Normal Equations The Orthogonality Principle Solution of the Normal Equations.
Audio-Visual Speech Recognition: Audio Noise, Video Noise, and Pronunciation Variability Mark Hasegawa-Johnson Electrical and Computer Engineering.
Spectrum Sensing In Cognitive Radio Networks
Dongxu Yang, Meng Cao Supervisor: Prabin.  Review of the Beamformer  Realization of the Beamforming Data Independent Beamforming Statistically Optimum.
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
Independent Component Analysis Independent Component Analysis.
1 On the Channel Capacity of Wireless Fading Channels C. D. Charalambous and S. Z. Denic School of Information Technology and Engineering, University of.
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 9 Review.
Chance Constrained Robust Energy Efficiency in Cognitive Radio Networks with Channel Uncertainty Yongjun Xu and Xiaohui Zhao College of Communication Engineering,
Spatial Covariance Models For Under- Determined Reverberant Audio Source Separation N. Duong, E. Vincent and R. Gribonval METISS project team, IRISA/INRIA,
Position Calibration of Audio Sensors and Actuators in a Distributed Computing Platform Vikas C. Raykar | Igor Kozintsev | Rainer Lienhart University of.
Jon Barker, Ricard Marxer, University of Sheffield Emmanuel Vincent, Inria Shinji Watanabe, MERL ASRU 2015, Scottsdale The 3 rd CHIME Speech Separation.
Geology 6600/7600 Signal Analysis 26 Oct 2015 © A.R. Lowry 2015 Last time: Wiener Filtering Digital Wiener Filtering seeks to design a filter h for a linear.
Autonomous Mobile Robots Autonomous Systems Lab Zürich Probabilistic Map Based Localization "Position" Global Map PerceptionMotion Control Cognition Real.
UNIT-IV. Introduction Speech signal is generated from a system. Generation is via excitation of system. Speech travels through various media. Nature of.
PART II: TRANSIENT SUPPRESSION. IntroductionIntroduction Cohen, Gannot and Talmon\11 2 Transient Interference Suppression Transient Interference Suppression.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Speech Enhancement Summer 2009
Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.
Optimum Passive Beamforming in Relation to Active-Passive Data Fusion
Optimum Passive Beamforming in Relation to Active-Passive Data Fusion
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
Probabilistic Map Based Localization
Dealing with Acoustic Noise Part 1: Spectral Estimation
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Dealing with Acoustic Noise Part 2: Beamforming Mark Hasegawa-Johnson University of Illinois Lectures at CLSP WS06 July 25, 2006

AVICAR Recording Hardware 4 Cameras, Glare Shields, Adjustable Mounting Best Place= Dashboard 8 Mics, Pre-amps, Wooden Baffle. Best Place= Sunvisor. System is not permanently installed; mounting requires 10 minutes.

AVICAR Database ● 100 Talkers ● 4 Cameras, 7 Microphones ● 5 noise conditions: Engine idling, 35mph, 35mph with windows open, 55mph, 55mph with windows open ● Three types of utterances: – Digits & Phone numbers, for training and testing phone- number recognizers – Phonetically balanced sentences, for training and testing large vocabulary speech recognition – Isolated letters, to see how video can help with an acoustically hard problem ● Open-IP public release to 15 institutions, 5 countries

Noise and Lombard Effect

Beamforming Configuration Microphone Array Closed Window (Acoustic Reflector) Talker (Automobile Passenger) Open Window (Noise Source)

Frequency-Domain Expression X mk is the measured signal at microphone m in frequency band k Assume that X mk was created by filtering the speech signal S k through the room response filter H mk and adding noise V mk. Beamforming estimates an “inverse filter” W mk.

Time-Domain Approximation

Optimality Criteria for Beamforming ŝ[n] = W T X is the estimated speech signal W is chosen for – Distortionless response: W T HS = S Time Domain: W T H = [1,0,…,0] Freq Domain: W H H = 1 – Minimum variance: W = argmin(W T RW), R = E[VV T ] – Multichannel spectral estimation: E[f(S)|X] = E[f(S)|Ŝ]

Delay-and-Sum Beamformer Suppose that the speech signal at microphone m arrives earlier than at some reference microphone, by n m samples. Then we can estimate S by delaying each microphone n m samples, and adding: …or…

What are the Delays? d θ dsinθ Far-field approximation: inter-microphone delay τ = d sinθ/c Near-field (talker closer than ~10d): formulas exist

Delay-and-Sum Beamformer has Distortionless Response In the noise-free, echo-free case, we get…

Delay-and-Sum Beamformer with Non-Integer Delays

Distortionless Response for Channels with Non-Integer Delays or Echoes So, we need to find any W such that… Here is one way to write the solution: Here is another way to write it: … where B is the “null space” of matrix H, i.e., …

Beam Patterns Seven microphones, spaced 4cm apart Delay-and-Sum Beamformer, Steered to  =0

Minimum Variance Distortionless Response (Frost, 1972) Define an error signal, e[n]: Goal: minimize the error power, subject to the constraint of distortionless response:

Minimum Variance Distortionless Response: The Solution The closed-form solution (for stationary noise): The adaptive solution: adapt W B so that…

Beam Patterns, MVDR MVDR beamformer, tuned to cancel a noise source distributed over 60 < θ < 90 deg. Beam steered to θ = 0

Multi-Channel Spectral Estimation We want to estimate some function f(S), given a multi- channel, noisy, reverberant measurement X: Assume that S and X are jointly Gaussian, thus:

p(X|S) Has Two Factors Where Ŝ is (somehow, by magic) the MVDR beamformed signal: … and the covariance matrices of Ŝ and its orthogonal complement are:

Sufficient Statistics for Multichannel Estimation (Balan and Rosca, SAMSP 2002)

Multi-Channel Estimation: Estimating the Noise Independent Noise Assumption Measured Noise Covariance – Problem: R v may be singular, especially if estimated from a small number of frames Isotropic Noise (Kim and Hasegawa-Johnson, AES 2005) – Isotropic = Coming from every direction with equal power – The form of R v has been solved analytically, and is guaranteed to never be exactly singular – Problem: noise is not really isotropic, e.g., it comes primarily from the window and the radio

Isotropic Noise: Not Independent

Adaptive Filtering for Multi- channel Estimation (Kim, Hasegawa-Johnson, and Sung, ICASSP 2006) δ T (H T H) -1 H T X BTBT WBTWBT + Spectral Estimation Ŝ + -

MVDR with Correct Assumed Channel MVDR eliminates high- frequency noise, MMSE-logSA eliminates low- frequency noise MMSE-logSA adds reverberation at low frequencies; reverberation seems to not effect speech recognition accuracy

MVDR with Incorrect Assumed Channel

Channel Estimation Measured signal x[n] is the sum of noise (v[n]), a “direct sound” (a 0 s[n-n 0 ]), and an infinite number of echoes: The process of adding echoes can be written as convolution:

Channel Estimation If you know enough about R s [m], then the echo times can be estimated from R x [m]:

Channel Estimation For example, if the “speech signal” is actually white noise, then Rx[m] has peaks at every inter-echo-arrival time:

Channel Estimation Unfortunately, R s [m] is usually not sufficiently well known to allow us to infer n i and n j

Channel Estimation Seeded methods, e.g., maximum-length sequence pseudo-noise signals, or chirp signals – Problem: channel response from loudspeaker to microphone not same as from lips to microphone Independent components analysis – Problem: doesn’t work if speech and noise are both nearly Gaussian

Channel Response as a Random Variable: EM Beamforming (Kim, Hasegawa-Johnson, and Sung, in preparation)

WER Results, AVICAR Ten-digit phone numbers; trained and tested with 50/50 mix of quiet (engine idling) and very noisy (55mph, windows open)