WP 2: Acoustic Scene Analysis

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Acoustic Echo Cancellation for Low Cost Applications

Beamforming Issues in Modern MIMO Radars with Doppler

Evaluation of Reconstruction Techniques

Microphone Array Post-filter based on Spatially- Correlated Noise Measurements for Distant Speech Recognition Kenichi Kumatani, Disney Research, Pittsburgh.

Manifold Sparse Beamforming

BYU Auxiliary Antenna Assisted Interference Cancellation for Radio Astronomy Imaging Arrays Brian Jeffs and Karl Warnick August 21, 2002.

G. Valenzise *, L. Gerosa, M. Tagliasacchi *, F. Antonacci *, A. Sarti * IEEE Int. Conf. On Advanced Video and Signal-based Surveillance, 2007 * Dipartimento.

Survey of INTERSPEECH 2013 Reporter: Yi-Ting Wang 2013/09/10.

Volkan Cevher, Marco F. Duarte, and Richard G. Baraniuk European Signal Processing Conference 2008.

3/24/2006Lecture notes for Speech Communications Multi-channel speech enhancement Chunjian Li DICOM, Aalborg University.

Goals of Adaptive Signal Processing Design algorithms that learn from training data Algorithms must have good properties: attain good solutions, simple.

Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London.

HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.

A Multipath Sparse Beamforming Method

HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March Torino.

Dept. E.E./ESAT-STADIUS, KU Leuven homes.esat.kuleuven.be/~moonen/

Adaptive Signal Processing Class Project Adaptive Interacting Multiple Model Technique for Tracking Maneuvering Targets Viji Paul, Sahay Shishir Brijendra,

10/12/2006The University of North Carolina at Chapel Hill1 Sound Localization Using Microphone Arrays Anish Chandak 10/12/2006 COMP.

For 3-G Systems Tara Larzelere EE 497A Semester Project.

„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Introduction SNR Gain Patterns Beam Steering Shading Resources: Wiki:

Exploiting video information for Meeting Structuring ….

Eigenstructure Methods for Noise Covariance Estimation Olawoye Oyeyele AICIP Group Presentation April 29th, 2003.

Dr A VENGADARAJAN, Sc ‘F’, LRDE

Nico De Clercq Pieter Gijsenbergh Noise reduction in hearing aids: Generalised Sidelobe Canceller.

Multiuser Detection (MUD) Combined with array signal processing in current wireless communication environments Wed. 박사 3학기 구 정 회.

Blind Separation of Speech Mixtures Vaninirappuputhenpurayil Gopalan REJU School of Electrical and Electronic Engineering Nanyang Technological University.

Survey of ICASSP 2013 section: feature for robust automatic speech recognition Repoter: Yi-Ting Wang 2013/06/19.

Ekapol Chuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷.

SCALE Speech Communication with Adaptive LEarning Computational Methods for Structured Sparse Component Analysis of Convolutive Speech Mixtures Volkan.

Nico De Clercq Pieter Gijsenbergh.  Problem  Solutions  Single-channel approach  Multichannel approach  Our assignment Overview.

Forward-Scan Sonar Tomographic Reconstruction PHD Filter Multiple Target Tracking Bayesian Multiple Target Tracking in Forward Scan Sonar.

MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.

Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.

Estimation of Number of PARAFAC Components

1 HAP Smart Antenna George White, Zhengyi Xu, Yuriy Zakharov University of York.

Signal Processing Algorithms for Wireless Acoustic Sensor Networks Alexander Bertrand Electrical Engineering Department (ESAT) Katholieke Universiteit.

Project-Final Presentation Blind Dereverberation Algorithm for Speech Signals Based on Multi-channel Linear Prediction Supervisor: Alexander Bertrand Authors:

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Spatial Covariance Models For Under- Determined Reverberant Audio Source Separation N. Duong, E. Vincent and R. Gribonval METISS project team, IRISA/INRIA,

Acoustic source tracking using microphone array R 羅子建 R 林祺豪.

Motorola presents in collaboration with CNEL Introduction  Motivation: The limitation of traditional narrowband transmission channel  Advantage: Phone.

ADAPTIVE SMART ANTENNA Prepared By: Shivangi Jhavar Guided By: Mr.Bharat Patil.

Introduction to Machine Learning, its potential usage in network area,

Speech Enhancement Summer 2009

WP 7: Management Stipulated versus actual work

Heiner Löllmann and Christine Evers

Acoustic mapping technology

Bag-of-Visual-Words Based Feature Extraction

Liverpool Keele Contribution.

Optimum Passive Beamforming in Relation to Active-Passive Data Fusion

Adaptive Beamforming for Target Tracking in Cognitive MIMO Sonar

Orthogonal Subspace Projection - Matched Filter

System integration – current status and future priorities

Dynamical Statistical Shape Priors for Level Set Based Tracking

Presentation of the System

WP 1: Embodied Acoustic Sensing for Real-world Environments

Hardware and system development:

Analysis of Adaptive Array Algorithm Performance for Satellite Interference Cancellation in Radio Astronomy Lisha Li, Brian D. Jeffs, Andrew Poulsen, and.

John H.L. Hansen & Taufiq Al Babba Hasan

A maximum likelihood estimation and training on the fly approach

Ian C. Wong and Brian L. Evans ICASSP 2007 Honolulu, Hawaii

INTRODUCTION TO ADVANCED DIGITAL SIGNAL PROCESSING

Presenter: Shih-Hsiang(士翔)

Overview: Chapter 2 Localization and Tracking

Combination of Feature and Channel Compensation (1/2)

A Multi-Channel Partial-Update Algorithm for

Presentation transcript:

WP 2: Acoustic Scene Analysis Patrick A. Naylor Project Meeting Erlangen, Nov 30th, 2016

Introduction – Task List T2.1 Acoustic source localization and environment mapping T2.2 Focusing by adaptive robomorphic arrays T2.3 Acoustic source and environment tracking T2.4 Spatial filtering T2.5 Acoustic echo cancellation T2.6 Multichannel noise reduction and interference suppression T2.7 Robust dereverberation for robot audition and interaction

T2.1 – Mapping Achievements: Aim: Mapping using bearing-only sensors (DOA) for moving microphone array on the robot platform [1] Challenges: Reported and actual robot positions diverge Moving sources, missing source-sensor range Reverberation, missing detections, localization error Achievements: Novel, generalized approach to SLAM [5] Novel specific approach for acoustic SLAM, robust to dominant early reflections and moving sound sources (e.g., human talkers) [2-4,6] Separate demo video will be prepared for the Review Meeting C. Evers, Keynote speech on Bayesian inference and acoustic scene mapping for robot audition, invited for HSCMA 2017. C. Evers, A. H. Moore, P. A. Naylor, “a-SLAM of a Moving Microphone Array and its Surrounding Speakers”, ICASSP 2016 -, “Towards Informative Path Planning for Acoustic Simultaneous Localization of Microphone Arrays and Mapping of Surrounding Sound Sources”, DAGA 2016. -, “Localization of Moving Microphone Arrays from Moving Sound Sources for Robot Audition”, EUSIPCO 2016. C. Evers and P. A. Naylor, “Generalized dynamic scene mapping”, to be submitted to IEEE Tran. Sig. Proc. C. Evers and P. A. Naylor, “Acoustic SLAM”, to be submitted to IEEE Tran. ASLP

T2.1 – Sound Source Localization Spherical Microphone Arrays (Head) Developed novel extension to Pseudo Intensity Vector (PIV) method which uses signal subspace (SSPIV) to improve robustness to noise and reverberation EUSIPCO-2015 Conference paper with BGU [Moore2015a] IEEE TASLP Paper accepted Parallel work on extending PIVs to use higher order spherical harmonics to improve initial estimate obtained from PIV (Augmented Intensity Vector – AIV). ICASSP-2016 Conference paper [Hafezi2016] Journal article in preparation

T2.2: Focusing by Adaptive Robomorphic Arrays (I) Aim: Increased attenuation of competing speakers and background noise Minimum Mutual Information (MMI)-based signal extraction [Reindl et al., 2014]: Realization: Generalized Sidelobe Canceller (GSC) structure, uses geometrically-constrained BSS to realize blocking matrix Advantage: Robomorphic array can be used to increase target signal suppression of blocking matrix increased signal extraction performance of GSC Experiments with measured impulse responses: GSC with robomorphic yields increased signal enhancement of up to 3dB compared to GSC with head array Dependent on scenario and array configuration [Reindl et al., 2014]: K. Reindl et al., Minimum mutual information based linearly constrained broadband signal extraction, TASLP, June. 2014

T2.2: Focusing by Adaptive Robomorphic Arrays (II) Ongoing work: Implementation of MMI-based GSC for real-time processing Separate demo video will be prepared for the Review Meeting Demonstrator highly parallelized on Graphics Processing Unit (GPU) containing the following features: Geometrically-constrained BSS unit as Blocking Matrix (BM) Interference Canceller (IC) realized as frequency domain-based NLMS Fixed delay-and-sum beamformer Joint adaptation control between BM and IC Algorithmic extensions to support arbitrary microphone constellation of the robomorphic array WP2 Acoustic Scene Analysis

T2.3 – Tracking Aim: Create a dynamic map of the surrounding environment with moving sources Challenges: Missing source-sensor range Bottlenecked by localization performance Broadband nature of speech: Frequency-dependent DOAs Achievements: Exploit spatial diversity of robot to infer 3D position from 2D DOAs [1-3] Directly use raw audio data for track-before-detect [4] Multi-detection tracker using DOAs in multiple frequency bins [5] Audio-visual fusion for improved performance [6] C. Evers, J. Sheaffer, A. H. Moore, B. Rafaely, and P. A. Naylor, “Bearing-only Acoustic Tracking of Moving Speakers for Robot Audition”, DSP 2015. Y. Dorfan, C. Evers, S. Gannot, and P. A. Naylor, “Speaker Localization with Moving Microphone Arrays”, EUSIPCO, 2016. C. Evers, Y. Dorfan, S. Gannot, and P. A. Naylor, “Source Tracking using Moving Microphone Arrays for Robot Audition”, submitted to ICASSP 2017. C. Evers, Y. Dorfan, S. Gannot, and P. A. Naylor, “Bayesian Acoustic Track-before-Detect”, in preparation for IEEE Tran. ASLP C. Evers, B. Rafaely, and P. A. Naylor, “Multi-detection Acoustic Tracking”, in preparation for HSCMA 2017. I. D. Gebru, C. Evers, R. Horaud, and P. A. Naylor, TBD, in preparation for HSCMA 2017.

T2.4: Spatial Filtering Aim: Attenuation of competing speakers and background noise Robust HRTF-based polynomial beamformer Extension of robust HRTF-based beamformer to concept of polynomial beamforming Advantage: Flexible steering of beamformer’s main beam Experiments: HRTF-based polynomial beamformer provides good approximation of non-polynomial beamformer [Barfuss et al., 2016] Two-dimensional HRTF-based beamformer design Extension of robust HRTF-based beamformer to two dimensions Advantage: Control of beamformer’s behavior for entire sound field Experiments: Consistent improvement of signal enhancement performance compared to previous (one-dimensional) design (submitted to HSCMA 2017) Implemented in prototype system [Barfuss et al., 2016]: H. Barfuss et al., HRTF-based robust least-squares frequency-invariant polynomial beamforming, IWAENC, Sep. 2016

T2.5: Acoustic Echo Control Aim: Suppression of acoustic feedback to allow for barge-in Combination of adaptive beamforming and echo cancellation GSC structure with echo and interference canceller in parallel Evaluation for NAO with 4 head microphones revealed improved noise suppression and WER in comparison to fixed beamforming [El-Rayyes et al., 2016] Current work AEC implementation for prototype system Demo video with AEC (at least for the Review Meeting) [El-Rayyes et al., 2016]: A. El-Rayess, H. Löllmann and W. Kellermann: Acoustic Echo Control for Humanoid Robots, DAGA, March 2016, Aachen

T2.6: Multichannel Noise Reduction and Interference Suppression Aim: Suppression of NAO’s actuator ego-noise Continued work on Phase-optimized Multichannel Dictionary Approach (PO-KSVD) [Deleforge et al., 2015] Fusion of motor data to PO-KSVD [Schmidt et al., 2016] Replacing iterative search in a pre-trained dictionary by classification at once using Support Vector Machines (SVMs) SVMs are completely motor data-driven Results: Speeding up calculation time Improving suppression performance for microphone geometries (i.e., varying head positions) that were not trained [Deleforge et al., 2015]: A. Deleforge and W. Kellermann, Phase-optimized K-SVD for signal extraction from underdetermined multichannel sparse mixtures, IEEE ICASSP, Sept. 2015 [Schmidt et al., 2015]: A. Schmidt, A. Deleforge and W. Kellermann, Ego-Noise Reduction Using a Motor Data-Guided Multichannel Dictionary, IEEE IROS, Oct. 2016

T2.7: Robust Dereverberation for Robot Audition and Interaction Aim: Attenuation of reverberation (and background noise) Multichannel equalization of beamformed channels [Moore2015] Spatial pre-processing of channel estimates improves the channel diversity leading to improved channel shortening Can be applied to eigenbeams or beams steered towards individual reflections Acoustic rake receiver in SH domain [Javed2016, Javed2016a] Individual beams steered towards direct path and early reflections Beam outputs delayed and coherently summed Pre-echo null design reduces smearing of initial onset More robust than multi-channel equalization and requires only DOAs and TDOAs of early reflections [Moore2015] Moore, Evers and Naylor, “Multichannel equalisation for high-order spherical microphone arrays using beamformed channels,” IEEE Conf. DSP, 2015. [Javed2016] Javed, Moore and Naylor, “Spherical microphone array acoustic rake receivers,” ICASSP, 2016 [Javed2016a] ——, “Spherical harmonic rake receivers for dereverberation,” IWAENC, 2016.

Linear prediction based approach Baseline results in realistic reverb/noise conditions [Moore2017*] Allows many microphones to be used for improved robustness to noise, without increasing computational cost [Moore2016] Included in demo (as given in Berlin) Coherence-to-Diffuse Power Ratio (CDR)-based single-channel signal enhancement Estimation of CDR from microphone signals [Schwarz, 2015] Wiener filter based on estimated CDR instead of SNR ratio Evaluation of CDR-based signal enhancement at CHiME-3 challenge: 5-10% absolute reduction of word error rates of DNN-based ASR system [Barfuss et al., 2015] [Moore2017*] Moore, Peso Parada & Naylor, “Speech enhancement for robust automatic speech recognition: Evaluation using a baseline system and instrumental measures”, Computer Speech & Language, (accepted) [Moore2016] Moore & Naylor, “Linear prediction based dereverberation for spherical microphone arrays,” IWAENC, 2016 [Schwarz et al., 2015]: A. Schwarz & W. Kellermann, “Coherent-to-Diffuse Power Ratio Estimation for Dereverberation” IEEE TASLP, Apr. 2015 [Barfuss et al., 2015]: H. Barfuss et al., “Robust coherence-based spectral enhancement for distant speech recognition”, CHiME-3 challenge, Dec. 2015

Discussion and Conclusion WP2 has produced significant, novel and relevant algorithms Strong dissemination of the research in progress throughout Key contributions Implementation and exploitation of robomorphic array Wide range of direction of arrival studies, including on-going work on A-V fusion Simultaneous localization and mapping (and tracking) Four dereverberation approaches including one in the demo Focus of the final period has been Integration for the demos Dissemination

Reference - Objectives ✓ Reference - Objectives ? ✘ ✓ • To develop techniques of acoustic awareness for the robot giving source localisation and tracking in real-world environments, exploiting the anthropomorphic microphone array of WP1. • To exploit mapping techniques adapted to the application scenario and informed by audio and video signals to enable the robot to make sense of what is ‘heard’ around it. The research will include at least acoustic-SLAM and Bayesian estimation techniques. • To investigate the use of adaptive robomorphic array configurations to achieve focussing, such as using microphones on the robot's ‘hands’. • To incorporate techniques for awareness of the dynamics of real-world acoustic scenarios by tracking movement of sound sources given that the robot also moves and the environment of the scenario also continuously changes with time. • To develop spatial filtering technology to give microphone beams for the case of microphones located on a robot exploiting spherical harmonic representations to give full 3D steering including compensations for robot head movements, and exploiting the spatial filtering for sound source separation and enhancement. • To exploit advanced multichannel acoustic echo cancellation novelly adapted to the robot platform to operate in combination with the microphone array sensor and spatial filtering algorithms. • To optimize target signal extraction for speech audition in the robot exploiting advanced signal processing including constrained BSS algorithms in combination with Wiener-type multichannel and spatial filtering using the adaptive robomorphic array. • To research algorithms to remove reverberation from the captured signals in order to enable the robot's speech recognition to operate in real-world reverberant environments such as defined in the application scenario. ✓ ✓ ✓ ✓ ? ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ WP2 Acoustic Scene Analysis