WP 2: Acoustic Scene Analysis

WP 2: Acoustic Scene Analysis
Patrick A. Naylor Project Meeting Erlangen, Nov 30th, 2016

Introduction – Task List
T2.1 Acoustic source localization and environment mapping T2.2 Focusing by adaptive robomorphic arrays T2.3 Acoustic source and environment tracking T2.4 Spatial filtering T2.5 Acoustic echo cancellation T2.6 Multichannel noise reduction and interference suppression T2.7 Robust dereverberation for robot audition and interaction

T2.1 – Mapping Achievements:
Aim: Mapping using bearing-only sensors (DOA) for moving microphone array on the robot platform [1] Challenges: Reported and actual robot positions diverge Moving sources, missing source-sensor range Reverberation, missing detections, localization error Achievements: Novel, generalized approach to SLAM [5] Novel specific approach for acoustic SLAM, robust to dominant early reflections and moving sound sources (e.g., human talkers) [2-4,6] Separate demo video will be prepared for the Review Meeting C. Evers, Keynote speech on Bayesian inference and acoustic scene mapping for robot audition, invited for HSCMA 2017. C. Evers, A. H. Moore, P. A. Naylor, “a-SLAM of a Moving Microphone Array and its Surrounding Speakers”, ICASSP 2016 -, “Towards Informative Path Planning for Acoustic Simultaneous Localization of Microphone Arrays and Mapping of Surrounding Sound Sources”, DAGA 2016. -, “Localization of Moving Microphone Arrays from Moving Sound Sources for Robot Audition”, EUSIPCO 2016. C. Evers and P. A. Naylor, “Generalized dynamic scene mapping”, to be submitted to IEEE Tran. Sig. Proc. C. Evers and P. A. Naylor, “Acoustic SLAM”, to be submitted to IEEE Tran. ASLP

T2.1 – Sound Source Localization
Spherical Microphone Arrays (Head) Developed novel extension to Pseudo Intensity Vector (PIV) method which uses signal subspace (SSPIV) to improve robustness to noise and reverberation EUSIPCO-2015 Conference paper with BGU [Moore2015a] IEEE TASLP Paper accepted Parallel work on extending PIVs to use higher order spherical harmonics to improve initial estimate obtained from PIV (Augmented Intensity Vector – AIV). ICASSP-2016 Conference paper [Hafezi2016] Journal article in preparation

T2.2: Focusing by Adaptive Robomorphic Arrays (I)
Aim: Increased attenuation of competing speakers and background noise Minimum Mutual Information (MMI)-based signal extraction [Reindl et al., 2014]: Realization: Generalized Sidelobe Canceller (GSC) structure, uses geometrically-constrained BSS to realize blocking matrix Advantage: Robomorphic array can be used to increase target signal suppression of blocking matrix increased signal extraction performance of GSC Experiments with measured impulse responses: GSC with robomorphic yields increased signal enhancement of up to 3dB compared to GSC with head array Dependent on scenario and array configuration [Reindl et al., 2014]: K. Reindl et al., Minimum mutual information based linearly constrained broadband signal extraction, TASLP, June. 2014

T2.2: Focusing by Adaptive Robomorphic Arrays (II)
Ongoing work: Implementation of MMI-based GSC for real-time processing Separate demo video will be prepared for the Review Meeting Demonstrator highly parallelized on Graphics Processing Unit (GPU) containing the following features: Geometrically-constrained BSS unit as Blocking Matrix (BM) Interference Canceller (IC) realized as frequency domain-based NLMS Fixed delay-and-sum beamformer Joint adaptation control between BM and IC Algorithmic extensions to support arbitrary microphone constellation of the robomorphic array WP2 Acoustic Scene Analysis

T2.3 – Tracking Aim: Create a dynamic map of the surrounding environment with moving sources Challenges: Missing source-sensor range Bottlenecked by localization performance Broadband nature of speech: Frequency-dependent DOAs Achievements: Exploit spatial diversity of robot to infer 3D position from 2D DOAs [1-3] Directly use raw audio data for track-before-detect [4] Multi-detection tracker using DOAs in multiple frequency bins [5] Audio-visual fusion for improved performance [6] C. Evers, J. Sheaffer, A. H. Moore, B. Rafaely, and P. A. Naylor, “Bearing-only Acoustic Tracking of Moving Speakers for Robot Audition”, DSP 2015. Y. Dorfan, C. Evers, S. Gannot, and P. A. Naylor, “Speaker Localization with Moving Microphone Arrays”, EUSIPCO, 2016. C. Evers, Y. Dorfan, S. Gannot, and P. A. Naylor, “Source Tracking using Moving Microphone Arrays for Robot Audition”, submitted to ICASSP 2017. C. Evers, Y. Dorfan, S. Gannot, and P. A. Naylor, “Bayesian Acoustic Track-before-Detect”, in preparation for IEEE Tran. ASLP C. Evers, B. Rafaely, and P. A. Naylor, “Multi-detection Acoustic Tracking”, in preparation for HSCMA 2017. I. D. Gebru, C. Evers, R. Horaud, and P. A. Naylor, TBD, in preparation for HSCMA 2017.

T2.4: Spatial Filtering Aim: Attenuation of competing speakers and background noise Robust HRTF-based polynomial beamformer Extension of robust HRTF-based beamformer to concept of polynomial beamforming Advantage: Flexible steering of beamformer’s main beam Experiments: HRTF-based polynomial beamformer provides good approximation of non-polynomial beamformer [Barfuss et al., 2016] Two-dimensional HRTF-based beamformer design Extension of robust HRTF-based beamformer to two dimensions Advantage: Control of beamformer’s behavior for entire sound field Experiments: Consistent improvement of signal enhancement performance compared to previous (one-dimensional) design (submitted to HSCMA 2017) Implemented in prototype system [Barfuss et al., 2016]: H. Barfuss et al., HRTF-based robust least-squares frequency-invariant polynomial beamforming, IWAENC, Sep. 2016

T2.5: Acoustic Echo Control
Aim: Suppression of acoustic feedback to allow for barge-in Combination of adaptive beamforming and echo cancellation GSC structure with echo and interference canceller in parallel Evaluation for NAO with 4 head microphones revealed improved noise suppression and WER in comparison to fixed beamforming [El-Rayyes et al., 2016] Current work AEC implementation for prototype system Demo video with AEC (at least for the Review Meeting) [El-Rayyes et al., 2016]: A. El-Rayess, H. Löllmann and W. Kellermann: Acoustic Echo Control for Humanoid Robots, DAGA, March 2016, Aachen

T2.6: Multichannel Noise Reduction and Interference Suppression
Aim: Suppression of NAO’s actuator ego-noise Continued work on Phase-optimized Multichannel Dictionary Approach (PO-KSVD) [Deleforge et al., 2015] Fusion of motor data to PO-KSVD [Schmidt et al., 2016] Replacing iterative search in a pre-trained dictionary by classification at once using Support Vector Machines (SVMs) SVMs are completely motor data-driven Results: Speeding up calculation time Improving suppression performance for microphone geometries (i.e., varying head positions) that were not trained [Deleforge et al., 2015]: A. Deleforge and W. Kellermann, Phase-optimized K-SVD for signal extraction from underdetermined multichannel sparse mixtures, IEEE ICASSP, Sept. 2015 [Schmidt et al., 2015]: A. Schmidt, A. Deleforge and W. Kellermann, Ego-Noise Reduction Using a Motor Data-Guided Multichannel Dictionary, IEEE IROS, Oct. 2016

T2.7: Robust Dereverberation for Robot Audition and Interaction
Aim: Attenuation of reverberation (and background noise) Multichannel equalization of beamformed channels [Moore2015] Spatial pre-processing of channel estimates improves the channel diversity leading to improved channel shortening Can be applied to eigenbeams or beams steered towards individual reflections Acoustic rake receiver in SH domain [Javed2016, Javed2016a] Individual beams steered towards direct path and early reflections Beam outputs delayed and coherently summed Pre-echo null design reduces smearing of initial onset More robust than multi-channel equalization and requires only DOAs and TDOAs of early reflections [Moore2015] Moore, Evers and Naylor, “Multichannel equalisation for high-order spherical microphone arrays using beamformed channels,” IEEE Conf. DSP, 2015. [Javed2016] Javed, Moore and Naylor, “Spherical microphone array acoustic rake receivers,” ICASSP, 2016 [Javed2016a] ——, “Spherical harmonic rake receivers for dereverberation,” IWAENC, 2016.

Linear prediction based approach
Baseline results in realistic reverb/noise conditions [Moore2017*] Allows many microphones to be used for improved robustness to noise, without increasing computational cost [Moore2016] Included in demo (as given in Berlin) Coherence-to-Diffuse Power Ratio (CDR)-based single-channel signal enhancement Estimation of CDR from microphone signals [Schwarz, 2015] Wiener filter based on estimated CDR instead of SNR ratio Evaluation of CDR-based signal enhancement at CHiME-3 challenge: 5-10% absolute reduction of word error rates of DNN-based ASR system [Barfuss et al., 2015] [Moore2017*] Moore, Peso Parada & Naylor, “Speech enhancement for robust automatic speech recognition: Evaluation using a baseline system and instrumental measures”, Computer Speech & Language, (accepted) [Moore2016] Moore & Naylor, “Linear prediction based dereverberation for spherical microphone arrays,” IWAENC, 2016 [Schwarz et al., 2015]: A. Schwarz & W. Kellermann, “Coherent-to-Diffuse Power Ratio Estimation for Dereverberation” IEEE TASLP, Apr. 2015 [Barfuss et al., 2015]: H. Barfuss et al., “Robust coherence-based spectral enhancement for distant speech recognition”, CHiME-3 challenge, Dec. 2015

Discussion and Conclusion
WP2 has produced significant, novel and relevant algorithms Strong dissemination of the research in progress throughout Key contributions Implementation and exploitation of robomorphic array Wide range of direction of arrival studies, including on-going work on A-V fusion Simultaneous localization and mapping (and tracking) Four dereverberation approaches including one in the demo Focus of the final period has been Integration for the demos Dissemination

Reference - Objectives
✓ Reference - Objectives ? ✘ ✓ • To develop techniques of acoustic awareness for the robot giving source localisation and tracking in real-world environments, exploiting the anthropomorphic microphone array of WP1. • To exploit mapping techniques adapted to the application scenario and informed by audio and video signals to enable the robot to make sense of what is ‘heard’ around it. The research will include at least acoustic-SLAM and Bayesian estimation techniques. • To investigate the use of adaptive robomorphic array configurations to achieve focussing, such as using microphones on the robot's ‘hands’. • To incorporate techniques for awareness of the dynamics of real-world acoustic scenarios by tracking movement of sound sources given that the robot also moves and the environment of the scenario also continuously changes with time. • To develop spatial filtering technology to give microphone beams for the case of microphones located on a robot exploiting spherical harmonic representations to give full 3D steering including compensations for robot head movements, and exploiting the spatial filtering for sound source separation and enhancement. • To exploit advanced multichannel acoustic echo cancellation novelly adapted to the robot platform to operate in combination with the microphone array sensor and spatial filtering algorithms. • To optimize target signal extraction for speech audition in the robot exploiting advanced signal processing including constrained BSS algorithms in combination with Wiener-type multichannel and spatial filtering using the adaptive robomorphic array. • To research algorithms to remove reverberation from the captured signals in order to enable the robot's speech recognition to operate in real-world reverberant environments such as defined in the application scenario. ✓ ✓ ✓ ✓ ? ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ WP2 Acoustic Scene Analysis

WP 2: Acoustic Scene Analysis

Similar presentations

Presentation on theme: "WP 2: Acoustic Scene Analysis"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

WP 2: Acoustic Scene Analysis

Similar presentations

Presentation on theme: "WP 2: Acoustic Scene Analysis"— Presentation transcript:

Similar presentations

About project

Feedback