The Fully Networked Car Geneva, 4-5 March Automotive Speech Enhancement of Today: Applications, Challenges and Solutions Tim Haulick Harman/Becker Automotive Systems
The Fully Networked Car Geneva, 4-5 March Automotive Speech Enhancement Applications Phone Speech dialog system Send side processing: beamforming, echo and noise suppression Receive side processing: gain control In-car communication: feedback suppression, automatic gain adjustment Communication channel to vehicle Communication channel from vehicle Communication channel in vehicle
The Fully Networked Car Geneva, 4-5 March Extended Speech Enhancement Mobile Phone Speech dialog system Send side processing: echo & noise suppression, wind-noise suppression speech reconstruction, beamforming/postfilter Receive side processing: gain control, bandwidth extension, adaptive equalization Speaker independent speech knowledge Speaker (in-)dependent speech knowledge
The Fully Networked Car Geneva, 4-5 March Bandwidth extension for wideband speech signals (bandwidth 7 kHz, e.g. AMR wideband codec G.722.2) – extension of high frequency components up to 11kHz. Narrowband connection: Wideband connection: Bandwidth extension for narrowband speech signals (bandwidth 3.4 …3.8 kHz) – extension of low frequency components and extension of high frequency components up to 5.5 or 8 kHz. Wideband input Wideband output Narrowband output Narrowband input Bandwidth Extension – Examples
The Fully Networked Car Geneva, 4-5 March Bandwidth Extension - Enhancements o Adaptation to different transmission channels (variable bandwidth, different signal-to-noise ratios) o Adaptation to different sound amplifiers o Adaptation depending on the background noise in the vehicle
The Fully Networked Car Geneva, 4-5 March Speech Reconstruction Motivation: At medium and high driving speed low frequency speech components are often masked by the background noise. However, standard noise suppression methods of today often fail in these situations. As a result the processed speech signal sounds thin and distorted. For further improvement of the speech quality a reconstruction approach is an alternative. However, speech recon- struction starts where conventional noise reduction fails …
The Fully Networked Car Geneva, 4-5 March Phone (downlink) Phone (uplink) Loud- speaker Micro- phone Receive side processing Speaker (in-)dependent speech knowledge Analysis filter bank Residual echo and noise suppression Mixer Echo cancellation Synthesis filter bank Speech reconstruction Speech Reconstruction – Algorithmic Overview
The Fully Networked Car Geneva, 4-5 March Microphone signal Conven- tional Recon- structed Time in seconds Frequency in Hz Time in seconds Before CDMA coding Time in seconds Frequency in Hz Time in seconds Frequency in Hz Microphone signal Conven- tional Recon- structed Conven- tional Recon- structed Conven- tional Recon- structed After CDMA coding 120 km/h 160 km/h Speech Reconstruction – Audio Examples
The Fully Networked Car Geneva, 4-5 March Speech Reconstruction – Evaluation (CDMA) o A subjective evaluation was performed by 10 trained listeners. The signals have been coded and decoded with the CDMA enhanced variable rate codec (EVRC) prior to listening. o The test set comprised 20 different test scenarios with SNRs ranging from 3 dB to 12 dB.
The Fully Networked Car Geneva, 4-5 March Beamformer/Postfilter o Beamformers perform a directional filtering: signals from the desired speaker direction are passed while signals arriving from other directions are attenuated. o The achievable noise reduction is dependent on the spatio-temporal properties of the soundfield and the number of microphones. o By extending the beamformer with a spatial postfilter a high directionality can even be achieved with a small number of microphones. Spatial postfilter Ratio computat. MAP approxi- mation Fixed beamf. Micro- phone spectra Blocking matrix Interference canceller Noise power estimation Output spectrum |…|² GSC BeamformerPostfilter
The Fully Networked Car Geneva, 4-5 March © 2008 Harman International Industries, Incorporated. All rights reserved. Page 11 Signal of the first microphone Beamformer/Postfilter – Audio Example 2-channel processing (beamformer/postfilter) Audio example: 2-channel microphone array, 120 km/h, driver and passenger are talking
The Fully Networked Car Geneva, 4-5 March By extending the adaptive beamformer with a spatial postfilter the voice recognition performance can be improved significantly without impairing the speech quality. Reference system: 2-channel adaptive beamformer (GSC) without postfilter Beamformer/Postfilter – Evaluation Results 120 km/h100 km/h window 1/4 open 120 km/h double-talk relative enhancement of WER [%] Voice Recognition Performance 120 km/h100 km/h window 1/4 open 120 km/h double-talk Log-Spectral Distance Speech Distortion Adaptive Beamformer with Postfilter Adaptive Beamformer
The Fully Networked Car Geneva, 4-5 March Wind Noise Suppression o Problem: Due to design reasons and lack of space the standard wind shield of hands-free microphones is often insufficient. For this reason the microphone signal is often impaired by wind noise caused by the fan or an open top of a convertible. o Solution: Suppression of wind noise by algorithmic means taking advantage of the statistical properties of the noise Microphone Signal 2 channel processed output signal
The Fully Networked Car Geneva, 4-5 March In-Car Communication (ICC) Current Situation: o Communication between passengers is difficult, because of the acoustic loss (especially front to back). o Front passengers have to speak louder than normal – longer conversations will be tiring. o Driver turns around – road safety is reduced. Solution: o Improve the speech quality and intelligibility by means of an intercom system. Application: o Mid and high class automobiles, which are already equipped with the necessary audio and signal processing components. o Minibuses, Vans, etc. (cars more than 2 rows of seats). Passenger compartment *Acoustic loss (referred to the ear of the driver) -5…-15dB*
The Fully Networked Car Geneva, 4-5 March Configurations One-Way System o 2-4 microphones o 2-4 loudspeakers Two-Way System o 4-8 microphones o 6-8 loudspeakers ICC System ICC System
The Fully Networked Car Geneva, 4-5 March Subjective Evaluation Driving Scenarios o 0 km/h beside motorway o 130 km/h on motorway o Prerecorded speech sentences with different Lombard levels were played back via an artificial mouth. o Binaural recordings were made by means of a HEAD acoustics NoiseBook on the seat behind the driver.
The Fully Networked Car Geneva, 4-5 March Results of the Comparison Mean Opinion Score Test 0 km/h, vehicle parked close to a motorway: o 19.7 % prefer the system to be switched off o 29.7 % have no preference o 50.6 % prefer an activated system 130 km/h, motorway: o 4.3 % prefer the system to be switched off o 7.1 % have no preference o 88.6 % prefer an activated system (25 signal pairs for each driving situation / 15 listeners per scenario):
The Fully Networked Car Geneva, 4-5 March Results of Modified Rhyme Tests (MRT) 0 km/h, vehicle parked close to a motorway: o No significant difference (95.2 % system off versus 95.0 % system on) o Due to the automatic gain adjustment the intercom system operates with only very small gain at these noise levels 130 km/h, motorway: o Significant improvement of the MRT error rate o Nearly 50 % error reduction (85.4 % correct answers increased to 92.2 % correct answers) (48 utterances were presented to each listener per driving situation):