1 Introduction to MPEG Surround 韓志岡 2/9/2005
2 Outline Background – Motivation – Perception of sound in space Pricicple of MPEG Surround – Downmixing to one channel – Estimation of spatial cues – Synthesis of spatial cues Conclusions & Reference
3 Motivation The vast majority of audio playback equipment use traditional two-channel presentations (stereo) More reproduction channels ( “ multi-channel audio ” or “ surround sound ” ) is quite visible in the market place A non-disruptive transition from stereo to multi-channel audio requires media formats that can serve both those using conventional stereo equipment and those using next-generation multi-channel equipment.
4 Perception of sound in space HRTF(Head Related Transfer Function) modeling the path of sound from a source to the left and right ear entrances.
5 Perception of sound in space(cont.) Three parameters(cues) describing how human localize sound in the horizontal plane: – Interaural level difference (ILD) – Interaural time difference (ITD) – Interaural coherence (IC)
6 ITD (Interaural time difference) & ILD (Interaural level difference)
7 ITD (Interaural time difference) & ILD (Interaural level difference) (cont.) ITD and ILD between a pair of headphone signals determine the location of the auditory event which appears in the frontal section of the upper head.
8 IC (Interaural coherence) The spatial impression of the auditory enent is related to IC
9 Two sound source: Summing localization Inter-channel time difference (ICTD) Inter-channel level difference (ICLD) Inter-channel coherence (ICC)
10 Two sound source: Summing localization (cont.)
11 MPEG Surround MPEG Surround exploits inter-channel differences in level, phase and coherence equivalent to the ILD, ITD and IC cues to capture the spatial image of a multi-channel audio signal Downmix signal and encodes these cues in a very compact form such that the cues and the transmitted signal can be decoded to synthesize a high quality multi-channel representation. Provide backward compatibility with stereo/mono audio systems.
12 Coding Scheme
13 Downmixing to one channel (1/2) The sum signal is generated by adding the input channels in a subband domain Multiplying the sum with a factor in order to preserve signal power
14 Downmixing to one channel (2/2)
15 Estimation of spatial cues (1/4) The spatial cues, ICTD, ICLD, and ICC are estimated in a subband domain. The spatial cue estimation is applied independently to each subband
16 Estimation of spatial cues(2/4) ICTD (samples): with a short-time estimate of normalized cross- correlation function where and is a short-time estimate of the mean of
17 Estimation of spatial cues(3/4) ICLD (dB): ICC :
18 Estimation of spatial cues(4/4) For multi-channel audio signals, ICTD and ICLD are defined between the reference channel and each other C-1 channels
19 Synthesis of spatial cues(1/3) ICTD are synthesized by imposing delays, ICLD by scaling, and ICC by applying de-correlation filters.
20 Synthesis of spatial cues(2/3) The delays are determined by the ICTDs
21 Synthesis of spatial cues(3/3) The scale factors are determined by the ICLDs satisfying: After delays and scaling, we need to reduce correlation between the subbands. This is achieved by designing the filters h c controlled as a function of ICC.
22 Conclusions (1/2) Well-known perceptual audio coders, such as MP3, primarily exploit a single channel ’ s ability to mask its own quantization noise. In contrast, spatial perception is primarily attributed to three parameters : ILD, ITD, and IC.
23 Conclusions (2/2) MPEG Surround provides an extremely efficient method for coding of multi-channel sound via the transmission of a compressed stereo (or even mono) audio program plus a low-rate side-information channel. MPEG Surround is the latest technology for bitrate efficient and backward compatible presentation of multi-channel audio.
24 Reference ISO/IEC JTC1/SC29/WG11 (MPEG), Document N7390, “ Tutorial on MPEG Surround Audio Coding ”, July 2005, Poznan, Poland C. Faller, “ Parametric coding of spatial audio, ” in Proc. DAFx (Digital Audio Effects), October 2004.