Download presentation
Presentation is loading. Please wait.
1
Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com
2
Motivation Many applications require processing hundreds of audio streams in real-time games/simulators, multi-track mixing, etc. ©Eden Games©Steinberg
3
Massive audio processing Often exceeds available resources Limited CPU or hardware processing Bus-traffic Typically involves individual processing mix-down of all signals to outputs 3D audio rendering
4
Perceptual audio rendering Perceptually-based processing Many sources and efficient DSP effects Level of detail rendering Independent of reproduction system Extended sound sourcesSound reflections sound sources
5
Leveraging limitations of human hearing A large part of complex sound mixtures is likely to be perceptually irrelevant e.g., auditory masking Limitations of spatial hearing e.g., localization accuracy, ventriloquism
6
masking clustering progressive processing sources listener Perceptual audio rendering components
7
Masking
8
Real-time masking evaluation Remove inaudible sources Fetch and process only perceptually relevant input Different from invisible or occluded sound sources Estimate inter-source masking Build upon perceptual audio coding work Computing audibility threshold requires knowledge of signal characteristics
9
Signal characteristics Pre-computed for short time-frames (20 ms) power spectrum tonality index in [0,1] (1 = tone, 0 = noise) time pre-recorded signal
10
Sort sources by decreasing loudness Loudness relates to the sensation of sound intensity Efficient run-time loudness evaluation Retrieve pre-computed power spectrum for each source Modulate by propagation effects Convert to loudness using look-up tables [Moore92] Greedy culling algorithm
11
power [dB] listener 1 Candidate sources Current mix Current masking threshold STOP ! 2 3 4 Current masking threshold Current masking threshold Current masking threshold Masking evaluation
12
Clustering
13
Dynamic spatial clustering Amortize (costly) 3D-audio processing over groups of sources Leverage limited resolution of spatial hearing Group neighboring sources together Compute an “impostor” for the group Perceptually equivalent but cheaper to render Unique point source with a complex response (mixture of all source signals in cluster)
14
Dynamic spatial clustering Limited spatial perception of human hearing [Blauert, Middlebrooks] Static sound source clustering [Herder99] non-uniform subdivision of direction space use Cartesian centroid as representative
15
Group neighboring sources together Uniform direction constraint Log(1/distance) constraint Weight by loudness Hochbaum-Schmoy heuristic [Hochbaum85] Fast hierarchical implementation Dynamic spatial clustering
16
Mix signals of all sources in the cluster create a single source with a complex response Rendering clusters
17
Dynamic spatial clustering
18
Culling and masking are transparent rated 4.4/5 avg. (5 = indistinguishable from reference) Clustering preserves localization cues 74% success avg. (90% within 1 meter of true location) no significant correlation with number of clusters Pilot validation study
19
Progressive processing
20
Progressive signal processing A scalable pipeline for filtering and mixing many audio streams fetch & process only perceptually relevant input continuously adapt quality vs. speed remain perceptually transparent use a “standard” representation of the inputs
21
Progressive signal processing Uses Fourier-domain coefficients for processing Degrade both signal quality and spatial cues Combines processing and audio coding Uses additional signal descriptors for decision making
22
Progressive processing pipeline N input frames importance Process + Reconstruct Masking Importance sampling 1 output frame
23
Progressive signal processing
24
Progressive processing and sound synthesis Sound synthesis from physics-driven animation Modal models Resonant modes can be synthesized in Fourier domain numer of Fourier coefficients can be allocated on-the-fly Balance processing costs for recorded and synthesized sounds at the same time
25
Conclusions Perceptually motivated techniques for rendering and authoring virtual auditory environments human listener only process a small amount of information in complex situations Extend to more complex auditory processing model cross-modal perception Efficient and Practical Audio-Visual Rendering for Games using Crossmodal Perception David Grelaud, Nicolas Bonneel, Michael Wimmer, Manuel Asselot, George Drettakis, Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games - 2009 other problems : dynamic range management e.g., HDR audio approach of EA/Dice studio for Battlefield
26
Additional references www-sop.inria.fr/reves This work was supported by http://www.inria.fr/reves/OPERA http://www.inria.fr/reves/OPERA RNTL project OPERA http://www.inria.fr/reves/OPERA http://www.inria.fr/reves/OPERA EU IST Project CREATE http://www.cs.ucl.ac.uk/create http://www.cs.ucl.ac.uk/create EU FET OPEN Project CROSSMOD http://www-sop.inria.fr/reves/CrossmodPublic/ http://www-sop.inria.fr/reves/CrossmodPublic/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.