Presentation is loading. Please wait.

Presentation is loading. Please wait.

Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories

Similar presentations


Presentation on theme: "Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories"— Presentation transcript:

1 Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com

2 Motivation  Many applications require processing hundreds of audio streams in real-time  games/simulators, multi-track mixing, etc. ©Eden Games©Steinberg

3 Massive audio processing  Often exceeds available resources  Limited CPU or hardware processing  Bus-traffic  Typically involves  individual processing  mix-down of all signals to outputs  3D audio rendering

4 Perceptual audio rendering  Perceptually-based processing  Many sources and efficient DSP effects  Level of detail rendering  Independent of reproduction system Extended sound sourcesSound reflections sound sources

5 Leveraging limitations of human hearing  A large part of complex sound mixtures is likely to be perceptually irrelevant  e.g., auditory masking  Limitations of spatial hearing  e.g., localization accuracy, ventriloquism

6 masking clustering progressive processing sources listener Perceptual audio rendering components

7 Masking

8 Real-time masking evaluation Remove inaudible sources  Fetch and process only perceptually relevant input  Different from invisible or occluded sound sources Estimate inter-source masking  Build upon perceptual audio coding work  Computing audibility threshold requires knowledge of signal characteristics

9 Signal characteristics Pre-computed for short time-frames (20 ms)  power spectrum  tonality index in [0,1] (1 = tone, 0 = noise) time pre-recorded signal

10 Sort sources by decreasing loudness  Loudness relates to the sensation of sound intensity Efficient run-time loudness evaluation  Retrieve pre-computed power spectrum for each source  Modulate by propagation effects  Convert to loudness using look-up tables [Moore92] Greedy culling algorithm

11 power [dB] listener 1 Candidate sources Current mix Current masking threshold STOP ! 2 3 4 Current masking threshold Current masking threshold Current masking threshold Masking evaluation

12 Clustering

13 Dynamic spatial clustering  Amortize (costly) 3D-audio processing over groups of sources  Leverage limited resolution of spatial hearing  Group neighboring sources together  Compute an “impostor” for the group  Perceptually equivalent but cheaper to render  Unique point source with a complex response (mixture of all source signals in cluster)

14 Dynamic spatial clustering  Limited spatial perception of human hearing [Blauert, Middlebrooks]  Static sound source clustering [Herder99]  non-uniform subdivision of direction space  use Cartesian centroid as representative

15 Group neighboring sources together  Uniform direction constraint  Log(1/distance) constraint  Weight by loudness Hochbaum-Schmoy heuristic [Hochbaum85]  Fast hierarchical implementation Dynamic spatial clustering

16 Mix signals of all sources in the cluster  create a single source with a complex response Rendering clusters

17 Dynamic spatial clustering

18 Culling and masking are transparent  rated 4.4/5 avg. (5 = indistinguishable from reference) Clustering preserves localization cues  74% success avg. (90% within 1 meter of true location)  no significant correlation with number of clusters Pilot validation study

19 Progressive processing

20 Progressive signal processing  A scalable pipeline for filtering and mixing many audio streams  fetch & process only perceptually relevant input  continuously adapt quality vs. speed  remain perceptually transparent  use a “standard” representation of the inputs

21 Progressive signal processing  Uses Fourier-domain coefficients for processing  Degrade both signal quality and spatial cues  Combines processing and audio coding  Uses additional signal descriptors for decision making

22 Progressive processing pipeline N input frames importance Process + Reconstruct Masking Importance sampling 1 output frame

23 Progressive signal processing

24 Progressive processing and sound synthesis  Sound synthesis from physics-driven animation  Modal models  Resonant modes can be synthesized in Fourier domain  numer of Fourier coefficients can be allocated on-the-fly  Balance processing costs for recorded and synthesized sounds at the same time

25 Conclusions  Perceptually motivated techniques for rendering and authoring virtual auditory environments  human listener only process a small amount of information in complex situations  Extend to  more complex auditory processing model  cross-modal perception Efficient and Practical Audio-Visual Rendering for Games using Crossmodal Perception David Grelaud, Nicolas Bonneel, Michael Wimmer, Manuel Asselot, George Drettakis, Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games - 2009  other problems : dynamic range management e.g., HDR audio approach of EA/Dice studio for Battlefield

26 Additional references  www-sop.inria.fr/reves  This work was supported by http://www.inria.fr/reves/OPERA http://www.inria.fr/reves/OPERA  RNTL project OPERA http://www.inria.fr/reves/OPERA http://www.inria.fr/reves/OPERA  EU IST Project CREATE http://www.cs.ucl.ac.uk/create http://www.cs.ucl.ac.uk/create  EU FET OPEN Project CROSSMOD http://www-sop.inria.fr/reves/CrossmodPublic/ http://www-sop.inria.fr/reves/CrossmodPublic/


Download ppt "Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories"

Similar presentations


Ads by Google