Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories

Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com

Motivation  Many applications require processing hundreds of audio streams in real-time  games/simulators, multi-track mixing, etc. ©Eden Games©Steinberg

Massive audio processing  Often exceeds available resources  Limited CPU or hardware processing  Bus-traffic  Typically involves  individual processing  mix-down of all signals to outputs  3D audio rendering

Perceptual audio rendering  Perceptually-based processing  Many sources and efficient DSP effects  Level of detail rendering  Independent of reproduction system Extended sound sourcesSound reflections sound sources

Leveraging limitations of human hearing  A large part of complex sound mixtures is likely to be perceptually irrelevant  e.g., auditory masking  Limitations of spatial hearing  e.g., localization accuracy, ventriloquism

masking clustering progressive processing sources listener Perceptual audio rendering components

Masking

Real-time masking evaluation Remove inaudible sources  Fetch and process only perceptually relevant input  Different from invisible or occluded sound sources Estimate inter-source masking  Build upon perceptual audio coding work  Computing audibility threshold requires knowledge of signal characteristics

Signal characteristics Pre-computed for short time-frames (20 ms)  power spectrum  tonality index in [0,1] (1 = tone, 0 = noise) time pre-recorded signal

Sort sources by decreasing loudness  Loudness relates to the sensation of sound intensity Efficient run-time loudness evaluation  Retrieve pre-computed power spectrum for each source  Modulate by propagation effects  Convert to loudness using look-up tables [Moore92] Greedy culling algorithm

power [dB] listener 1 Candidate sources Current mix Current masking threshold STOP ! 2 3 4 Current masking threshold Current masking threshold Current masking threshold Masking evaluation

Clustering

Dynamic spatial clustering  Amortize (costly) 3D-audio processing over groups of sources  Leverage limited resolution of spatial hearing  Group neighboring sources together  Compute an “impostor” for the group  Perceptually equivalent but cheaper to render  Unique point source with a complex response (mixture of all source signals in cluster)

Dynamic spatial clustering  Limited spatial perception of human hearing [Blauert, Middlebrooks]  Static sound source clustering [Herder99]  non-uniform subdivision of direction space  use Cartesian centroid as representative

Group neighboring sources together  Uniform direction constraint  Log(1/distance) constraint  Weight by loudness Hochbaum-Schmoy heuristic [Hochbaum85]  Fast hierarchical implementation Dynamic spatial clustering

Mix signals of all sources in the cluster  create a single source with a complex response Rendering clusters

Dynamic spatial clustering

Culling and masking are transparent  rated 4.4/5 avg. (5 = indistinguishable from reference) Clustering preserves localization cues  74% success avg. (90% within 1 meter of true location)  no significant correlation with number of clusters Pilot validation study

Progressive processing

Progressive signal processing  A scalable pipeline for filtering and mixing many audio streams  fetch & process only perceptually relevant input  continuously adapt quality vs. speed  remain perceptually transparent  use a “standard” representation of the inputs

Progressive signal processing  Uses Fourier-domain coefficients for processing  Degrade both signal quality and spatial cues  Combines processing and audio coding  Uses additional signal descriptors for decision making

Progressive processing pipeline N input frames importance Process + Reconstruct Masking Importance sampling 1 output frame

Progressive signal processing

Progressive processing and sound synthesis  Sound synthesis from physics-driven animation  Modal models  Resonant modes can be synthesized in Fourier domain  numer of Fourier coefficients can be allocated on-the-fly  Balance processing costs for recorded and synthesized sounds at the same time

Conclusions  Perceptually motivated techniques for rendering and authoring virtual auditory environments  human listener only process a small amount of information in complex situations  Extend to  more complex auditory processing model  cross-modal perception Efficient and Practical Audio-Visual Rendering for Games using Crossmodal Perception David Grelaud, Nicolas Bonneel, Michael Wimmer, Manuel Asselot, George Drettakis, Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games - 2009  other problems : dynamic range management e.g., HDR audio approach of EA/Dice studio for Battlefield

Additional references  www-sop.inria.fr/reves  This work was supported by http://www.inria.fr/reves/OPERA http://www.inria.fr/reves/OPERA  RNTL project OPERA http://www.inria.fr/reves/OPERA http://www.inria.fr/reves/OPERA  EU IST Project CREATE http://www.cs.ucl.ac.uk/create http://www.cs.ucl.ac.uk/create  EU FET OPEN Project CROSSMOD http://www-sop.inria.fr/reves/CrossmodPublic/ http://www-sop.inria.fr/reves/CrossmodPublic/

Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories

Similar presentations

Presentation on theme: "Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories

Similar presentations

Presentation on theme: "Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories"— Presentation transcript:

Similar presentations

About project

Feedback