Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories

Slides:

Advertisements

Similar presentations

Alex Chen Nader Shehad Aamir Virani Erik Welsh

Advertisements

DCSP-13 Jianfeng Feng

Progressive Perceptual Audio Rendering of Complex Scenes Thomas Moeck - Nicolas Bonneel - Nicolas Tsingos - George Drettakis - Isabelle Viaud-Delmon -

Audio Compression ADPCM ATRAC (Minidisk) MPEG Audio –3 layers referred to as layers I, II, and III –The third layer is mp3.

Introduction to MP3 and psychoacoustics Material from website by Mark S. Drew

Guerino Mazzola (Fall 2014 © ): Introduction to Music Technology IIIDigital Audio III.6 (Fr Oct 24) The MP3 algorithm with PAC.

CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.

MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.

Time-Frequency Analysis Analyzing sounds as a sequence of frames

Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.

Speech Enhancement through Noise Reduction By Yating & Kundan.

AUDIO COMPRESSION TOOLS & TECHNIQUES Gautam Bhattacharya.

Chapter 7 Principles of Analog Synthesis and Voltage Control Contents Understanding Musical Sound Electronic Sound Generation Voltage Control Fundamentals.

Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.

1 Digital Audio Compression. 2 Formats  There are many different formats for storing and communicating digital audio:  CD audio  Wav  Aiff  Au 

Real Time Digital Watermarking System for Audio Signals Yuval Cassuto and Michael Lustig Supervisor: Shay Mizrachi Technion - Israel Institute of Technology.

Speech & Audio Processing

1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.

Interactive Sound Rendering SIGGRAPH 2009 Dinesh Manocha UNC Chapel Hill

1 Introduction to MPEG Surround 韓志岡 2/9/ Outline Background – Motivation – Perception of sound in space Pricicple of MPEG Surround – Downmixing.

SWE 423: Multimedia Systems Chapter 7: Data Compression (1)

Integration into game engines Nicolas Tsingos Dolby Laboratories

MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.

Geometric Sound Propagation Anish Chandak & Dinesh Manocha UNC Chapel Hill

EET 450 Chapter 18 – Audio. Analog Audio Sound is analog Consists of air pressure that has a variety of characteristics  Frequencies  Amplitude (loudness)

1 Manipulating Digital Audio. 2 Digital Manipulation  Extremely powerful manipulation techniques  Cut and paste  Filtering  Frequency domain manipulation.

Spectral centroid 6 harmonics: f0 = 100Hz E.g. 1: Amplitudes: 6; 5.75; 4; 3.2; 2; 1 [(100*6)+(200*5.75)+(300*4)+(400*3.2)+(500*2 )+(600*1)] / = 265.6Hz.

Fundamentals of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06.

1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.

ANISH CHANDAK COMP 770 (SPRING’09) An Introduction to Sound Rendering © Copyright 2009 Anish Chandak.

A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.

Physics and Sound Zhimin & Dave. Motivation Physical simulation Games Movies Special effects.

Modernising Children’s Hearing Aid Services Sound Field Testing MCHAS TEAM Wave 4 SFR 17/05/04.

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.

Binaural Sonification of Disparity Maps Alfonso Alba, Carlos Zubieta, Edgar Arce Facultad de Ciencias Universidad Autónoma de San Luis Potosí.

Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.

Multiresolution STFT for Analysis and Processing of Audio

Dynamic Meshing Using Adaptively Sampled Distance Fields

Interactive Time-Dependent Tone Mapping Using Programmable Graphics Hardware Nolan GoodnightGreg HumphreysCliff WoolleyRui Wang University of Virginia.

1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.

Real-time Graphics for VR Chapter 23. What is it about? In this part of the course we will look at how to render images given the constrains of VR: –we.

Audio Systems Survey of Methods for Modelling Sound Propagation in Interactive Virtual Environments Ben Tagger Andriana Machaira.

Image Processing Architecture, © 2001, 2002, 2003 Oleh TretiakPage 1 ECE-C490 Image Processing Architecture MP-3 Compression Course Review Oleh Tretiak.

Interactive acoustic modeling of virtual environments Nicolas Tsingos Nicolas TsingosREVES-INRIA.

Mingyang Zhu, Huaijiang Sun, Zhigang Deng Quaternion Space Sparse Decomposition for Motion Compression and Retrieval SCA 2012.

CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.

Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp

Marwan Al-Namari 1 Digital Representations. Bits and Bytes Devices can only be in one of two states 0 or 1, yes or no, on or off, … Bit: a unit of data.

02/05/2002 (C) University of Wisconsin 2002, CS 559 Last Time Color Quantization Mach Banding –Humans exaggerate sharp boundaries, but not fuzzy ones.

Predicting Voice Elicited Emotions

Automatic Equalization for Live Venue Sound Systems Damien Dooley, Final Year ECE Progress To Date, Monday 21 st January 2008.

IntroductiontMyn1 Introduction MPEG, Moving Picture Experts Group was started in 1988 as a working group within ISO/IEC with the aim of defining standards.

ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission ( ) Image Compression Quantization independent samples uniform and optimum correlated.

CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.

JASS J ava A udio S ynthesis S ystem SUMMARY JASS is a unit generator based audio synthesis programming environment written.

Predicting Speech Intelligibility Where we were… Model of speech intelligibility Good prediction of Greenberg’s bands Data.

HIGH-RESOLUTION SINUSOIDAL MODELING OF UNVOICED SPEECH GEORGE P. KAFENTZIS, YANNIS STYLIANOU MULTIMEDIA INFORMATICS LABORATORY DEPARTMENT OF COMPUTER SCIENCE.

Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.

Digital Audio (2/2) S.P.Vimal CSIS Group BITS-Pilani

Noise & Sound Graeme Murphy – National Brand Manager, Industrial Equipment.

III Digital Audio III.6 (Fr Oct 20) The MP3 algorithm with PAC.

Anne Pratoomtong ECE734, Spring2002

A Smartphone App-Based

Spectral processing of point-sampled geometry

Judith Molka-Danielsen, Oct. 02, 2000

MPEG-1 Overview of MPEG-1 Standard

III Digital Audio III.6 (Mo Oct 22) The MP3 algorithm with PAC.

Assist. Lecturer Safeen H. Rasool Collage of SCIENCE IT Dept.

Govt. Polytechnic Dhangar(Fatehabad)

Presentation transcript:

Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories

Motivation  Many applications require processing hundreds of audio streams in real-time  games/simulators, multi-track mixing, etc. ©Eden Games©Steinberg

Massive audio processing  Often exceeds available resources  Limited CPU or hardware processing  Bus-traffic  Typically involves  individual processing  mix-down of all signals to outputs  3D audio rendering

Perceptual audio rendering  Perceptually-based processing  Many sources and efficient DSP effects  Level of detail rendering  Independent of reproduction system Extended sound sourcesSound reflections sound sources

Leveraging limitations of human hearing  A large part of complex sound mixtures is likely to be perceptually irrelevant  e.g., auditory masking  Limitations of spatial hearing  e.g., localization accuracy, ventriloquism

masking clustering progressive processing sources listener Perceptual audio rendering components

Masking

Real-time masking evaluation Remove inaudible sources  Fetch and process only perceptually relevant input  Different from invisible or occluded sound sources Estimate inter-source masking  Build upon perceptual audio coding work  Computing audibility threshold requires knowledge of signal characteristics

Signal characteristics Pre-computed for short time-frames (20 ms)  power spectrum  tonality index in [0,1] (1 = tone, 0 = noise) time pre-recorded signal

Sort sources by decreasing loudness  Loudness relates to the sensation of sound intensity Efficient run-time loudness evaluation  Retrieve pre-computed power spectrum for each source  Modulate by propagation effects  Convert to loudness using look-up tables [Moore92] Greedy culling algorithm

power [dB] listener 1 Candidate sources Current mix Current masking threshold STOP ! Current masking threshold Current masking threshold Current masking threshold Masking evaluation

Clustering

Dynamic spatial clustering  Amortize (costly) 3D-audio processing over groups of sources  Leverage limited resolution of spatial hearing  Group neighboring sources together  Compute an “impostor” for the group  Perceptually equivalent but cheaper to render  Unique point source with a complex response (mixture of all source signals in cluster)

Dynamic spatial clustering  Limited spatial perception of human hearing [Blauert, Middlebrooks]  Static sound source clustering [Herder99]  non-uniform subdivision of direction space  use Cartesian centroid as representative

Group neighboring sources together  Uniform direction constraint  Log(1/distance) constraint  Weight by loudness Hochbaum-Schmoy heuristic [Hochbaum85]  Fast hierarchical implementation Dynamic spatial clustering

Mix signals of all sources in the cluster  create a single source with a complex response Rendering clusters

Dynamic spatial clustering

Culling and masking are transparent  rated 4.4/5 avg. (5 = indistinguishable from reference) Clustering preserves localization cues  74% success avg. (90% within 1 meter of true location)  no significant correlation with number of clusters Pilot validation study

Progressive processing

Progressive signal processing  A scalable pipeline for filtering and mixing many audio streams  fetch & process only perceptually relevant input  continuously adapt quality vs. speed  remain perceptually transparent  use a “standard” representation of the inputs

Progressive signal processing  Uses Fourier-domain coefficients for processing  Degrade both signal quality and spatial cues  Combines processing and audio coding  Uses additional signal descriptors for decision making

Progressive processing pipeline N input frames importance Process + Reconstruct Masking Importance sampling 1 output frame

Progressive signal processing

Progressive processing and sound synthesis  Sound synthesis from physics-driven animation  Modal models  Resonant modes can be synthesized in Fourier domain  numer of Fourier coefficients can be allocated on-the-fly  Balance processing costs for recorded and synthesized sounds at the same time

Conclusions  Perceptually motivated techniques for rendering and authoring virtual auditory environments  human listener only process a small amount of information in complex situations  Extend to  more complex auditory processing model  cross-modal perception Efficient and Practical Audio-Visual Rendering for Games using Crossmodal Perception David Grelaud, Nicolas Bonneel, Michael Wimmer, Manuel Asselot, George Drettakis, Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games  other problems : dynamic range management e.g., HDR audio approach of EA/Dice studio for Battlefield

Additional references  www-sop.inria.fr/reves  This work was supported by  RNTL project OPERA  EU IST Project CREATE  EU FET OPEN Project CROSSMOD