Progressive Perceptual Audio Rendering of Complex Scenes Thomas Moeck - Nicolas Bonneel - Nicolas Tsingos - George Drettakis - Isabelle Viaud-Delmon -

Slides:

Advertisements

Similar presentations

FMRI Methods Lecture 10 – Using natural stimuli. Reductionism Reducing complex things into simpler components Explaining the whole as a sum of its parts.

Advertisements

F-tests continued.

DCSP-13 Jianfeng Feng

QoS-based Management of Multiple Shared Resources in Dynamic Real-Time Systems Klaus Ecker, Frank Drews School of EECS, Ohio University, Athens, OH {ecker,

Background Virtual memory – separation of user logical memory from physical memory. Only part of the program needs to be in memory for execution. Logical.

Virtual COMSATS Inferential Statistics Lecture-3

Real-time Shading with Filtered Importance Sampling

Random Forest Predrag Radenković 3237/10

Wavelets Fast Multiresolution Image Querying Jacobs et.al. SIGGRAPH95.

Inpainting Assigment – Tips and Hints Outline how to design a good test plan selection of dimensions to test along selection of values for each dimension.

Contrast-Aware Halftoning Hua Li and David Mould April 22,

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

Modifications of Fechner’s methods, forced choice Research Methods Fall 2010 Tamás Bőhm.

Render Cache John Tran CS851 - Interactive Ray Tracing February 5, 2003.

Increasing computer science popularity and gender diversity through the use of games and contextualized learning By Mikha Zeffertt Supervised by Mici Halse.

On the Genetic Evolution of a Perfect Tic-Tac-Toe Strategy

 Image Characteristics  Image Digitization Spatial domain Intensity domain 1.

Improvement of Audibility for Multi Speakers with the Head Related Transfer Function Takanori Nishino †, Kazuhiro Uchida, Naoya Inoue, Kazuya Takeda and.

Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.

A Perceptual Heuristic for Shadow Computation in Photo-Realistic Images Wednesday, 2 August 2006 Peter VangorpOlivier DumontToon LenaertsPhilip Dutré.

Multi-View Learning in the Presence of View Disagreement C. Mario Christoudias, Raquel Urtasun, Trevor Darrell UC Berkeley EECS & ICSI MIT CSAIL.

Guillaume Lavoué Mohamed Chaker Larabi Libor Vasa Université de Lyon

Hidden Markov Models Theory By Johan Walters (SR 2003)

Artificial Learning Approaches for Multi-target Tracking Jesse McCrosky Nikki Hu.

1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.

Nice, 17/18 December 2001 Autonomous mapping of natural fields using Random Closed Set Models Stefan Rolfes, Maria Joao Rendas

Evaluation Adam Bodnar CPSC 533C Monday, April 5, 2004.

The Effectiveness of a QoE - Based Video Output Scheme for Audio- Video IP Transmission Shuji Tasaka, Hikaru Yoshimi, Akifumi Hirashima, Toshiro Nunome.

Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories

ART: Augmented Reality Table for Interactive Trading Card Game Albert H.T. Lam, Kevin C. H. Chow, Edward H. H. Yau and Michael R. Lyu Department of Computer.

Statement of the Problem Goal Establishes Setting of the Problem hypothesis Additional information to comprehend fully the meaning of the problem scopedefinitionsassumptions.

HELSINKI UNIVERSITY OF TECHNOLOGY LABORATORY OF COMPUTER AND INFORMATION SCIENCE NEURAL NETWORKS RESEACH CENTRE Variability of Independent Components.

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Optimizing the Placement of Chemical and Biological Agent Sensors Daniel L. Schafer Thomas Jefferson High School for Science and Technology Defense Threat.

Physics and Sound Zhimin & Dave. Motivation Physical simulation Games Movies Special effects.

The role of auditory-visual integration in object recognition Clara Suied 1, Nicolas Bonneel 2 and Isabelle Viaud-Delmon 1 1 CNRS – UPMC UMR 7593 Hôpital.

Navigating and Browsing 3D Models in 3DLIB Hesham Anan, Kurt Maly, Mohammad Zubair Computer Science Dept. Old Dominion University, Norfolk, VA, (anan,

Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Studying Visual Attention with the Visual Search Paradigm Marc Pomplun Department of Computer Science University of Massachusetts at Boston

Genetic Algorithm.

„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.

ENTERFACE ‘08: Project4 Design and Usability Issues for multimodal cues in Interface Design/ Virtual Environments eNTERFACE ‘08| Project 4.

1 Smashing Peacocks Further: Drawing Quasi-Trees from Biconnected Components Daniel Archambault and Tamara Munzner, University of British Columbia David.

A Regression Approach to Music Emotion Recognition Yi-Hsuan Yang, Yu-Ching Lin, Ya-Fan Su, and Homer H. Chen, Fellow, IEEE IEEE TRANSACTIONS ON AUDIO,

HP-PURDUE-CONFIDENTIAL Final Exam May 16th 2008 Slide No.1 Outline Motivations Analytical Model of Skew Effect and its Compensation in Banding and MTF.

A Viable Implementation of a Comparison Algorithm for Regions of Interest John P. Heminghous Computer Science Clemson University

Digital Sound Ming C. Lin Department of Computer Science University of North Carolina

Chapter 5: Normal Hearing. Objectives (1) Define threshold and minimum auditory sensitivity The normal hearing range for humans Define minimum audible.

1 Presented by Jari Korhonen Centre for Quantifiable Quality of Service in Communication Systems (Q2S) Norwegian University of Science and Technology (NTNU)

1 Reconstructing head models from photograph for individualized 3D-audio processing Matteo Dellepiane, Nico Pietroni, Nicolas Tsingos, Manuel Asselot,

A statistical test for point source searches - Aart Heijboer - AWG - Cern june 2002 A statistical test for point source searches Aart Heijboer contents:

Goal and Motivation To study our (in)ability to detect inconsistencies in the illumination of objects in images Invited Talk! – Hany Farid: Photo Forensincs:

A Computational Study of Three Demon Algorithm Variants for Solving the TSP Bala Chandran, University of Maryland Bruce Golden, University of Maryland.

IMAGIS-GRAVIR / IMAG Drawing for Illustration and Annotation in 3D David Bourguignon*, Marie-Paule Cani* and George Drettakis** *iMAGIS, INRIA Rhône-Alpes,

by Mitchell D. Swanson, Bin Zhu, and Ahmed H. Tewfik

BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.

Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.

Secure Spread Spectrum Watermarking for Multimedia Young K Hwang.

Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.

Nonlinear State Estimation

Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.

Metaheuristics for the New Millennium Bruce L. Golden RH Smith School of Business University of Maryland by Presented at the University of Iowa, March.

Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.

MMC LAB Secure Spread Spectrum Watermarking for Multimedia KAIST MMC LAB Seung jin Ryu 1MMC LAB.

Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)

[1] National Institute of Science & Technology Technical Seminar Presentation 2004 Suresh Chandra Martha National Institute of Science & Technology Audio.

Objective and Subjective Audio Assessment of MP3 Players’ Quality

Super-resolution Image Reconstruction

Spatial Online Sampling and Aggregation

A Review in Quality Measures for Halftoned Images

Presentation transcript:

Progressive Perceptual Audio Rendering of Complex Scenes Thomas Moeck - Nicolas Bonneel - Nicolas Tsingos - George Drettakis - Isabelle Viaud-Delmon - David Alloza 1- REVES/INRIA Sophia-Antipolis 2- Computer Graphics Group, University of Erlangen-Nuremberg 3- CNRS-UPMC UMR EdenGames 1,

Objectives Efficient audio rendering of very complex scenes with moving sources Without audible impairment of the quality Verify results by user tests

Previous Work Rendering complex auditory scenes Clustering [Tsingos et al. 2004]: replace many sources with a representative Still can only treat ~200 sound sources (cost of clustering itself) Scalable audio processing Importance-guided processing of few frequency/time bins [Fouad et al. 1997, Wand & Straßer 2004, Gallo et al. 2005, Tsingos 2005]. Audio processing (e.g., HRTF, spatialization) is expensive Crossmodal effects Neuroscience Literature: Ventriloquism affects 3D audio perception Ventriloquism spatial window can vary from a few up to 15 degree Few papers on ecological experiments

Methodology Recursive approach to clustering Reduce cost of clustering Scalable perceptual premixing Faster premixing without audible loss of quality Taking perceptual and cross-modal information into account Improve audio clustering algorithm User experiments to detect improvement possibilities Improving quality with results of tests Validation of resulting algorithms

Overview of the algorithms Masking of inaudible sources (with energy) Clustering of remaining sources Progressive premixing within each cluster Spatial audio processing (HRTF) recursiv e

Our Work Optimized recursive approach of clustering Clustering performance evaluation Improved scalable perceptual premixing Quality evaluation study Study of cross-modal effects by user experiments Using results of cross-modal studies to develop audio-visual clustering algorithm

Optimized Recursive Clustering Recursive splitting of clusters Fixed-budget approach Using a fixed number of clusters Variable-budget approach Splitting clusters until break condition is reached Break condition: Average angle error Optimal number of clusters Variant used by EdenGames 8 cluster budget Local clustering when necessary

Eden Games implementation Test Drive Unlimited

Clustering Performance Evaluation Performance of recursive algorithms are clearly better

Improved progressive scalable perceptual premixing (1) After clustering: Premixing in each cluster Why? Effects can be done afterwards - less cost because viewer signals Only premixing necessary data Assigning frequency bins to sound sources (iterative importance sampling) by using pinnacle value

Improved progressive scalable perceptual premixing (2) premixing clustering

Improved progressive scalable perceptual premixing (3) Iterative importance sampling Calculation of importance value from energy, loudness or audio saliency map Assignment of frequency proportional to importance until pinnacle value is reached Reassignment of remaining frequencies to sounds relative to importance values

Varying budget

Quality Evaluation Study (1) MUSHRA (Multiple Stimuli with Hidden Reference and Anchors) test of perceptual premixing 7 subjects, aged from 23 – 40 Ambient, music and speech Various budgets (2% – 25 %) With and without pinnacle value Using loudness or saliency as importance value

Quality Evaluation Study (2) Results: Approach is capable of generating high quality using 25% of the original data Acceptable results with 10% (2% in case of speech) Significant Effects: Budget Importance value Pinnacle value

Study of Cross-Modal Influences – Questions Do we need more or fewer clusters in the viewing frustum? We move spatial position of sound sources to representative in cluster How tolerant are we to this error ? Do visuals influence the perceived quality?

Study of Cross-Modal Influences – Setup (1)

Study of Cross-Modal Influences – Setup (2)

Study of Cross-Modal Effects – Setup (3)

Uniform distribution [1/4]

[2/3] condition

[3/2] condition

[4/1] condition

Study of Cross-Modal Influences – Results Statistical analysis of the results shows: We need more clusters in the viewing frustum No significant difference of visuals/no-visuals but possible cross-modal effect

Modifying the algorithm Introducing weighting term in clustering: Increasing number of clusters in the viewing frustum

Cross-Modal illustration

Video: Putting it all together

Conclusions Up to nearly 3000 sound sources possible in good quality Main limitation are graphics (!) Better quality because more clusters in viewing frustum Future work experiment with auditory saliency measurements handle procedurally synthesized sounds?

Questions?