Download presentation
Presentation is loading. Please wait.
2
Does Spatialized Audio Change Everything?
Now I’ve been at Unity almost 2 years. I’m focused mostly on VR and AR audio and video. By video, I mean video decoders. So what does that mean? For audio, it basically means focusing on adding more realistic capabilities to Unity audio. We have 3 audio engineers in Copenhagen focused on core audio. And I’m here in Bellevue focused on things like HRTF and the frameworks to support HRTF (spatializers), ambisonics, and environmental audio. That sounds exciting, but so far my role has really been to get audio working on some new VR and AR platforms and to create frameworks to support this stuff. Or to make sure the Unity framework does not prohibit doing these things. On the video side, I’m making sure our Unity video player works efficiently on new XR platforms. Making sure it works with spatialized audio, making sure it works in AR with the re-constructed meshes captured from the room you’re in. Today, I want to focus on the XR audio stuff I’ve been thinking about, mostly the framework or audio engine around HRTF nodes (or spatializers). “Does Spatialized Audio Change Everything?” I want to explore that with you all a little bit today.
3
What is a traditional audio engine?
Tree of nodes Node performs an operation Buffers of data flow throw these nodes Metadata describes buffers Audio source nodes work on mono data Audio mixer nodes work on stereo data (in XR)
4
What are some audio nodes?
File nodes stream in data De-compression nodes produce raw PCM data Low-pass filter nodes reduce high frequencies High-pass filter nodes reduce low frequencies Mixer nodes add all input buffers 3D panner node outputs stereo based on 3D position
5
What is a spatializer node?
Realistic 3D panner node Left channel is sent data for left ear Right channel is sent data for right ear More subtle than traditional 3D panner node
6
What is unique about a spatializer node?
Fundamental for 3D sounds (for XR and headphones) Expensive
7
Other fundamental, expensive nodes?
De-compression nodes Project-specific nodes Nothing else is so fundamental, necessary, and expensive
8
What’s the problem? Spatializer nodes fit well into the traditional audio engine Problem is performance
9
What’s the node performance?
Performance numbers taken on Samsung S8, Android phone Most nodes are relatively cheap De-compression nodes cost 0.75% CPU per sound (Vorbis) Spatializer nodes cost 1.0% CPU per sound (Oculus)
10
What is game audio’s budget?
1ms on main thread (30fps game) 0.5ms on main thread (60fps XR game) 5-10% of device’s memory and CPU resources 50% of core on audio thread
11
What’s the cost of de-compressed, spatialized sounds?
16 sounds cost 1.75% x 16 = 28% 28 sounds cost 1.75% x 28 = 49% 32 sounds cost 1.75% x 32 = 56% 64 sounds cost 1.75% x 64 = 112%
12
What can we do with mobile XR?
We can support 28 sounds, if we do nothing else But we also want: Occlusion and low-pass filtering Play requests Reverb Event systems 64 sounds for AAA audio
13
Optimization: Prioritize
Prioritize sounds Separate virtual and physical sounds Spatialize most important physical sounds 32 physical sounds, 16 spatialized sounds cost 40%
14
Optimization: Cheap spatialization
Use realistic distance attenuation curves Perform ILT Perform ILD Perform simple, low-pass filtering based on location
15
Optimization: Group nearby sounds
Group sounds at the same location together Spatialize the mixed group of sounds Group far away sounds more aggressively Character sounds play from the mouth and foot nearby Character sounds play from the center far away
16
Optimization: De-compress less
Don’t repeatedly de-compress small, frequently played sounds De-compress once Use more memory and less CPU
17
Optimizations: All of the above
Play 32 physical sounds De-compress 24 sounds Spatialize 8 locations (16 sounds) Use low-LOD spatialization on other sounds 26% CPU usage
18
Is there a different approach?
Google Resonance Windows Sonic Dolby Atmos
19
Google Resonance Convert each sound into ambisonic format
Mix ambisonic sounds Decode one, mixed ambisonic sound and spatialize in one step Scales very well Each sound costs 0.75% to de-compress (same) Each sound costs 0.6% to spatialize (instead of 1.0%) 32 de-compressed, spatialized sounds cost 43%.
20
Google Resonance limitations
Mixer part of audio engine does not exist Or mixers are in 16-channel ambisonic format
21
Negatives to eliminating Audio Mixers?
Sound designer workflow (“Lower the player’s foley.”) Apply expensive effects once to many sounds HDR and side-chaining (activity on mixer affects other sound/mixer properties)
22
Positives to eliminating Audio Mixers?
Very simple pipeline Easy, flexible jobification Low-latency because of few dependencies
23
My personal thoughts (ambisonic)
Initially, I was very intrigued with the ambisonic / Google approach But, the performance improvement is limited for 32 sounds / mobile Not good enough to throw away the traditional audio engine design
24
My personal thoughts (traditional)
Need excellent prioritization / culling algorithm Need lower-quality spatializer to pair with high-quality spatializer Optimizations feel more like traditional game engine / audio opts We can do this!
25
What do you think?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.