Download presentation
Presentation is loading. Please wait.
1
Memory-Savvy Distributed Interactive Ray Tracing David E. DeMarle Christiaan Gribble Steven Parker
2
Impetus for the Paper data sets are growing memory access time is a bottleneck use parallel memory resources efficiently three techniques for faster access to scene data
3
System Overview base system presented at IEEE PVG’03 cluster port of an interactive ray tracer for shared memory supercomputers IEEE VIS’98 image parallel work division fetch scene data over from peers and cache locally
4
Three Techniques for Memory Efficiency ODSM PDSM central work queue distributed work sharing polygonal mesh reorganization
5
Distributed Shared Memory data is kept in memory blocks each node has 1/n th of the blocks fetch rest over the network from peers cache recently fetched blocks 123467589 abstract view of memory 1472 node 1’s memory 2583 node 2’s memory 36924 node 3’s memory resident setcache
6
Object Based DSM each block has a unique handle application finds handle for each datum acquire and release for every block access //locate data handle, offset = ODSM_location(datum); block_start_addr = acquire(handle); //use data datum = *(block_start_addr + offset); //relinquish space release(handle);
7
ODSM Observations handle = level of indirection > 4 GB mapping scene data to blocks is tricky acquire and release add overhead address computations add overhead 7.5 GB Richtmyer-Meshkov time step 64 CPUs ~3fps, with view and isovalue changes
8
Page Based DSM like ODSM: each node keeps 1/n th of scene fetches from peers uses caching difference is how memory is accessed normal virtual memory addressing use addresses between heap and stack PDSM installs a segmentation fault signal handler: on a miss obtain page from peer, return
9
PDSM Observations no handles, normal memory access no acquire/release or address computations easy to place any type of scene data in shared space limited to 2^32 bytes hard to make thread safe DSM acts only in the exceptional case of a miss ray tracing acceleration structure > 90 % hit rates ODSMPDSM Hit time10.2 µs4.97 µs Miss time629 µs632 µs
10
Head-to-Head Comparison compare replication, PDSM and ODSM use a small 512 ^3 volumetric data set PDSM and ODSM keep only 1/16 th locally change viewpoint and isovalue throughout first half, large working set second half, small working set
11
Head-to-Head Comparison note - accelerated ~2x for presentation
12
Head-to-Head Comparison
13
replicated 3.74 frames/sec average
14
Head-to-Head Comparison ODSM 32% speed of replication
15
Head-to-Head Comparison PDSM 82% speed of replication
16
Three Techniques for Memory Efficiency ODSM PDSM central work queue distributed work sharing polygonal mesh reorganization
17
Load Balancing Options central work queue legacy from original shared memory implementation display node keeps task queue render nodes get tiles from queue now distributed work sharing start with tiles traced last frame hit rates increase workers get tiles from each other communicate in parallel, better scalability steal from random peers, slowest worker gives work
18
Supervisor node tile 0tile 1tile 2tile 3 … Worker node 0 Worker node 1 Worker node 2 Worker node 3 … Worker node 0 Worker node 1 Worker node 2 Worker node 3 … tile 0 tile 1 tile 2 tile 3 … … … … Central Work QueueDistributed Work Sharing
19
Central Work QueueDistributed Work Sharing
20
Central Work QueueDistributed Work Sharing
21
Central Work QueueDistributed Work Sharing
22
Central Work QueueDistributed Work Sharing
23
Comparison bunny, dragon, and acceleration structures in PDSM measure misses and frame rates vary local memory to simulate data much larger than physical memory
24
Misses Frames/Sec 0 1E6 0 20 MB locally 15 10 5 5E4 central queuedistributed sharing
25
Misses Frames/Sec 0 1E6 0 20 MB locally 15 10 5 5E4 central queuedistributed sharing
26
Misses Frames/Sec 0 1E6 0 20 MB locally 15 10 5 5E4 central queuedistributed sharing
27
Misses Frames/Sec 0 1E6 0 20 MB locally 15 10 5 5E4 central queuedistributed sharing
28
Three Techniques for Memory Efficiency ODSM PDSM central work queue distributed work sharing polygonal mesh reorganization
29
Mesh “Bricking” similar to volumetric bricking increase hit rates by reorganizing scene data for better data locality place neighboring triangles on the same page &0&1 &90&91 &2 &92 &3 &93 volume bricking &3&5 &4 &7 &8 &1 &0&2&6 &94&96 &95 &98&90&92 &91&93&97 … … … … mesh “bricking”
30
Mesh “Bricking” similar to volumetric bricking increase hit rates by reorganizing scene data for better data locality place neighboring triangles on the same page &0&1 &90&91 &2 &92 &3 &93 volume bricking &3&5 &4 &7 &8 &1 &0&2&6 &94&96 &95 &98&90&92 &91&93&97 … … … … mesh “bricking”
31
Mesh “Bricking” similar to volumetric bricking increase hit rates by reorganizing scene data for better data locality place neighboring triangles on the same page &0&1 &90&91 &2 &92 &3 &93 volume bricking &3&5 &4 &7 &8 &1 &0&2&6 &94&96 &95 &98&90&92 &91&93&97 … … … … mesh “bricking”
32
Mesh “Bricking” similar to volumetric bricking increase hit rates by reorganizing scene data for better data locality place neighboring triangles on the same page &0&1 &90&91 &2 &92 &3 &93 volume bricking &3&5 &4 &7 &8 &1 &0&2&6 &94&96 &95 &98&90&92 &91&93&97 … … … … mesh “bricking”
33
Mesh “Bricking” similar to volumetric bricking increase hit rates by reorganizing scene data for better data locality place neighboring triangles on the same page &0&1 &90&91 &2 &92 &3 &93 volume bricking &3&5 &4 &7 &8 &1 &0&2&6 &94&96 &95 &98&90&92 &91&93&97 … … … … mesh “bricking”
34
Mesh “Bricking” similar to volumetric bricking increase hit rates by reorganizing scene data for better data locality place neighboring triangles on the same page &0&1 &2&3 &4 &6 &5 &7 volume bricking &6&8 &7 &13 &14 &1 &0&2&12 &10&15 &11 &17&3&5 &4&9&16 … … … … mesh “bricking”
35
Mesh “Bricking” similar to volumetric bricking increase hit rates by reorganizing scene data for better data locality place neighboring triangles on the same page &0&1 &2&3 &4 &6 &5 &7 volume bricking &6&8 &7 &13 &14 &1 &0&2&12 &10&15 &11 &17&3&5 &4&9&16 … … … … mesh “bricking”
36
Input Mesh
37
Sorted Mesh
38
Reorganizing the Mesh based on a grid acceleration structure each grid cell contains pointers to triangles within our grid structure is bricked in memory 1.create grid acceleration structure 2.traverse the cells as stored in memory 3.append copies of the triangles to a new mesh new mesh has triangles sorted in space and memory
39
Comparison same test as before compare input and sorted mesh
40
Misses Frames/Sec MB locally input meshsorted mesh
41
Misses Frames/Sec MB locally input meshsorted mesh
42
Misses Frames/Sec MB locally input meshsorted mesh
43
Misses Frames/Sec MB locally input meshsorted mesh
44
Frames/Sec MB locally input meshsorted mesh grid based approach duplicates split triangles
45
Summary three techniques for more efficient memory use: 1.PDSM adds overhead only in the exceptional case of data miss 2.reuse tile assignments with parallel load balancing heuristics 3.mesh reorganization puts related triangles onto nearby pages
46
Future Work need 64-bit architecture for very large data thread safe PDSM for hybrid parallelism distributed pixel result gathering surface based mesh reorganization
47
Acknowledgments Funding agencies NSF 9977218, 9978099 DOE VIEWS NIH Reviewers - for tips and seeing through the rough initial data presentation EGPGV Organizers Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.