A Coherent Grid Traversal Algorithm for Volume Rendering Ioannis Makris Supervisors: Philipp Slusallek, Céline Loscos Computer Graphics Lab, Universität.

Slides:

Advertisements

Similar presentations

ENV 2006 CS4.1 Envisioning Information: Case Study 4 Focus and Context for Volume Visualization.

Advertisements

Sven Woop Computer Graphics Lab Saarland University

Christian Lauterbach COMP 770, 2/16/2009. Overview  Acceleration structures  Spatial hierarchies  Object hierarchies  Interactive Ray Tracing techniques.

Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.

Direct Volume Rendering. What is volume rendering? Accumulate information along 1 dimension line through volume.

Computer Organization and Architecture

Ray Tracing Ray Tracing 1 Basic algorithm Overview of pbrt Ray-surface intersection (triangles, …) Ray Tracing 2 Brute force: Acceleration data structures.

Ray Tracing CMSC 635. Basic idea How many intersections?  Pixels  ~10 3 to ~10 7  Rays per Pixel  1 to ~10  Primitives  ~10 to ~10 7  Every ray.

Technical University of Lodz Department of Microelectronics and Computer Science Elements of high performance microprocessor architecture Memory system.

CSCI-455/552 Introduction to High Performance Computing Lecture 11.

Computer Organization and Architecture

Computer Organization and Architecture

Fast Volume Rendering Using a Shear-Warp Factorization of the Viewing Transformation Philippe Larcoute & Marc Levoy Stanford University Published in SIGGRAPH.

Other DVR Algorithms and A Comparison Jian Huang, CS 594, Spring 2002.

Ray-casting in VolumePro™ 1000

The Discrete Ray-casting Algorithm Qiang Xue Jiaoying Shi State Key Lab Of CAD&CG Zhejiang University.

Two-Level Grids for Ray Tracing on GPUs

Volume Rendering Volume Modeling Volume Rendering Volume Modeling Volume Rendering 20 Apr

Experiences with Streaming Construction of SAH KD Trees Stefan Popov, Johannes Günther, Hans-Peter Seidel, Philipp Slusallek.

Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.

Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.

Memory-Savvy Distributed Interactive Ray Tracing David E. DeMarle Christiaan Gribble Steven Parker.

Introduction to Volume Rendering Presented by Zvi Devir.

Introduction to Volume Visualization Mengxia Zhu Fall 2007.

Final Gathering on GPU Toshiya Hachisuka University of Tokyo Introduction Producing global illumination image without any noise.

Enhancing and Optimizing the Render Cache Bruce Walter Cornell Program of Computer Graphics George Drettakis REVES/INRIA Sophia-Antipolis Donald P. Greenberg.

Memory Efficient Acceleration Structures and Techniques for CPU-based Volume Raycasting of Large Data S. Grimm, S. Bruckner, A. Kanitsar and E. Gröller.

Direct Volume Rendering w/Shading via Three- Dimensional Textures.

Volume Rendering & Shear-Warp Factorization Joe Zadeh January 22, 2002 CS395 - Advanced Graphics.

University of Texas at Austin CS 378 – Game Technology Don Fussell CS 378: Computer Game Technology Beyond Meshes Spring 2012.

CH12 CPU Structure and Function

Interactive Ray Tracing: From bad joke to old news David Luebke University of Virginia.

Technology and Historical Overview. Introduction to 3d Computer Graphics  3D computer graphics is the science, study, and method of projecting a mathematical.

Efficient Volume Visualization of Large Medical Datasets Stefan Bruckner Institute of Computer Graphics and Algorithms Vienna University of Technology.

1 Speeding Up Ray Tracing Images from Virtual Light Field Project ©Slides Anthony Steed 1999 & Mel Slater 2004.

Rendering Adaptive Resolution Data Models Daniel Bolan Abstract For the past several years, a model for large datasets has been developed and extended.

Lecture 3 : Direct Volume Rendering Bong-Soo Sohn School of Computer Science and Engineering Chung-Ang University Acknowledgement : Han-Wei Shen Lecture.

Scientific Visualization Module 6 Volumetric Algorithms (adapted by S.V. Moore – slides deleted, modified, and added) prof. dr. Alexandru (Alex) Telea.

Cg Programming Mapping Computational Concepts to GPUs.

On a Few Ray Tracing like Algorithms and Structures. -Ravi Prakash Kammaje -Swansea University.

Unstructured Volume Rendering Jian Huang, CS 594, Spring 2002 This set of slides reference slides developed by Prof. Torsten Moeller, SFU, Canada.

Interactive Visualization of Exceptionally Complex Industrial CAD Datasets Andreas Dietrich Ingo Wald Philipp Slusallek Computer Graphics Group Saarland.

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.

Saarland University, Germany B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek.

Interactive Rendering With Coherent Ray Tracing Eurogaphics 2001 Wald, Slusallek, Benthin, Wagner Comp 238, UNC-CH, September 10, 2001 Joshua Stough.

MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,

An Enhanced Splatting Method Graphics and Visualization Group Department of Computer Science The University of Auckland Peter Kulka & Richard Lobb.

Computer Graphics II University of Illinois at Chicago Volume Rendering Presentation for Computer Graphics II Prof. Andy Johnson By Raj Vikram Singh.

- Laboratoire d'InfoRmatique en Image et Systèmes d'information

Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin.

GPU Based Sound Simulation and Visualization Torbjorn Loken, Torbjorn Loken, Sergiu M. Dascalu, and Frederick C Harris, Jr. Department of Computer Science.

Compact, Fast and Robust Grids for Ray Tracing Ares Lagae & Philip Dutré 19 th Eurographics Symposium on Rendering EGSR 2008Wednesday, June 25th.

COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.

Ray Tracing by GPU Ming Ouhyoung. Outline Introduction Graphics Hardware Streaming Ray Tracing Discussion.

1 Adapted from UC Berkeley CS252 S01 Lecture 17: Reducing Cache Miss Penalty and Reducing Cache Hit Time Hardware prefetching and stream buffer, software.

Wei Hong, Feng Qiu, Arie Kaufman Center for Visual Computing and Department of Computer Science, Stony Brook University

Cache memory. Cache memory Overview CPU Cache Main memory Transfer of words Transfer of blocks of words.

Electronic Visualization Laboratory University of Illinois at Chicago “Fast And Reliable Space Leaping For Interactive Volume Rendering” by Ming Wan, Aamir.

These slides are based on the book:

Pathology Spatial Analysis February 2017

5.2 Eleven Advanced Optimizations of Cache Performance

Improving cache performance of MPEG video codec

Accelerated Single Ray Tracing for Wide Vector Units

Presented by: Isaac Martin

An Efficient Method for Volume Rendering using Perspective Projection

Ray Tracing on Programmable Graphics Hardware

Chapter 11 Processor Structure and function

Presentation transcript:

A Coherent Grid Traversal Algorithm for Volume Rendering Ioannis Makris Supervisors: Philipp Slusallek*, Céline Loscos *Computer Graphics Lab, Universität des Saarlandes UCL Department of Computer Science

2 Overview Introduction Previous work in software Direct Volume Rendering Introduction to the Cell Broadband Engine The Coherent Grid Traversal Algorithm Parallelisation Schemes UCL Department of Computer Science

3 Introduction to Direct Volume Rendering Technique of displaying a 2D projection of a 3D sampled dataset (volume), by accumulating samples across lines of sight with some transfer function. Several types of sampled data. We will only deal with rectilinear grids.

4 Direct Volume Rendering Ray Casting (Levoy 1988, 1990) –Image order algorithm Splatting (Westover 1990) –Object order Shear Warp (Lacroute 1994, 1996) –Hybrid order UCL Department of Computer Science

5 Ray Casting Cast a ray from the viewpoint to the volume for all pixels Obtain samples from the volume in equal intervals, by trilinearly interpolating neighbouring voxels. Accumulate with some operator to get final colour. Several acceleration techniques have been suggested (early ray termination (Levoy 1990), adaptive sampling, octrees (Ogata et al. 1998), kd-trees(Wald et al 2005) UCL Department of Computer Science

6 Shear-Warp Considered the fastest known Direct Volume Rendering algorithm. Steps: –Transform volume to sheared object space –Project sheared slices on an intermediate image –Transform the intermediate image to image space Requires 3 copies of the data, for every principal axis, but RLE compression can help. UCL Department of Computer Science

7 Characteristics of modern x86 processors Deep instruction pipeline. Very sophisticated hardware branch prediction 2 levels of cache, supports software prefetching Rich SIMD instruction set

8 The CELL processor Developed jointly by IBM, Sony and Toshiba Combines a PowerPC general purpose processor with 8 separate SIMD execution units (SPUs). Exceptional FLOPS / cost ratio and more powerful than the Itanium! Needs fast memory, which is relatively expensive UCL Department of Computer Science

9 Notable Characteristics of the SPUs Software managed local store (i.e. no caches) No branch prediction, expensive branch misses SIMD loads/stores ONLY Favors streaming code UCL Department of Computer Science

10 Motivation for a new algorithm Ray Casting algorithms are typically not cache friendly. Performance depends on viewing axis. Acceleration structures may produce non- streaming code and several overheads. Shear Warp may require too much memory for certain data. UCL Department of Computer Science

11 A Coherent Grid Traversal Algorithm for Volume Rendering (1) Original idea from “Ray Tracing Animated Scenes using Coherent Grid Traversal” (Wald et al, SIGGRAPH 2006). Bundles (frustums) of coherent rays are traced in grid space, by incrementaly computing the overlap with grid slices. The overlap of the frustum is computed with a SIMD addition and a SIMD truncation only UCL Department of Computer Science

12 A Coherent Grid Traversal Algorithm for Volume Rendering (2) The volume rendering version of the algorithm uses a “bricked” volume (Sakas et al 1994), bricks replace the grid elements. Bricks are referenced by 3 maps, one for each principal axis. Compression is achieved by not storing empty bricks. UCL Department of Computer Science

13 A Coherent Grid Traversal Algorithm for Volume Rendering (3)

14 A Coherent Grid Traversal Algorithm for Volume Rendering (4) Traversal is performed on the principal axis, using the corresponding map. Indices are computed incrementally. If all the overlapping bricks of a slice are empty, the slice is skipped. If some bricks are empty, they are associated with a locally stored empty brick and processed redundantly (but not fetched). UCL Department of Computer Science

15 A Coherent Grid Traversal Algorithm for Volume Rendering (examples) UCL Department of Computer Science

16 Bundle Parallelisation Bundle Parallelisation is trivial. On a x86 C++ OpenMP implementation, it only required 1 line of code. It is possible to have some blocks fetched multiple times from neighbouring bundles. UCL Department of Computer Science

17 Slice Parallelisation A slice parallelisation is less likely to exhibit this problem, but traversal of brick slices is not incremental! So, how would the processing element know which bundles to process for a given slice? UCL Department of Computer Science

18 Slice Parallelisation Most bundles will start on k=0, or end on k=kmax (or both). During tracing, we create 2 vectors of references to bundles, we shall call them A and D, along with 2 index tables for the corresponding slices we shall call P and Q. The bundles that run through a given slice s can be expressed as Only 2 memory reads are required for that, or no memory reads if the bundles are large enough for A and D to fit in the cache/local store. UCL Department of Computer Science

19 Slice Parallelisation Remaining bundles can take up to 33% (they are about 14% average). We use two more lists, we shall call S and E with index tables M and N. S holds references to the remaining bundles sorted by the first slice they intersect, and E sorted by the last. Remaining bundles that run through s are: We need to run through both these lists to find that out, but this does not hit performance. UCL Department of Computer Science

20 A notable problem of the CGT algorithm as described in [Wald 2006] When the “roll” angle of the bundles to the respective angle of the volume is close to π/4, the number of blocks fetches can be double than the number required. There is a good solution to that (not yet published).

21 Results First results demonstrated an speed increase of up to 2 orders of magnitude from ray-casting. This may increase with further optimisations UCL Department of Computer Science

22 Conclusion We have developed a scalable algorithm for coherent volume traversal with performance on- par with the Shear – Warp, with reduced memory requirements. We demonstrated parallel implementations.

23 Future Work Investigate mixed parallelisation schemes Optimise the computation performed per brick.

24 The End Thank you for your attention Questions? UCL Department of Computer Science