1 1 Exploiting Local Orientation Similarity for Efficient Ray Traversal of Hair and Fur Sven Woop, Carsten Benthin, Ingo Wald, Gregory S. Johnson Intel.

Slides:



Advertisements
Similar presentations
Sven Woop Computer Graphics Lab Saarland University
Advertisements

Christian Lauterbach COMP 770, 2/16/2009. Overview  Acceleration structures  Spatial hierarchies  Object hierarchies  Interactive Ray Tracing techniques.
Intel ® Xeon ® Processor E v2 Product Family Ivy Bridge Improvements *Other names and brands may be claimed as the property of others. FeatureXeon.
Intersection Testing Chapter 13 Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology.
Collision Detection CSCE /60 What is Collision Detection?  Given two geometric objects, determine if they overlap.  Typically, at least one of.
IIIT Hyderabad Hybrid Ray Tracing and Path Tracing of Bezier Surfaces using a mixed hierarchy Rohit Nigam, P. J. Narayanan CVIT, IIIT Hyderabad, Hyderabad,
Collision Detection and Resolution Zhi Yuan Course: Introduction to Game Development 11/28/
Computer graphics & visualization Collisions. computer graphics & visualization Simulation and Animation – SS07 Jens Krüger – Computer Graphics and Visualization.
Software and Services Group Optimization Notice Advancing HPC == advancing the business of software Rich Altmaier Director of Engineering Sept 1, 2011.
Ray Tracing CMSC 635. Basic idea How many intersections?  Pixels  ~10 3 to ~10 7  Rays per Pixel  1 to ~10  Primitives  ~10 to ~10 7  Every ray.
Perceptual Computing SDK Q2, 2013 Update Building Momentum with the SDK 1 Barry Solomon, Senior Product Manager, Intel Xintian Wu, Architect, Intel.
Software & Services Group Developer Products Division Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property.
Latency considerations of depth-first GPU ray tracing
Computational Geometry & Collision detection
RT06 conferenceVlastimil Havran On the Fast Construction of Spatial Hierarchies for Ray Tracing Vlastimil Havran 1,2 Robert Herzog 1 Hans-Peter Seidel.
Paper Presentation - Micropolygon Ray Tracing With Defocus and Motion Blur - Qiming Hou, Hao Qin, Wenyao Li, Baining Guo, Kun Zhou Presenter : Jong Hyeob.
Experiences with Streaming Construction of SAH KD Trees Stefan Popov, Johannes Günther, Hans-Peter Seidel, Philipp Slusallek.
Afrigraph 2004 Interactive Ray-Tracing of Free-Form Surfaces Carsten Benthin Ingo Wald Philipp Slusallek Computer Graphics Lab Saarland University, Germany.
Collision Detection CSE 191A: Seminar on Video Game Programming Lecture 3: Collision Detection UCSD, Spring, 2003 Instructor: Steve Rotenberg.
OBBTree: A Hierarchical Structure for Rapid Interference Detection Gottschalk, M. C. Lin and D. ManochaM. C. LinD. Manocha Department of Computer Science,
Bounding Volume Hierarchies and Spatial Partitioning Kenneth E. Hoff III COMP-236 lecture Spring 2000.
Fast Agglomerative Clustering for Rendering Bruce Walter, Kavita Bala, Cornell University Milind Kulkarni, Keshav Pingali University of Texas, Austin.
1 Advanced Scene Management System. 2 A tree-based or graph-based representation is good for 3D data management A tree-based or graph-based representation.
Collision Detection David Johnson Cs6360 – Virtual Reality.
HEVC Commentary and a call for local temporal distortion metrics Mark Buxton - Intel Corporation.
Intel ® Server Platform Transitions Nov / Dec ‘07.
Getting Reproducible Results with Intel® MKL 11.0
12/4/2001CS 638, Fall 2001 Today Using separating planes/axes for collision testing Collision detection packages.
Tuning Python Applications Can Dramatically Increase Performance Vasilij Litvinov Software Engineer, Intel.
Computer Graphics 2 Lecture x: Acceleration Techniques for Ray-Tracing Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
Spatial Data Structures Jason Goffeney, 4/26/2006 from Real Time Rendering.
Evaluation of a DAG with Intel® CnC Mark Hampton Software and Services Group CnC MIT July 27, 2010.
-Global Illumination Techniques
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Stefan PopovHigh Performance GPU Ray Tracing Real-time Ray Tracing on GPU with BVH-based Packet Traversal Stefan Popov, Johannes Günther, Hans- Peter Seidel,
On a Few Ray Tracing like Algorithms and Structures. -Ravi Prakash Kammaje -Swansea University.
Recognizing Potential Parallelism Introduction to Parallel Programming Part 1.
PRESENTED BY – GAURANGI TILAK SHASHANK AGARWAL Collision Detection.
Saarland University, Germany B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek.
CIS 350 – I Game Programming Instructor: Rolf Lakaemper.
Click to edit Master title style HCCMeshes: Hierarchical-Culling oriented Compact Meshes Tae-Joon Kim 1, Yongyoung Byun 1, Yongjin Kim 2, Bochang Moon.
Interactive Rendering With Coherent Ray Tracing Eurogaphics 2001 Wald, Slusallek, Benthin, Wagner Comp 238, UNC-CH, September 10, 2001 Joshua Stough.
Fast BVH Construction on GPUs (Eurographics 2009) Park, Soonchan KAIST (Korea Advanced Institute of Science and Technology)
Stefan Popov Space Subdivision for BVHs Stefan Popov, Iliyan Georgiev, Rossen Dimov, and Philipp Slusallek Object Partitioning Considered Harmful: Space.
Boxed Processor Stocking Plans Server & Mobile Q1’08 Product Available through February’08.
Dynamic Scenes Paul Arthur Navrátil ParallelismJustIsn’tEnough.
Chapter 11 Collision Detection 가상현실 입문 그래픽스 연구실 민성환.
Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees Tero Karras NVIDIA Research.
Bounding Volume Hierarchy. The space within the scene is divided into a grid. When a ray travels through a scene, it only passes a few boxes within the.
David Luebke 3/5/2016 Advanced Computer Graphics Lecture 4: Faster Ray Tracing David Luebke
Path/Ray Tracing Examples. Path/Ray Tracing Rendering algorithms that trace photon rays Trace from eye – Where does this photon come from? Trace from.
1 The Method of Precomputing Triangle Clusters for Quick BVH Builder and Accelerated Ray Tracing Kirill Garanzha Department of Software for Computers Bauman.
Hierarchical Data Structure in Game Programming Yanci Zhang Game Programming Practice.
Layout by orngjce223, CC-BY Compact BVH StorageFabianowski ∙ Dingliana Compact BVH Storage for Ray Tracing and Photon Mapping Bartosz Fabianowski ∙ John.
Bounding Volume Hierarchies and Spatial Partitioning
Hybrid Ray Tracing and Path Tracing of Bezier Surfaces using a mixed hierarchy Rohit Nigam, P. J. Narayanan CVIT, IIIT Hyderabad, Hyderabad, India.
STBVH: A Spatial-Temporal BVH for Efficient Multi-Segment Motion Blur
BLIS optimized for EPYCTM Processors
Bounding Volume Hierarchies and Spatial Partitioning
Deformable Collision Detection
Real-Time Ray Tracing Stefan Popov.
Many-core Software Development Platforms
Accelerated Single Ray Tracing for Wide Vector Units
12/26/2018 5:07 AM Leap forward with fast, agile & trusted solutions from Intel & Microsoft* Eman Yarlagadda (for Christine McMonigal) Hybrid Cloud – Product.
Enabling TSO in OvS-DPDK
A Scalable Approach to Virtual Switching
Collision Detection.
Deformable Collision Detection
Compressed-LEAF bounding volume hierarchies
David Johnson Cs6360 – Virtual Reality
Presentation transcript:

1 1 Exploiting Local Orientation Similarity for Efficient Ray Traversal of Hair and Fur Sven Woop, Carsten Benthin, Ingo Wald, Gregory S. Johnson Intel Corporation Eric Tabellion DreamWorks Animation

2 2 Legal Disclaimer and Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright ©, Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #

3 3 Challenges of Hair Geometry Path Tracing hair requires high sampling rates to reduce noise and aliasing  Our approach helps by improving traversal performance Long and thin structures are challenging to bound using AABBs  Our approach uses oriented bounding boxes to produce much tighter bounds Many million hairs are common (in particular for furry animals)  We use direct ray/hair intersection to keep memory consumption low (tesellation impractical because of high memory consumption)

4 4 Previous Work Path Tracing Hair [Moon and Marschner 2006]: Simulating Multiple Scattering in Hair Using a Photon Mapping Approach [Ou et. al. 2012]: ISHair: Importance Sampling for Hair Scattering Oriented Bounding Box (OBB) Hierarchies [Gottschalk et. al. 1996]: OBB-Tree: A Hierarchical Structure for Rapid Interference Detection [Lext and Akenine-Möller 2001]: Towards Rapid Reconstruction for Animated Ray Tracing OBBs used in commercial renderers Ray/Curve Intersection [Sederberg and Nishita 1990]: Curve Intersection using Bezier Clipping [Nakamaru and Ohno 2002]: Ray Tracing for Curve Primitive

5 5 Hair Representation Hair subdivided into individual hair segments (done in application) Hair segments represented as cubic bezier curves (4 control points) with interpolated radius (4 radii) p0/r0 p1/r1 p2/r2 p3/r3

6 6 Bounding Representations Axis Aligned Bounding Box (AABB): lower and upper bounds in x,y,z in world space Oriented Bounding Box (OBB): lower and upper bounds in x,y,z in rotated space

7 7 Bounding Diagonal Hair Segment loose  many false positives tight  few false positives Axis aligned bounds Oriented bounds

8 8 Bounding Diagonal Hair Segments significant overlap  many traversal steps minimal overlap  few traversal steps Axis aligned bounds Oriented bounds

9 9 Local Orientation Similarity Neighboring hairs exhibit natural similarity in orientation For real hair, collisions cause similar orientation Synthetic hair mostly mimics real hair

10 Bounding Groups of Similarly Oriented Hairs Groups of equally oriented hair segments are effectively bounded by OBBs  OBB hierarchy efficient for similarly oriented hair segments

11 Our Approach Use mixed AABB/OBB hierarchy with fast direct ray/curve intersection Exploits local orientation similarity to be efficient. No advantage for random hair distributions. good no advantage

12 Mixed AABB/OBB Hierarchy 4 wide Bounding Volume Hierarchy to make effective use of 4-wide SSE Node types AABB nodes store 4 AABBs plus 4 child references OBB nodes store 4 OBBs plus 4 child references Leaf nodes store short lists of individual cubic bezier curves Triangles handled in separate BVH simplifies the implementation....

13 AABBs versus OBBs OBBs bound better, but more expensive  tradeoff Towards the root AABBs are best as hair segments are small relative to bounding box Towards the leaves OBBs are best as oriented bounds can tightly enclose hair strands  Few nodes store AABBs and many OBBs  Many AABB nodes and few OBB nodes get traversed Performan ce AABB only OBB only AABB+OB B 100%146%186%

14 Uncompressed OBB Nodes Stores 4 OBBs in Struct of Array Layout for effective use of SSE OBB stored as affine transformation (3x4 matrices) that transforms OBB to unit AABB Fast ray/OBB intersection by first transforming ray and then intersecting with unit AABB Requires 224 bytes per node  about 2x the size of an AABB node struct UncompressedOBBNode { float[4] matrix[3][4]; Node* children[4]; }

15 Compressed OBB Nodes Stores one shared quantized (signed chars) rotation that transforms the OBBs to AABBs Stores merged AABBs (after rotation) of all 4 children using floating point Stores quantisized (unsigned chars) AABBs of each child relative to merged AABB Requires only 96 bytes per node (less than half of uncompressed) struct CompressedOBBNode { char matrix[3][4]; float min_x,min_y,min_z; float max_x,max_y,max_y; uchar cmin_x[4],cmin_y[4],cmin_z[4]; uchar cmax_x[4],cmax_y[4],cmax_z[4]; Node* children[4]; }

16 AABB/OBB Hierarchy Construction Traditional top down build using SAH heuristic [Wald 2007] Handling lists of bezier curves (not lists of bounding boxes)  control points needed for spatial splits  control points allow to compute precise bounds in different spaces Use lowest SAH split from multiple splitting heuristics Some splitting heuristics operate in a special hair space Spatial splits [Stich et. al.; Popov et. al.] can make the approach more robust by handling challenging cases

17 Split Heuristics AABB Split Heuristics Object Binning (16 bins) in world space Spatial Splits (16 bins) in world space OBB Split Heuristics Object Binning (16 bins) in hair space (most important) Spatial Splits (16 bins) in hair space Similar Orientation Clustering

18 Hair Space Hair space used for binning and calculating OBBs of nodes Hair space is a coordinate space with one axis well aligned with a set of hair curves Only rotations used to be area preserving Calculation calculate candidate spaces (4 in the paper) aligned with main direction (start to end point) of random hairs pick space where sum of surface areas of bounding boxes of hair is smallest good bad x y

19 Similar Orientation Clustering Can separate two crossing hair strands  No single hair space will work well Calculation pick random hair A pick hair B that is maximally misaligned with hair A cluster according to main direction of hairs A and B bound clusters according to space aligned with main direction of A and B Gives about 5% higher rendering performance A B

20 4-wide AABB/OBB Hierarchy Construction Split multiple times to fill up all 4 children (pick largest node or node with highest SAH gain) If only „AABB heuristic“ splits create AABB node If one split was an „OBB heuristic“ split create OBB node and store OBB aligned with hair space computed for each child  SAH decides where to use which node type

21 AABB/OBB Hierarchy Traversal Modified highly optimized BVH4 single ray traversal kernel of Embree Kept fast path for AABB node handling Added slow path for OBB node handling Added fast ray/hair segment intersection

22 Ray-Hair Segment Intersection Use 8-wide AVX to generate 8 points on curve in parallel using precalculated Bezier coefficients a,b,c,d: avxf p = a*p0 + b*p1 + c*p2 + d*p3 Intersect ray using 8-wide AVX in parallel with 8 line segments using test by [Nakamaru and Ohno 2002] 8 segments work well for our models  rarely very curved hair segments need pre-subdivision p0 p1 p2 p3

23 Benchmark Settings Dual Socket Intel® Xeon® E (AVX2, 2x GHz, 64GB memory) 1M pixel resolution, path traced including shading (50% shading, 50% tracing) Representative movie content from Dreamworks Tighten 420k triangles 2.2M curves Tiger 83k triangles 6.5M curves Sophie 75k triangles 13.3M curves Yeti 82k triangles 153M curves

24 Results AABBs triangles AABBs curves AABB/OBBs curves + spatial splits + compression Perf.3.5fps3.7fps6.6fps7.5fps7.3fps Mem.1.1GB257MB387MB633MB404MB Perf.1.44fps1.0fps2.1fps2.7fps2.5fps Mem.3.5GB0.8GB1.1GB1.8GB1.1GB Perf.4.2fps3.5fps7.1fps7.3fps7.1fps Mem.6.8GB1.6GB2.1GB3.3GB2.7GB Perf.-1.8fps2.6fps3.1fps3.2fps Mem.-18.6GB21.7GB34.4GB24.9GB Measured on Dual Socket Intel® Xeon® E5-2697, GHz

25 Results: Using Ray/Curve Intersector AABBs triangles AABBs curves AABB/OBBs curves + spatial splits + compression Perf.3.5fps3.7fps6.6fps7.5fps7.3fps Mem.1.1GB257MB387MB633MB404MB Perf.1.44fps1.0fps2.1fps2.7fps2.5fps Mem.3.5GB0.8GB1.1GB1.8GB1.1GB Perf.4.2fps3.5fps7.1fps7.3fps7.1fps Mem.6.8GB1.6GB2.1GB3.3GB2.7GB Perf.-1.8fps2.6fps3.1fps3.2fps Mem.-18.6GB21.7GB34.4GB24.9GB  Using our ray/curve intersector reduces performance by 15% at 1/4 th the memory consumption Measured on Dual Socket Intel® Xeon® E5-2697, GHz

26 Results: Triangles Consume too much Memory AABBs triangles AABBs curves AABB/OBBs curves + spatial splits + compression Perf.3.5fps3.7fps6.6fps7.5fps7.3fps Mem.1.1GB257MB387MB633MB404MB Perf.1.44fps1.0fps2.1fps2.7fps2.5fps Mem.3.5GB0.8GB1.1GB1.8GB1.1GB Perf.4.2fps3.5fps7.1fps7.3fps7.1fps Mem.6.8GB1.6GB2.1GB3.3GB2.7GB Perf.-1.8fps2.6fps3.1fps3.2fps Mem.-18.6GB21.7GB34.4GB24.9GB  Out of memory, even with 64GB of memory and tessellation into only 8 triangles. Measured on Dual Socket Intel® Xeon® E5-2697, GHz

27 Results: Adding OBBs AABBs triangles AABBs curves AABB/OBBs curves + spatial splits + compression Perf.3.5fps3.7fps6.6fps7.5fps7.3fps Mem.1.1GB257MB387MB633MB404MB Perf.1.44fps1.0fps2.1fps2.7fps2.5fps Mem.3.5GB0.8GB1.1GB1.8GB1.1GB Perf.4.2fps3.5fps7.1fps7.3fps7.1fps Mem.6.8GB1.6GB2.1GB3.3GB2.7GB Perf.-1.8fps2.6fps3.1fps3.2fps Mem.-18.6GB21.7GB34.4GB24.9GB  adding OBBs gives 80% speedup for 30% higher memory consumption Measured on Dual Socket Intel® Xeon® E5-2697, GHz

28 Results: Adding Spatial Splits AABBs triangles AABBs curves AABB/OBBs curves + spatial splits + compression Perf.3.5fps3.7fps6.6fps7.5fps7.3fps Mem.1.1GB257MB387MB633MB404MB Perf.1.44fps1.0fps2.1fps2.7fps2.5fps Mem.3.5GB0.8GB1.1GB1.8GB1.1GB Perf.4.2fps3.5fps7.1fps7.3fps7.1fps Mem.6.8GB1.6GB2.1GB3.3GB2.7GB Perf.-1.8fps2.6fps3.1fps3.2fps Mem.-18.6GB21.7GB34.4GB24.9GB  spatial splits give 15% speedup for 60% higher memory consumption Measured on Dual Socket Intel® Xeon® E5-2697, GHz

29 Results: Adding Compression AABBs triangles AABBs curves AABB/OBBs curves + spatial splits + compression Perf.3.5fps3.7fps6.6fps7.5fps7.3fps Mem.1.1GB257MB387MB633MB404MB Perf.1.44fps1.0fps2.1fps2.7fps2.5fps Mem.3.5GB0.8GB1.1GB1.8GB1.1GB Perf.4.2fps3.5fps7.1fps7.3fps7.1fps Mem.6.8GB1.6GB2.1GB3.3GB2.7GB Perf.-1.8fps2.6fps3.1fps3.2fps Mem.-18.6GB21.7GB34.4GB24.9GB  spatial splits and compression give 13% speedup for similar memory consumption Measured on Dual Socket Intel® Xeon® E5-2697, GHz

30 Video Path tracing with up to 10 about 1M pixels 2x Intel(R) Xeon(R) CPU 3.10GHz (16 cores total)

31 Conclusion and Future Work AABB/OBB hierarchy gives almost 2x speedup for hair geometry Need to improve build performance currently 20x slower than building standard BVH over curve segments Handling triangles in same BVH could give additional benefit. Support for Motion Blur is important for movie rendering.

32 Questions? Source code for Xeon and Xeon Phi available as part of Embree 2.3.1,