On a Few Ray Tracing like Algorithms and Structures. -Ravi Prakash Kammaje -Swansea University
Ray Tracing Naïve method – Intersect every ray against every triangle – O (rays * trs) Need better methods
Data Structures BSP TeesUniform Grid Octree Bounding Volume (Box) Hierarchy
Kd-trees A specialised BSP Tree Axes restricted to X, Y and Z axes Among most widely used for ray tracing – SAH Heuristic to build trees suitable for Ray Tracing – Cheap Traversal
RBSP Trees Form of BSP tree – Space partitioning – Binary – 2 children at each node Predetermined axes – Number of axes, m – Axes Construction and Traversal – Similar to kd-trees – Heuristics from kd-tree borrowed
RBSP Trees - Example kd-treeRBSP tree, 24 axes
RBSP Trees - Construction Predetermine Axis Methods to predetermine m axes Evenly spaced points on Sphere Find evenly spaced points on unit sphere Use vector from centre to points as axes Advantage Has an even distribution of axes Disadvantage Axes are not customised to scene
Construction Recursive process Find bounding volume At each node Find a split plane Use a heuristic Classify triangles Continue until very few triangles are in node A maximum depth is reached Split Plane Selection Use SAH over all axes Select plane with minimum cost
RBSP Trees - Traversal Standard slabs method Over m planes Find intersection of ray and plane Precomputes divides Number of divide operations = m If m is large, divide operations cause slowdowns Use SSE to perform 4 divides Accelerates ray tracing
RBSP Trees - Results Makes RBSP trees faster than kd-trees A structure that shows Ray tracing potential Better than kd-trees for models with non-axis aligned scenes Needs better heuristics to predetermine axes
Row Tracing Combines rasterization and ray tracing concepts A form of Packet ray tracing – Packets of rays spanning an entire row Row can be – A 2D plane Simpler traversal Easy row / triangle intersection – per-pixel cost less than ray / triangle intersections A 1D line – Simplifies clipping, occlusion testing and frustum testing
Row Tracing - Algorithm High level algorithm – Traverse row-plane through kd-tree or octree – Rasterize leaf node triangles with scanline algorithm Very similar to Ray tracing Early ray termination not possible Use 1D Hierarchical Occlusion Maps to achieve this
Row Tracing – Hierarchical Occlusion Maps Important optimization – Indicates already occluded parts of a Row 1D version of HOM by Zhang, et al. (1997) Lowest level – 1 pixel Each upper level – 2 bits of lower level For a row with 1024 pixels, lowest level – 128 chars Entire HOM – 256 chars
Row Tracing – Hierarchical Occlusion Maps Initialize prior to traversal – Set all bits to zero – The entire row is unoccluded Updating the HOM – Triangles rasterization – Corresponding lowest level bits are set to 1 – Upper levels updated if necessary Testing for Occlusion – Skip occluded nodes – Optimize rasterization
Packet Row Tracing Row-Packet / Node intersection – Case 1 – All rows in packet hit the node – Case 2 – Row packet misses node – Case 3 – Divergence nodes – Trace individual rows from these nodes Occlusion testing – Test each row individually Leaf node – All rows are rasterized with leaf node’s triangles Easily multi threadable
Row Tracing – vs Packet Ray Tracing
Row Tracing – vs OpenGL
GPUs Very Powerful Highly Parallel Example – Nvidia GeForce GTX cores 648 MHz Graphics Clock 1476 MHz Processor Clock 1 GB GDDR3 SDRAM General Purpose on graphics hardware is getting popular
GPU based Algorithms GPUs are much faster at doing parallel tasks However, simple tasks require special algorithms to effectively utilise this Example – Scan of an array – Find sum of all previous elements in the array – Input : {3,7,1,5,8,2,8,1,8,6,2,8} – Output : {3,10,11,16,24,26,34,35,43,49,51,59}
GPU based Algorithms On CPU for(i=1; i < num; ++i) arr[i] = arr[i]+arr[i-1]; On GPU – Use parallel scanning algorithm – Make use of several threads – Each element finds sum of itself and element at an offset
GPU Algorithm – Parallel sum Same number of threads as number of elements in array Offset = 1 => Each thread finds sum of itself and it’s neighbouring element Double the offset Iterate until offset < number of elements Can be optimised further by using blocks of threads and intermediate results
Fast ray sorting and breadth-first Packet Traversal for GPU ray tracing - Garanzha and Loop Sort rays on the GPU – Generate a hash code for each ray based on Direction of ray Origin of ray – If rays have same hash code Considered coherent – Sorted into bins Each bin has < maxSize rays – Compression, Sorting, Decompression scheme Utilises GPU efficiently Create frustum for each bin Breadth first traverse a BVH of triangles
OpenCL Based on C Framework for developing heterogenous applications – In theory Some parts can be run on GPU Some on CPU Initially developed by Apple
OpenCL
OpenCL – early impressions Still very early – Complex code – Runs on both CPUs and GPUs Potentially easier to debug on CPUs prior to porting to GPUs Can allocate work based on suitability Runs on NVIDIA and AMD / ATI cards CUDA – much easier to program – Much cleaner code – Not cross platform – Only on NVIDIA GPUs
Conclusion A few ray tracing like structures and algorithms – RBSP Trees – Row Tracing Brief summary of GPU Algorithms – Parallel scan – Ray tracing by ray sorting – Garanzha and Loop – OpenCL