Presentation is loading. Please wait.

Presentation is loading. Please wait.

Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees Tero Karras NVIDIA Research.

Similar presentations


Presentation on theme: "Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees Tero Karras NVIDIA Research."— Presentation transcript:

1 Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees Tero Karras NVIDIA Research

2 Trees 2

3 Trees Path tracing Pharr & Humphreys Real-time ray tracing NVIDIA Collision detection Ageia Particle simulation NVIDIA Voxel-based global illumination Crassin et al. Surface reconstruction Amenta et al. Photon mapping Uchida BetterFaster 3

4 Outline  Fastest existing methods are sequential  Parallelize within each hierarchy level  But not between levels 4

5 Outline  Fastest existing methods are sequential  Parallelize within each hierarchy level  But not between levels  Lack of parallelism  Small workloads bottlenecked by top levels  Sub-linear scaling of performance 5

6 Outline  Novel way to build the entire tree in parallel  Two algorithmic “building blocks”  Fast, scalable 6

7 Outline  Novel way to build the entire tree in parallel  Two algorithmic “building blocks”  Fast, scalable  Main focus: BVHs  Point-based octrees and k-d trees also covered in the paper 7

8 Bounding volume hierarchy 8

9 9

10 LBVH - Lauterbach et al. [2009] 1.Assign Morton codes 2.Sort primitives 3.Generate hierarchy 4.Fit bounding boxes p p x = 0. p y = 0. p z = 0. 1010 0111 1100 10

11 0 1 0 LBVH - Lauterbach et al. [2009] 1.Assign Morton codes 2.Sort primitives 3.Generate hierarchy 4.Fit bounding boxes p p x = 0. p y = 0. p z = 0. 110 011 110 1010 0111 1100 11

12 LBVH - Lauterbach et al. [2009] 1.Assign Morton codes 2.Sort primitives 3.Generate hierarchy 4.Fit bounding boxes p p x = 0. p y = 0. p z = 0. 1010 0111 1100 10101100code = 12

13 LBVH - Lauterbach et al. [2009] 1.Assign Morton codes 2.Sort primitives 3.Generate hierarchy 4.Fit bounding boxes 13

14 LBVH - Lauterbach et al. [2009] 1.Assign Morton codes 2.Sort primitives 3.Generate hierarchy 4.Fit bounding boxes 14

15 Binary radix tree 0001000010 0001000010 0010000100 0010000100 0010100101 0010100101 1001110011 1001110011 1100011000 1100011000 1100111001 1100111001 1111011110 1111011110 0000100001 0000100001 0011223344556677 15

16 Binary radix tree 0001000010 0001000010 0010000100 0010000100 0010100101 0010100101 1001110011 1001110011 1100011000 1100011000 1100111001 1100111001 1111011110 1111011110 0000100001 0000100001 0011223344556677 00 0 0 16

17 Binary radix tree 0001000010 0001000010 0010000100 0010000100 0010100101 0010100101 1001110011 1001110011 1100011000 1100011000 1100111001 1100111001 1111011110 1111011110 0000100001 0000100001 223344556677 00 000000 00000000 0 0 0011 17

18 Binary radix tree 0001000010 0001000010 0010000100 0010000100 0010100101 0010100101 1001110011 1001110011 1100011000 1100011000 1100111001 1100111001 1111011110 1111011110 0000100001 0000100001 44 00 000000 000000 0011 00100010 00100010 2233 1 1 11 77 11001100 11001100 5566 18

19 Binary radix tree 1 1 11 00 11001100 11001100 00100010 00100010 000000 000000 0001000010 0001000010 0010000100 0010000100 0010100101 0010100101 1001110011 1001110011 1100011000 1100011000 1100111001 1100111001 1111011110 1111011110 0000100001 0000100001 0011223344556677 n n-1 19

20 Longest common prefix 1 1 11 00 11001100 11001100 00100010 00100010 000000 000000 0001000010 0001000010 0010000100 0010000100 0010100101 0010100101 1001110011 1001110011 1100011000 1100011000 1100111001 1100111001 1111011110 1111011110 0000100001 0000100001 0011223344556677 20

21 Longest common prefix 00 0000100001 0000100001 00 0001000010 0001000010 11 0010000100 0010000100 22 0010100101 0010100101 33 1001110011 1001110011 44 1100111001 1100111001 66 1111011110 1111011110 77 1100011000 1100011000 55 δ(5,6) = 4 4 11001100 11001100 1100011000 1100011000 55 1100111001 1100111001 66 4 21

22 Longest common prefix 2 00 0000100001 0000100001 00 0001000010 0001000010 11 0010000100 0010000100 22 0010100101 0010100101 33 1001110011 1001110011 44 1100111001 1100111001 66 1111011110 1111011110 77 1100011000 1100011000 55 δ(5,6) = 4 11001100 11001100 δ(0,3) = ? 0000100001 0000100001 00 0010100101 0010100101 33 δ(0,3) = 2 2 4 22

23 Garanzha et al. [2011] Level 31 node Level 2 3 nodes Level 01 node 0001000010 0001000010 0000100001 0000100001 0010000100 0010000100 0010100101 0010100101 1001110011 1001110011 1100011000 1100011000 1100111001 1100111001 1111011110 1111011110 Level 12 nodes 1122 00 445533 0011223344556677 66 23

24 Our method 0001000010 0001000010 0000100001 0000100001 0010000100 0010000100 0010100101 0010100101 1001110011 1001110011 1100011000 1100011000 1100111001 1100111001 1111011110 1111011110 0011223344556677 24

25 Our method  Define a numbering scheme for the nodes  Gain some knowledge of their identity  Establish a connection with the keys 25

26 Our method  Define a numbering scheme for the nodes  Gain some knowledge of their identity  Establish a connection with the keys  Find the children of a given node  Only look at node index and nearby keys 26

27 Our method  Define a numbering scheme for the nodes  Gain some knowledge of their identity  Establish a connection with the keys  Find the children of a given node  Only look at node index and nearby keys  Do this for all nodes in parallel 27

28 Numbering scheme 0001000010 0001000010 0010000100 0010000100 0010100101 0010100101 1001110011 1001110011 1100011000 1100011000 1100111001 1100111001 1111011110 1111011110 0000100001 0000100001 0011223344556677 00 3344 28

29 Numbering scheme 0001000010 0001000010 0010000100 0010000100 0010100101 0010100101 1001110011 1001110011 1100011000 1100011000 1100111001 1100111001 1111011110 1111011110 0000100001 0000100001 0011223344556677 00 3344 55 29

30 Numbering scheme 0001000010 0001000010 0010000100 0010000100 0010100101 0010100101 1001110011 1001110011 1100011000 1100011000 1100111001 1100111001 1111011110 1111011110 0000100001 0000100001 0011223344556677 11 66 11 66 3333 0000 4444 55552222 30

31 Numbering scheme 0001000010 0001000010 0010000100 0010000100 0010100101 0010100101 1001110011 1001110011 1100011000 1100011000 1100111001 1100111001 1111011110 1111011110 0000100001 0000100001 0011223344556677 00 3344 1155 66 22 31

32 Algorithm 0001000010 0001000010 0010000100 0010000100 0010100101 0010100101 1001110011 1001110011 1100011000 1100011000 1100111001 1100111001 1111011110 1111011110 0000100001 0000100001 0011223344556677 00 3344 1155 66 22 32

33 δ(2,3) = 4 Algorithm 0000100001 0000100001 00 0001000010 0001000010 11 0010000100 0010000100 22 0010100101 0010100101 33 1001110011 1001110011 44 1100011000 1100011000 55 1100111001 1100111001 66 1111011110 1111011110 77 0010000100 0010000100 22 0010100101 0010100101 33 0010100101 0010100101 33 1001110011 1001110011 44 δ(3,4) = 0 33 δ(2,3) = 4 33

34 Algorithm 0000100001 0000100001 00 0001000010 0001000010 11 0010000100 0010000100 22 0010100101 0010100101 33 1001110011 1001110011 44 1100011000 1100011000 55 1100111001 1100111001 66 1111011110 1111011110 77 0010100101 0010100101 33 1001110011 1001110011 44 δ(3,4) = 0 33 δ(2,3) = 4 δ(1,3) = 2 δ(0,3) = 2 34

35 Algorithm 0000100001 0000100001 00 0001000010 0001000010 11 0010000100 0010000100 22 0010100101 0010100101 33 1001110011 1001110011 44 1100011000 1100011000 55 1100111001 1100111001 66 1111011110 1111011110 77 33 δ(0,3) = 2 δ(2,3) = 4δ(1,3) = 2 35

36 Algorithm 0000100001 0000100001 00 0001000010 0001000010 11 0010000100 0010000100 22 0010100101 0010100101 33 1001110011 1001110011 44 1100011000 1100011000 55 1100111001 1100111001 66 1111011110 1111011110 77 33 δ(0,3) = 2 2 2 1122 36

37 For each node i=0..n-2 in parallel: 1.Determine direction of the range 2.Expand the range as far as possible 3.Find where to split the range 4.Identify children Algorithm Binary search 37

38 Duplicate keys  The algorithm only works with unique keys  Duplicates are common in practice 38

39 Duplicate keys  The algorithm only works with unique keys  Duplicates are common in practice  Trick: Augment each key with its index  Distinguishes between duplicates  Keys are still in lexicographical order 39

40 Duplicate keys  The algorithm only works with unique keys  Duplicates are common in practice  Trick: Augment each key with its index  Distinguishes between duplicates  Keys are still in lexicographical order  Tie-break when evaluating δ(i,j) 40

41 LBVH 1.Assign Morton codes 2.Sort primitives 3.Generate hierarchy 4.Fit bounding boxes 41

42 Lauterbach et al. [2009] 42

43 Our method  Need a different approach  How many levels are there?  Which nodes are located on a given level? 43

44 Our method  Need a different approach  How many levels are there?  Which nodes are located on a given level?  Traverse paths in the tree in parallel  Start from leaves, advance toward the root  Terminate threads using per-node atomic flags 44

45 Our method 45

46 Results  Evaluate performance on GTX 480 (Fermi)  CUDA, 30-bit Morton codes 46

47 Results  Evaluate performance on GTX 480 (Fermi)  CUDA, 30-bit Morton codes  Compare against Garanzha et al. [2011]  Identical tree (top-level SAH splits disabled) 47

48 Results  Evaluate performance on GTX 480 (Fermi)  CUDA, 30-bit Morton codes  Compare against Garanzha et al. [2011]  Identical tree (top-level SAH splits disabled)  Simulate large GPUs  N times as many cores  N times the memory bandwidth 48

49 Results 49 Our method Garanzha et al. Fairy Forest 174K triangles milliseconds

50 Results 50 Our method Garanzha et al. Fairy Forest 174K triangles milliseconds

51 Results 51 Our method Turbine Blade 1.77M triangles milliseconds Garanzha et al.

52 Results 52 Stanford Dragon 871K triangles milliseconds Our method milliseconds

53 Acknowledgements  Timo Aila  Samuli Laine  David Luebke  Jacopo Pantaleoni  Jaakko Lehtinen For helpful suggestions and proofreading. 53

54 Thank You  Questions

55 Backup slides

56 Pseudocode 56

57 Results 57

58 Algorithm 0001000010 0001000010 0010000100 0010000100 0010100101 0010100101 1001110011 1001110011 1100011000 1100011000 1100111001 1100111001 1111011110 1111011110 0000100001 0000100001 0011223344556677 00 3344 112255 66 58

59 Algorithm 0000100001 0000100001 00 0001000010 0001000010 11 0010000100 0010000100 22 0010100101 0010100101 33 1001110011 1001110011 44 1100011000 1100011000 55 1100111001 1100111001 66 1111011110 1111011110 77 22 0001000010 0001000010 11 0010000100 0010000100 22 0010000100 0010000100 22 0010100101 0010100101 33 δ(1,2) = 2δ(2,3) = 4 59

60 δ(2,4) = 0 Algorithm 0000100001 0000100001 00 0001000010 0001000010 11 0010000100 0010000100 22 0010100101 0010100101 33 1001110011 1001110011 44 1100011000 1100011000 55 1100111001 1100111001 66 1111011110 1111011110 77 0001000010 0001000010 11 0010000100 0010000100 22 δ(2,3) = 4 δ(1,2) = 2 22 60

61 δ(2,3) = 4 Algorithm 0000100001 0000100001 00 0001000010 0001000010 11 0010000100 0010000100 22 0010100101 0010100101 33 1001110011 1001110011 44 1100011000 1100011000 55 1100111001 1100111001 66 1111011110 1111011110 77 22 61

62 δ(2,3) = 4 Algorithm 0000100001 0000100001 00 0001000010 0001000010 11 0010000100 0010000100 22 0010100101 0010100101 33 1001110011 1001110011 44 1100011000 1100011000 55 1100111001 1100111001 66 1111011110 1111011110 77 22 11 62


Download ppt "Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees Tero Karras NVIDIA Research."

Similar presentations


Ads by Google