Download presentation
Presentation is loading. Please wait.
Published bySheila Atherley Modified over 10 years ago
1
A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008
2
Motivation Point-based graphics established Powerful algorithms –Representation –Processing –Manipulation –Rendering Decomposition –Get neighborhood –Operate on neighbors Graphics Hardware 2008 2
3
Motivation GPUs not suited for getting neighborhood –SIMD –Incoherent branching –Dynamic data structures slow –Recursive calls not supported CPUs –Small number of FPUs –Inflexible memory caches Graphics Hardware 2008 3 Courtesy of NVIDIA Courtesy of Intel
4
Contributions Hardware architecture for point sets –Neighbor search module –Novel advanced caching mechanism –Reconfigurable processing module –Programmability using FPGA compiler FPGA prototype and measurements Small & Lean Integration into multi-core CPU/GPU possible Graphics Hardware 2008 4
5
Outline Related Work Spatial Searching and Caching Architecture and Prototype Results Conclusion Graphics Hardware 2008 5
6
Related Work Kd-Tree [Bentley 75] Graphics Hardware 2008 6 kNN on GPUs [Ma and McCool 02] Kd-Tree Hardware [Woop et al. 05] [Woop et al. 06] Kd-Tree on GPUs [Popov et al. 07]
7
Related Work Adaptive SPH Fluid Simulation [Adams et al. ‘07] Graphics Hardware 2008 7 Linear Moving Least Squares, [Adamson and Alexa ’04] Algebraic Moving Least Squares, [Guennebaud and Gross ‘07]
8
Linear Moving Least Squares Graphics Hardware 2008 8 Implicit surface definition defined by set of points
9
Linear Moving Least Squares Graphics Hardware 2008 9 x Implicit surface definition defined by set of points
10
Linear Moving Least Squares Graphics Hardware 2008 10 x pipi nini
11
Linear Moving Least Squares Graphics Hardware 2008 11 x Iterative projections onto plane
12
Linear Moving Least Squares Graphics Hardware 2008 12 x Iterative projections onto plane x’ ’
13
Linear Moving Least Squares Graphics Hardware 2008 13 x Iterative projections onto plane x’’ ’
14
Linear Moving Least Squares Graphics Hardware 2008 14 x Iterative projections onto plane x’’’ ’ ’ ’
15
Linear Moving Least Squares Graphics Hardware 2008 15 x Surface defined by points projecting onto themselves
16
Outline Related Work Spatial Searching and Caching Architecture & Prototype Results Conclusion Graphics Hardware 2008 16
17
Spatial Search Spatial search: kNN and NN –Common in most point operations –Based on kd-tree Example NN: Graphics Hardware 2008 17
18
Spatial Search kNN search similar to NN search: –Start with infinite radius –Sort leaf points into priority queue –Shrink radius with every point sorted Graphics Hardware 2008 18
19
Coherent Neighbor Cache ( NN) Find neighbors in slightly bigger radius Re-use result for spatially close query Graphics Hardware 2008 19 Re-use if
20
Coherent Neighbor Cache (kNN, exact) Find (k+1) neighbors Re-use result for spatially close query Graphics Hardware 2008 20 Re-use if
21
Coherent Neighbor Cache (kNN, approximation) Approximation error –Enlarge radius Graphics Hardware 2008 21 Re-use if
22
Outline Related Work Spatial Searching and Caching Architecture & Prototype Results Conclusion Graphics Hardware 2008 22
23
The Architecture Graphics Hardware 2008 23 Host
24
Eight cached neighborhoods Problem: parallel queries in kd-tree module Interleave spatially similar queries Coherent Neighbor Cache Graphics Hardware 2008 24 11 1 0 0 0 nn n
25
Kd-Tree Traversal Graphics Hardware 2008 25
26
Graphics Hardware 2008 26 Kd-tree structure on chip 16 threads Pipelining and multi-threading Node Recurse
27
Stacks 16 stacks Parallel read/write Bounded in depth 6 bytes per thread per recursion Graphics Hardware 2008 27
28
Leaf 16 parallel priority queues (1-cycle ops) Queues store pointers and distances Bandwidth bottleneck Graphics Hardware 2008 28
29
Multithreaded quad-port bank of 16 registers 128 threads Programmability using FPGA-technology Processing Module Graphics Hardware 2008 29
30
Further Data Implemented on two FPGAs –64 bit DDR DRAM –Interconnection: no overhead Resource usage regs and LUTs –Virtex 2 Pro 100 (kNN): 26% registers, 38% LUTs –Virtex 2 Pro 70 (MLS): 47% registers, 52% LUTs Clock frequency: 75 MHz Graphics Hardware 2008 30
31
Outline Related Work Spatial Searching and Caching Architecture & Prototype Results Conclusion Graphics Hardware 2008 31
32
Applications Tested on various applications PCI interface of prototype slow Graphics Hardware 2008 32 [Weyrich et al. 04] [Adams et al. 07]
33
Results kNN Graphics Hardware 2008 33 CUDA: x4 CPU: x1.5 FPGA: x1 CUDA: x2.4 CPU: x1.4 FPGA: x1 CUDA w/o sort: x4.0 CUDA: x1.6 CPU: x1.1 FPGA: x1 CUDA w/o sort: x3.1 75 MHz 1200 MHz 2200 MHz Number of Neighbors Number of queries ASIC estimate, 500 MHz x6.6
34
Results kNN Graphics Hardware 2008 34 CUDA: x4 CPU: x1.5 FPGA: x1 CUDA: x2.4 CPU: x1.4 FPGA: x1 CUDA w/o sort: x4.0 CUDA: x1.6 CPU: x1.1 FPGA: x1 CUDA w/o sort: x3.1 75 MHz 1200 MHz 2200 MHz Number of Neighbors Number of queries ASIC estimate, 500 MHz x6.6 Small hardware footprint FPGA slightly slower Realistic clock frequency Prototype faster than CPU/GPU
35
Results MLS Graphics Hardware 2008 35 FPGA: x1 MLS CPU: x0.4 MLS CUDA x3.8 75 MHz 1200 MHz 2200 MHz Number of Neighbors Number of queries FPGA faster than CPU kNN bottleneck –FPGA –GPU
36
Coherent Neighbor Cache Graphics Hardware 2008 36 CPU, =0.1 FPGA, exact FPGA, =0.1 Level of coherence Number of queries
37
Results Approximation Error (MLS projection) Graphics Hardware 2008 37 approximation MLS Error no approx.
38
Results Approximation Error (MLS projection) Graphics Hardware 2008 38 Cache hits Cache Hits approximation
39
Approximation Error (visual) Graphics Hardware 2008 39
40
Approximation Error (visual) Graphics Hardware 2008 40 Coherent Neighbor Cache: Not optimal for exact queries Approximate queries –Can be tolerated in most cases –Greatly increases performance –Even for small approximations
41
Outline Related Work Spatial Searching and Caching Architecture & Prototype Results Conclusion Graphics Hardware 2008 41
42
Conclusion Novel hardware architecture for –Nearest-neighbor searches –Generic meshless processing operators Cache exploiting spatial coherence Good performance considering resources Possible GPU integration Graphics Hardware 2008 42
43
Future Work Programmable data structure –Support different data structures –Programmability in data structure –Construction on-chip ‘Real’ programmability in point processing module Graphics Hardware 2008 43
44
A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.