Download presentation
Presentation is loading. Please wait.
Published byRolf Booth Modified over 9 years ago
1
Using FPGAs to Supplement Ray-Tracing Computations on the Cray XD-1 Charles B. Cameron United States Naval Academy Department of Electrical Engineering United States Naval Academy 105 Maryland Avenue, Stop 14B Annapolis, Maryland 21402-5025 Research supported by: NASA Goddard Space Flight Center (Code 586) NRL Applied Optics Branch (Code 5630) DoD High Performance Computing Modernization Program at NRL (Code 5593) United States Naval Academy Xilinx, Inc.
2
Topics Ray tracing Conventional parallel processing Modulo scheduling Coordination of sequential and parallel processing Expected Performance
3
Ray tracing MODIS –Moderate-resolution Imaging Spectroradiometer The Intersection Problem Finding the Perpendicular Refraction Reflection
4
MODIS Optical System ( Moderate-resolution Imaging Spectroradiometer)
5
MODIS Optical System 485 pinholes 400 rays per pinhole 241 121 rays reflected from the diffuser 5.66 10 9 rays
6
Ray Directed to a Surface MODIS –Moderate-resolution Imaging Spectroradiometer The Intersection Problem Finding the Perpendicular Refraction Reflection Coordinate Transformation
7
Calculate the Intercept Point MODIS –Moderate-resolution Imaging Spectroradiometer The Intersection Problem Finding the Perpendicular Refraction Reflection Coordinate Transformation
8
Find the Normal MODIS –Moderate-resolution Imaging Spectroradiometer The Intersection Problem Finding the Perpendicular Refraction Reflection Coordinate Transformation
9
Find the Refracted Ray MODIS –Moderate-resolution Imaging Spectroradiometer The Intersection Problem Finding the Perpendicular Refraction Reflection Coordinate Transformation
10
Find the Reflected Ray MODIS –Moderate-resolution Imaging Spectroradiometer The Intersection Problem Finding the Perpendicular Refraction Reflection Coordinate Transformation
11
MODIS –Moderate-resolution Imaging Spectroradiometer The Intersection Problem Finding the Perpendicular Refraction Reflection Coordinate Transformation (Hard to visualize this!)
12
Topics Ray tracing Conventional parallel processing Modulo scheduling Coordination of sequential and parallel processing Expected Performance
13
Parallelism
14
Performance (5.66 10 9 rays) ProcessorDEC Alpha 3000 Series Model 800. 200 MHz Cray XD-1 with 839 AMD Opteron 275 processors. 2.2 GHz Duration 1.2 10 6 s (Two weeks) 27 s Rate 0.112 10 6 rays · surfaces / s 6.6 10 6 rays · surfaces / (s · processor) Reduction in Time Consumed: Improvement in Ray Tracing Rate: 99.998 % 5,857 % * * Rate based on a linear regression of results obtained using a varying numbers of processors.
15
Performance (5.66 10 9 rays)
16
Efficiency
17
Topics Ray tracing Conventional parallel processing Modulo scheduling Coordination of sequential and parallel processing Expected Performance
18
Operations Required as a Function of Surface, Aperture, and Interaction Types Circular Aperture Rectangular Aperture Plane1. Refraction 7. Reflection 4. Refraction 10. Reflection Sphere2. Refraction 8. Reflection 5. Refraction 11. Reflection Conicoid3. Refraction 9. Reflection 6. Refraction 12. Reflection Lots of theseNot too many of these
19
Quadratic Equation Critical Path (Data-Flow Limit) 88 cycles Latency Unit# of cycles Adder11 Multiplier6 Divider27 Square root extractor27
20
Modulo Scheduling: One Multiplier
27
Equal to the Data-Flow Limit
28
One collective computation Modulo Scheduling: Filling the Pipeline
30
Multipliers are 100 % utilized Modulo Scheduling: Filling the Pipeline No schedule conflicts
31
Modulo Scheduling: Two Multipliers Two multipliers with two multiplications each
32
Modulo Scheduling: Two Multipliers Two cycles One adder with two additions Maximum efficiency
33
Modulo Scheduling: Two Multipliers Improved efficiency: Up from 25 %
34
Modulo Scheduling: Two Multipliers
36
Less than the Data-Flow Limit
37
Modulo Scheduling: Two Multipliers Less than the Data-Flow Limit, but double the throughput.
38
Topics Ray tracing Conventional parallel processing Modulo scheduling Coordination of sequential and parallel processing Expected Performance
39
Cray XD-1 MPI (Message Passing Interface) Master node Reads file Distributes file Collates results
40
One Node of the Cray XD-1 Open MP (Multi Processing) 144 of 220 nodes have a Xilinx Virtex II Pro FPGA Opteron processors Sequential program Depth first FPGA Pipelined hardware Breadth first
41
Topics Ray tracing Conventional parallel processing Modulo scheduling Coordination of sequential and parallel processing Expected Performance
42
Performance Opteron alone 6.6 10 6 rays · surfaces / s · proc [meas.] FPGA alone 5.4 10 6 rays · surfaces / s · proc [est.] Reduction in speed = 20 %.
43
Performance Opteron alone 6.6 10 6 rays · surfaces / s · proc [meas.] FPGA alone 5.4 10 6 rays · surfaces / s · proc [est.] Reduction in speed = 20 %. Opteron with FPGA 12.0 10 6 rays · surfaces / s · proc [est.] Increase in speed = +80 %. Floating point units use 11% of FPGA 1 adder 1 multiplier 1 divider 1 square-root unit
44
Performance Opteron alone 6.6 10 6 rays · surfaces / s · proc [meas.] FPGA alone 5.4 10 6 rays · surfaces / s · proc [est.] Reduction in speed = 20 %. Opteron with FPGA 12.0 10 6 rays · surfaces / s · proc [est.] Increase in speed = +80 %. Floating point units use 11% of FPGA Opteron with FPGA 25.2 10 6 rays · surfaces / s · proc [est.] Increase in speed = +285 %. Floating point units use 25% of FPGA 1 adder 1 multiplier 1 divider 1 square-root unit 3 adders 4 multipliers 1 divider 1 square-root unit
45
Performance
46
Summary Modulo scheduling produces 100 % efficiency of critical resources. Sequential processors get a boost from supplemental FPGA processing. Deep pipelines are efficient only if filled much of the time. FPGAs beat ASICs only if they can take advantage of special problem knowledge. Opteron uses 55 W. Virtex II Pro FPGA uses 4 W to 45 W.
47
Equations Intersection of a Ray with a Plane Intersection of a Ray with a Sphere Intersection of a Ray with a Conicoid Finding the Perpendicular Interaction of a Ray with an Optical Surface Coordinate Transformations
48
Intersection of a Ray with a Plane List of equations Initial direction Normal to the plane Point in the plane Initial point Final point
49
Intersection of a Ray with a Sphere List of equations Initial pointFinal point Initial direction
50
Intersection of a Ray with a Conicoid List of equations Initial point Final point Initial direction
51
Finding the Perpendicular Unit Vector Normal to a Sphere Unit Vector Normal to a Conicoid List of equations
52
Interaction of a Ray with an Optical Surface RefractionReflection List of equations Initial index of refraction Final index of refraction Normal to the plane Initial direction Final direction
53
Coordinate Transformations Rotation and Translation Rotation List of equations Translation Vector Rotation Matrix Direction in Frame of Reference k Direction in Frame of Reference k+1 Position in Frame of Reference k Position in Frame of Reference k+1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.