Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using FPGAs to Supplement Ray-Tracing Computations on the Cray XD-1 Charles B. Cameron United States Naval Academy Department of Electrical Engineering.

Similar presentations


Presentation on theme: "Using FPGAs to Supplement Ray-Tracing Computations on the Cray XD-1 Charles B. Cameron United States Naval Academy Department of Electrical Engineering."— Presentation transcript:

1 Using FPGAs to Supplement Ray-Tracing Computations on the Cray XD-1 Charles B. Cameron United States Naval Academy Department of Electrical Engineering United States Naval Academy 105 Maryland Avenue, Stop 14B Annapolis, Maryland 21402-5025 Research supported by: NASA Goddard Space Flight Center (Code 586) NRL Applied Optics Branch (Code 5630) DoD High Performance Computing Modernization Program at NRL (Code 5593) United States Naval Academy Xilinx, Inc.

2 Topics Ray tracing Conventional parallel processing Modulo scheduling Coordination of sequential and parallel processing Expected Performance

3 Ray tracing MODIS –Moderate-resolution Imaging Spectroradiometer The Intersection Problem Finding the Perpendicular Refraction Reflection

4 MODIS Optical System ( Moderate-resolution Imaging Spectroradiometer)

5 MODIS Optical System 485 pinholes 400 rays per pinhole 241  121 rays reflected from the diffuser 5.66  10 9 rays

6 Ray Directed to a Surface MODIS –Moderate-resolution Imaging Spectroradiometer The Intersection Problem Finding the Perpendicular Refraction Reflection Coordinate Transformation

7 Calculate the Intercept Point MODIS –Moderate-resolution Imaging Spectroradiometer The Intersection Problem Finding the Perpendicular Refraction Reflection Coordinate Transformation

8 Find the Normal MODIS –Moderate-resolution Imaging Spectroradiometer The Intersection Problem Finding the Perpendicular Refraction Reflection Coordinate Transformation

9 Find the Refracted Ray MODIS –Moderate-resolution Imaging Spectroradiometer The Intersection Problem Finding the Perpendicular Refraction Reflection Coordinate Transformation

10 Find the Reflected Ray MODIS –Moderate-resolution Imaging Spectroradiometer The Intersection Problem Finding the Perpendicular Refraction Reflection Coordinate Transformation

11 MODIS –Moderate-resolution Imaging Spectroradiometer The Intersection Problem Finding the Perpendicular Refraction Reflection Coordinate Transformation (Hard to visualize this!)

12 Topics Ray tracing Conventional parallel processing Modulo scheduling Coordination of sequential and parallel processing Expected Performance

13 Parallelism

14 Performance (5.66  10 9 rays) ProcessorDEC Alpha 3000 Series Model 800. 200 MHz Cray XD-1 with 839 AMD Opteron 275 processors. 2.2 GHz Duration 1.2  10 6 s (Two weeks) 27 s Rate 0.112  10 6 rays · surfaces / s 6.6  10 6 rays · surfaces / (s · processor) Reduction in Time Consumed: Improvement in Ray Tracing Rate: 99.998 % 5,857 % * * Rate based on a linear regression of results obtained using a varying numbers of processors.

15 Performance (5.66  10 9 rays)

16 Efficiency

17 Topics Ray tracing Conventional parallel processing Modulo scheduling Coordination of sequential and parallel processing Expected Performance

18 Operations Required as a Function of Surface, Aperture, and Interaction Types Circular Aperture Rectangular Aperture Plane1. Refraction 7. Reflection 4. Refraction 10. Reflection Sphere2. Refraction 8. Reflection 5. Refraction 11. Reflection Conicoid3. Refraction 9. Reflection 6. Refraction 12. Reflection Lots of theseNot too many of these

19 Quadratic Equation Critical Path (Data-Flow Limit) 88 cycles Latency Unit# of cycles Adder11 Multiplier6 Divider27 Square root extractor27

20 Modulo Scheduling: One Multiplier

21

22

23

24

25

26

27 Equal to the Data-Flow Limit

28 One collective computation Modulo Scheduling: Filling the Pipeline

29

30 Multipliers are 100 % utilized Modulo Scheduling: Filling the Pipeline No schedule conflicts

31 Modulo Scheduling: Two Multipliers Two multipliers with two multiplications each

32 Modulo Scheduling: Two Multipliers Two cycles One adder with two additions Maximum efficiency

33 Modulo Scheduling: Two Multipliers Improved efficiency: Up from 25 %

34 Modulo Scheduling: Two Multipliers

35

36 Less than the Data-Flow Limit

37 Modulo Scheduling: Two Multipliers Less than the Data-Flow Limit, but double the throughput.

38 Topics Ray tracing Conventional parallel processing Modulo scheduling Coordination of sequential and parallel processing Expected Performance

39 Cray XD-1 MPI (Message Passing Interface) Master node Reads file Distributes file Collates results

40 One Node of the Cray XD-1 Open MP (Multi Processing) 144 of 220 nodes have a Xilinx Virtex II Pro FPGA Opteron processors Sequential program Depth first FPGA Pipelined hardware Breadth first

41 Topics Ray tracing Conventional parallel processing Modulo scheduling Coordination of sequential and parallel processing Expected Performance

42 Performance Opteron alone 6.6  10 6 rays · surfaces / s · proc [meas.] FPGA alone 5.4  10 6 rays · surfaces / s · proc [est.] Reduction in speed =  20 %.

43 Performance Opteron alone 6.6  10 6 rays · surfaces / s · proc [meas.] FPGA alone 5.4  10 6 rays · surfaces / s · proc [est.] Reduction in speed =  20 %. Opteron with FPGA 12.0  10 6 rays · surfaces / s · proc [est.] Increase in speed = +80 %. Floating point units use 11% of FPGA 1 adder 1 multiplier 1 divider 1 square-root unit

44 Performance Opteron alone 6.6  10 6 rays · surfaces / s · proc [meas.] FPGA alone 5.4  10 6 rays · surfaces / s · proc [est.] Reduction in speed =  20 %. Opteron with FPGA 12.0  10 6 rays · surfaces / s · proc [est.] Increase in speed = +80 %. Floating point units use 11% of FPGA Opteron with FPGA 25.2  10 6 rays · surfaces / s · proc [est.] Increase in speed = +285 %. Floating point units use  25% of FPGA 1 adder 1 multiplier 1 divider 1 square-root unit 3 adders 4 multipliers 1 divider 1 square-root unit

45 Performance

46 Summary Modulo scheduling produces 100 % efficiency of critical resources. Sequential processors get a boost from supplemental FPGA processing. Deep pipelines are efficient only if filled much of the time. FPGAs beat ASICs only if they can take advantage of special problem knowledge. Opteron uses 55 W. Virtex II Pro FPGA uses 4 W to 45 W.

47 Equations Intersection of a Ray with a Plane Intersection of a Ray with a Sphere Intersection of a Ray with a Conicoid Finding the Perpendicular Interaction of a Ray with an Optical Surface Coordinate Transformations

48 Intersection of a Ray with a Plane List of equations Initial direction Normal to the plane Point in the plane Initial point Final point

49 Intersection of a Ray with a Sphere List of equations Initial pointFinal point Initial direction

50 Intersection of a Ray with a Conicoid List of equations Initial point Final point Initial direction

51 Finding the Perpendicular Unit Vector Normal to a Sphere Unit Vector Normal to a Conicoid List of equations

52 Interaction of a Ray with an Optical Surface RefractionReflection List of equations Initial index of refraction Final index of refraction Normal to the plane Initial direction Final direction

53 Coordinate Transformations Rotation and Translation Rotation List of equations Translation Vector Rotation Matrix Direction in Frame of Reference k Direction in Frame of Reference k+1 Position in Frame of Reference k Position in Frame of Reference k+1


Download ppt "Using FPGAs to Supplement Ray-Tracing Computations on the Cray XD-1 Charles B. Cameron United States Naval Academy Department of Electrical Engineering."

Similar presentations


Ads by Google