Using FPGAs to Supplement Ray-Tracing Computations on the Cray XD-1 Charles B. Cameron United States Naval Academy Department of Electrical Engineering.

Using FPGAs to Supplement Ray-Tracing Computations on the Cray XD-1 Charles B. Cameron United States Naval Academy Department of Electrical Engineering United States Naval Academy 105 Maryland Avenue, Stop 14B Annapolis, Maryland 21402-5025 Research supported by: NASA Goddard Space Flight Center (Code 586) NRL Applied Optics Branch (Code 5630) DoD High Performance Computing Modernization Program at NRL (Code 5593) United States Naval Academy Xilinx, Inc.

Topics Ray tracing Conventional parallel processing Modulo scheduling Coordination of sequential and parallel processing Expected Performance

Ray tracing MODIS –Moderate-resolution Imaging Spectroradiometer The Intersection Problem Finding the Perpendicular Refraction Reflection

MODIS Optical System ( Moderate-resolution Imaging Spectroradiometer)

MODIS Optical System 485 pinholes 400 rays per pinhole 241  121 rays reflected from the diffuser 5.66  10 9 rays

Ray Directed to a Surface MODIS –Moderate-resolution Imaging Spectroradiometer The Intersection Problem Finding the Perpendicular Refraction Reflection Coordinate Transformation

Calculate the Intercept Point MODIS –Moderate-resolution Imaging Spectroradiometer The Intersection Problem Finding the Perpendicular Refraction Reflection Coordinate Transformation

Find the Normal MODIS –Moderate-resolution Imaging Spectroradiometer The Intersection Problem Finding the Perpendicular Refraction Reflection Coordinate Transformation

Find the Refracted Ray MODIS –Moderate-resolution Imaging Spectroradiometer The Intersection Problem Finding the Perpendicular Refraction Reflection Coordinate Transformation

Find the Reflected Ray MODIS –Moderate-resolution Imaging Spectroradiometer The Intersection Problem Finding the Perpendicular Refraction Reflection Coordinate Transformation

MODIS –Moderate-resolution Imaging Spectroradiometer The Intersection Problem Finding the Perpendicular Refraction Reflection Coordinate Transformation (Hard to visualize this!)

Parallelism

Performance (5.66  10 9 rays) ProcessorDEC Alpha 3000 Series Model 800. 200 MHz Cray XD-1 with 839 AMD Opteron 275 processors. 2.2 GHz Duration 1.2  10 6 s (Two weeks) 27 s Rate 0.112  10 6 rays · surfaces / s 6.6  10 6 rays · surfaces / (s · processor) Reduction in Time Consumed: Improvement in Ray Tracing Rate: 99.998 % 5,857 % * * Rate based on a linear regression of results obtained using a varying numbers of processors.

Performance (5.66  10 9 rays)

Efficiency

Operations Required as a Function of Surface, Aperture, and Interaction Types Circular Aperture Rectangular Aperture Plane1. Refraction 7. Reflection 4. Refraction 10. Reflection Sphere2. Refraction 8. Reflection 5. Refraction 11. Reflection Conicoid3. Refraction 9. Reflection 6. Refraction 12. Reflection Lots of theseNot too many of these

Quadratic Equation Critical Path (Data-Flow Limit) 88 cycles Latency Unit# of cycles Adder11 Multiplier6 Divider27 Square root extractor27

Modulo Scheduling: One Multiplier

Equal to the Data-Flow Limit

One collective computation Modulo Scheduling: Filling the Pipeline

Multipliers are 100 % utilized Modulo Scheduling: Filling the Pipeline No schedule conflicts

Modulo Scheduling: Two Multipliers Two multipliers with two multiplications each

Modulo Scheduling: Two Multipliers Two cycles One adder with two additions Maximum efficiency

Modulo Scheduling: Two Multipliers Improved efficiency: Up from 25 %

Modulo Scheduling: Two Multipliers

Less than the Data-Flow Limit

Modulo Scheduling: Two Multipliers Less than the Data-Flow Limit, but double the throughput.

Cray XD-1 MPI (Message Passing Interface) Master node Reads file Distributes file Collates results

One Node of the Cray XD-1 Open MP (Multi Processing) 144 of 220 nodes have a Xilinx Virtex II Pro FPGA Opteron processors Sequential program Depth first FPGA Pipelined hardware Breadth first

Performance Opteron alone 6.6  10 6 rays · surfaces / s · proc [meas.] FPGA alone 5.4  10 6 rays · surfaces / s · proc [est.] Reduction in speed =  20 %.

Performance Opteron alone 6.6  10 6 rays · surfaces / s · proc [meas.] FPGA alone 5.4  10 6 rays · surfaces / s · proc [est.] Reduction in speed =  20 %. Opteron with FPGA 12.0  10 6 rays · surfaces / s · proc [est.] Increase in speed = +80 %. Floating point units use 11% of FPGA 1 adder 1 multiplier 1 divider 1 square-root unit

Performance Opteron alone 6.6  10 6 rays · surfaces / s · proc [meas.] FPGA alone 5.4  10 6 rays · surfaces / s · proc [est.] Reduction in speed =  20 %. Opteron with FPGA 12.0  10 6 rays · surfaces / s · proc [est.] Increase in speed = +80 %. Floating point units use 11% of FPGA Opteron with FPGA 25.2  10 6 rays · surfaces / s · proc [est.] Increase in speed = +285 %. Floating point units use  25% of FPGA 1 adder 1 multiplier 1 divider 1 square-root unit 3 adders 4 multipliers 1 divider 1 square-root unit

Performance

Summary Modulo scheduling produces 100 % efficiency of critical resources. Sequential processors get a boost from supplemental FPGA processing. Deep pipelines are efficient only if filled much of the time. FPGAs beat ASICs only if they can take advantage of special problem knowledge. Opteron uses 55 W. Virtex II Pro FPGA uses 4 W to 45 W.

Equations Intersection of a Ray with a Plane Intersection of a Ray with a Sphere Intersection of a Ray with a Conicoid Finding the Perpendicular Interaction of a Ray with an Optical Surface Coordinate Transformations

Intersection of a Ray with a Plane List of equations Initial direction Normal to the plane Point in the plane Initial point Final point

Intersection of a Ray with a Sphere List of equations Initial pointFinal point Initial direction

Intersection of a Ray with a Conicoid List of equations Initial point Final point Initial direction

Finding the Perpendicular Unit Vector Normal to a Sphere Unit Vector Normal to a Conicoid List of equations

Interaction of a Ray with an Optical Surface RefractionReflection List of equations Initial index of refraction Final index of refraction Normal to the plane Initial direction Final direction

Coordinate Transformations Rotation and Translation Rotation List of equations Translation Vector Rotation Matrix Direction in Frame of Reference k Direction in Frame of Reference k+1 Position in Frame of Reference k Position in Frame of Reference k+1

Using FPGAs to Supplement Ray-Tracing Computations on the Cray XD-1 Charles B. Cameron United States Naval Academy Department of Electrical Engineering.

Similar presentations

Presentation on theme: "Using FPGAs to Supplement Ray-Tracing Computations on the Cray XD-1 Charles B. Cameron United States Naval Academy Department of Electrical Engineering."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Using FPGAs to Supplement Ray-Tracing Computations on the Cray XD-1 Charles B. Cameron United States Naval Academy Department of Electrical Engineering.

Similar presentations

Presentation on theme: "Using FPGAs to Supplement Ray-Tracing Computations on the Cray XD-1 Charles B. Cameron United States Naval Academy Department of Electrical Engineering."— Presentation transcript:

Similar presentations

About project

Feedback