Backprojection and Synthetic Aperture Radar Processing on a HHPC Albert Conti, Ben Cordes, Prof. Miriam Leeser, Prof. Eric Miller Synthetic Aperture Radar (SAR) is a process by which high- resolution images can be formed by processing a series of radar reflections taken by a single transceiver. Backprojection is one method for post-processing these reflections; it is a highly parallel algorithm, which makes it suitable for translation into hardware. This poster explores the difficulties involved in achieving maximum speedup from a hardware implementation on a parallel computing system, including memory bandwidth, communication bottlenecks, and others. Abstract What is SAR? SAR: Synthetic Aperture Radar Aperture (width of radar dish) directly affects the resolution of the image Many radar pulses taken and processed Aperture is synthetically increased by accumulating the results ‘Stripmap’ and ‘Spotlight’ modes For More Detail: Soumekh, M. “Synthetic Aperture Radar Signal Processing with MATLAB Algorithms”, ISBN Stripmap Mode SAR Plane flies past target in straight line Multiple radar pulses are taken at right angle to flight path Each pulse covers some portion of the target area What is Backprojection? SAR output: array of radar response ‘projections’ Filter out physical effects of radar Correlate pixels to time, index into projection data Interpolate between indices to increase accuracy Previous Work Medical Imaging Spotlight mode with backprojection Used Annapolis Firebird board Precursor to WildStar II board 65MHz clock, 16-way pipeline Previous Work: Results For More Detail: Haiqian Yu, “Memory Architecture for Data Intensive Image Processing Algorithms in Reconfigurable Hardware”, Master’s Thesis; Northeastern University, Boston MA PlatformRuntimeSpeedup 1GHz Pentium, Floating-point94s1.0x 1GHz Pentium, Fixed-point28s3.4x 50MHz WildStar I (1-way)5.37s17x 65MHz FireBird (1-way)4.13s23x 50MHz WildStar I (4-way)1.34s70x 65MHz FireBird (16-way)0.26s360x HHPC Architecture 48-node Beowulf cluster Dual 2.2GHz Xeons Linux OS Annapolis MicroSystems WildStar II FGPA boards Champ LVDS systolic interconnect Gigabit Ethernet cards Myrinet MPI cards Funded by DOD High Performance Computing Modernization Program. Grant #PET SIP-K Exploiting Parallelism Parallel operations can provide performance gains Data dependencies reduce parallelism Few dependencies exist in SAR/BP Coarse-grained Parallelism Process several projections on each system Size and available space determines ratio of projections per board Fine-grained parallelism Work for each set of projections can be pipelined and parallelized Memory bandwidth determines number of parallel pipelines Hybrid Implementation Future Optimizations Performance The hybrid implementation achieved 40X speedup over a software solution with a single node of the HHPC (no coarse-grained parallelism). Preliminary results from the parallel version of the hybrid implementation show drastic speedup in the processing stage of the algorithm, yet a slowdown in reconstruction of the final image due to inter- process communication. Currently, work is being done to analyze the optimal number of processing nodes to reconstruct images most efficiently with the HHPC. Overlap processing and communication in an effort to make use of inherent communication latency Overlap file I/O and communication to minimize end to end run time Utilize processing nodes for intermediate merging Stagger processing stages to avoid communication collisions 4. Target Images Merged PC FPGA PC FPGA PC FPGA PC FPGA PC 1. Input Data Loaded from Disc 5. Aggregate Image Stored to Disc 2. Data Broadcasted to 3. Parallel Processing Backprojection Processor Nodes In stage 1, data from the separate projections are fetched from storage and made ready to distribute amongst the processing nodes. In stage 2, the projection data is broadcasted to all of the processor nodes via Myrinet. Processor nodes listen and accept data that contributes to their respective sections of the target area. In stage 3, distinct regions of the target area reconstructed in parallel. In stage 4, these smaller regions that were generated in stage 3 are merged to form the final target image. In stage 5, the final image is stored on disc. FPGA SWATH LUT PCI Staging BRAM Input BRAM Target Memory 1 Target Memory 2 This work was supported in part by CenSSIS, the Center for Subsurface Sensing and Imaging Systems, under the Engineering Research Centers Program of the National Science Foundation (Award Number EEC ). Research Level 1 Thrust R3 R1 R2 Fundamental Science Validating TestBEDs L1 L2 L3 R3 S1 S4 S5 S3 S2 Bio-MedEnviro- Civil Methods of serial computing are slow and can not take advantage of the inherent parallelism of the algorithm for processing SAR data. This work is focused on developing a high- speed computation engine that will enable image reconstruction in a small fraction of the time possible with serial computing.