GTRI_B-1 ECRB - HPC - 1 GPU SAS - 1 Using Graphics Processors to Accelerate Synthetic Aperture Sonar Imaging via Backpropagation 2010 High Performance.

Slides:



Advertisements
Similar presentations
CSE 781 Anti-aliasing for Texture Mapping. Quality considerations So far we just mapped one point – results in bad aliasing (resampling problems) We really.
Advertisements

Control Architectures: Feed Forward, Feedback, Ratio, and Cascade
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Microwave Doppler Speed Measurement System Guo Jianghuai Supervisor: Roland G Clarke Assessor: Chris Trayner Introduction A Doppler radar is a special.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
New modules of the software package “PHOTOMOD Radar” September 2010, Gaeta, Italy X th International Scientific and Technical Conference From Imagery to.
Optimizing and Auto-Tuning Belief Propagation on the GPU Scott Grauer-Gray and Dr. John Cavazos Computer and Information Sciences, University of Delaware.
23057page 1 Physics of SAR Summer page 2 Synthetic-Aperture Radar SAR Radar - Transmits its own illumination a "Microwave flashlight" RAdio.
Computer Graphics Hardware Acceleration for Embedded Level Systems Brian Murray
Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.
Back-Projection on GPU: Improving the Performance Wenlay “Esther” Wei Advisor: Jeff Fessler Mentor: Yong Long April 29, 2010.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
IN4151 Introduction 3D graphics 1 Introduction to 3D computer graphics part 2 Viewing pipeline Multi-processor implementation GPU architecture GPU algorithms.
November 18, 2004 Embedded System Design Flow Arkadeb Ghosal Alessandro Pinto Daniele Gasperini Alberto Sangiovanni-Vincentelli
Paper by Alexander Keller
PH4705/ET4305: A/D: Analogue to Digital Conversion
Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.
LEC ( 2 ) RAD 323. Reconstruction techniques dates back to (1917), when scientist (Radon) developed mathematical solutions to the problem of reconstructing.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
SAR Imaging Radar System A fundamental problem in designing a SAR Image Formation System is finding an optimal estimator as an ideal.
Ray Tracing Primer Ref: SIGGRAPH HyperGraphHyperGraph.
Slide 1/8 Performance Debugging for Highly Parallel Accelerator Architectures Saurabh Bagchi ECE & CS, Purdue University Joint work with: Tsungtai Yeh,
SAGE: Self-Tuning Approximation for Graphics Engines
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Distortion Correction ECE 6276 Project Review Team 5: Basit Memon Foti Kacani Jason Haedt Jin Joo Lee Peter Karasev.
GTRI_B-1 ECRB - HPC - 1 Using GPU VSIPL & CUDA to Accelerate RF Clutter Simulation 2010 High Performance Embedded Computing Workshop 23 September 2010.
© 2007 SET Associates Corporation SAR Processing Performance on Cell Processor and Xeon Mark Backues, SET Corporation Uttam Majumder, AFRL/RYAS.
Today’s lecture 2-Dimensional indexing Color Format Thread Synchronization within for- loops Shared Memory Tiling Review example programs Using Printf.
GPU Architecture and Programming
Forward-Scan Sonar Tomographic Reconstruction PHD Filter Multiple Target Tracking Bayesian Multiple Target Tracking in Forward Scan Sonar.
Tone Mapping on GPUs Cliff Woolley University of Virginia Slides courtesy Nolan Goodnight.
Using a MATLAB/Photoshop Interface to Enhance Image Processing in the Interpretation of Radar Imagery The Center for Remote Sensing of Ice Sheets (CReSIS)
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
M. Jędrzejewski, K.Marasek, Warsaw ICCVG, Multimedia Chair Computation of room acoustics using programable video hardware Marcin Jędrzejewski.
An Execution Model for Heterogeneous Multicore Architectures Gregory Diamos, Andrew Kerr, and Sudhakar Yalamanchili Computer Architecture and Systems Laboratory.
GPU Accelerated MRI Reconstruction Professor Kevin Skadron Computer Science, School of Engineering and Applied Science University of Virginia, Charlottesville,
Spatiotemporal Saliency Map of a Video Sequence in FPGA hardware David Boland Acknowledgements: Professor Peter Cheung Mr Yang Liu.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Accurate Indoor Localization With Zero Start-up Cost
Sponsored By Abstract 1 Ritamar Siurano – Undergraduate Student Prof. Domingo Rodriguez – Advisor Abigail Fuentes – Graduate StudentProf. Ana B. Ramirez.
1 November 11, 2015 A Massively Parallel, Hybrid Dataflow/von Neumann Architecture Yoav Etsion November 11, 2015.
Image-Based Rendering Geometry and light interaction may be difficult and expensive to model –Think of how hard radiosity is –Imagine the complexity of.
High Performance Flexible DSP Infrastructure Based on MPI and VSIPL 7th Annual Workshop on High Performance Embedded Computing MIT Lincoln Laboratory
David Luebke 3/5/2016 Advanced Computer Graphics Lecture 4: Faster Ray Tracing David Luebke
Graphics Processor Clusters for High Speed Backpropagation 2011 High Performance Embedded Computing Workshop 22 September 2011 Daniel P. Campbell, Thomas.
CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS 1.
PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.
1 Processor design Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section 11.3.
Date of download: 7/10/2016 Copyright © 2016 SPIE. All rights reserved. A graphical overview of the proposed compressed gated range sensing (CGRS) architecture.
VERTICAL SCANNING INTERFEROMETRY VSI
Atindra Mitra Joe Germann John Nehrbass
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Time and Depth Imaging Algorithms in a Hardware Accelerator Paradigm
GPU Memory Details Martin Kruliš by Martin Kruliš (v1.1)
GPU VSIPL: High Performance VSIPL Implementation for GPUs
Leiming Yu, Fanny Nina-Paravecino, David Kaeli, Qianqian Fang
Final exam information
CPSC 531: System Modeling and Simulation
CSCI1600: Embedded and Real Time Software
Ice Investigation with PPC
Processor design Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section 11.3.
Chapter 01: Introduction
6- General Purpose GPU Programming
CSCI1600: Embedded and Real Time Software
Serial Communications
Processor design Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section 11.3.
Pointer analysis John Rollinson & Kaiyuan Li
Presentation transcript:

GTRI_B-1 ECRB - HPC - 1 GPU SAS - 1 Using Graphics Processors to Accelerate Synthetic Aperture Sonar Imaging via Backpropagation 2010 High Performance Embedded Computing Workshop 23 September 2010 Daniel P. Campbell, Daniel A. Cook

GTRI_B-2 ECRB - HPC - 2 GPU SAS - 2 Sonar Imaging via Backpropagation As the sonar passes by the scene, it transmits pulses and records returns. Each output pixel is computed by returns from the pulses for which it is within the beamwidth (shown in red). A typical integration path is shown in yellow. Forward Travel

GTRI_B-3 ECRB - HPC - 3 GPU SAS - 3 Backpropagation Backpropagation is the simplest synthetic aperture image reconstruction algorithm for each output pixel: – Find all pulses containing reflections from that location on the ground – Find recorded samples at each round-trip range – Inner product with expected reflection – Sum all of these data points end

GTRI_B-4 ECRB - HPC - 4 GPU SAS - 4 Backpropagation – Practical Advantages Procedure – Attempt to fly a perfectly straight line – Compensate for unwanted motion – Form image using Fourier-based method backpropagation – Register and interpolate image onto map coordinates Algorithmic simplicity – Easier to code and troubleshoot – Less accumulated numerical error Flexibility – Can image directly onto map coordinates without the need for postprocessing (including bathymetric maps) Expanded operating envelope – Can form imagery in adverse environmental conditions and during maneuvers

GTRI_B-5 ECRB - HPC - 5 GPU SAS - 5 Sonar vs. Radar Typical SAS range/resolution: 100m/3cm Typical SAR range/resolution: 10km/0.3m SAS and SAR are mathematically equivalent, allowing the same code to be used for both – The sensor is in continual motion, so it moves while the signal travels to and from the ground Light travels 200,000 times faster than sound, so SAR processing can be accelerated by assuming the platform is effectively stationary for each pulse.

GTRI_B-6 ECRB - HPC - 6 GPU SAS - 6 Sonar vs. Radar In general, the sensor is at a different position by the time the signal is received (above). If the propagation is very fast (i.e., speed of light), then the platform can be considered motionless between transmit and receive (below).

GTRI_B-7 ECRB - HPC - 7 GPU SAS - 7 Advantages of Backpropagation FFT-based reconstruction techniques exist – Require either linear or circular collections – Only modest deviations can be compensated – Requires extra steps to get georeferenced imagery Backpropagation is far more expensive, but is the most accurate approach – No constraints on collection geometry: can image during maneuvers – Directly produces imagery located on any map coordinates desired

GTRI_B-8 ECRB - HPC - 8 GPU SAS - 8 Minimum FLOPs Range out9 Estimated r/t time1 Beam Check5 Final receiver position 65 Final platform orientation 6 Construct platform final R 35 Apply R 15 Add platform motion 9 Range In 9 Range->Bin 2 Sample & Interpolate 9 Correlate with ideal reflector 9 Accumulate 2 Total 111 Not needed for Radar

GTRI_B-9 ECRB - HPC - 9 GPU SAS - 9 GPU Backpropagation GTRI SAR/S Toolbox, MATLAB Based Multiple image formations Backpropagation too slow GPU Accelerated plug-in to MATLAB toolbox CUDA/C++ One output pixel per thread Stream groups of pulses to GPU memory Kernel invocation per pulse group

GTRI_B-10 ECRB - HPC - 10 GPU SAS - 10 Direct Optimization Considerations Textures for clamped, interpolated sampling 2-D blocks for range (thus cache) coherency Careful control of register spills Shared memory for (some) local variables Reduced precision transcendentals Recalculate versus lookup Limit index arithmetic

GTRI_B-11 ECRB - HPC - 11 GPU SAS - 11 GPU Ocelot Courtesy, Computer Architecture and Systems Laboratory, Georgia Tech G. Diamos, A. Kerr, S. Yalamanchili, and N. Clark, “Ocelot: A Dynamic Optimizing Compiler for Bulk Synchronous Applications in Heterogeneous Systems,” IEEE/ACM International Conference on Parallel Architectures and Compilation Techniques, September 2010.

GTRI_B-12 ECRB - HPC - 12 GPU SAS - 12 Productivity Tools for Hybrid Systems Ocelot Dynamic Execution Infrastructure Debugging Memory race detection Bounds checks Profiling/Performance Tuning Alignment behavior Control flow behavior Inter-thread data flow Integration with Front-End profiling tools GLIMPSES (S. Pande)

GTRI_B-13 ECRB - HPC - 13 GPU SAS - 13 Ocelot data

GTRI_B-14 ECRB - HPC - 14 GPU SAS - 14 Ocelot Findings FLOPS per pixel*pulse, too high Did not expect any! tyx = ThreadIdx.x * BLOCK + ThreadIdx.y; share[tyx] = foo; 5% Speedup a = calc1(); b = calc2(); if (a<constant) {… 25% Speedup

GTRI_B-15 ECRB - HPC - 15 GPU SAS - 15 Performance versus Image Size

GTRI_B-16 ECRB - HPC - 16 GPU SAS - 16 Strong Scaling Results

GTRI_B-17 ECRB - HPC - 17 GPU SAS - 17 Performance – Alternate Configurations ConfigurationRun time (s)pp/sFLOPS/ppFLOPS/s SAS E+9 Stop & Hop E+9 Ignore Beam E E+9 SAR (S&H+IB) E E x 3936 Image from 3936 pulses Alternate configurations not optimized GTX 480

GTRI_B-18 ECRB - HPC - 18 GPU SAS - 18 Future Work Further optimization Reoptimize for Fermi Tune for multi-GPU Multi-node Improve error handling, edge cases, etc. Backpropagation server