Patrick Cozzi University of Pennsylvania CIS Spring 2011

Slides:



Advertisements
Similar presentations
GPU-ACCELERATED VIDEO ENCODING/DECODING CIS 565 Spring 2012 Philip Peng, 2012/03/19.
Advertisements

CSE 781 Anti-aliasing for Texture Mapping. Quality considerations So far we just mapped one point – results in bad aliasing (resampling problems) We really.
Combinational Circuit Yan Gu 2 nd Presentation for CS 6260.
Back-Projection on GPU: Improving the Performance Wenlay “Esther” Wei Advisor: Jeff Fessler Mentor: Yong Long April 29, 2010.
Parallel Prefix Sum (Scan) GPU Graphics Gary J. Katz University of Pennsylvania CIS 665 Adapted from articles taken from GPU Gems III.
Real Time Image Feature Vector Generator Employing Functional Cache Memory for Edge Takuki Nakagawa, Department of Electronic Engineering The University.
GPGPU platforms GP - General Purpose computation using GPU
CSS Margins. The Margin properties define the space around elements. It is possible to use negative values to overlap content. The top, right, bottom,
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
Challenges Bit-vector approach Conclusion & Future Work A subsequence of a string of symbols is derived from the original string by deleting some elements.
Box Method for Factoring Factoring expressions in the form of.
Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.
Implementing a Speech Recognition System on a GPU using CUDA
CSC1401 Viewing a picture as a 2D image - 1. Review from the last week We used a getPixels() to get all of the pixels of a picture But this has been somewhat.
CUDA Performance Considerations (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.
NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS Spring 2011.
Lecture 03 Area Based Image Processing Lecture 03 Area Based Image Processing Mata kuliah: T Computer Vision Tahun: 2010.
CUDA Performance Patrick Cozzi University of Pennsylvania CIS Fall
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Fall 2013.
CUDA Performance Considerations (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2011.
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
CUDA Performance Considerations (2 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2011.
Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.
CUDA Basics. Overview What is CUDA? Data Parallelism Host-Device model Thread execution Matrix-multiplication.
OpenCL Joseph Kider University of Pennsylvania CIS Fall 2011.
COMPUTER PROGRAMMING I SUMMER 2011 Objective 8.01 Understand coordinate systems. (3%)
CUDA Odds and Ends Patrick Cozzi University of Pennsylvania CIS Fall 2013.
Parallel Algorithms Continued Patrick Cozzi University of Pennsylvania CIS Spring 2012.
CS/EE 217 GPU Architecture and Parallel Programming Midterm Review
Section 4.4. a) Factor x 2 + 5x - 36 (x + 9)(x – 4) b) Factor 2x 2 – 20x + 18 Hint: factor out a 2 first 2(x-9)(x-1) c) Factor 2x 2 + x – 6 Why doesn’t.
Fall 2008Simple Parallel Algorithms1. Fall 2008Simple Parallel Algorithms2 Scalar Product of Two Vectors Let a = (a 1, a 2, …, a n ); b = (b 1, b 2, …,
Weekly Report- Reduction Ph.D. Student: Leo Lee date: Oct. 30, 2009.
What is Matrix Multiplication? Matrix multiplication is the process of multiplying two matrices together to get another matrix. It differs from scalar.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.
GPGPU: Parallel Reduction and Scan Joseph Kider University of Pennsylvania CIS Fall 2011 Credit: Patrick Cozzi, Mark Harris Suresh Venkatensuramenan.
Compressing Bi-Level Images by Block Matching on a Tree Architecture Sergio De Agostino Computer Science Department Sapienza University of Rome ITALY.
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
Parallel primitives – Scan operation CDP – Written by Uri Verner 1 GPU Algorithm Design.
Data Parallel Computations and Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson, slides6c.ppt Nov 4, c.1.
Chapter 5 Integrals 5.1 Areas and Distances
CS/EE 217 – GPU Architecture and Parallel Programming
CMSC5711 Image processing and computer vision
A Parents’ Guide to Alternative Algorithms
Backprojection Project Update January 2002
Image Convolution with CUDA
Image Transformation 4/30/2009
Box Method for Factoring
Box Method for Factoring
Image Processing & Perception
Anti-aliasing for Texture Mapping
CMSC5711 Revision (1) (v.7.b) revised
CMSC5711 Image processing and computer vision
Sequence Pair Representation
NVIDIA Fermi Architecture
DRAM Bandwidth Slide credit: Slides adapted from
CS/EE 217 – GPU Architecture and Parallel Programming
Branch and Bound.
GPGPU: Parallel Reduction and Scan
© 2012 Elsevier, Inc. All rights reserved.
Parallel build blocks.
Patrick Cozzi University of Pennsylvania CIS Spring 2011
Convolution Layer Optimization
Mattan Erez The University of Texas at Austin
Data Parallel Pattern 6c.1
Parallelized Analytic Placer
Data Parallel Computations and Pattern
Areas and Distances In this handout: The Area problem
Data Parallel Computations and Pattern
Presentation transcript:

Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011 Summed Area Tables Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011

Sources Patrick Cozzi Spring 2011 NVIDIA CUDA Programming Guide CUDA by Example Programming Massively Parallel Processors 1 to 8. Maybe 1 if all threads are in the same warp and issued at the same time. 2

Summed Area Table Summed Area Table (SAT): 2D table where each element stores the sum of all elements in an input image between the lower left corner and the entry location.

Summed Area Table Example: (1 + 1 + 0) + (1 + 2 + 1) + (0 + 1 + 2) = 9 Input image SAT 2 1 4 9 12 14 1 2 2 6 9 11 1 2 1 2 5 6 8 1 1 2 1 2 2 4 (1 + 1 + 0) + (1 + 2 + 1) + (0 + 1 + 2) = 9

Summed Area Table Benefit Used to perform different width filters at every pixel in the image in constant time per pixel Just sample four pixels in SAT: w and h are the width and height of the filter kernel Image from http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html

Summed Area Table Uses Glossy environment reflections and refractions Approximate depth of field Objects far from the focal length are blurry, while objects at the focal length are in focus. Image from http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html

Summed Area Table Input image SAT 2 1 1 2 1 2 1 1 1 2

Summed Area Table Input image SAT 2 1 1 2 1 2 1 1 1 2 1

Summed Area Table Input image SAT 2 1 1 2 1 2 1 1 1 2 1 2

Summed Area Table Input image SAT 2 1 1 2 1 2 1 1 1 2 1 2 2

Summed Area Table Input image SAT 2 1 1 2 1 2 1 1 1 2 1 2 2 4

Summed Area Table Input image SAT 2 1 1 2 1 2 1 2 1 1 2 1 2 2 4

Summed Area Table Input image SAT 2 1 1 2 1 2 1 2 5 1 1 2 1 2 2 4

Summed Area Table …

Summed Area Table Input image SAT 2 1 4 9 1 2 2 6 9 11 1 2 1 2 5 6 8 1 1 2 1 2 2 4

Summed Area Table Input image SAT 2 1 4 9 12 1 2 2 6 9 11 1 2 1 2 5 6 8 1 1 2 1 2 2 4

Summed Area Table Input image SAT 2 1 4 9 12 14 1 2 2 6 9 11 1 2 1 2 5 6 8 1 1 2 1 2 2 4

Summed Area Table How would implement this on the GPU?

Summed Area Table Recall Inclusive Scan: 1 2 3 4 5 6 7 1 3 6 10 15 21 1 2 3 4 5 6 7 1 3 6 10 15 21 28

How would compute a SAT on the GPU using inclusive scan? Summed Area Table How would compute a SAT on the GPU using inclusive scan?

Summed Area Table Step 1 of 2: 2 1 2 3 3 3 1 2 1 3 3 1 2 1 1 3 4 4 1 1 Input image Partial SAT 2 1 2 3 3 3 1 2 1 3 3 1 2 1 1 3 4 4 1 1 2 1 2 2 4 One inclusive scan for each row

Summed Area Table Step 2 of 2: 2 3 3 3 4 9 12 14 1 3 3 2 6 9 11 1 3 4 Partial SAT Final SAT 2 3 3 3 4 9 12 14 1 3 3 2 6 9 11 1 3 4 4 2 5 6 8 1 2 2 4 1 2 2 4 One inclusive scan for each Column, bottom to top