Download presentation
1
Blocked 2D Convolution Ravi Sankar P Nair
2
Implement 2D Convolution
Source:
3
Implement 2D Convolution.cpp in GPU Kernel
4
Implement 2D Convolution.cpp in GPU Kernel
Use Constant memory to store M matrix
5
Implement 2D Convolution.cpp in GPU Kernel
Use Constant memory to store M matrix
6
Performance Testing CPU vs. GPU
What is the measured floating-point computation rate for the CPU and GPU kernels on this application? How do they each scale with the size of the input? #include <sys/time.h>
7
Performance Testing CPU vs. GPU
What is the measured floating-point computation rate for the CPU and GPU kernels on this application? How do they each scale with the size of the input? Alternate Timer method
8
Performance Testing CPU vs. GPU
What is the measured floating-point computation rate for the CPU and GPU kernels on this application? How do they each scale with the size of the input? #include <sys/time.h>
9
Performance Testing CPU vs. GPU
2. How much time is spent as an overhead cost of using the GPU for computation? Consider all code executed within your host function, with the exception of the kernel itself, as overhead. How does the overhead scale with the size of the input?
10
Performance Testing CPU vs. GPU
Table shows values in micro seconds. Run on GTX 480 pacman.ddns.uark.edu Total Setup = Setup M,N + Setup GPU call Over Head GPU = Setup GPU Call – GPU kernel Over Head Setup = Total Setup – GPU kernel Over Head Main = Total Main program – GPU Kernel N and P Total Main Program Setup/read M,N files Setup GPU Function call Total Setup = C+D CPU Kernel GPU Kernel OvH GPU = D-F OvH Setup = E - F OvH Main = B - F 281x80 66139 1692 62634 64326 1354 50 62584 64276 66089 32x32 71080 907 70042 70949 57 36 70006 70913 71044 64x64 72355 3075 68917 71992 238 38 68879 71954 72317 128x128 85528 10438 73781 84219 985 47 73734 84172 85481 256x256 116027 29614 81206 110820 3975 82 81124 110738 115945 512x512 206072 105901 79420 185321 15977 224 79196 185097 205848 1024x1024 572661 411698 78001 489699 64130 844 77157 488855 571817 2048x2048 85603 256114 3089 82514
11
Performance Testing CPU vs. GPU
Table shows values in micro seconds. Run on GTX 480 pacman.ddns.uark.edu (Alternate Timer) Total Setup = Setup M,N + Setup GPU call Over Head GPU = Setup GPU Call – GPU kernel Over Head Setup = Total Setup – GPU kernel Over Head Main = Total Main program – GPU Kernel N and P Total Main Program Setup/read M,N files Setup GPU Function call Total Setup = C+D CPU Kernel GPU Kernel OvH GPU = D-F OvH Setup = E - F OvH Main = B - F 281x80 66214 1681 62724 64405 1355 50 62674 64355 66164 32x32 82312 907 81274 82181 57 36 81238 82145 82276 64x64 70401 3087 66953 70040 236 38 66915 70002 70363 128x128 86663 10449 74909 85358 982 47 74862 85311 86616 256x256 115126 29564 80363 109927 3973 83 80280 109844 115043 512x512 204261 105868 77645 183513 15990 221 77424 183292 204040 1024x1024 578057 411822 83242 495064 64099 843 82399 494221 577214 2048x2048 81614 256660 78527
12
Performance Testing CPU vs. GPU
Run on GTX 480 pacman.ddns.uark.edu
13
Performance Testing CPU vs. GPU
Table shows values in micro seconds. Run on GTX 295 stargate.uark.edu Total Setup = Setup M,N + Setup GPU call Over Head GPU = Setup GPU Call – GPU kernel Over Head Setup = Total Setup – GPU kernel Over Head Main = Total Main program – GPU Kernel N and P Total Main Program Setup/read M,N files Setup GPU Function call Total Setup = C+D CPU Kernel GPU Kernel OvH GPU = D-F OvH Setup = E - F OvH Main = B - F 281x80 1335 1215 86 32x32 2127 69 64 64x64 45163 220 60 128x128 55181 875 76 256x256 91452 3459 157 512x512 275434 13832 455 1024x1024 811408 55499 1669 2048x2048 238949 6552
14
Performance Testing CPU vs. GPU
Run on GTX 295 stargate.uark.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.