Blocked 2D Convolution Ravi Sankar P Nair 010469036
Implement 2D Convolution Source: http://www.songho.ca/dsp/convolution/convolution2d_example.html
Implement 2D Convolution.cpp in GPU Kernel
Implement 2D Convolution.cpp in GPU Kernel Use Constant memory to store M matrix
Implement 2D Convolution.cpp in GPU Kernel Use Constant memory to store M matrix
Performance Testing CPU vs. GPU What is the measured floating-point computation rate for the CPU and GPU kernels on this application? How do they each scale with the size of the input? #include <sys/time.h>
Performance Testing CPU vs. GPU What is the measured floating-point computation rate for the CPU and GPU kernels on this application? How do they each scale with the size of the input? Alternate Timer method
Performance Testing CPU vs. GPU What is the measured floating-point computation rate for the CPU and GPU kernels on this application? How do they each scale with the size of the input? #include <sys/time.h>
Performance Testing CPU vs. GPU 2. How much time is spent as an overhead cost of using the GPU for computation? Consider all code executed within your host function, with the exception of the kernel itself, as overhead. How does the overhead scale with the size of the input?
Performance Testing CPU vs. GPU Table shows values in micro seconds. Run on GTX 480 pacman.ddns.uark.edu Total Setup = Setup M,N + Setup GPU call Over Head GPU = Setup GPU Call – GPU kernel Over Head Setup = Total Setup – GPU kernel Over Head Main = Total Main program – GPU Kernel N and P Total Main Program Setup/read M,N files Setup GPU Function call Total Setup = C+D CPU Kernel GPU Kernel OvH GPU = D-F OvH Setup = E - F OvH Main = B - F 281x80 66139 1692 62634 64326 1354 50 62584 64276 66089 32x32 71080 907 70042 70949 57 36 70006 70913 71044 64x64 72355 3075 68917 71992 238 38 68879 71954 72317 128x128 85528 10438 73781 84219 985 47 73734 84172 85481 256x256 116027 29614 81206 110820 3975 82 81124 110738 115945 512x512 206072 105901 79420 185321 15977 224 79196 185097 205848 1024x1024 572661 411698 78001 489699 64130 844 77157 488855 571817 2048x2048 2061625 1644657 85603 1730260 256114 3089 82514 1727171 2058536
Performance Testing CPU vs. GPU Table shows values in micro seconds. Run on GTX 480 pacman.ddns.uark.edu (Alternate Timer) Total Setup = Setup M,N + Setup GPU call Over Head GPU = Setup GPU Call – GPU kernel Over Head Setup = Total Setup – GPU kernel Over Head Main = Total Main program – GPU Kernel N and P Total Main Program Setup/read M,N files Setup GPU Function call Total Setup = C+D CPU Kernel GPU Kernel OvH GPU = D-F OvH Setup = E - F OvH Main = B - F 281x80 66214 1681 62724 64405 1355 50 62674 64355 66164 32x32 82312 907 81274 82181 57 36 81238 82145 82276 64x64 70401 3087 66953 70040 236 38 66915 70002 70363 128x128 86663 10449 74909 85358 982 47 74862 85311 86616 256x256 115126 29564 80363 109927 3973 83 80280 109844 115043 512x512 204261 105868 77645 183513 15990 221 77424 183292 204040 1024x1024 578057 411822 83242 495064 64099 843 82399 494221 577214 2048x2048 2048527 1635106 81614 1716720 256660 78527 1713633 2045440
Performance Testing CPU vs. GPU Run on GTX 480 pacman.ddns.uark.edu
Performance Testing CPU vs. GPU Table shows values in micro seconds. Run on GTX 295 stargate.uark.edu Total Setup = Setup M,N + Setup GPU call Over Head GPU = Setup GPU Call – GPU kernel Over Head Setup = Total Setup – GPU kernel Over Head Main = Total Main program – GPU Kernel N and P Total Main Program Setup/read M,N files Setup GPU Function call Total Setup = C+D CPU Kernel GPU Kernel OvH GPU = D-F OvH Setup = E - F OvH Main = B - F 281x80 2796273 1335 2793075 2794410 1215 86 2792989 2794324 2796187 32x32 2820670 2127 2818379 2820506 69 64 2818315 2820442 2820606 64x64 2845781 45163 2800109 2845272 220 60 2800049 2845212 2845721 128x128 2876348 55181 2819790 2874971 875 76 2819714 2874895 2876272 256x256 2927615 91452 2831007 2922459 3459 157 2830850 2922302 2927458 512x512 3130441 275434 2834679 3110113 13832 455 2834224 3109658 3129986 1024x1024 3711026 811408 2818357 3629765 55499 1669 2816688 3628096 3709357 2048x2048 6261147 3072243 2842964 5915207 238949 6552 2836412 5908655 6254595
Performance Testing CPU vs. GPU Run on GTX 295 stargate.uark.edu