Download presentation
Presentation is loading. Please wait.
Published byLouise Warren Modified over 8 years ago
1
A Parallel Implementation of MSER detection GPGPU Final Project Lin Cao
2
Review Invariant to affine transformation, such as rotation, translation, and scale change; Denotes a set of stable connected components that are detected in gray scale image;
3
Review MSER is a stable Connected Component of thresholded image All pixels inside the MSER have higher or lower intensities than in the surrounding regions Regions are selected to be stable over intensity range
4
Sequential and Parallel Approach Sequential { Parallel { bucketSort(); buildDirectedGraph( ); Find ( ); blockReduction( ); Union( ); parentCompression( ); Update( ); // already get regions GetRegion( ); computeVariation( ); computeVariation( ); findRoot( ); leastVariation( ); } } leastVariation( );
5
buildDirectedGraph A parent’s value of each pixel should no less than its current value. 75785662 50585553 80656460 65555055 local memory: visited, members Shared memory
6
buildDirectedGraph 75785662 50585553 80656460 65555055 Memory Usage: local memory: visited, members Shared memory Also process edge for next step
7
Block Reduction 16*16, 8*8
8
Block Reduction 16*16, 8*8
9
Block Reduction 16*16, 8*8
10
Block Reduction totally 3 iterations are needed log 2 4 log 2 2
11
Block Reduction 6570656375 5860595857 55656662 55 5452 5859 62 60 80 70 55 50 57 80 60 If (horizontal_pixelUpdate) Load edge information to each pixel
12
Block Reduction History buffer
13
Parent Compression 75785662 50585658 80585458 65555855 Shared memory based on parent locality
14
FindRegion FindRoot, so that we can process each region’s tree respectively Find region’s parent and child based on the delta, so that variation can be computed. var = (area(parent) – area(child))/area(current region); Send the region information to CPU Scan every region’s tree, find the minival variation, which is MSER regions. Filter the region
15
Performance Analysis For 256*256 image,
16
Performance Analysis For 1024*768 image,
17
Performance Analysis Why 8*8 better than 16*16? local memory usage recursion times block execution block reduction times parent locality
18
Performance Analysis GPU vs CPU timing intermidiate values Synchronization record information memory transfer
19
Conclusion Very large data dependancy, still can be solved. Should be suitable to multicore microprocessor, whose individual core is strong enough than the single thread in GPU. The bottenleck is still memory.
20
Future Work More efficient block reduction. (decoder and encoder) Memory random access GPU code effciency 6565 7070 6565 6363 7575 5858 6060 5959 5858 5757 5 65656 6262 55 5454 5252 60 80 70 13 50 57 80 60
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.