A Parallel Implementation of MSER detection GPGPU Final Project Lin Cao.

A Parallel Implementation of MSER detection GPGPU Final Project Lin Cao

Review Invariant to affine transformation, such as rotation, translation, and scale change; Denotes a set of stable connected components that are detected in gray scale image;

Review MSER is a stable Connected Component of thresholded image All pixels inside the MSER have higher or lower intensities than in the surrounding regions Regions are selected to be stable over intensity range

Sequential and Parallel Approach Sequential { Parallel { bucketSort(); buildDirectedGraph( ); Find ( ); blockReduction( ); Union( ); parentCompression( ); Update( ); // already get regions GetRegion( ); computeVariation( ); computeVariation( ); findRoot( ); leastVariation( ); } } leastVariation( );

buildDirectedGraph A parent’s value of each pixel should no less than its current value. 75785662 50585553 80656460 65555055 local memory: visited, members Shared memory

buildDirectedGraph 75785662 50585553 80656460 65555055 Memory Usage: local memory: visited, members Shared memory Also process edge for next step

Block Reduction 16*16, 8*8

Block Reduction totally 3 iterations are needed log 2 4 log 2 2

Block Reduction 6570656375 5860595857 55656662 55 5452 5859 62 60 80 70 55 50 57 80 60 If (horizontal_pixelUpdate) Load edge information to each pixel

Block Reduction History buffer

Parent Compression 75785662 50585658 80585458 65555855 Shared memory based on parent locality

FindRegion FindRoot, so that we can process each region’s tree respectively Find region’s parent and child based on the delta, so that variation can be computed. var = (area(parent) – area(child))/area(current region); Send the region information to CPU Scan every region’s tree, find the minival variation, which is MSER regions. Filter the region

Performance Analysis For 256*256 image,

Performance Analysis For 1024*768 image,

Performance Analysis Why 8*8 better than 16*16? local memory usage recursion times block execution block reduction times parent locality

Performance Analysis GPU vs CPU timing intermidiate values Synchronization record information memory transfer

Conclusion Very large data dependancy, still can be solved. Should be suitable to multicore microprocessor, whose individual core is strong enough than the single thread in GPU. The bottenleck is still memory.

Future Work More efficient block reduction. (decoder and encoder) Memory random access GPU code effciency 6565 7070 6565 6363 7575 5858 6060 5959 5858 5757 5 65656 6262 55 5454 5252 60 80 70 13 50 57 80 60

A Parallel Implementation of MSER detection GPGPU Final Project Lin Cao.

Similar presentations

Presentation on theme: "A Parallel Implementation of MSER detection GPGPU Final Project Lin Cao."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Parallel Implementation of MSER detection GPGPU Final Project Lin Cao.

Similar presentations

Presentation on theme: "A Parallel Implementation of MSER detection GPGPU Final Project Lin Cao."— Presentation transcript:

Similar presentations

About project

Feedback