Local Affine Feature Tracking in Films/Sitcoms Chunhui Gu CS Final Presentation Dec. 13, 2006
Objective Automatically detect and track local affine features in film/sitcom frame sequences. –Current Dataset: Sex and the City –Why sitcom? Simple daily environment Few or no special effects Repeated scenes
Outline Preprocessing Tracking Algorithm –Pairwise local matching –Robust features Feature Matching across Shots Results –Feature matching vs baseline color histogram –Time complexity –When does tracking fail
Preprocessing Frame Extraction (i-1)’th shoti’th shot Shot Detection MSER Interest Point Detection SIFT Feature Extraction
Tracking Algorithm Basic: Pairwise Matching Frame iFrame j=i+1
Tracking Algorithm Basic: Pairwise Matching Frame iFrame j=i+1
Tracking Algorithm Basic: Pairwise Matching Frame iFrame j=i+1 Thresholding on both minimum distance and ratio
Tracking Algorithm Basic: Pairwise Matching Frame iFrame j=i+1
Tracking Algorithm Basic: Pairwise Matching Frame iFrame j=i+1
Tracking Algorithm Problem of Pairwise Matching –Sensitive to occlusion and feature misdetection Solutions: –Use multiple overlapping windows –Backward Matching Match features in current frame to features in all previous frames within the shot Pruning process (reduce computation time) Select a proportion of features that have longer tracking length as robust features
Shot grouping/Scene Retrieval Shot Shot Shot 56 Shot Scene 5
Inter-Shot Matching Shot I Shot J
“Confusion Table”
ROC
When Does Tracking Fail? Tracking feature outside local window –Rare when continuous tracking –Happens when occlusion occurs Same feature splitting to two or more groups –Long occlusion –Multiple matching in a single frame Frame iFrame j=i+1
Computation Complexity Everything except for MSER and SIFT algorithms are implemented in Matlab (slow…) ComplexityTime Frame ExtractionO(N)~0.3s/frame Shot DetectionO(N*f(B))~0.07s/frame (B=16) MSER DetectionO(N)~0.3s/frame SIFT DetectionO(N)~0.9s/frame Feature TrackingO(N*F*W*L)~0.5s/frame Matching across shots O(S 2 *T 2 )~1s/shot pair N: # of frames; (30,000) B: # of bins for color hist (16) F: ave. # of features per frame; (400) W: Local window size; (15) L: tracking length; (20) T: ave. # of robust trackers per shot; (300) S: # of shots; (35)
Conclusion We successfully implemented local affine feature tracking in sitcom “sex and the city”. The tracking method is robust to occlusion and feature misdetection. Although no quantitative precision/recall curve (hard to find ground truth), the demonstration shows that precision is almost perfect with good recall performance. We show one successful application of using robust features to associate similar shots together for scene retrieval.
Future Work Implement algorithm in real-time (C/C++) Search unique shots in films/sitcoms Separate indoor scenes from outdoor scenes Determine context of the scene
Acknowledgement