Download presentation
Presentation is loading. Please wait.
Published byFelix Roberts Modified over 9 years ago
1
Active Frame Selection for Label Propagation in Videos Sudheendra Vijayanarasimhan and Kristen Grauman Department of Computer Science, University of Texas at Austin Motivation Main Idea Estimating Expected Label Propagation Error Results Video Label Propagation Active Frame Selection Manually labeling objects in video is tedious and expensive, yet such annotations are valuable for object and activity recognition. Existing methods for interactive labeling Propagate labels from an arbitrarily selected frame, and/or Assume a human will intervene repeatedly to correct errors Our active approach outperforms the baselines for all values of k, and saves hours of manual effort per video, if cost to correct errors is proportional to number of mislabeled pixels. Error in terms of average number of mislabeled pixels, in hundreds of pixels Our error predictions in C follow the actual errors closely. In this case, our method automatically selects frames with high resolution information of most of the objects. Total annotation time Accuracy per frame, sorted from high to low Segtrack k = 5 Datasets: Camseq01: 101 frames of a moving driving scene, Camvid seq05: 3000 frames of a driving scene, Labelme 8126: 167 frames of a traffic signal, Segtrack: 6 videos with moving objects. Baselines Uniform-f: samples frames uniformly and transfers labels forward Uniform: samples frames uniformly and transfers labels in both directions. Keyframe: selects frames with k-way spectral clustering on Gist features. 1 i … nn-1 b … i+1i+2 Case 1: 1-way end. n > i 12 i = n … i-1 b = 1 Case 2: 1-way beg. b = 1 and n = i Pixel Flow + MRF Label Propagation Enhance flow model with space-time Markov Random Field: Infer label maps that are smooth in space and time Exploit object appearance models defined by labeled frames. We explicitly model the probability that pixel p in frame t will be mislabeled if we were to obtain its label from frame t+1:, where Distances use flow to estimate errors due to boundaries, occlusions, and when pixels change in appearance, or enter/leave the frame: AppearanceMotion If more than one frame separates the labeled frame r t and current frame t, we compute the accumulated error recursively (and analogously for l t ): Identify the k frames which, if labeled, would propagate to the rest of the video with minimal expected error. Propagate labels to all other frames … Actively select k informative frames Segment and label selected frames Highlights of our approach Annotate all objects in a video with minimal manual effort. Jointly select k most useful frames via predicted “trackability” Efficient dynamic programming solution Pixel Flow Label Propagation Use dense optical flow to track each pixel in both the forward and backward directions, until it reaches the closest labeled frame on either side. flow fwd label prop back label prop fwd flow back … … Occlusion To segment an N-frame video, there are two sources of manual effort cost: 1.the cost of fully labeling a frame from scratch, denoted C l 2.the cost of correcting errors by propagation, denoted C c. Our approach yields higher accuracy, especially for frames far from labeled frames. It reduces effort better than the baselines, and can also predict the optimal number of frames to have labeled, k*. Errors and time saved Example of actively selected frames 12 … i … nn-1 … Sequence frame index: Selected frame index: bb-1 … b-2 … Dynamic programming solution Let be the optimal value of for selecting b frames from the first n frames, where i denotes the index of the b-th selected frame. For a given k, we show how to obtain the optimal value: in time, compared to for a naïve exhaustive search. that minimizes expected effort: where Objective We want 1i = n b b-1 … …… j m i-1 j+1 Case 3: Both ways. b > 1 and n = i Let the N x N matrix C record the frame-to-frame predicted errors:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.