Active Frame Selection for Label Propagation in Videos Sudheendra Vijayanarasimhan and Kristen Grauman Department of Computer Science, University of Texas.

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Filling Algorithms Pixelwise MRFsChaos Mosaics Patch segments are pasted, overlapping, across the image. Then either: Ambiguities are removed by smoothing.
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
A CTION R ECOGNITION FROM V IDEO U SING F EATURE C OVARIANCE M ATRICES Kai Guo, Prakash Ishwar, Senior Member, IEEE, and Janusz Konrad, Fellow, IEEE.
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
Dynamic Bayesian Networks (DBNs)
Patch to the Future: Unsupervised Visual Prediction
Hidden Markov Models Theory By Johan Walters (SR 2003)
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Planning under Uncertainty
Tracking Objects with Dynamics Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/21/15 some slides from Amin Sadeghi, Lana Lazebnik,
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Adaptive Rao-Blackwellized Particle Filter and It’s Evaluation for Tracking in Surveillance Xinyu Xu and Baoxin Li, Senior Member, IEEE.
Rodent Behavior Analysis Tom Henderson Vision Based Behavior Analysis Universitaet Karlsruhe (TH) 12 November /9.
Optical flow and Tracking CISC 649/849 Spring 2009 University of Delaware.
Tracking Video Objects in Cluttered Background
Trinity College Dublin PixelGT: A new Ground Truth specification for video surveillance Dr. Kenneth Dawson-Howe, Graphics, Vision and Visualisation Group.
A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.
Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
TP15 - Tracking Computer Vision, FCUP, 2013 Miguel Coimbra Slides by Prof. Kristen Grauman.
Human-Computer Interaction Human-Computer Interaction Tracking Hanyang University Jong-Il Park.
COMPUTER VISION: SOME CLASSICAL PROBLEMS ADWAY MITRA MACHINE LEARNING LABORATORY COMPUTER SCIENCE AND AUTOMATION INDIAN INSTITUTE OF SCIENCE June 24, 2013.
DTU Medical Visionday May 27, 2009 Generative models for automated brain MRI segmentation Koen Van Leemput Athinoula A. Martinos Center for Biomedical.
Particle Filters for Shape Correspondence Presenter: Jingting Zeng.
#MOTION ESTIMATION AND OCCLUSION DETECTION #BLURRED VIDEO WITH LAYERS
Fast Similarity Search for Learned Metrics Prateek Jain, Brian Kulis, and Kristen Grauman Department of Computer Sciences University of Texas at Austin.
Visual motion Many slides adapted from S. Seitz, R. Szeliski, M. Pollefeys.
Background Subtraction based on Cooccurrence of Image Variations Seki, Wada, Fujiwara & Sumi Presented by: Alon Pakash & Gilad Karni.
1 University of Texas at Austin Machine Learning Group 图像与视频处理 计算机学院 Motion Detection and Estimation.
Vehicle Segmentation and Tracking From a Low-Angle Off-Axis Camera Neeraj K. Kanhere Committee members Dr. Stanley Birchfield Dr. Robert Schalkoff Dr.
Lei Li Computer Science Department Carnegie Mellon University Pre Proposal Time Series Learning completed work 11/27/2015.
Using Webcast Text for Semantic Event Detection in Broadcast Sports Video IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 7, NOVEMBER 2008.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
Joint Tracking of Features and Edges STAN BIRCHFIELD AND SHRINIVAS PUNDLIK CLEMSON UNIVERSITY ABSTRACT LUCAS-KANADE AND HORN-SCHUNCK JOINT TRACKING OF.
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Segmentation of Vehicles in Traffic Video Tun-Yu Chiang Wilson Lau.
Spatiotemporal Saliency Map of a Video Sequence in FPGA hardware David Boland Acknowledgements: Professor Peter Cheung Mr Yang Liu.
Application of the MCMC Method for the Calibration of DSMC Parameters James S. Strand and David B. Goldstein The University of Texas at Austin Sponsored.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Final Review Course web page: vision.cis.udel.edu/~cv May 21, 2003  Lecture 37.
Motion Estimation Today’s Readings Trucco & Verri, 8.3 – 8.4 (skip 8.3.3, read only top half of p. 199) Newton's method Wikpedia page
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.
Motion Estimation using Markov Random Fields Hrvoje Bogunović Image Processing Group Faculty of Electrical Engineering and Computing University of Zagreb.
Motion Estimation Today’s Readings Trucco & Verri, 8.3 – 8.4 (skip 8.3.3, read only top half of p. 199) Newton's method Wikpedia page
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.
Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
REAL-TIME DETECTOR FOR UNUSUAL BEHAVIOR
Bag-of-Visual-Words Based Feature Extraction
Motion and Optical Flow
Tracking Objects with Dynamics
Nonparametric Semantic Segmentation
Outline Multilinear Analysis
Adversarially Tuned Scene Generation
Vehicle Segmentation and Tracking in the Presence of Occlusions
Announcements Homework 3 due today (grace period through Friday)
Predicting Body Movement and Recognizing Actions: an Integrated Framework for Mutual Benefits Boyu Wang and Minh Hoai Stony Brook University Experiments:
A Block Based MAP Segmentation for Image Compression
Tracking Many slides adapted from Kristen Grauman, Deva Ramanan.
Counting in High-Density Crowd Videos
Presentation transcript:

Active Frame Selection for Label Propagation in Videos Sudheendra Vijayanarasimhan and Kristen Grauman Department of Computer Science, University of Texas at Austin Motivation Main Idea Estimating Expected Label Propagation Error Results Video Label Propagation Active Frame Selection Manually labeling objects in video is tedious and expensive, yet such annotations are valuable for object and activity recognition. Existing methods for interactive labeling Propagate labels from an arbitrarily selected frame, and/or Assume a human will intervene repeatedly to correct errors Our active approach outperforms the baselines for all values of k, and saves hours of manual effort per video, if cost to correct errors is proportional to number of mislabeled pixels. Error in terms of average number of mislabeled pixels, in hundreds of pixels Our error predictions in C follow the actual errors closely. In this case, our method automatically selects frames with high resolution information of most of the objects. Total annotation time Accuracy per frame, sorted from high to low Segtrack k = 5 Datasets: Camseq01: 101 frames of a moving driving scene, Camvid seq05: 3000 frames of a driving scene, Labelme 8126: 167 frames of a traffic signal, Segtrack: 6 videos with moving objects. Baselines Uniform-f: samples frames uniformly and transfers labels forward Uniform: samples frames uniformly and transfers labels in both directions. Keyframe: selects frames with k-way spectral clustering on Gist features. 1 i … nn-1 b … i+1i+2 Case 1: 1-way  end. n > i 12 i = n … i-1 b = 1 Case 2: 1-way  beg. b = 1 and n = i Pixel Flow + MRF Label Propagation Enhance flow model with space-time Markov Random Field: Infer label maps that are smooth in space and time Exploit object appearance models defined by labeled frames. We explicitly model the probability that pixel p in frame t will be mislabeled if we were to obtain its label from frame t+1:, where Distances use flow to estimate errors due to boundaries, occlusions, and when pixels change in appearance, or enter/leave the frame: AppearanceMotion If more than one frame separates the labeled frame r t and current frame t, we compute the accumulated error recursively (and analogously for l t ): Identify the k frames which, if labeled, would propagate to the rest of the video with minimal expected error. Propagate labels to all other frames … Actively select k informative frames Segment and label selected frames Highlights of our approach Annotate all objects in a video with minimal manual effort. Jointly select k most useful frames via predicted “trackability” Efficient dynamic programming solution Pixel Flow Label Propagation Use dense optical flow to track each pixel in both the forward and backward directions, until it reaches the closest labeled frame on either side. flow fwd label prop back label prop fwd flow back … … Occlusion To segment an N-frame video, there are two sources of manual effort cost: 1.the cost of fully labeling a frame from scratch, denoted C l 2.the cost of correcting errors by propagation, denoted C c. Our approach yields higher accuracy, especially for frames far from labeled frames. It reduces effort better than the baselines, and can also predict the optimal number of frames to have labeled, k*. Errors and time saved Example of actively selected frames 12 … i … nn-1 … Sequence frame index: Selected frame index: bb-1 … b-2 … Dynamic programming solution Let be the optimal value of for selecting b frames from the first n frames, where i denotes the index of the b-th selected frame. For a given k, we show how to obtain the optimal value: in time, compared to for a naïve exhaustive search. that minimizes expected effort: where Objective We want 1i = n b b-1 … …… j m i-1 j+1 Case 3: Both ways. b > 1 and n = i Let the N x N matrix C record the frame-to-frame predicted errors: