Learning video saliency from human gaze using candidate selection CVPR2013 Poster.

Slides:

Advertisements

Similar presentations

Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.

Advertisements

Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.

Limin Wang, Yu Qiao, and Xiaoou Tang

Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.

Learning Techniques for Video Shot Detection Under the guidance of Prof. Sharat Chandran by M. Nithya.

Introduction To Tracking

Patch to the Future: Unsupervised Visual Prediction

Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.

Face Recognition and Biometric Systems

Multiple People Detection and Tracking with Occlusion Presenter: Feifei Huo Supervisor: Dr. Emile A. Hendriks Dr. A. H. J. Stijn Oomes Information and.

Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.

Face Recognition & Biometric Systems, 2005/2006 Face recognition process.

Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection CVPR2013 POSTER.

Robust Moving Object Detection & Categorization using self- improving classifiers Omar Javed, Saad Ali & Mubarak Shah.

A Novel Method for Generation of Motion Saliency Yang Xia, Ruimin Hu, Zhenkun Huang, and Yin Su ICIP 2010.

IEEE TCSVT 2011 Wonjun Kim Chanho Jung Changick Kim

ICME 2008 Huiying Liu, Shuqiang Jiang, Qingming Huang, Changsheng Xu.

Learning to Detect A Salient Object Reporter: 鄭綱 (3/2)

Natan Jacobson, Yen-Lin Lee, Vijay Mahadevan, Nuno Vasconcelos, Truong Q. Nguyen IEEE, ICME 2010.

Tracking Objects with Dynamics Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/21/15 some slides from Amin Sadeghi, Lana Lazebnik,

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

T.Sharon 1 Internet Resources Discovery (IRD) Video IR.

Multiple Human Objects Tracking in Crowded Scenes Yao-Te Tsai, Huang-Chia Shih, and Chung-Lin Huang Dept. of EE, NTHU International Conference on Pattern.

1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.

A Novel 2D To 3D Image Technique Based On Object- Oriented Conversion.

Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques

Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.

Computer vision: models, learning and inference Chapter 6 Learning and Inference in Vision.

Human tracking and counting using the KINECT range sensor based on Adaboost and Kalman Filter ISVC 2013.

Tracking Pedestrians Using Local Spatio- Temporal Motion Patterns in Extremely Crowded Scenes Louis Kratz and Ko Nishino IEEE TRANSACTIONS ON PATTERN ANALYSIS.

CS55 Tianfan Xue Adviser: Bo Zhang, Jianmin Li.

A General Framework for Tracking Multiple People from a Moving Camera

Detecting Pedestrians Using Patterns of Motion and Appearance Paul Viola Microsoft Research Irfan Ullah Dept. of Info. and Comm. Engr. Myongji University.

Visual Attention Accelerated Vehicle Detection in Low-Altitude Airborne Video of Urban Environment Xianbin Cao, Senior Member, IEEE, Renjun Lin, Pingkun.

Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags Sung Ju Hwang and Kristen Grauman University of Texas at Austin Jingnan.

National institute of science & technology BLINK DETECTION AND TRACKING OF EYES FOR EYE LOCALIZATION LOPAMUDRA CS BLINK DETECTION AND TRACKING.

Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School.

Human pose recognition from depth image MS Research Cambridge.

Stable Multi-Target Tracking in Real-Time Surveillance Video

Sparse Bayesian Learning for Efficient Visual Tracking O. Williams, A. Blake & R. Cipolloa PAMI, Aug Presented by Yuting Qi Machine Learning Reading.

Region-Based Saliency Detection and Its Application in Object Recognition IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM FOR VIDEO TECHNOLOGY, VOL. 24 NO. 5,

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR

First-Person Activity Recognition: What Are They Doing to Me? M. S. Ryoo and Larry Matthies Jet Propulsion Laboratory, California Institute of Technology,

HIGH PERFORMANCE OBJECT DETECTION BY COLLABORATIVE LEARNING OF JOINT RANKING OF GRANULES FEATURES Chang Huang and Ram Nevatia University of Southern California,

A Dynamic Conditional Random Field Model for Object Segmentation in Image Sequences Duke University Machine Learning Group Presented by Qiuhua Liu March.

Spatio-temporal saliency model to predict eye movements in video free viewing Gipsa-lab, Grenoble Département Images et Signal CNRS, UMR 5216 S. Marat,

Fast Semi-Direct Monocular Visual Odometry

Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011.

Visual Odometry for Ground Vehicle Applications David Nistér, Oleg Naroditsky, and James Bergen Sarnoff Corporation CN5300 Princeton, New Jersey

Tracking Groups of People for Video Surveillance Xinzhen(Elaine) Wang Advisor: Dr.Longin Latecki.

A computational model of stereoscopic 3D visual saliency School of Electronic Information Engineering Tianjin University 1 Wang Bingren.

Oisin Mac Aodha (UCL) Gabriel Brostow (UCL) Marc Pollefeys (ETH)

ICCV 2009 Tilke Judd, Krista Ehinger, Fr´edo Durand, Antonio Torralba.

Zhaoxia Fu, Yan Han Measurement Volume 45, Issue 4, May 2012, Pages 650–655 Reporter: Jing-Siang, Chen.

1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.

Instantaneous Geo-location of Multiple Targets from Monocular Airborne Video.

Urban Scene Analysis James Elder & Patrick Denis York University.

Naifan Zhuang, Jun Ye, Kien A. Hua

Computer vision: models, learning and inference

Saliency-guided Video Classification via Adaptively weighted learning

Contribution of spatial and temporal integration in heading perception

Video-based human motion recognition using 3D mocap data

Week 8 Nicholas Baker.

A New Approach to Track Multiple Vehicles With the Combination of Robust Detection and Two Classifiers Weidong Min , Mengdan Fan, Xiaoguang Guo, and Qing.

Boosting Nearest-Neighbor Classifier for Character Recognition

A User Attention Based Visible Watermarking Scheme

Video understanding using part based object detection models

Week 7 Nicholas Baker.

Human-object interaction

Presentation transcript:

Learning video saliency from human gaze using candidate selection CVPR2013 Poster

Outline Introduction Method Experiments Conclusions

Introduction Predicting where people look in video is relevant in many applications. Image vs. video saliency

Introduction Two observation: 1.image saliency studies concentrate on a single image stimulus, without any prior. 2. when watching dynamic scenes people usually follow the action and the characters by shifting their gaze to a new interesting location in the scene.

Introduction We propose a novel method for video saliency estimation, which is inspired by the way people watch videos.

Method Candidate extraction Modeling gaze dynamics

Candidate extraction Three types of candidates: 1. Static candidates 2. Motion candidates 3. Semantic candidates

Candidate extraction 1. Static candidates calculate the graph-based visual saliency (GBVS)

Candidate extraction 2. Motion candidates calculate the optical flow between consecutive frames apply Difference-of-Gaussians (DoG) filtering to the optical flow magnitude

Candidate extraction Static (a) and motion (b) candidates.

Candidate extraction 3. Semantic candidates due to higher level visual processing three types: center, face, and body

Candidate extraction 3. Semantic candidates small detections : create a single candidate at their center. large detections : create several candidates four for body detections (head, shoulders and torso) three for faces (eyes and nose with mouth).

Candidate extraction Semantic candidates

Modeling gaze dynamics Features Gaze transitions for training Learning transition probability

Features the creation of a feature vector for every ordered pair of (source, destination) candidate locations The features can be categorized into two sets: destination frame features and inter-frame features.

Features As a low level spatial cue we use the local contrast of the neighborhood around the candidate location.

Gaze transitions for training Whether a gaze transition occurs from a given source candidate to a given target candidate. 1. choose relevant pairs of frames  Scene cut 2. to label positive and negative gaze transitions between these frames

Gaze transitions for training

Learning transition probability whether a transition occurs or not train a standard random forest classifier using the normalized feature vectors and their labeling. trained model classifies every transition between source and destination candidates and provides a confidence value.

Learning transition probability transition probability P(d|s i )

Experiments Dataset : DIEM (Dynamic Images and Eye Movements)dataset CRCNS dataset

Experiments

Conclusions The method is substantially different from existing methods and uses a sparse candidate set to model the saliency map. using candidates boosts the accuracy of the saliency prediction and speeds up the algorithm. the proposed method accounts for the temporal dimension of the video by learning the probability to shift between saliency locations.