FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

Slides:



Advertisements
Similar presentations
TARGET DETECTION AND TRACKING IN A WIRELESS SENSOR NETWORK Clement Kam, William Hodgkiss, Dept. of Electrical and Computer Engineering, University of California,
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.
Variance reduction techniques. 2 Introduction Simulation models should be coded such that they are efficient. Efficiency in terms of programming ensures.
Basics of MPEG Picture sizes: up to 4095 x 4095 Most algorithms are for the CCIR 601 format for video frames Y-Cb-Cr color space NTSC: 525 lines per frame.
1 Approximated tracking of multiple non-rigid objects using adaptive quantization and resampling techniques. J. M. Sotoca 1, F.J. Ferri 1, J. Gutierrez.
Introduction To Tracking
Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit.
Oklahoma State University Generative Graphical Models for Maneuvering Object Tracking and Dynamics Analysis Xin Fan and Guoliang Fan Visual Computing and.
Introduction to Mobile Robotics Bayes Filter Implementations Gaussian filters.
Artificial Learning Approaches for Multi-target Tracking Jesse McCrosky Nikki Hu.
Formation et Analyse d’Images Session 8
Lecture 11: Recursive Parameter Estimation
Tracking Objects with Dynamics Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/21/15 some slides from Amin Sadeghi, Lana Lazebnik,
Motion Tracking. Image Processing and Computer Vision: 82 Introduction Finding how objects have moved in an image sequence Movement in space Movement.
Probabilistic video stabilization using Kalman filtering and mosaicking.
Adaptive Rao-Blackwellized Particle Filter and It’s Evaluation for Tracking in Surveillance Xinyu Xu and Baoxin Li, Senior Member, IEEE.
Detecting and Tracking Moving Objects for Video Surveillance Isaac Cohen and Gerard Medioni University of Southern California.
Object Detection and Tracking Mike Knowles 11 th January 2005
1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.
Fitting a Model to Data Reading: 15.1,
1 Validation and Verification of Simulation Models.
Tracking a maneuvering object in a noisy environment using IMMPDAF By: Igor Tolchinsky Alexander Levin Supervisor: Daniel Sigalov Spring 2006.
Source-Channel Prediction in Error Resilient Video Coding Hua Yang and Kenneth Rose Signal Compression Laboratory ECE Department University of California,
© 2003 by Davi GeigerComputer Vision November 2003 L1.1 Tracking We are given a contour   with coordinates   ={x 1, x 2, …, x N } at the initial frame.
Fundamentals of Multimedia Chapter 11 MPEG Video Coding I MPEG-1 and 2
Tracking with Linear Dynamic Models. Introduction Tracking is the problem of generating an inference about the motion of an object given a sequence of.
Using Redundancy and Interleaving to Ameliorate the Effects of Packet Loss in a Video Stream Yali Zhu, Mark Claypool and Yanlin Liu Department of Computer.
Object Tracking for Retrieval Application in MPEG-2 Lorenzo Favalli, Alessandro Mecocci, Fulvio Moschetti IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR.
Overview and Mathematics Bjoern Griesbach
Radial Basis Function Networks
CSE679: MPEG r MPEG-1 r MPEG-2. MPEG r MPEG: Motion Pictures Experts Group r Standard for encoding videos/movies/motion pictures r Evolving set of standards.
Image and Video Compression
Adaptive Signal Processing
Adaptive Signal Processing Class Project Adaptive Interacting Multiple Model Technique for Tracking Maneuvering Targets Viji Paul, Sahay Shishir Brijendra,
Statistical learning and optimal control:
MPEG MPEG-VideoThis deals with the compression of video signals to about 1.5 Mbits/s; MPEG-AudioThis deals with the compression of digital audio signals.
Course Project Intro IMM-JPDAF Multiple-Target Tracking Algorithm: Description and Performance Testing By Melita Tasic 3/5/2001.
The Kalman Filter ECE 7251: Spring 2004 Lecture 17 2/16/04
Abhik Majumdar, Rohit Puri, Kannan Ramchandran, and Jim Chou /24 1 Distributed Video Coding and Its Application Presented by Lei Sun.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
June, 1999 An Introduction to MPEG School of Computer Science, University of Central Florida, VLSI and M-5 Research Group Tao.
MPEG Video Technology Virtual Lab Tour: Vision Systems for Mobile Robots By: Soradech Krootjohn Vanderbilt University Center for Intelligent Systems Feb.
Compression video overview 演講者:林崇元. Outline Introduction Fundamentals of video compression Picture type Signal quality measure Video encoder and decoder.
Rate-distortion Optimized Mode Selection Based on Multi-channel Realizations Markus Gärtner Davide Bertozzi Classroom Presentation 13 th March 2001.
Tracking CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.
Statistical learning and optimal control: A framework for biological learning and motor control Lecture 4: Stochastic optimal control Reza Shadmehr Johns.
Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.
Stable Multi-Target Tracking in Real-Time Surveillance Video
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
Chapter 5 Multi-Cue 3D Model- Based Object Tracking Geoffrey Taylor Lindsay Kleeman Intelligent Robotics Research Centre (IRRC) Department of Electrical.
Dr. Sudharman K. Jayaweera and Amila Kariyapperuma ECE Department University of New Mexico Ankur Sharma Department of ECE Indian Institute of Technology,
1 Value of information – SITEX Data analysis Shubha Kadambe (310) Information Sciences Laboratory HRL Labs 3011 Malibu Canyon.
Michael Isard and Andrew Blake, IJCV 1998 Presented by Wen Li Department of Computer Science & Engineering Texas A&M University.
Principles of Radar Target Tracking The Kalman Filter: Mathematical Radar Analysis.
Tracking with dynamics
Visual Tracking by Cluster Analysis Arthur Pece Department of Computer Science University of Copenhagen
Irfan Ullah Department of Information and Communication Engineering Myongji university, Yongin, South Korea Copyright © solarlits.com.
6/9/20161 Video Compression Techniques Image, Video and Audio Compression standards have been specified and released by two main groups since 1985: International.
Kalman Filter and Data Streaming Presented By :- Ankur Jain Department of Computer Science 7/21/03.
Zhaoxia Fu, Yan Han Measurement Volume 45, Issue 4, May 2012, Pages 650–655 Reporter: Jing-Siang, Chen.
H. 261 Video Compression Techniques 1. H.261  H.261: An earlier digital video compression standard, its principle of MC-based compression is retained.
V4 – Video Tracker for Extremely Hard Night Conditions
Tracking Objects with Dynamics
Fitting Curve Models to Edges
SIMPLE ONLINE AND REALTIME TRACKING WITH A DEEP ASSOCIATION METRIC
Tracking Many slides adapted from Kristen Grauman, Deva Ramanan.
Nome Sobrenome. Time time time time time time..
Presentation transcript:

FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University *NEC Research Institute Presentation by Andy Rova School of Computing Science Simon Fraser University

March 15, Andy Rova SFU CMPT 820 Introduction FaceTrack FaceTrack System for both tracking and summarizing faces in compressed video data System for both tracking and summarizing faces in compressed video data Tracking Tracking Detect faces and trace them through time in video shots Detect faces and trace them through time in video shots Summarizing Summarizing Cluster the faces across video shots and associate them with different people Cluster the faces across video shots and associate them with different people Compressed video Compressed video Avoids the costly overhead of decoding prior to face detection Avoids the costly overhead of decoding prior to face detection

March 15, Andy Rova SFU CMPT 820 System Overview The FaceTrack system’s goals are related to ideas discussed in previous presentations The FaceTrack system’s goals are related to ideas discussed in previous presentations A face-based video summary can help users decide if they want to download the whole video A face-based video summary can help users decide if they want to download the whole video The summary provides good visual indexing information for a database search engine The summary provides good visual indexing information for a database search engine

March 15, Andy Rova SFU CMPT 820 Problem definition The goal of the FaceTrack system is to take an input video sequence and generate a list of prominent faces that appear in the video, and determine the time periods where each of the faces appears The goal of the FaceTrack system is to take an input video sequence and generate a list of prominent faces that appear in the video, and determine the time periods where each of the faces appears

March 15, Andy Rova SFU CMPT 820 General Approach Track faces within shots Track faces within shots Once tracking is done, group faces across video shots into faces of different people Once tracking is done, group faces across video shots into faces of different people Output a list of faces for each sequence Output a list of faces for each sequence For each face, list shots where it appears, and when For each face, list shots where it appears, and when Face recognition is not performed Face recognition is not performed Very difficult in unconstrained videos due to the broad range of face sizes, numbers, orientations and lighting conditions Very difficult in unconstrained videos due to the broad range of face sizes, numbers, orientations and lighting conditions

March 15, Andy Rova SFU CMPT 820 General Approach Try to work in the compressed domain as much as possible Try to work in the compressed domain as much as possible MPEG-1 and MPEG-2 videos MPEG-1 and MPEG-2 videos Used in applications such as digital TV and DVD Used in applications such as digital TV and DVD Macroblocks and motion vectors can be used directly in tracking Macroblocks and motion vectors can be used directly in tracking Greater computational speed compared to decoding Greater computational speed compared to decoding Can always decode select frames down to the pixel level for further analysis Can always decode select frames down to the pixel level for further analysis For example, grouping faces across shots For example, grouping faces across shots

March 15, Andy Rova SFU CMPT 820 MPEG Review 3 types of frame data 3 types of frame data Intra-frames (I-frames) Intra-frames (I-frames) Forward predictive frames (P-frames) Forward predictive frames (P-frames) Bidirectional predictive frames (B-frames) Bidirectional predictive frames (B-frames) Macroblocks are coding units which combine pixel information via DCT Macroblocks are coding units which combine pixel information via DCT Luminance and chrominance are separated Luminance and chrominance are separated P-frames and B-frames are subjected to motion compensation P-frames and B-frames are subjected to motion compensation Motion vectors are found and their differences are encoded Motion vectors are found and their differences are encoded

March 15, Andy Rova SFU CMPT 820 System Diagram

March 15, Andy Rova SFU CMPT 820 Face Tracking Challenges Challenges Locations of detected faces may not be accurate, since the face detection algorithm works on 16x16 macroblocks Locations of detected faces may not be accurate, since the face detection algorithm works on 16x16 macroblocks False alarms and misses False alarms and misses Multiple faces cause ambiguities when they move close to each other Multiple faces cause ambiguities when they move close to each other The motion approximated by the MPEG motion vectors may not be accurate The motion approximated by the MPEG motion vectors may not be accurate A tracking framework which can handle these issues in the compressed domain is needed A tracking framework which can handle these issues in the compressed domain is needed

March 15, Andy Rova SFU CMPT 820 The Kalman Filter A linear, discrete-time dynamic system is defined by the following difference equations: A linear, discrete-time dynamic system is defined by the following difference equations: We only have access to a sequence of measurements We only have access to a sequence of measurements Given this noisy observation data, the problem is to find the optimal estimate of the unknown system state variables Given this noisy observation data, the problem is to find the optimal estimate of the unknown system state variables

March 15, Andy Rova SFU CMPT 820 The Kalman Filter The “filter” is actually an iterative algorithm which keeps taking in new observations The “filter” is actually an iterative algorithm which keeps taking in new observations The new states are successively estimated The new states are successively estimated The error of the prediction ofis called the innovation The error of the prediction ofis called the innovation The innovation is amplified by a gain matrix and used as a correction for the state prediction The innovation is amplified by a gain matrix and used as a correction for the state prediction The corrected prediction is the new state estimate The corrected prediction is the new state estimate

March 15, Andy Rova SFU CMPT 820 The Kalman Filter In the FaceTrack system, the state vector of the Kalman filter is the kinematic information of the face In the FaceTrack system, the state vector of the Kalman filter is the kinematic information of the face position, velocity (and sometimes acceleration) position, velocity (and sometimes acceleration) The observation vector is the position of the detected face The observation vector is the position of the detected face May not be accurate May not be accurate The Kalman filter lets the system predict and update the position and parameters of the faces The Kalman filter lets the system predict and update the position and parameters of the faces

March 15, Andy Rova SFU CMPT 820 The Kalman Filter The FaceTrack system uses a 0.1 second time interval for state updates The FaceTrack system uses a 0.1 second time interval for state updates This corresponds to every I-frame and P-frame for typical MPEG GOP structure This corresponds to every I-frame and P-frame for typical MPEG GOP structure GOP: “Group Of Pictures” frame structure GOP: “Group Of Pictures” frame structure For example, IBBPBBP… For example, IBBPBBP…

March 15, Andy Rova SFU CMPT 820 The Kalman Filter For I-frames, the face detector results are used directly For I-frames, the face detector results are used directly For P-frames, the face detector results are more prone to false alarms For P-frames, the face detector results are more prone to false alarms Instead, P-frame face locations are predicted based on the MPEG motion vectors (approximately) Instead, P-frame face locations are predicted based on the MPEG motion vectors (approximately) These locations are then fed into the Kalman filter as observations These locations are then fed into the Kalman filter as observations (in contrast with previous trackers, which assumed that the motion-vector calculated locations were correct alone) (in contrast with previous trackers, which assumed that the motion-vector calculated locations were correct alone)

March 15, Andy Rova SFU CMPT 820 The Face Tracking Framework How to discriminate new faces from previous ones during tracking? How to discriminate new faces from previous ones during tracking? The Mahalanobis distance is a quantitative indicator of how close the new observation is to the prediction The Mahalanobis distance is a quantitative indicator of how close the new observation is to the prediction This can help separate new faces from existing tracks: if the Mahalanobis distance is greater than a certain threshold, then the newly detected face is unlikely to belong to a particular existing track This can help separate new faces from existing tracks: if the Mahalanobis distance is greater than a certain threshold, then the newly detected face is unlikely to belong to a particular existing track

March 15, Andy Rova SFU CMPT 820 The Face Tracking Framework In the case where two faces move close together, Mahalanobis distance alone cannot keep track of multiple faces In the case where two faces move close together, Mahalanobis distance alone cannot keep track of multiple faces Case where a face is missed or occluded: Case where a face is missed or occluded: Hypothesize the continuation of the face track Hypothesize the continuation of the face track Case of false alarm or faces close together: Case of false alarm or faces close together: Hypothesize creation of a new track Hypothesize creation of a new track The idea is to wait for new observation data before making the final decision about a track The idea is to wait for new observation data before making the final decision about a track

March 15, Andy Rova SFU CMPT 820 Intra-shot Tracking Challenges Multiple hypothesis method: Multiple hypothesis method:

March 15, Andy Rova SFU CMPT 820 Kalman Motion Models The Kalman filter is a framework which can model different types of motion, depending on the system matrices used The Kalman filter is a framework which can model different types of motion, depending on the system matrices used Several models were tested for the paper, with varying results Several models were tested for the paper, with varying results Intuition: who pays to research object tracking? Intuition: who pays to research object tracking? The military! The military! Hence many tracking models are based on trajectories that are unlike those that faces in video will likely exhibit Hence many tracking models are based on trajectories that are unlike those that faces in video will likely exhibit For example, in most commercial video, a human face will not maneuver like a jet or missile For example, in most commercial video, a human face will not maneuver like a jet or missile

March 15, Andy Rova SFU CMPT 820 Kalman Motion Models Four motion models were tested for FaceTrack Four motion models were tested for FaceTrack Constant Velocity(CV) Constant Velocity(CV) Constant Acceleration(CA) Constant Acceleration(CA) Correlated Acceleration(AA) Correlated Acceleration(AA) Variable Dimension(VDF) Variable Dimension(VDF) The testing was done against ground truth consisting of manually identified face centers in each frame The testing was done against ground truth consisting of manually identified face centers in each frame

March 15, Andy Rova SFU CMPT 820 Kalman Motion Models Rather than go through the whole process in exact detail, the next several slides are an illustration of the differences between the CV and CA models Rather than go through the whole process in exact detail, the next several slides are an illustration of the differences between the CV and CA models Also, the matrices are expanded to show how the states are updated Also, the matrices are expanded to show how the states are updated

March 15, Andy Rova SFU CMPT 820 Constant Velocity (CV) Model expand

March 15, Andy Rova SFU CMPT 820 Constant Velocity (CV) Model simplify

March 15, Andy Rova SFU CMPT 820 Constant Velocity (CV) Model simplify expand

March 15, Andy Rova SFU CMPT 820 Constant Acceleration (CA) Model Acceleration is now added to the state vector, and is explicitly modeled as constants disturbed by random noises expand

March 15, Andy Rova SFU CMPT 820 Constant Acceleration (CA) Model simplify

March 15, Andy Rova SFU CMPT 820 The Correlated Acceleration Model Replaces constant accelerations with a AR(1) model Replaces constant accelerations with a AR(1) model AR(1): First order autoregressive AR(1): First order autoregressive A stochastic process where the immediately previous value has an effect on the current value (plus some random noise) A stochastic process where the immediately previous value has an effect on the current value (plus some random noise) Why? Why? There is a strong negative autocorrelation between the accelerations of consecutive frames There is a strong negative autocorrelation between the accelerations of consecutive frames Positive accelerations tend to be followed by negative accelerations Positive accelerations tend to be followed by negative accelerations Implies that faces tend to “stabilize” Implies that faces tend to “stabilize”

March 15, Andy Rova SFU CMPT 820 The Variable Dimension Filter A system that switches between CV (constant velocity) and CA (constant acceleration) modes A system that switches between CV (constant velocity) and CA (constant acceleration) modes The dimension of the state vector changes when a maneuver is detected, hence “VDF” The dimension of the state vector changes when a maneuver is detected, hence “VDF” Developed for tracking highly maneuverable targets (probably military jets) Developed for tracking highly maneuverable targets (probably military jets)

March 15, Andy Rova SFU CMPT 820 Comparison of Motion Models average tracking error tracking runs (first 16)

March 15, Andy Rova SFU CMPT 820 Comparison of Motion Models Why does CV perform best? Why does CV perform best? Small sampling interval justifies viewing face motion as piecewise linear movements Small sampling interval justifies viewing face motion as piecewise linear movements The face cannot achieve very high accelerations (as opposed to a jet fighter) The face cannot achieve very high accelerations (as opposed to a jet fighter) AA also performs well because it fits the nature of the face motion well AA also performs well because it fits the nature of the face motion well Commercial video faces exhibit few persistent accelerations (negative autocorrelation) Commercial video faces exhibit few persistent accelerations (negative autocorrelation)

March 15, Andy Rova SFU CMPT 820 Summarization Across Shots Select representative frames for tracked faces Select representative frames for tracked faces Large, frontal-view faces are best Large, frontal-view faces are best Decode representative frames into the pixel domain Decode representative frames into the pixel domain Use clustering algorithms to group the faces into different persons Use clustering algorithms to group the faces into different persons Make use of domain knowledge Make use of domain knowledge For example, people do not usually change clothes within a news segment, but often do change outfits within a sitcom episode For example, people do not usually change clothes within a news segment, but often do change outfits within a sitcom episode

March 15, Andy Rova SFU CMPT 820 Simulation Results

March 15, Andy Rova SFU CMPT 820 Conclusions & Future Research The FaceTrack is an effective face tracking (and summarization) architecture, within which different detection and tracking methods can be used The FaceTrack is an effective face tracking (and summarization) architecture, within which different detection and tracking methods can be used Could be updated to use new face detection algorithms or improved motion models Could be updated to use new face detection algorithms or improved motion models Based on the results, the CV and AA motion models are sufficient for commercial face motion Based on the results, the CV and AA motion models are sufficient for commercial face motion Summarization techniques need the most development, followed by optimizing tracking for adverse situations Summarization techniques need the most development, followed by optimizing tracking for adverse situations