Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

Slides:



Advertisements
Similar presentations
Kien A. Hua Division of Computer Science University of Central Florida.
Advertisements

November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.
Robust statistical method for background extraction in image segmentation Doug Keen March 29, 2001.
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Hongliang Li, Senior Member, IEEE, Linfeng Xu, Member, IEEE, and Guanghui Liu Face Hallucination via Similarity Constraints.
Verbs and Adverbs: Multidimensional Motion Interpolation Using Radial Basis Functions Presented by Sean Jellish Charles Rose Michael F. Cohen Bobby Bodenheimer.
Foreground Modeling The Shape of Things that Came Nathan Jacobs Advisor: Robert Pless Computer Science Washington University in St. Louis.
Tracking Multiple Occluding People by Localizing on Multiple Scene Planes Professor :王聖智 教授 Student :周節.
Robust Object Tracking via Sparsity-based Collaborative Model
Adviser : Ming-Yuan Shieh Student ID : M Student : Chung-Chieh Lien VIDEO OBJECT SEGMENTATION AND ITS SALIENT MOTION DETECTION USING ADAPTIVE BACKGROUND.
Hierarchical Saliency Detection School of Electronic Information Engineering Tianjin University 1 Wang Bingren.
Authers : Yael Pritch Alex Rav-Acha Shmual Peleg. Presenting by Yossi Maimon.
A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.
Local Descriptors for Spatio-Temporal Recognition
Tracking Features with Large Motion. Abstract Problem: When frame-to-frame motion is too large, KLT feature tracker does not work. Solution: Estimate.
International Conference on Image Analysis and Recognition (ICIAR’09). Halifax, Canada, 6-8 July Video Compression and Retrieval of Moving Object.
Efficient Moving Object Segmentation Algorithm Using Background Registration Technique Shao-Yi Chien, Shyh-Yih Ma, and Liang-Gee Chen, Fellow, IEEE Hsin-Hua.
ADVISE: Advanced Digital Video Information Segmentation Engine
Direct Methods for Visual Scene Reconstruction Paper by Richard Szeliski & Sing Bing Kang Presented by Kristin Branson November 7, 2002.
Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.
Advanced Topics in Computer Vision Spring 2006 Video Segmentation Tal Kramer, Shai Bagon Video Segmentation April 30 th, 2006.
High-Quality Video View Interpolation
A Self-Organizing Approach to Background Subtraction for Visual Surveillance Applications Lucia Maddalena and Alfredo Petrosino, Senior Member, IEEE.
Deriving Intrinsic Images from Image Sequences Mohit Gupta Yair Weiss.
MULTIPLE MOVING OBJECTS TRACKING FOR VIDEO SURVEILLANCE SYSTEMS.
1 Motion in 2D image sequences Definitely used in human vision Object detection and tracking Navigation and obstacle avoidance Analysis of actions or.
A Novel 2D To 3D Image Technique Based On Object- Oriented Conversion.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 11, NOVEMBER 2011 Qian Zhang, King Ngi Ngan Department of Electronic Engineering, the Chinese university.
TelosCAM: Identifying Burglar Through Networked Sensor-Camera Mates with Privacy Protection Presented by Qixin Wang Shaojie Tang, Xiang-Yang Li, Haitao.
Fast Texture Synthesis Tree-structure Vector Quantization Li-Yi WeiMarc Levoy Stanford University.
A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.
1 Real Time, Online Detection of Abandoned Objects in Public Areas Proceedings of the 2006 IEEE International Conference on Robotics and Automation Authors.
Real Time Abnormal Motion Detection in Surveillance Video Nahum Kiryati Tammy Riklin Raviv Yan Ivanchenko Shay Rochel Vision and Image Analysis Laboratory.
Webcam-synopsis: Peeking Around the World Young Ki Baik (CV Lab.) (Fri)
Video Synopsis Yael Pritch Alex Rav-Acha Shmuel Peleg The Hebrew University of Jerusalem.
SCENE SUMMARIZATION Written by: Alex Rav-Acha // Yael Pritch // Shmuel Peleg (2006) PRESENTED BY: NIHAD AWIDAT.
Unsupervised Learning-Based Spatio-Temporal Vehicle Tracking and Indexing for Transportation Multimedia Database Systems Chengcui Zhang Department of Computer.
Multimodal Interaction Dr. Mike Spann
Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)
1. Introduction Motion Segmentation The Affine Motion Model Contour Extraction & Shape Estimation Recursive Shape Estimation & Motion Estimation Occlusion.
Characterizing activity in video shots based on salient points Nicolas Moënne-Loccoz Viper group Computer vision & multimedia laboratory University of.
ALIGNMENT OF 3D ARTICULATE SHAPES. Articulated registration Input: Two or more 3d point clouds (possibly with connectivity information) of an articulated.
Object Stereo- Joint Stereo Matching and Object Segmentation Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Michael Bleyer Vienna.
High-Resolution Interactive Panoramas with MPEG-4 발표자 : 김영백 임베디드시스템연구실.
Digital Image Processing CCS331 Relationships of Pixel 1.
Deriving Intrinsic Images from Image Sequences Mohit Gupta 04/21/2006 Advanced Perception Yair Weiss.
Dynamosaicing Dynamosaicing Mosaicing of Dynamic Scenes (Fri) Young Ki Baik Computer Vision Lab Seoul National University.
Computer Vision, Robert Pless
December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.
Action as Space-Time Shapes
Non-Photorealistic Rendering and Content- Based Image Retrieval Yuan-Hao Lai Pacific Graphics (2003)
Stable Multi-Target Tracking in Real-Time Surveillance Video
The Implementation of Markerless Image-based 3D Features Tracking System Lu Zhang Feb. 15, 2005.
Segmentation of Vehicles in Traffic Video Tun-Yu Chiang Wilson Lau.
Student Name: Honghao Chen Supervisor: Dr Jimmy Li Co-Supervisor: Dr Sherry Randhawa.
Stas Goferman Lihi Zelnik-Manor Ayellet Tal Technion.
Journal of Visual Communication and Image Representation
Presented by: Idan Aharoni
Instructor: Mircea Nicolescu Lecture 5 CS 485 / 685 Computer Vision.
Matte-Based Restoration of Vintage Video 指導老師 : 張元翔 主講人員 : 鄭功運.
Michal Irani Dept of Computer Science and Applied Math Weizmann Institute of Science Rehovot, ISRAEL Spatio-Temporal Analysis and Manipulation of Visual.
Video object segmentation and its salient motion detection using adaptive background generation Kim, T.K.; Im, J.H.; Paik, J.K.;  Electronics Letters 
Learning Patterns of Activity
Summary of “Efficient Deep Learning for Stereo Matching”
Fast Preprocessing for Robust Face Sketch Synthesis
Feature description and matching
Vehicle Segmentation and Tracking in the Presence of Occlusions
PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD
Image and Video Processing
Feature descriptors and matching
Presentation transcript:

Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1

Outline Introduction Related Work on Video Abstraction Synopsis by Energy Minimization Object-based Synopsis Synopsis of Endless Video Limitations and Failures Demo 2

Introduction Video browsing and retrieval is time consuming, most captured video is never watched or examined. Video synopsis provides a short video representation, while preserving the essential activities of the original video. The activity in the video is condensed into a shorter period by simultaneously showing multiple activities, even when they originally occurred at different times. The synopsis video is also an index of the original video by pointing to the original time of each activity. 3

Introduction The properties of video synopsis: The video synopsis should be shorter than the original video. The video synopsis is also a video, expressing the dynamics of the scene. Reduce as much as spatiotemporal redundancy as possible, the relative timing between activities may change. Visible seams and fragmented objects should be avoided. 4

Introduction Video synopsis can make surveillance cameras and webcams more useful by giving the viewer summaries. Synopsis server analyzes the live video feed of interesting events and record the object-based description of the video. The description includes duration, location, appearance. For example: Event  飛機起飛 Object  飛機 Description  起飛的時間, 機場, 飛機的外觀 In a 3D space-time description of the video, object is represented by a “tube”. 5

Introduction Fig. 1. The input video shows a walking person and, after a period of inactivity, displays a flying bird. A compact video synopsis can be produced by playing the bird and the person simultaneously. Fig. 2. Basic temporal rearrangement of object. Objects of interest are defined and viewed as tubes in the space-time volume. (a)Two objects recorded at different times are shifted to the same time interval in the shorter video synopsis. (b) A single object moving during a long time is broken into segments having a shorter duration and those segments are shifted in time and played simultaneously. (c) Intersection of objects does not disturb the synopsis when object tubes are broken into segments. Input video Video synopsis 6

Related Work on Video Abstraction Fast forwarding Individual frames or groups of frames are skipped in fixed or adaptive intervals. Simple but only complete frames can be removed. Video condensation ratio is relatively low. Video summarization Key frames are extracted and usually presented simultaneously as a storyline. Loses the dynamics of the original video 7

Related Work on Video Abstraction Video montage Spatial and temporal shifts are applied to object to create a video summary. This paper uses only temporal transformations, keeping spatial location intact. 8 Fig. 3. Comparison between “video montage” and our approach. (a) A frame from a “video montage.” Two space-time regions were shifted in both time and space and then stitched together. Visual seams between the different regions are unavoidable. (b) A frame from a “video synopsis.” Only temporal shifts were applied, enabling seamless stitching.

Synopsis by Energy Minimization Any synopsis pixel S(x,y,t) can come from an input pixel I(x,y,M(x,y,t)). The time shift M is obtained by minimizing the following cost function: E a (M) : activity, indicates the loss in activities. Activity measure: difference from the background E d (M) : discontinuity, indicates the sum of color difference across seams between spatiotemporal neighbors. e i : the six unit vectors representing the six temporal neighbors. Four spatial and two temporal neighbors. 9

Synopsis by Energy Minimization 10 Fig. 4 (a) The shorter video synopsis S is generated from the input video I by including most active pixels together with their spatiotemporal neighborhood. To assure smoothness, when pixel A in S corresponds to pixel B in I, their “cross border” neighbors in space as well as in time should be similar.

Synopsis by Energy Minimization A seam exists between two neighboring locations (x1,y1) and (x2,y2) in S if M (x1,y1) != M (x2,y2). e i : four unit vectors describing the four spatial neighbors. K : # of frames in the output. N : # of frames in the input. 11 Fig. 4. (b) An approximate solution can be obtained by restricting consecutive synopsis pixels to come from consecutive input pixels.

Object-based Synopsis Low-level approach for video synopsis as described earlier is limited to satisfying local properties such as avoiding visible seams. Higher level object-based properties can be incorporated when objects can be detected and tracked. To enable segmentation of moving foreground objects, we start with background construction. For short video clips, using a temporal median over entire clip. For surveillance cameras, using a temporal median over a few minutes before and after each frames. 12

Object-based Synopsis 13 Fig. 6. Background images from a surveillance camera at Stuttgart airport. The bottom images are at night, while the top images are in daylight. Parked cars and parked airplanes become part of the background.

Object-based Synopsis Use a simplification of [32] to compute the space-time tubes representing dynamic objects. High quality and real time. A single video sequence with a moving foreground object and stationary background, using background subtraction, color and contrast cues to extract a foreground accurately and efficiently. Each tube b is represented by its characteristic function: t b : the time interval in which this object exists. 14 [32] J. Sun, W. Zhang, X. Tang, and H. Shum, “Background Cut,” Proc. Ninth European Conf. Computer Vision, pp , 2006.

Object-based Synopsis 15 [32] J. Sun, W. Zhang, X. Tang, and H. Shum, “Background Cut,” Proc. Ninth European Conf. Computer Vision, pp , Fig. 7. Four extracted tubes shown “flattened” over the corresponding backgrounds from Fig. 6. The left tubes correspond to ground vehicles, while the right tubes correspond to airplanes on the runway at the back.

Object-based Synopsis Create a synopsis having maximum activity while avoiding collision between objects. Optimal synopsis video as the one that minimizes the following energy function: 16

Object-based Synopsis Activity cost : penalize for objects that are not mapped to a valid time in the synopsis. Collision cost : for every two shifted tubes, define the collision cost as the volume of their space-time overlap weighted by their activity measures. Reducing the weights of collision cost will result in a denser video where object may overlap. Increasing this weight will result in a sparser video where objects do not overlap and less activity is presented. 17

Object-based Synopsis Temporal consistency cost : preserving the chronological order of events. The amount of interaction d(b, b’) between each pair of tubes is estimated from their relative spatiotemporal distance. d(b,b’,t) : euclidean distance : the extent of the space interaction between tubes. : the extent of the space interaction between tubes. If tube b and b’ do not share a common time at the synopsis video. : extent of time in which events still have temporal interaction. : extent of time in which events still have temporal interaction. 18

Object-based Synopsis Energy minimization Use simple greedy optimization. The optimization was applied in the space of all possible temporal mappings M. Initial state, use the state in which all tubes are shifted to the beginning of the synopsis video. In order to accelerate computation, restrict the temporal shifts of tubes to be in jumps of 10 frames. 19

Object-based Synopsis Stroboscopic panoramic synopsis Long tubes exist in the input video, the duration of the synopsis video is bounded. 20

Object-based Synopsis Surveillance application 21 Fig. 12. Video synopsis from street surveillance. (a) A typical frame from the original video (22 seconds). (b) A frame from a video synopsis movie(2 seconds) showing condensed activity. (c) A frame from a shorter video synopsis (0.7 second) showing even more condensed activity.

Synopsis of Endless Video Make the webcam resource more useful. Build a system which is based on the object-based synopsis allows dealing with endless videos. Query to system For example: I would like to watch in one minute a synopsis of the video from this camera captured during the last hour. Respond to query The most interesting events(tubes) are collected from the desired period and are assembled into a synopsis video of the desired length. The synopsis video is an index into the original video as each object includes a pointer to its original time. 22

Synopsis of Endless Video 23

Synopsis of Endless Video Removing stationary frames Filter out frames with no activity during online phase. Record frames according to two criteria A global change in the scene, measured by SSD between the incoming frame and the last kept frame. For lighting change. The existence of a moving object measured by the maximal SSD in small windows. By assuming that moving objects with a very small duration are not important(e.g. less than a second). Video activity can be measured only once in every 10 frames. 24

Synopsis of Endless Video The object queue Main challenge is handling endless videos. The naive scheme is to throw out the oldest activity.  Not good Estimate the importance of each object to possible future queries and throw objects out accordingly. Importance (activity) Collision potential (spatial activity distribution) Age Other options like specify activity is of interest. 25 Fig. 14. The spatial distribution of activity in the airport scene (intensity is log of activity value). The activity distribution of a single tube is on the left and the average over all tubes is on the right. As expected, the highest activity is on the car lanes and on the runway. The potential for the collision of tubes is higher in regions having a higher activity.

Synopsis of Endless Video Synopsis generation Generate a background video. A consistency cost is computed for each object and for each possible time in the synopsis. An energy minimization determines which tubes appear in the synopsis and at what time. The selected tubes are combined with the background. 26

Synopsis of Endless Video Time-lapse background Represent the background changes over time. Represent the background of the activity tubes. Constructing two temporal histograms: A temporal activity histogram Ha of the video stream. A uniform temporal histogram Ht of the video stream. Compute a third histogram by interpolating the two histograms 27

Synopsis of Endless Video Consistency with background Prefer to stitch tubes to background images having a similar appearance. Final energy function: Final energy function: 28

Synopsis of Endless Video Stitching the synopsis video Use the modification of Poisson editing to deal with objects coming from different lighting condition. Overlapping tubes are blended together by letting each pixel be a weighted average of the corresponding pixels from the stitched activity tubes, with weights proportional to the activity measures. 29 [20] M. Gangnet, P. Perez, and A. Blake, “Poisson Image Editing,” Proc. ACM SIGGRAPH ’03, pp , July 2003.

Limitations and failures Video synopsis is less applicable in several cases, some of which are listed below: Video with already dense activity. All locations are active all the time. An example is a camera in a busy train station. Video with already dense activity. All locations are active all the time. An example is a camera in a busy train station. Edited video, like a feature movie. The intentions of the movie creator may be destroyed by changing the chronological order of events. 30

Demo 31