Characterizing activity in video shots based on salient points Nicolas Moënne-Loccoz Viper group Computer vision & multimedia laboratory University of.

Slides:

Advertisements

Similar presentations

Scale & Affine Invariant Interest Point Detectors Mikolajczyk & Schmid presented by Dustin Lennon.

Advertisements

Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.

Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.

Space-time interest points Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer Science KTH (Royal Institute.

Image alignment Image from

Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.

IBBT – Ugent – Telin – IPI Dimitri Van Cauwelaert A study of the 2D - SIFT algorithm Dimitri Van Cauwelaert.

Local Descriptors for Spatio-Temporal Recognition

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

1 Interest Operators Find “interesting” pieces of the image –e.g. corners, salient regions –Focus attention of algorithms –Speed up computation Many possible.

A Study of Approaches for Object Recognition

Object-based Image Representation Dr. B.S. Manjunath Sitaram Bhagavathy Shawn Newsam Baris Sumengen Vision Research Lab University of California, Santa.

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

Recognising Panoramas

1 Interest Operator Lectures lecture topics –Interest points 1 (Linda) interest points, descriptors, Harris corners, correlation matching –Interest points.

Incremental Learning of Temporally-Coherent Gaussian Mixture Models Ognjen Arandjelović, Roberto Cipolla Engineering Department, University of Cambridge.

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.

Feature extraction: Corners and blobs

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

MPEG-7 Motion Descriptors. Reference ISO/IEC JTC1/SC29/WG11 N4031 ISO/IEC JTC1/SC29/WG11 N4062 MPEG-7 Visual Motion Descriptors (IEEE Transactions on.

1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.

Scale Invariant Feature Transform (SIFT)

Blob detection.

A Probabilistic Framework for Video Representation Arnaldo Mayer, Hayit Greenspan Dept. of Biomedical Engineering Faculty of Engineering Tel-Aviv University,

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

1 Interest Operators Find “interesting” pieces of the image Multiple possible uses –image matching stereo pairs tracking in videos creating panoramas –object.

Overview Introduction to local features

Computer vision.

The MPEG-7 Color Descriptors

IRISA / INRIA Rennes Computational Vision and Active Perception Laboratory (CVAP) KTH (Royal Institute of Technology)

1 Interest Operators Harris Corner Detector: the first and most basic interest operator Kadir Entropy Detector and its use in object recognition SIFT interest.

Recognition and Matching based on local invariant features Cordelia Schmid INRIA, Grenoble David Lowe Univ. of British Columbia.

Overview Harris interest points Comparing interest points (SSD, ZNCC, SIFT) Scale & affine invariant interest points Evaluation and comparison of different.

Local invariant features Cordelia Schmid INRIA, Grenoble.

Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.

CSCE 643 Computer Vision: Extractions of Image Features Jinxiang Chai.

Local invariant features Cordelia Schmid INRIA, Grenoble.

Distinctive Image Features from Scale-Invariant Keypoints Ronnie Bajwa Sameer Pawar * * Adapted from slides found online by Michael Kowalski, Lehigh University.

A DISTRIBUTION BASED VIDEO REPRESENTATION FOR HUMAN ACTION RECOGNITION Yan Song, Sheng Tang, Yan-Tao Zheng, Tat-Seng Chua, Yongdong Zhang, Shouxun Lin.

Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06.

Segmentation of Vehicles in Traffic Video Tun-Yu Chiang Wilson Lau.

Overview Introduction to local features Harris interest points + SSD, ZNCC, SIFT Scale & affine invariant interest point detectors Evaluation and comparison.

Feature extraction: Corners and blobs. Why extract features? Motivation: panorama stitching We have two images – how do we combine them?

Features, Feature descriptors, Matching Jana Kosecka George Mason University.

M4 / September Integrating multimodal descriptions to index large video collections M4 meeting – Munich Nicolas Moënne-Loccoz, Bruno Janvier,

CSE 185 Introduction to Computer Vision Feature Matching.

Local features: detection and description

Distinctive Image Features from Scale-Invariant Keypoints

Stereo Vision Local Map Alignment for Robot Environment Mapping Computer Vision Center Dept. Ciències de la Computació UAB Ricardo Toledo Morales (CVC)

CS654: Digital Image Analysis

776 Computer Vision Jan-Michael Frahm Spring 2012.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

Blob detection.

Motion Estimation of Moving Foreground Objects Pierre Ponce ee392j Winter March 10, 2004.

Invariant Local Features Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging.

776 Computer Vision Jan-Michael Frahm Spring 2012.

SIFT Scale-Invariant Feature Transform David Lowe

Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

Video Google: Text Retrieval Approach to Object Matching in Videos

Scale and interest point descriptors

Paper Presentation: Shape and Matching

Shape matching and object recognition using shape contexts

CSE 455 – Guest Lectures 3 lectures Contact Interest points 1

Knowledge-based event recognition from salient regions of activity

Video Google: Text Retrieval Approach to Object Matching in Videos

CSE 185 Introduction to Computer Vision

ECE734 Project-Scale Invariant Feature Transform Algorithm

Recognition and Matching based on local invariant features

Presentation transcript:

Characterizing activity in video shots based on salient points Nicolas Moënne-Loccoz Viper group Computer vision & multimedia laboratory University of Geneva

NML - CVML - UniGe2 Outline Context Video Activity extraction – Spatial salient points – Spatio-temporal salient points – Spatio-temporal salient regions Results Conclusion

NML - CVML - UniGe3 Context Describe visual content of video  Index, retrieve and browse video database Requirements – Generic approach (v.s. domain oriented) – Local approach (v.s. global description of the content) – Computationally efficient approach Video activity : salient region of the video 3D space

NML - CVML - UniGe4 Context Frames Activity #1 Activity #2  Description in space and time of video activity  Inference based on video object and event relationships  High level indexing

NML - CVML - UniGe5 Context Related approaches : Spatio-temporal segmentation – Segmentation problem – Computational efficiency Our approach : – Spatio-temporal salient points – Spatial grouping of salient points – Temporal matching of salient regions  Set of activities

NML - CVML - UniGe6 Overview Salient points & trajectories Global motion estimation Motion outliers Spatial grouping Video stream Salient extraction Temporal matching Salient extraction

NML - CVML - UniGe7 Salient points Points in the image space – Repetitive (robust) – High information content  Scale invariant interest points (Mikolajczyk, Schmid 2001) – One of the most robust – Salient points with characteristic scale

NML - CVML - UniGe8 Salient point extraction Linear Scale-Space : Harris function : Salient points (image space) : local maxima h(v,s) Laplacian over scale : Salient points (scale space) : local maxima l(v,s) & h(v,s)

NML - CVML - UniGe9 Salient point extraction Example : scale

NML - CVML - UniGe10 Salient point extraction scale

NML - CVML - UniGe11 Motion estimation Goal : – Find points having salient temporal behaviour  Estimate background motion model  Select points that do not follow this background motion model Estimation : – Compute salient point trajectories – Estimate corresponding affine motion model

NML - CVML - UniGe12 Trajectories Point descriptors : Local Grayvalue Invariants Point distance : Mahalanobis distance

NML - CVML - UniGe13 Trajectories Goodness of match : Candidate matching points – Matches with spatial distance below a threshold Relaxation process : – Disambiguating set of candidate matches – Greedy Winner-Takes-All algorithm

NML - CVML - UniGe14 Motion estimation Affine motion model : Estimate model from trajectories – Iterative least square error estimate (Tukey M-Estimator)  select points that belong to the global motion model  Assumption : +50% points belong to the background

NML - CVML - UniGe15 Motion estimation Points of the background and their motion estimated using the presented approach All points and their motion estimated by a dense motion estimator

NML - CVML - UniGe16 Spatio-temporal salient points Points whose trajectory does not fit the global motion model  Outliers (moving objects) Points without trajectory (no matching point)  New points (appearing or deformable objects)

NML - CVML - UniGe17 Spatio-temporal salient points Fixed camera Moving camera

NML - CVML - UniGe18 Salient regions Set of spatio-temporal salient points  Feature distribution of points (RGB colour features)  Spatial distribution of points Grouping process : Estimate salient region models

NML - CVML - UniGe19 Feature model Feature description – A salient point is characterized by the feature distribution of its neighbourhood – Assumption : maximum of four regions in the neighbourhood of the points – Compute the corresponding colour distributions : K-means clustering Gaussian model Gaussian models clustering – Greedy algorithm (AHC)  Set of Gaussian distributions representing the distribution of the neighbourhood of the salient points :

NML - CVML - UniGe20 Salient region model Feature models – Mixture of Gaussians  Corresponding weight of each Gaussian Spatial model : – Estimate spatial pdf from salient points & associated scale

NML - CVML - UniGe21 Salient region models Iterate a RanSaC algorithm Estimate salient region model – Robust estimation (Tukey M-estimator) – Cost function :

NML - CVML - UniGe22 Salient regions Fixed camera Moving camera

NML - CVML - UniGe23 Temporal matching Spatio-temporal salient regions of arbitrary length  Matching of salient regions Use salient points trajectories 1. Match regions with the highest number of matching points

NML - CVML - UniGe24 Results - Meetings

NML - CVML - UniGe25 Results – Misc

NML - CVML - UniGe26 Conclusion Contribution – Highly informational content descriptor – Generic content descriptor – Local in space and time content descriptor Limitation – Noisy & short activity Ongoing work – Temporal filtering of activity – Indexing of videos through the set of activity