Space-time interest points Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer Science KTH (Royal Institute.

Slides:

Advertisements

Similar presentations

Feature extraction: Corners

Advertisements

Scale & Affine Invariant Interest Point Detectors Mikolajczyk & Schmid presented by Dustin Lennon.

CSE 473/573 Computer Vision and Image Processing (CVIP)

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor

TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.

Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.

Matching with Invariant Features

Local Descriptors for Spatio-Temporal Recognition

Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,

Feature extraction: Corners 9300 Harris Corners Pkwy, Charlotte, NC.

(1) Feature-point matching by D.J.Duff for CompVis Online: Feature Point Matching Detection,

A Study of Approaches for Object Recognition

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.

Feature extraction: Corners and blobs

Object Recognition Using Distinctive Image Feature From Scale-Invariant Key point D. Lowe, IJCV 2004 Presenting – Anat Kaspi.

Scale Invariant Feature Transform (SIFT)

Blob detection.

1 Invariant Local Feature for Object Recognition Presented by Wyman 2/05/2006.

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor

Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe – IJCV 2004 Brien Flewelling CPSC 643 Presentation 1.

Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Overview Introduction to local features

Learning to classify the visual dynamics of a scene Nicoletta Noceti Università degli Studi di Genova Corso di Dottorato.

Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.

Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing.

IRISA / INRIA Rennes Computational Vision and Active Perception Laboratory (CVAP) KTH (Royal Institute of Technology)

Characterizing activity in video shots based on salient points Nicolas Moënne-Loccoz Viper group Computer vision & multimedia laboratory University of.

Overview Harris interest points Comparing interest points (SSD, ZNCC, SIFT) Scale & affine invariant interest points Evaluation and comparison of different.

Local invariant features Cordelia Schmid INRIA, Grenoble.

Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.

Periodic Motion Detection via Approximate Sequence Alignment Ivan Laptev*, Serge Belongie**, Patrick Perez* *IRISA/INRIA, Rennes, France **Univ. of California,

Recognizing Action at a Distance Alexei A. Efros, Alexander C. Berg, Greg Mori, Jitendra Malik Computer Science Division, UC Berkeley Presented by Pundik.

CVPR 2003 Tutorial Recognition and Matching Based on Local Invariant Features David Lowe Computer Science Department University of British Columbia.

December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.

CSCE 643 Computer Vision: Extractions of Image Features Jinxiang Chai.

Feature extraction: Corners 9300 Harris Corners Pkwy, Charlotte, NC.

776 Computer Vision Jan-Michael Frahm, Enrique Dunn Spring 2013.

Local invariant features Cordelia Schmid INRIA, Grenoble.

Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06.

Harris Corner Detector & Scale Invariant Feature Transform (SIFT)

Segmentation of Vehicles in Traffic Video Tun-Yu Chiang Wilson Lau.

Overview Introduction to local features Harris interest points + SSD, ZNCC, SIFT Scale & affine invariant interest point detectors Evaluation and comparison.

Feature extraction: Corners and blobs. Why extract features? Motivation: panorama stitching We have two images – how do we combine them?

Features, Feature descriptors, Matching Jana Kosecka George Mason University.

Features Jan-Michael Frahm.

CS654: Digital Image Analysis

Presented by David Lee 3/20/2006

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

CSE 185 Introduction to Computer Vision Local Invariant Features.

MASKS © 2004 Invitation to 3D vision Lecture 3 Image Primitives andCorrespondence.

Recognizing specific objects Matching with SIFT Original suggestion Lowe, 1999,2004.

Keypoint extraction: Corners 9300 Harris Corners Pkwy, Charlotte, NC.

Blob detection.

SIFT Scale-Invariant Feature Transform David Lowe

CS262: Computer Vision Lect 09: SIFT Descriptors

Presented by David Lee 3/20/2006

Distinctive Image Features from Scale-Invariant Keypoints

Project 1: hybrid images

TP12 - Local features: detection and description

A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology

Video Google: Text Retrieval Approach to Object Matching in Videos

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor

SIFT keypoint detection

Knowledge-based event recognition from salient regions of activity

Edge detection f(x,y) viewed as a smooth function

Video Google: Text Retrieval Approach to Object Matching in Videos

Lecture VI: Corner and Blob Detection

Lecture 5: Feature invariance

Presented by Xu Miao April 20, 2005

Lecture 5: Feature invariance

Presentation transcript:

Space-time interest points Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer Science KTH (Royal Institute of Technology) SE Stockholm, Sweden Ivan Laptev and Tony Lindeberg

General motivation  Spatio-temporal image data contains rich information about the external world.  Traditional methods for video analysis include optical flow estimation; tracking of features/models over time.  Observation: Events in video are often characterised by non- constant motion and non-constant appearance

Spatio-temporal data Idea: detect points with high spatio-temporal variation of image values Direct method for event detection

Why local features in time? Non-constant motion in images may be an indication of  physical interaction between objects in the world (ball bouncing the ground, car crash, etc.)  non-rigid motion, e.g. relative motion of body parts, gestures, etc.  occlusions/disocclusions in the field of view Goal:  make a sparse and informative representation of complex motion patterns;  obtain robustness w.r.t. missing data (occlusions) and outliers (dynamic, complex background)

Interest points in space  (Harris and Stephens 1988): image points with high variation of values in both image direction High eigenvectors of the second-moment matrix integrated at the local neighbourhood where L x, L y are Gaussian derivatives  Select points with positive maxima of the corner function

Interest points in space-time  High variation of image values in both space and time extend Harris corner function into 3D spatio-temporal domain; compute the second moment matrix where L x, L y, L t are Gaussian derivatives in space-time obtained by spatio-temporal convolution: and

Interest points in space-time  Points with high space-time variations of image values correspond to the maxima of  distinct scale parameters for the spatial scale and the temporal scale : spatial and temporal extents of events are independent in general.  Convolution with the Gaussian kernels violates causality constraint of temporal domain. Alternative (recursive) kernels can be used to address this problem (Koenderink 1988, Lindeberg & Fagerström 1996, Florack 1997) where are eigenvalues of.

Experiments with synthetic sequences Spatio-temporal ”corner”Collision I

Experiments with synthetic sequences Collision II  2 =16  2 =16  2 =8  2 =8

Motivation for scale selection  2 =2  2 =8  2 =8  2 =8  2 =2  2 =2  2 =8  2 =2

Motivation for velocity adaptation v x =-0.8 v x =1.4 v x =0.0

Spatio-temporal scale selection  Estimate the spatio-temporal extent of image structures  Local scale estimation has been investigated and applied previously in the spatial domain (Lindeberg IJCV’98; Chomat et.al. ECCV’00; Mikolajczyk and Schmid ICCV’01):  Here: Extend scale selection into the spatio-temporal domain; estimate spatial and temporal scale parameters Task: find normalisation parameters (a,b,c,d) of such that normalised derivatives obtain extrema at scales corresponding to the extents of image structures in space-time

Spatio-temporal scale selection  Analyse spatio-temporal blob  Extrema constraints Give parameter values a=1, b=1/4, c=1/2, d=3/4

 The normalised spatio-temporal Laplacian operator assumes extrema values at positions and scales corresponding to the centres and the spatio-temporal extent of a Gaussian blob Spatio-temporal scale selection

Want to adapt point neighbourhoods to the direction of motion and obtain invariance w.r.t. the first-order motion Velocity adaptation Stationary pattern: First-order motion is described by the Galilean transformation where and it follows

Velocity adaptation expansion gives However, this scheme needs the estimate of in advance in order to adapt the smoothing filter kernel. Iteratively estimate and adapt the filter kernel until the fixed-point condition is reached:  with (Similar approach for affine shape adaptation in space, Lindeberg)

 Find interest points p=(x,y,t,  2,  2,v x,v y ) that are  maxima of the corner function H over (x,y,t);  maxima of the normalised Laplacian over (  2,  2 );  satisfy fixed-point condition Scale and velocity adaptation Approach: 1.Find interest points P for a set of sampled (  2,  2,v x,v y ) 2.For each p i in P 3. select new scale (  2,  2 ) at (x,y,t) that maximises Laplacian in the local scale-neighbourhood 4. estimate velocity ( v x, v y ) 5. re-detect interest point for new scales and velocities 6. If changes in (  2,  2, v x, v y ) => repeat from else i=i+1 (Similar in spatial domain: Mikolajczyk and Schmid ICCV01, ECCV02)

Scale- and velocity-adapted interest points

Experiments Stationary cameraStabilised camera

Experiments Stationary camera Stabilised camera No adaptationScale adaptation Scale and velocity adaptation

Experiments Invariance with respect to size changes

Experiments Selection of temporal scales captures the temporal extents of events

Applications of interest points (preliminary results)  Classify detected interest points using their spatio-temporal neighbourhoods  Represent video data by a set of classified interest points (features)  Align video sequences by matching spatio- temporal features  Recognise motion patterns using probability distribution of features derived from training sequences

Classification of events  When analysing periodic motion such as the gait pattern, the interest points with similar spatio-temporal structure are likely to correspond to the interesting events, while the others are more likely to be caused by noise.  Describe each interest point p i, i=1,...,n by the local responses of spatio-temporal Gaussian derivatives: and normalise descriptors w.r.t. the covariance   Group similar points in the space of normalised descriptors using k-means clustering  Select significant clusters and represent each of them by the mean and the covariance matrix

K-means clustering For the gait pattern, four significant clusters (clusters with most points) correspond to distinct spatio-temporal events c1 c2 c3 c4 Clustering Classification

Application I: Sequence matching  Represent the model sequence and the test sequence by a set of classified spatio-temporal points.  Find a valid transformation of a model that brings model features in correspondence with data features. Problem:Find walking people and estimate their poses from image sequences Match a model sequence with data sequences using spatio-temporal interest points  Note: the feature matching is defined in a 3D spatio-temporal window

Walking model  Represent the gait pattern using classified spatio-temporal points corresponding the one gait cycle  Define the state of the model X for the moment t 0 by the position, the size, the phase and the velocity of a person:  Associate each phase  with a silhouette of a person extracted from the original sequence

Sequence alignment  Given a data sequence with the current moment t 0, detect and classify interest points in the time window of length t w : (t 0, t 0 -t w )  Transform model features according to X and for each model feature f m,i =(x m,i, y m,i, t m,i,  m,i,  m,i, c m,i ) compute its distance d i to the most close data feature f d,j, c d,j =c m,i :  Define the ”fit function” D of model configuration X as a sum of distances of all features weighted w.r.t. their ”age” (t 0 -t m ) such that recent features get more influence on the matching

Sequence alignment data features model features At each moment t 0 minimize D with respect to X using standard Gauss-Newton minimization method

Experiments

Walking Exercise Running Cycling 1.Detect spatio-temporal velocity- and scale-adapted interest points and compute their jet descriptors 2.Cluster all the descriptors using k-means 3.Compute distributions of points over detected clusters for each sequence separately Application II: Action recognition

Cluster id Walking Exercise Running Cycling Model histograms (related to Leung & Malik, IJCV01)

Walking Exercise Running Cycling Background Test sequences

Classification 1.Detect interest points and classify their jet responses w.r.t. the cluster means : 2.Compute distribution of cluster labels and classify the sequence as an action if Walking Exercise Running Cycling Confusion matrix: test walking test exercise test running test cycling test background

Classification ROC curve corresponding to changes of the decision threshold when classifying 37 sequences using different histogram- distance measures % correct % false

Performance comparison Velocity- and scale- adapted space-time interest points Non-adapted space- time interest points Spatial interest points

Back-projection of points Test running Test cycling Test walking Test exercise

Summary  Points with high variation of image values in space-time are detected  Direct approach for event detection (no tracking needed)  invariant treatment of events at different spatial and temporal scales; invariance w.r.t. camera motion Interest point detection Applications  Classified space-time features provide a compact representation of video information  Interpretation of scenes with complex, non-stationary backgrounds Future work: contrast and orientation invariant descriptors, large-scale action recognition experiments, integration of multi-local constraints, on-line implementation.