Antón R. Escobedo cse 252c Behavior Recognition via Sparse Spatio-Temporal Features Piotr Dollár Vincent Rabaud Garrison CottrellSerge Belongie.

Slides:



Advertisements
Similar presentations
Distinctive Image Features from Scale-Invariant Keypoints
Advertisements

Distinctive Image Features from Scale-Invariant Keypoints David Lowe.
DONG XU, MEMBER, IEEE, AND SHIH-FU CHANG, FELLOW, IEEE Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment.
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Interest points CSE P 576 Ali Farhadi Many slides from Steve Seitz, Larry Zitnick.
TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.
Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.
Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Space-time interest points Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer Science KTH (Royal Institute.
Local Descriptors for Spatio-Temporal Recognition
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Robust Moving Object Detection & Categorization using self- improving classifiers Omar Javed, Saad Ali & Mubarak Shah.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
A Study of Approaches for Object Recognition
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Feature matching and tracking Class 5 Read Section 4.1 of course notes Read Shi and Tomasi’s paper on.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.
Feature extraction: Corners and blobs
Scale Invariant Feature Transform (SIFT)
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.
An Introduction to Action Recognition/Detection Sami Benzaid November 17, 2009.
Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe – IJCV 2004 Brien Flewelling CPSC 643 Presentation 1.
ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley.
Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley.
Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.
Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest Tsz-Ho Yu, Tae-Kyun Kim and Roberto Cipolla Machine Intelligence Laboratory,
Computer vision.
Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing.
Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.
IRISA / INRIA Rennes Computational Vision and Active Perception Laboratory (CVAP) KTH (Royal Institute of Technology)
Recognition and Matching based on local invariant features Cordelia Schmid INRIA, Grenoble David Lowe Univ. of British Columbia.
Local invariant features Cordelia Schmid INRIA, Grenoble.
Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
A Statistically Selected Part-Based Probabilistic Model for Object Recognition Zhipeng Zhao, Ahmed Elgammal Department of Computer Science, Rutgers, The.
Spatio-temporal constraints for recognizing 3D objects in videos Nicoletta Noceti Università degli Studi di Genova.
Pedestrian Detection and Localization
Supervised Learning of Edges and Object Boundaries Piotr Dollár Zhuowen Tu Serge Belongie.
MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.
Lecture 7: Features Part 2 CS4670/5670: Computer Vision Noah Snavely.
Local invariant features Cordelia Schmid INRIA, Grenoble.
A DISTRIBUTION BASED VIDEO REPRESENTATION FOR HUMAN ACTION RECOGNITION Yan Song, Sheng Tang, Yan-Tao Zheng, Tat-Seng Chua, Yongdong Zhang, Shouxun Lin.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Feature extraction: Corners and blobs. Why extract features? Motivation: panorama stitching We have two images – how do we combine them?
Local features: detection and description
CS654: Digital Image Analysis
3D Face Recognition Using Range Images Literature Survey Joonsoo Lee 3/10/05.
SIFT Scale-Invariant Feature Transform David Lowe
Interest Points EE/CSE 576 Linda Shapiro.
Presented by David Lee 3/20/2006
Distinctive Image Features from Scale-Invariant Keypoints
CS 4501: Introduction to Computer Vision Sparse Feature Detectors: Harris Corner, Difference of Gaussian Connelly Barnes Slides from Jason Lawrence, Fei.
Bag-of-Visual-Words Based Feature Extraction
TP12 - Local features: detection and description
Learning Mid-Level Features For Recognition
Paper Presentation: Shape and Matching
Feature description and matching
Outline Multilinear Analysis
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
Human Activity Analysis
KFC: Keypoints, Features and Correspondences
SIFT keypoint detection
Lecture VI: Corner and Blob Detection
Presented by Xu Miao April 20, 2005
Recognition and Matching based on local invariant features
Presentation transcript:

Antón R. Escobedo cse 252c Behavior Recognition via Sparse Spatio-Temporal Features Piotr Dollár Vincent Rabaud Garrison CottrellSerge Belongie

Outline I. Introduction II. Related Work III. Algorithm IV. Experiments V. Current Work

Part I: Introduction Motivation:  Sparse feature points extended to the spatio-temporal case

Part I: Introduction Motivation:  Behavior detection from video sequences Behavior recognition faces similar issues to those seen in object recognition. Posture, appearance, size, image clutter, variations in the environment such as illumination. Imprecise nature of feature detectors.

Part I: Introduction Inspiration: Sparsely detected features in object recognition.  Fergus et al. “Object Class Recognition by Unsupervised Scale-Invariant Learning”  Agarwal et al. “Learning to Detect Objects in Images via a Sparse, Part-Based Representation”  Leibe, Schiele “Scale invariant Object Categorization Using a Scale-Adaptive Mean- Shift Search”

Part I: Introduction Robustness Very good results Advantages of Sparse Features example from:

object recognition trainclassify example from: [motorcycle detected] [parts][object model] [training data & features]

Spatial-Temporal Features Short, local video sequence that can be used to describe a behavior. Behavior recognition based on features detected and compared with a rich set of features. 3 rd dimension  Temporal, not spatial

Part l: Introduction Will Show:  Direct 3D counterparts to feature detectors are inadequate.  Development and testing of descriptors used in this paper.  A dictionary of descriptors is all that is needed to recognize behavior. Proven on human behavior, facial expressions and mouse behavior dataset

Part II: Related Work Articulated models Efros et al.  30 pixel man Schuldt et al.  Spatio-Temporal features Images from: ftp://ftp.nada.kth.se/CVAP/users/laptev/icpr04actions.pdf

Part III: Proposed Algorithm Feature Detection Cuboids Cuboid Prototypes Behavior Descriptors

Feature Detection (spatial domain) Corner Detectors Laplacian of Gaussian (SIFT) Extensions to Spatio-Temporal Case  Stacks of images denoted by: I(x,y,t)  Detected features also have temporal extent.

Feature Detection Harris in 3D  Spatio-Temporal corners: Regions where the local gradient vectors point in orthogonal directions for x,y and t.  Why this doesn’t work Develop an Alternative detector  Err on the side of too many features  Why this works

Feature Detection Response Function  Spatial Filter: Gaussian  Temporal Filter: Gabor

Feature Detection What this implies:  Any region with spatially distinguishing characteristics undergoing a complex motion will induce a strong response.  Pure translation will not induce a response.

Cuboids Extracted at each interest point  ~6x scale at which detected Descriptor: Feature Vector  Transformations Applied Normalize Pixel Values Brightness Gradient Optical Flow  Feature Vector from local histograms

Cuboid Descriptor Flattened Gradient vector gave best results Generalization of PCA-SIFT descriptor

Cuboid Prototypes Unlimited cuboids are possible, but only a limited number of types exist. Use k-means algorithm to cluster extracted cuboids together by type.

Behavior Descriptor Behavior descriptor: histogram of cuboid types  Simple.  Distance measured using chi-squared distance.  Can easily be used in classification framework.  Discards spatial layout and temporal order of cuboids. Assumption is that cuboid types present capture all information about behavior.

spatio-temporal features train classify [cuboid prototypes][behavior model] [training data & features] [grooming detected]

boxing domain 1: human activity boxingclapping? clips from Schüldt et al. training examples:test example:

disgust domain 2: facial expressions disgusthappiness? training examples:test example:

eating domain 3: mouse behavior eatingexploring test example:training examples: ?

near vs. medium field near fieldmedium fieldfar field Efros et al. 2003

performance evaluation compared 4 methods:  CUBOIDS – our approach  CUBOIDS+HARRIS – our approach using Laptev’s 3D corner detector  ZMI – Zelnik-Manor & Irani 2001 Statistical measure of gross activity using histograms of spatio- temporal gradients gives activity descriptor  EFROS – Efros et al Normalized cross correlation of optical flow gives distance measure between activities analysis in terms of relative performance not all algorithms are always applied  format of data, computational complexity

facial expressions I

facial expressions II confusion matrices, row normalized

mouse behavior full database results

mouse behavior pilot study

human activity

parameter settings k, 50 < k < 500, number of clusters n, 10 ≤ n ≤ 200, number of cuboids per clip ω, 0 < ω < 1 overlap allowed between cuboids σ, 2 < σ < 9, spatial scale of the detector Base settings used were approximately:  k = 250, n = 30, ω =.9, and σ = 2

summary of results achieved good performance in all domains [typically 10-20% error] achieved best performance of algorithms tested in all domains comparison to domain specific algorithms necessary

Current Work Niebles et al.  “Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words” BMVC, 2006 Recognizes multiple activities in a single video sequence Using the same interest point detector, cluster cuboids into a set of video codewords, then use pLSA graphical model to determine probability distributions % accuracy vs 81.17% for Dollár et al.  However, learning is unsupervised for Niebles et al.

Questions? Acknowledgements: Piotr Dollar