Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing.

Slides:



Advertisements
Similar presentations
Distinctive Image Features from Scale-Invariant Keypoints
Advertisements

Feature extraction: Corners
Bayesian Decision Theory Case Studies
Antón R. Escobedo cse 252c Behavior Recognition via Sparse Spatio-Temporal Features Piotr Dollár Vincent Rabaud Garrison CottrellSerge Belongie.
Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Change Detection C. Stauffer and W.E.L. Grimson, “Learning patterns of activity using real time tracking,” IEEE Trans. On PAMI, 22(8): , Aug 2000.
Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.
CSE 473/573 Computer Vision and Image Processing (CVIP)
Actions in video Monday, April 25 Kristen Grauman UT-Austin.
Chapter 8 Content-Based Image Retrieval. Query By Keyword: Some textual attributes (keywords) should be maintained for each image. The image can be indexed.
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Space-time interest points Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer Science KTH (Royal Institute.
The Free Safety Problem Using Gaze Estimation as a Meaningful Input to a Homing Task Albert Goldfain CSE 668: Animate Vision Principles Final Project Presentation.
Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.
Robust Object Tracking via Sparsity-based Collaborative Model
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Local Descriptors for Spatio-Temporal Recognition
Event prediction CS 590v. Applications Video search Surveillance – Detecting suspicious activities – Illegally parked cars – Abandoned bags Intelligent.
Video Texture : Computational Photography Alexei Efros, CMU, Fall 2006 © A.A. Efros.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Stockman MSU Fall Computing Motion from Images Chapter 9 of S&S plus otherwork.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.
The Recognition of Human Movement Using Temporal Templates Liat Koren.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction.
Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.
1 Activity and Motion Detection in Videos Longin Jan Latecki and Roland Miezianko, Temple University Dragoljub Pokrajac, Delaware State University Dover,
ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley.
Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley.
IRISA / INRIA Rennes Computational Vision and Active Perception Laboratory (CVAP) KTH (Royal Institute of Technology)
Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)
Olga Zoidi, Anastasios Tefas, Member, IEEE Ioannis Pitas, Fellow, IEEE
Player Action Recognition in Broadcast Tennis Video with Applications to Semantic Analysis of Sport Game Guangyu Zhu, Changsheng Xu Qingming Huang, Wen.
Characterizing activity in video shots based on salient points Nicolas Moënne-Loccoz Viper group Computer vision & multimedia laboratory University of.
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
Recognizing Human Figures and Actions Greg Mori Simon Fraser University.
Activity Detection Seminar Sivan Edri.  This capability of the human vision system argues for recognition of movement directly from the motion itself,
Detecting Pedestrians Using Patterns of Motion and Appearance Paul Viola Microsoft Research Irfan Ullah Dept. of Info. and Comm. Engr. Myongji University.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
Recognizing Action at a Distance Alexei A. Efros, Alexander C. Berg, Greg Mori, Jitendra Malik Computer Science Division, UC Berkeley Presented by Pundik.
December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.
CSCE 643 Computer Vision: Extractions of Image Features Jinxiang Chai.
Lecture 7: Features Part 2 CS4670/5670: Computer Vision Noah Snavely.
CSE 185 Introduction to Computer Vision Face Recognition.
Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,
Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06.
Content-Based Image Retrieval QBIC Homepage The State Hermitage Museum db2www/qbicSearch.mac/qbic?selLang=English.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
CS654: Digital Image Analysis
Object Recognition by Discriminative Combinations of Line Segments and Ellipses Alex Chia ^˚ Susanto Rahardja ^ Deepu Rajan ˚ Maylor Leung ˚ ^ Institute.
Design & Implementation of a Gesture Recognition System Isaac Gerg B.S. Computer Engineering The Pennsylvania State University.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Bayesian Decision Theory Case Studies CS479/679 Pattern Recognition Dr. George Bebis.
Motion and optical flow
CS262: Computer Vision Lect 09: SIFT Descriptors
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
CS 4501: Introduction to Computer Vision Sparse Feature Detectors: Harris Corner, Difference of Gaussian Connelly Barnes Slides from Jason Lawrence, Fei.
Data Driven Attributes for Action Detection
A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology
Human Activity Analysis
Creating Data Representations
Papers 15/08.
Video Google: Text Retrieval Approach to Object Matching in Videos
Fourier Transform of Boundaries
Lecture VI: Corner and Blob Detection
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing Action at a Distance (Efros et al. 2003)

What is an Action? Action: Atomic motion(s) that can be unambiguously distinguished and usually has a semantic association (e.g. sitting down, running). An activity is composed of several actions performed in succession (e.g. dining, meeting a person). Event is a combination of activities (e.g. football match, traffic accident).

Action Recognition Previously o action recognition is part of articulated tracking problem o or generalized tracking problem for directly detecting (activities/events) Novelty o direct recognition of short time motion segments o new feature descriptors  motion history images  motion energy images  Efros' features

Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick-Davis 2001) Recognizing Action at a Distance (Efros et al. 2003)

Motivation

Goal Action: Motion over time Create a view-specific representation of action Construct a vector-image suitable for matching against other instances of action

Motion Energy Images D(x,y,t): Binary image sequence indicating motion locations

Motion Energy Images

Motion History Images Descriptor: Build a 2-component vector image by combining MEI and MH Images

Matching Compute the 7 Hu moments Model the 7 moments each action class with a Gaussian distribution (diagonal covariance) Given a new action instance: measure the Mahalanobis distance to all classes. Pick the nearest one.

Image Moments Translation Invariant Moments

Scale Invariant Moment 7 Hu Moments

Results Only the left (30 dg) camera as input and matches against all 7 views of all 18 moves (126 total). Metric: a pooled independent Mahalanobis distance using a diagonal covariance matrix to accommodate variations in magnitude of the moments.

Results Two camera The minimum sum of Mahalanobis distances between the two input templates and two stored views of an action that have the correct angular difference between them (in this case 90) The assumption: we know the approximate angular relationship between the cameras.

Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick-Davis 2001) Recognizing Action at a Distance (Efros et al. 2003)

Recognize medium-field human actions Humans few pixels tall Noisy video The Goal

1.Track and stabilize the human figure o Simple normalized-correlation based tracker 2.Compute pixelwise optical flow o On the stabilized space time volume 3.Build the descriptor o More on this later... 4.Find NN System Flow

Descriptor What are good features for motion? Pixel values Spatial image gradients Temporal gradients Problems: Appearance dependent and no directionality information on motion Pixel-wise optical flow Captures motion independent of appearance

Descriptor The key idea is that the channels must be sparse and non-negative

Similarity T: motion length I: frame (size) c: # of channels a,b: motion descriptors for two different sequences

Similarity

Classification Construct similarity matrix as outlined. Convolve with the temporal kernel For each frame of the novel sequence, the maximum score in the corresponding row of this matrix will indicate the best match to the motion descriptor centered at this frame. Classify this frame using a k-nearest-neighbor classifier: find the k best matches from labeled data and take the majority label.

Results Ballet (16 Classes): Clips of motions from an instructional video. Professional dancers, two men and two women. Performing mostly standard ballet moves. Tennis (6 Classes): Two amateur tennis players outdoors (one player test, one player train). Each player was video-taped on different days in different locations with slightly different camera positions. Players about 50 pixels tall. Football (8 Classes): Several minutes of a World Cup football game from an NTSC video tape. Wide angle of the playing field. Substantial camera motion and zoom. About 30-by-30 noisy pixels per human figure.

Results Values on the diagonals: Ballet (K=5, T=51): [ ] Tennis (K=5, T=7): [ ] Football (K=1, T=13): [ ]

Do As I Do Synthesis Given a “target” actor database T and a “driver” actor sequence D, the goal is to create a synthetic sequence S that contains the actor from T performing actions described by D.

Alper Yilmaz; Mubarak Shah, "Actions sketch: a novel action representation," Computer Vision and Pattern Recognition, Extensions to MHI Volumetric Features for Event Recognition in Video Yan Ke, Rahul Sukhtankar, Martial Hebert in ICCV 2007.