Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley.

Slides:



Advertisements
Similar presentations
Video Surveillance E Senior/Feris/Tian 1 Behavior Analysis Rogerio Feris IBM TJ Watson Research Center
Advertisements

Antón R. Escobedo cse 252c Behavior Recognition via Sparse Spatio-Temporal Features Piotr Dollár Vincent Rabaud Garrison CottrellSerge Belongie.
Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.
Actions in video Monday, April 25 Kristen Grauman UT-Austin.
- Recovering Human Body Configurations: Combining Segmentation and Recognition (CVPR’04) Greg Mori, Xiaofeng Ren, Alexei A. Efros and Jitendra Malik -
Kernel-based tracking and video patch replacement Igor Guskov
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Juergen Gall Action Recognition.
Space-time interest points Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer Science KTH (Royal Institute.
Image-based Clothes Animation for Virtual Fitting Zhenglong Zhou, Bo Shu, Shaojie Zhuo, Xiaoming Deng, Ping Tan, Stephen Lin * National University of.
Activity Recognition Computer Vision CS 143, Brown James Hays 11/21/11 With slides by Derek Hoiem and Kristen Grauman.
Character retrieval and annotation in multimedia
Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.
Silhouette Lookup for Automatic Pose Tracking N ICK H OWE.
Robust Object Tracking via Sparsity-based Collaborative Model
Local Descriptors for Spatio-Temporal Recognition
Introduction to Data-driven Animation Jinxiang Chai Computer Science and Engineering Texas A&M University.
Event prediction CS 590v. Applications Video search Surveillance – Detecting suspicious activities – Illegally parked cars – Abandoned bags Intelligent.
Natan Jacobson, Yen-Lin Lee, Vijay Mahadevan, Nuno Vasconcelos, Truong Q. Nguyen IEEE, ICME 2010.
Video Texture : Computational Photography Alexei Efros, CMU, Fall 2006 © A.A. Efros.
A Study of Approaches for Object Recognition
Motion based Correspondence for Distributed 3D tracking of multiple dim objects Ashok Veeraraghavan.
Video: savior, or “more of the same”? : Learning-Based Methods in Vision A. Efros, CMU, Spring 2009 © A.A. Efros.
Recognizing and Tracking Human Action Josephine Sullivan and Stefan Carlsson.
Learning the space of time warping functions for Activity Recognition Function-Space of an Activity Ashok Veeraraghavan Rama Chellappa Amit K. Roy-Chowdhury.
Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction.
Advanced Multimedia Tamara Berg Video & Tracking.
ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley.
Learning to classify the visual dynamics of a scene Nicoletta Noceti Università degli Studi di Genova Corso di Dottorato.
Bag of Video-Words Video Representation
Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing.
Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.
IRISA / INRIA Rennes Computational Vision and Active Perception Laboratory (CVAP) KTH (Royal Institute of Technology)
Video Analysis Mei-Chen Yeh May 29, Outline Video representation Motion Actions in Video.
TP15 - Tracking Computer Vision, FCUP, 2013 Miguel Coimbra Slides by Prof. Kristen Grauman.
Computer Science Department Distribution Fields A Unifying Representation for Low-Level Vision Problems Erik Learned-Miller with Laura Sevilla Lara, Manju.
Video Textures Arno Schödl Richard Szeliski David Salesin Irfan Essa Microsoft Research, Georgia Tech.
Player Action Recognition in Broadcast Tennis Video with Applications to Semantic Analysis of Sport Game Guangyu Zhu, Changsheng Xu Qingming Huang, Wen.
Learning and Recognizing Human Dynamics in Video Sequences Christoph Bregler Alvina Goh Reading group: 07/06/06.
Tzu ming Su Advisor : S.J.Wang MOTION DETAIL PRESERVING OPTICAL FLOW ESTIMATION 2013/1/28 L. Xu, J. Jia, and Y. Matsushita. Motion detail preserving optical.
Periodic Motion Detection via Approximate Sequence Alignment Ivan Laptev*, Serge Belongie**, Patrick Perez* *IRISA/INRIA, Rennes, France **Univ. of California,
Recognizing Human Figures and Actions Greg Mori Simon Fraser University.
Activity Detection Seminar Sivan Edri.  This capability of the human vision system argues for recognition of movement directly from the motion itself,
Gait Recognition Guy Bar-hen Tal Reis. Introduction Gait – is defined as a “manner of walking”. Gait recognition – –is the term typically used to refer.
#MOTION ESTIMATION AND OCCLUSION DETECTION #BLURRED VIDEO WITH LAYERS
Visual motion Many slides adapted from S. Seitz, R. Szeliski, M. Pollefeys.
Recognizing Action at a Distance Alexei A. Efros, Alexander C. Berg, Greg Mori, Jitendra Malik Computer Science Division, UC Berkeley Presented by Pundik.
In Defense of Nearest-Neighbor Based Image Classification Oren Boiman The Weizmann Institute of Science Rehovot, ISRAEL Eli Shechtman Adobe Systems Inc.
1 Data Mining for Surveillance Applications Suspicious Event Detection Dr. Bhavani Thuraisingham April 2006.
Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.
Efficient Visual Object Tracking with Online Nearest Neighbor Classifier Many slides adapt from Steve Gu.
Segmentation of Vehicles in Traffic Video Tun-Yu Chiang Wilson Lau.
Human Activity Recognition at Mid and Near Range Ram Nevatia University of Southern California Based on work of several collaborators: F. Lv, P. Natarajan,
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
Data Mining for Surveillance Applications Suspicious Event Detection Dr. Bhavani Thuraisingham.
Constrained Synthesis of Textural Motion for Animation Shmuel Moradoff Dani Lischinski The Hebrew University of Jerusalem.
Multi-view Synchronization of Human Actions and Dynamic Scenes Emilie Dexter, Patrick Pérez, Ivan Laptev INRIA Rennes - Bretagne Atlantique
Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
A Hierarchical Deep Temporal Model for Group Activity Recognition
REAL-TIME DETECTOR FOR UNUSUAL BEHAVIOR
Data Mining for Surveillance Applications Suspicious Event Detection
Data Driven Attributes for Action Detection
Real-Time Human Pose Recognition in Parts from Single Depth Image
Identifying Human-Object Interaction in Range and Video Data
Data Mining for Surveillance Applications Suspicious Event Detection
Data Mining for Surveillance Applications Suspicious Event Detection
View-Invariant Representation and Recognition of actions
Counting in High-Density Crowd Videos
Presentation transcript:

Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

Looking at People 3-pixel man Blob tracking –vast surveillance literature 300-pixel man Limb tracking –e.g. Yacoob & Black, Rao & Shah, etc. Far fieldNear field

Medium-field Recognition The 30-Pixel Man

Appearance vs. Motion Jackson Pollock Number 21 (detail)

Goals Recognize human actions at a distance –Low resolution, noisy data –Moving camera, occlusions –Wide range of actions (including non-periodic)

Our Approach Motion-based approach –Non-parametric; use large amount of data –Classify a novel motion by finding the most similar motion from the training set Related Work –Periodicity analysis Polana & Nelson; Seitz & Dyer; Bobick et al; Cutler & Davis; Collins et al. –Model-free Temporal Templates [Bobick & Davis] Orientation histograms [Freeman et al; Zelnik & Irani] Using MoCap data [Zhao & Nevatia, Ramanan & Forsyth]

Gathering action data Tracking –Simple correlation-based tracker –User-initialized

Figure-centric Representation Stabilized spatio-temporal volume –No translation information –All motion caused by person’s limbs Good news: indifferent to camera motion Bad news: hard! Good test to see if actions, not just translation, are being captured

input sequence Remembrance of Things Past “Explain” novel motion sequence by matching to previously seen video clips –For each frame, match based on some temporal extent Challenge: how to compare motions? motion analysis run walk left swing walk right jog database

How to describe motion? Appearance –Not preserved across different clothing Gradients (spatial, temporal) –same (e.g. contrast reversal) Edges/Silhouettes –Too unreliable Optical flow –Explicitly encodes motion –Least affected by appearance –…but too noisy

Spatial Motion Descriptor Image frame Optical flow blurred

Spatio-temporal Motion Descriptor t … … … …  Sequence A Sequence B Temporal extent E B frame-to-frame similarity matrix A motion-to-motion similarity matrix A B I matrix E E blurry I E E

Football Actions: matching Input Sequence Matched Frames inputmatched

Football Actions: classification 10 actions; 4500 total frames; 13-frame motion descriptor

Classifying Ballet Actions 16 Actions; total frames; 51-frame motion descriptor. Men used to classify women and vice versa.

Classifying Tennis Actions 6 actions; 4600 frames; 7-frame motion descriptor Woman player used as training, man as testing.

Classifying Tennis Red bars show classification results

Querying the Database input sequence database run walk left swing walk right jog runwalk leftswingwalk rightjog Action Recognition: Joint Positions:

2D Skeleton Transfer We annotate database with 2D joint positions After matching, transfer data to novel sequence –Ajust the match for best fit Input sequence: Transferred 2D skeletons:

3D Skeleton Transfer We populate database with rendered stick figures from 3D Motion Capture data Matching as before, we get 3D joint positions (kind of)! Input sequence: Transferred 3D skeletons:

“Do as I Do” Motion Synthesis Matching two things: –Motion similarity across sequences –Appearance similarity within sequence (like VideoTextures) Dynamic Programming input sequence synthetic sequence

“Do as I Do” Source MotionSource Appearance Result 3400 Frames

“Do as I Say” Synthesis Synthesize given action labels –e.g. video game control run walk left swing walk right jog synthetic sequence run walk left swing walk right jog

“Do as I Say” Red box shows when constraint is applied

Actor Replacement SHOW VIDEO

Conclusions In medium field action is about motion What we propose: –A way of matching motions at coarse scale What we get out: –Action recognition –Skeleton transfer –Synthesis: “Do as I Do” & “Do as I say” What we learned? –A lot to be said for the “little guy”!

Thank You

Smoothness for Synthesis is action similarity between source and target is appearance similarity within target frames For every source frame i, find best target frame by maximizing following cost function: Optimize using dynamic programming

The Database Analogy

Conclusions Action is about motion Purely motion-based descriptor for actions We treat optical flow –Not as measurement of pixel displacement –But as a set of noisy features that are carefully smoothed and aggregated Can handle very poor, noisy data

Cool Video, Attempt II

Comparing motion descriptors t motion-to-motion similarity matrix blurry I … … … … frame-to-frame similarity matrix  I matrix