Recognizing Human Figures and Actions Greg Mori Simon Fraser University.

Slides:

Advertisements

Similar presentations

Shape Context and Chamfer Matching in Cluttered Scenes

Advertisements

Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.

Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.

Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros.

Actions in video Monday, April 25 Kristen Grauman UT-Austin.

- Recovering Human Body Configurations: Combining Segmentation and Recognition (CVPR’04) Greg Mori, Xiaofeng Ren, Alexei A. Efros and Jitendra Malik -

Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.

Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.

Activity Recognition Computer Vision CS 143, Brown James Hays 11/21/11 With slides by Derek Hoiem and Kristen Grauman.

3/5/2002Phillip Saltzman Video Motion Capture Christoph Bregler Jitendra Malik UC Berkley 1997.

Complex Feature Recognition: A Bayesian Approach for Learning to Recognize Objects by Paul A. Viola Presented By: Emrah Ceyhan Divin Proothi Sherwin Shaidee.

Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.

Silhouette Lookup for Automatic Pose Tracking N ICK H OWE.

Transferable Dictionary Pair based Cross-view Action Recognition Lin Hong.

Robust Object Tracking via Sparsity-based Collaborative Model

Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.

Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA

Detecting Pedestrians by Learning Shapelet Features

Student: Yao-Sheng Wang Advisor: Prof. Sheng-Jyh Wang ARTICULATED HUMAN DETECTION 1 Department of Electronics Engineering National Chiao Tung University.

Local Descriptors for Spatio-Temporal Recognition

Computer Vision Group University of California Berkeley Estimating Human Body Configurations using Shape Context Matching Greg Mori and Jitendra Malik.

Computer Vision Group University of California Berkeley Shape Matching and Object Recognition using Shape Contexts Jitendra Malik U.C. Berkeley (joint.

Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.

Event prediction CS 590v. Applications Video search Surveillance – Detecting suspicious activities – Illegally parked cars – Abandoned bags Intelligent.

Natan Jacobson, Yen-Lin Lee, Vijay Mahadevan, Nuno Vasconcelos, Truong Q. Nguyen IEEE, ICME 2010.

Video Texture : Computational Photography Alexei Efros, CMU, Fall 2006 © A.A. Efros.

A Study of Approaches for Object Recognition

CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.

CVR05 University of California Berkeley 1 Familiar Configuration Enables Figure/Ground Assignment in Natural Scenes Xiaofeng Ren, Charless Fowlkes, Jitendra.

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.

Recognizing and Tracking Human Action Josephine Sullivan and Stefan Carlsson.

MSRI University of California Berkeley 1 Recovering Human Body Configurations using Pairwise Constraints between Parts Xiaofeng Ren, Alex Berg, Jitendra.

Computer Vision Group University of California Berkeley Recognizing Objects in Adversarial Clutter: Breaking a Visual CAPTCHA Greg Mori and Jitendra Malik.

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.

ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley.

Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley.

A Scale and Rotation Invariant Approach to Tracking Human Body Part Regions in Videos Yihang BoHao Jiang Institute of Automation, CAS Boston College.

Bag of Video-Words Video Representation

The Three R’s of Vision Jitendra Malik.

Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing.

TP15 - Tracking Computer Vision, FCUP, 2013 Miguel Coimbra Slides by Prof. Kristen Grauman.

EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.

Human-Computer Interaction Human-Computer Interaction Tracking Hanyang University Jong-Il Park.

Player Action Recognition in Broadcast Tennis Video with Applications to Semantic Analysis of Sport Game Guangyu Zhu, Changsheng Xu Qingming Huang, Wen.

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

Activity Detection Seminar Sivan Edri.  This capability of the human vision system argues for recognition of movement directly from the motion itself,

Detecting Pedestrians Using Patterns of Motion and Appearance Paul Viola Microsoft Research Irfan Ullah Dept. of Info. and Comm. Engr. Myongji University.

Spatio-temporal constraints for recognizing 3D objects in videos Nicoletta Noceti Università degli Studi di Genova.

#MOTION ESTIMATION AND OCCLUSION DETECTION #BLURRED VIDEO WITH LAYERS

Recognizing Action at a Distance Alexei A. Efros, Alexander C. Berg, Greg Mori, Jitendra Malik Computer Science Division, UC Berkeley Presented by Pundik.

Tracking People by Learning Their Appearance Deva Ramanan David A. Forsuth Andrew Zisserman.

Vision-based human motion analysis: An overview Computer Vision and Image Understanding(2007)

Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.

Efficient Visual Object Tracking with Online Nearest Neighbor Classifier Many slides adapt from Steve Gu.

Segmentation of Vehicles in Traffic Video Tun-Yu Chiang Wilson Lau.

Human Activity Recognition at Mid and Near Range Ram Nevatia University of Southern California Based on work of several collaborators: F. Lv, P. Natarajan,

CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.

Data Mining for Surveillance Applications Suspicious Event Detection Dr. Bhavani Thuraisingham.

Object Recognition by Discriminative Combinations of Line Segments and Ellipses Alex Chia ^˚ Susanto Rahardja ^ Deepu Rajan ˚ Maylor Leung ˚ ^ Institute.

Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

Nonparametric Semantic Segmentation

Paper Presentation: Shape and Matching

Real-Time Human Pose Recognition in Parts from Single Depth Image

Identifying Human-Object Interaction in Range and Video Data

Brief Review of Recognition + Context

KFC: Keypoints, Features and Correspondences

Fast Forward, Part II Multi-view Geometry Stereo Ego-Motion

Presentation transcript:

Recognizing Human Figures and Actions Greg Mori Simon Fraser University

Goal Action recognition –Where are the people? –What are they doing? Applications –Image understanding, image retrieval and search –HCI –Surveillance –Computer Graphics

3-pixel man Blob tracking 300-pixel man Find and track limbs Far field Near field Medium field 30-pixel man Coarse-level actions

Outline Human figures in motion –Action Recognition Localizing joint positions –Exemplar-based approach –Parts-based approach Motion Synthesis –Novel graphics application

Appearance vs. Motion Jackson Pollock Number 21 (detail)

Action Recognition Recognize human actions at a distance –Low resolution, noisy data –Moving camera, occlusions –Wide range of actions (including non-periodic)

Our Approach Motion-based approach –Classify a novel motion by finding the most similar motion from the training set –Use large amounts of data (“non-parametric”) Related Work –Periodicity analysis Polana & Nelson; Seitz & Dyer; Bobick et al; Cutler & Davis; Collins et al. –Model-free Temporal Templates [Bobick & Davis] Orientation histograms [Freeman et al; Zelnik & Irani] Using MoCap data [Zhao & Nevatia, Ramanan & Forsyth]

Gathering action data Tracking –Simple correlation-based tracker

Figure-centric Representation Stabilized spatio-temporal volume –No translation information –All motion caused by person’s limbs Good news: indifferent to camera motion Bad news: hard! Good test to see if actions, not just translation, are being captured

input sequence Remembrance of Things Past “Explain” novel motion sequence by matching to previously seen video clips –For each frame, match based on some temporal extent Challenge: how to compare motions? run walk left swing walk right jog database

How to describe motion? Appearance –Not preserved across different clothing Gradients (spatial, temporal) –same (e.g. contrast reversal) Edges –Unreliable at this scale Optical flow –Explicitly encodes motion –Least affected by appearance –…but too noisy

Spatial Motion Descriptor Image frame Optical flow F x,y yx FF,  yyxx FFFF,,, blurred  yyxx FFFF,,,

Spatio-temporal Motion Descriptor t … … … …  Sequence A Sequence B Temporal window w B frame-to-frame similarity matrix A motion-to-motion similarity matrix A B I matrix w w blurry I w w

Soccer Real actions, moving camera, poor video 8 classes of actions 4500 frames of labeled data 1-nearest-neighbor classifier

Classifying Ballet Actions 16 Actions; total frames; 51-frame motion descriptor. Men used to classify women and vice versa.

Classifying Tennis Actions 6 actions; 4600 frames; 7-frame motion descriptor Woman player used as training, man as testing.

Classifying Tennis Red bars show classification results

Outline Human figures in motion –Action Recognition Localizing joint positions –Exemplar-based approach –Parts-based approach Motion Synthesis –Novel graphics application

Human Figures in Still Images Detection of humans is possible for stereotypical poses –Standing –Walking –(Viola et al., Poggio et al.) But we want to do more –Wider variety of poses –Localize joint positions

Problem

Shape Matching For Finding People Database of Exemplars

Shape Contexts Deformable template approach –Shapes represented as a collection of edge points Two stages –Fast pruning Quick tests to construct a shortlist of candidate objects Database of known objects could be large –Detailed matching Perform computationally expensive comparisons on only the few shapes in the shortlist Publications –Mori et al., CVPR 2001 –Mori and Malik, CVPR 2003 Featured in New York Times Science section

Results: Tracking by Repeated Finding

Multiple Exemplars Parts-based approach –Use a combination of keypoints or limbs from different exemplars –Reduces the number of exemplars needed Compute a matching cost for each limb from every exemplar Compute pairwise “consistency” costs for neighbouring limbs Use dynamic programming to find best K configurations

Combining Exemplars

Finding People (II): Parts-based Approach Bottom-up Segmentation as preprocessing Detect half-limbs and torsos Assemble partial configurations –Prune using global constraints Extend partial configurations to full human figures

Segmentation for Recognition Window-scanning (e.g. face detection) –O(N M S) SUPERPIXELS SEGMENTS Segmentation –Support masks for computation of features –Efficiency –Scalability –600K pixels  300 superpixels, 50 segments –O(N) + O(log(M))

Limb/Torso Detectors Learn limb and torso detectors from hand-labeled data Cues: –Contour Average edge strength on boundary –Shape Similarity to rectangle –Shading x,y gradients, blurred –Focus Ratio of high to low frequency energies

Assembling Partial Configurations Combinatorial search over sets of limbs and torsos –3 half-limbs plus a torso configurations Prune using global constraints –Proximity –Relative widths –Maximum lengths –Symmetry in colour Complete half-limbs –2 or 3-limbed people Sort partial configurations –Use limb, torso, and segmentation scores Extend final limbs of best configurations

Results

Rank 3

Outline Human figures in motion –Action Recognition Localizing joint positions –Exemplar-based approach –Parts-based approach Motion Synthesis –Novel graphics application

“Do as I Do” Motion Synthesis Matching two things: –Motion similarity across sequences –Appearance similarity within sequence Dynamic Programming input sequence synthetic sequence

Smoothness for Synthesis is similarity between input and target frames is appearance similarity within target frames For input frames {i}, find best target frames { } by maximizing following cost function: Optimize using dynamic programming: –N frames in input sequence –M target frames in database

“Do as I Do” Synthesis Target Frames Input Sequence Result 3400 Frames

“Do as I Say” Synthesis Synthesize given action labels –e.g. video game control run walk left swing walk right jog synthetic sequence run walk left swing walk right jog

“Do as I Say” Red box shows when constraint is applied

Frame 9½ Putting It All Together Can we do a better job of splicing clips together? Frame 9Frame 10YES… if we can find the joints!

Morphed Transitions

8 Transitions

Morphed Transitions

3 Transitions

Actor Replacement Rendering new character into existing footage Algorithm –Track original character –Find matches from new character –Erase original character –Render in new character Need to worry about occlusions

Show the impressive video

Future Directions Much remains to be done! Action Recognition –Using joint positions, shape: the “morpho-kinetics” of action recognition –Better models of activities Detecting and localizing figures –Combining top-down exemplar methods with bottom-up segmentation methods –Exploiting temporal cues

Acknowledgements References –Mori, Belongie, and Malik, “Shape Contexts Enable Efficient Retrieval of Similar Shapes”, CVPR 2001 –Mori and Malik, “Estimating Human Body Configurations using Shape Context Matching”, ECCV 2002 –Efros, Berg, Mori, and Malik, “Recognizing Action at A Distance” ICCV 2003 –Mori and Malik, “Recognizing Objects in Adversarial Clutter: Breaking a Visual CAPTCHA”, CVPR 2003 –Mori, Ren, Efros, Malik, “Recovering Human Body Configurations: Combining Segmentation and Recognition” CVPR 2004 Thank you!