Human Activity Recognition at Mid and Near Range Ram Nevatia University of Southern California Based on work of several collaborators: F. Lv, P. Natarajan,

Slides:



Advertisements
Similar presentations
Activity Recognition Ram Nevatia Presents work of
Advertisements

By: Ryan Wendel.  It is an ongoing analysis in which videos are analyzed frame by frame  Most of the video recognition is pulled from 3-D graphic engines.
Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros.
Actions in video Monday, April 25 Kristen Grauman UT-Austin.
- Recovering Human Body Configurations: Combining Segmentation and Recognition (CVPR’04) Greg Mori, Xiaofeng Ren, Alexei A. Efros and Jitendra Malik -
Patch to the Future: Unsupervised Visual Prediction
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.
3D Human Body Pose Estimation from Monocular Video Moin Nabi Computer Vision Group Institute for Research in Fundamental Sciences (IPM)
Computer Vision Laboratory 1 Unrestricted Recognition of 3-D Objects Using Multi-Level Triplet Invariants Gösta Granlund and Anders Moe Computer Vision.
International Conference on Automatic Face and Gesture Recognition, 2006 A Layered Deformable Model for Gait Analysis Haiping Lu, K.N. Plataniotis and.
Complex Feature Recognition: A Bayesian Approach for Learning to Recognize Objects by Paul A. Viola Presented By: Emrah Ceyhan Divin Proothi Sherwin Shaidee.
Yuanlu Xu Advisor: Prof. Liang Lin Person Re-identification by Matching Compositional Template with Cluster Sampling.
Silhouette Lookup for Automatic Pose Tracking N ICK H OWE.
Proportion Priors for Image Sequence Segmentation Claudia Nieuwenhuis, etc. ICCV 2013 Oral.
Visual Event Detection & Recognition Filiz Bunyak Ersoy, Ph.D. student Smart Engineering Systems Lab.
Foreground Modeling The Shape of Things that Came Nathan Jacobs Advisor: Robert Pless Computer Science Washington University in St. Louis.
Tracking Multiple Occluding People by Localizing on Multiple Scene Planes Professor :王聖智 教授 Student :周節.
Robust Object Tracking via Sparsity-based Collaborative Model
Benny Neeman Leon Ribinik 27/01/2009. Our Goal – People Tracking We would like to be able to track and distinguish the different people in a movie.
Computer and Robot Vision I
Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002.
Personal Driving Diary: Constructing a Video Archive of Everyday Driving Events IEEE workshop on Motion and Video Computing ( WMVC) 2011 IEEE Workshop.
RECOGNIZING FACIAL EXPRESSIONS THROUGH TRACKING Salih Burak Gokturk.
1 Face Tracking in Videos Gaurav Aggarwal, Ashok Veeraraghavan, Rama Chellappa.
Rodent Behavior Analysis Tom Henderson Vision Based Behavior Analysis Universitaet Karlsruhe (TH) 12 November /9.
Domenico Bloisi, Luca Iocchi, Dorothy Monekosso, Paolo Remagnino
Tracking Video Objects in Cluttered Background
1 Motion in 2D image sequences Definitely used in human vision Object detection and tracking Navigation and obstacle avoidance Analysis of actions or.
Computer vision: models, learning and inference Chapter 6 Learning and Inference in Vision.
Learning to classify the visual dynamics of a scene Nicoletta Noceti Università degli Studi di Genova Corso di Dottorato.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing.
EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.
Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS.
Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.
Perception Introduction Pattern Recognition Image Formation
Learning and Recognizing Human Dynamics in Video Sequences Christoph Bregler Alvina Goh Reading group: 07/06/06.
Recognizing Human Figures and Actions Greg Mori Simon Fraser University.
Introduction EE 520: Image Analysis & Computer Vision.
An Information Fusion Approach for Multiview Feature Tracking Esra Ataer-Cansizoglu and Margrit Betke ) Image and.
Spatio-temporal constraints for recognizing 3D objects in videos Nicoletta Noceti Università degli Studi di Genova.
Tracking People by Learning Their Appearance Deva Ramanan David A. Forsuth Andrew Zisserman.
Computer Vision Michael Isard and Dimitris Metaxas.
Vision-based human motion analysis: An overview Computer Vision and Image Understanding(2007)
Learning the Appearance and Motion of People in Video Hedvig Sidenbladh, KTH Michael Black, Brown University.
December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.
Vehicle Segmentation and Tracking From a Low-Angle Off-Axis Camera Neeraj K. Kanhere Committee members Dr. Stanley Birchfield Dr. Robert Schalkoff Dr.
Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.
Rick Parent - CIS681 Motion Analysis – Human Figure Processing video to extract information of objects Motion tracking Pose reconstruction Motion and subject.
Inference in generative models of images and video John Winn MSR Cambridge May 2004.
Vision Overview  Like all AI: in its infancy  Many methods which work well in specific applications  No universal solution  Classic problem: Recognition.
By Naveen kumar Badam. Contents INTRODUCTION ARCHITECTURE OF THE PROPOSED MODEL MODULES INVOLVED IN THE MODEL FUTURE WORKS CONCLUSION.
Colour and Texture. Extract 3-D information Using Vision Extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition.
Data Mining for Surveillance Applications Suspicious Event Detection Dr. Bhavani Thuraisingham.
Learning Image Statistics for Bayesian Tracking Hedvig Sidenbladh KTH, Sweden Michael Black Brown University, RI, USA
REAL-TIME DETECTOR FOR UNUSUAL BEHAVIOR
Computer vision: models, learning and inference
Gait Recognition Gökhan ŞENGÜL.
A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology
Machine Learning Basics
Video-based human motion recognition using 3D mocap data
Context-Aware Modeling and Recognition of Activities in Video
Identifying Human-Object Interaction in Range and Video Data
Vehicle Segmentation and Tracking from a Low-Angle Off-Axis Camera
Kan Liu, Bingpeng Ma, Wei Zhang, Rui Huang
Human Activity Analysis
Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu
Human-object interaction
Presentation transcript:

Human Activity Recognition at Mid and Near Range Ram Nevatia University of Southern California Based on work of several collaborators: F. Lv, P. Natarajan, S. Lee, C. Huang International Workshop on Video 2009 May 26, 2009

Activity Recognition: Motivation  Is the key content of a video (along with scene description)  Useful for  Monitoring (alerts)  Indexing (forensic, deep analysis, entertainment…)  HCI ..

Activity Recognition: Goals  Goal is not just to give a name, but also a description (not just the verb but a sentence) Who, what, when, where, why etc…? Some of these inferences require object recognition in addition to “action” recognition  Actor, object, instrument…. Context and story understanding is important to infer intent

Action as Change of State  A change in state, is given by some function, say f (s, s’, t), Example: walking changes position of the walker  An event can also be defined over an interval where some properties of f are constant (or within a certain range) Example: walking at a constant speed or in the same direction  Recognition methods require some estimate of the state, such as positions or pose of actors, their trajectories and relation to scene objects

Event Composition  Composite Events Compositions of other, simpler events. Composition is usually, but not necessarily, a sequence operation, e.g. getting out of a car, opening a door and entering a building.  Primitive events: those we choose not to decompose, e.g. walking  Primitive events can be recognized directly from observables, by using standard classifiers.  Graphical models, such as HMMs and CRFs are natural tools for recognition of composite events.

Hierarchical Models  Hierarchical structure of events is naturally reflected in hierarchical graphical models

Issues in Activity Recognition  Variations in image/video appearance due to changes in viewpoint, illumination, clothing, style of activity etc.  Inherent ambiguities in 2-D videos  Reliable detection and tracking of objects, especially those directly involved in activities  Temporal segmentation  “Recognition” of novel events

Mid vs Near Range  Mid-range Limbs of human body, particularly the arms, are not distinguishable Common approach is to detect and track moving objects and make inferences based on trajectories  Near-range Hands/arms are visible; activities are defined by pose transitions, not just the position transitions Pose tracking is difficult; top-down methods are commonly used

Mid-Range Example  Example of abandoned luggage detection  Based on trajectory analysis and simple object detection/recognition  Uses a simple Bayesian classifier and logical reasoning about order of sub-events  Tested on PETS and ETISEO data

Tracking in Crowded Environments  Results from CVPR09 paper

Dealing with Track Failures  In crowded environments, track fragmentation is common  Events of interest themselves may cause occlusions, e.g. two (or more) people meeting  Possible event detection can trigger a re- evaluation of the tracks  Meeting event example People must have been separate, then get close to each other and stay together for some time How to distinguish between passing by and meeting? Both may cause tracks to vanish.

Meeting Event Result (Videos) Tracking Result Meeting Event Detection Result

Events requiring fine Pose Tracking  Many events, e.g. gestures, requiring tracking of body pose, not just position  Humans pose has large degrees of freedom > 50 joint angles/positions  Bottom up pose tracking approaches are slow and not robust  Top down approaches attempt to recognize activity and pose simultaneously Note that usually data is not pre-segmented into primitive action segments Closed-world assumption

Activity Recognition w/o Tracking Input sequence ………… 3D body pose ………… check watch punchkickpick upthrow + Action segments

  Viewpoint change & pose ambiguity (with a single camera view) Difficulties   Spatial and temporal variations (style, speed)

Key Poses and Action Nets Key poses are determined by an automatic method that computes large changes in energy; key poses may be shared among different actions

Experiments: Training Set 15 action models 177 key poses 6372 nodes in Action Net

Action Net: Apply constraints 0o0o 10 o …

Experiments: Test Set 50 clips, average length 1165 frames 5 viewpoints 10 actors (5 men, 5 women)

Experiments: Results PMKPMK-NU wo/ Action Net38.4%44.1% w/ Action Net56.7%80.6%

A Video Result extracted blob & ground truth with action net without action net original frame

Working with Natural Environments  Foreground segmentation is difficult Leads to use of lower level features, e.g. edges and optical flow  Key poses are not discriminative enough w/o accurate segmentation; actor position also needs to be inferred We introduce use of continuous pose sequence.  More general graphical models that include Hierarchy Transition probabilities may depend on observations Observations may depend on multiple states Duration models (HMMs imply an exponential decay)

Experiments  Tested the approach on videos of 6 actions- sit-on-ground(SG), standup-from-ground(StG), sit-on-chair(SC), standup-from-ground(StC), pickup(PK), point(P).  Collected instances of these actions around 4 tilt angles and 5 pan angles  A total of 400 instances over all actions with various backgrounds.  We compared the relative importance of shape, flow and duration features with our system (shape+flow+duration).

Results  Combining flow and shape produces a clear improvement.  Bulk of the expense is in computing the flow.