Human Activity Recognition at Mid and Near Range Ram Nevatia University of Southern California Based on work of several collaborators: F. Lv, P. Natarajan,

Human Activity Recognition at Mid and Near Range Ram Nevatia University of Southern California Based on work of several collaborators: F. Lv, P. Natarajan, S. Lee, C. Huang International Workshop on Video 2009 May 26, 2009

Activity Recognition: Motivation  Is the key content of a video (along with scene description)  Useful for  Monitoring (alerts)  Indexing (forensic, deep analysis, entertainment…)  HCI ..

Activity Recognition: Goals  Goal is not just to give a name, but also a description (not just the verb but a sentence) Who, what, when, where, why etc…? Some of these inferences require object recognition in addition to “action” recognition  Actor, object, instrument…. Context and story understanding is important to infer intent

Action as Change of State  A change in state, is given by some function, say f (s, s’, t), Example: walking changes position of the walker  An event can also be defined over an interval where some properties of f are constant (or within a certain range) Example: walking at a constant speed or in the same direction  Recognition methods require some estimate of the state, such as positions or pose of actors, their trajectories and relation to scene objects

Event Composition  Composite Events Compositions of other, simpler events. Composition is usually, but not necessarily, a sequence operation, e.g. getting out of a car, opening a door and entering a building.  Primitive events: those we choose not to decompose, e.g. walking  Primitive events can be recognized directly from observables, by using standard classifiers.  Graphical models, such as HMMs and CRFs are natural tools for recognition of composite events.

Hierarchical Models  Hierarchical structure of events is naturally reflected in hierarchical graphical models

Issues in Activity Recognition  Variations in image/video appearance due to changes in viewpoint, illumination, clothing, style of activity etc.  Inherent ambiguities in 2-D videos  Reliable detection and tracking of objects, especially those directly involved in activities  Temporal segmentation  “Recognition” of novel events

Mid vs Near Range  Mid-range Limbs of human body, particularly the arms, are not distinguishable Common approach is to detect and track moving objects and make inferences based on trajectories  Near-range Hands/arms are visible; activities are defined by pose transitions, not just the position transitions Pose tracking is difficult; top-down methods are commonly used

Mid-Range Example  Example of abandoned luggage detection  Based on trajectory analysis and simple object detection/recognition  Uses a simple Bayesian classifier and logical reasoning about order of sub-events  Tested on PETS and ETISEO data

Tracking in Crowded Environments  Results from CVPR09 paper

Dealing with Track Failures  In crowded environments, track fragmentation is common  Events of interest themselves may cause occlusions, e.g. two (or more) people meeting  Possible event detection can trigger a re- evaluation of the tracks  Meeting event example People must have been separate, then get close to each other and stay together for some time How to distinguish between passing by and meeting? Both may cause tracks to vanish.

Meeting Event Result (Videos) Tracking Result Meeting Event Detection Result

Events requiring fine Pose Tracking  Many events, e.g. gestures, requiring tracking of body pose, not just position  Humans pose has large degrees of freedom > 50 joint angles/positions  Bottom up pose tracking approaches are slow and not robust  Top down approaches attempt to recognize activity and pose simultaneously Note that usually data is not pre-segmented into primitive action segments Closed-world assumption

Activity Recognition w/o Tracking Input sequence ………… 3D body pose ………… check watch punchkickpick upthrow + Action segments

  Viewpoint change & pose ambiguity (with a single camera view) Difficulties   Spatial and temporal variations (style, speed)

Key Poses and Action Nets Key poses are determined by an automatic method that computes large changes in energy; key poses may be shared among different actions

Experiments: Training Set 15 action models 177 key poses 6372 nodes in Action Net

Action Net: Apply constraints 0o0o 10 o …

Experiments: Test Set 50 clips, average length 1165 frames 5 viewpoints 10 actors (5 men, 5 women)

Experiments: Results PMKPMK-NU wo/ Action Net38.4%44.1% w/ Action Net56.7%80.6%

A Video Result extracted blob & ground truth with action net without action net original frame

Working with Natural Environments  Foreground segmentation is difficult Leads to use of lower level features, e.g. edges and optical flow  Key poses are not discriminative enough w/o accurate segmentation; actor position also needs to be inferred We introduce use of continuous pose sequence.  More general graphical models that include Hierarchy Transition probabilities may depend on observations Observations may depend on multiple states Duration models (HMMs imply an exponential decay)

Experiments  Tested the approach on videos of 6 actions- sit-on-ground(SG), standup-from-ground(StG), sit-on-chair(SC), standup-from-ground(StC), pickup(PK), point(P).  Collected instances of these actions around 4 tilt angles and 5 pan angles  A total of 400 instances over all actions with various backgrounds.  We compared the relative importance of shape, flow and duration features with our system (shape+flow+duration).

Results  Combining flow and shape produces a clear improvement.  Bulk of the expense is in computing the flow.

Human Activity Recognition at Mid and Near Range Ram Nevatia University of Southern California Based on work of several collaborators: F. Lv, P. Natarajan,

Similar presentations

Presentation on theme: "Human Activity Recognition at Mid and Near Range Ram Nevatia University of Southern California Based on work of several collaborators: F. Lv, P. Natarajan,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Human Activity Recognition at Mid and Near Range Ram Nevatia University of Southern California Based on work of several collaborators: F. Lv, P. Natarajan,

Similar presentations

Presentation on theme: "Human Activity Recognition at Mid and Near Range Ram Nevatia University of Southern California Based on work of several collaborators: F. Lv, P. Natarajan,"— Presentation transcript:

Similar presentations

About project

Feedback