Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc.

Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc. Thesis Defense

Outline Group Activity Recognition with Context – Structure-level (latent structures) – Feature-level (Action Context descriptor) Experiments Introduction

Activity Recognition Goal Enable computers to analyze and understand human behavior. Answering a phone Kissing

Action vs. Activity Activity: a group of people forming a queue Action: Stand in a queue and facing left

Activity Recognition Activity Recognition is important Activity Recognition is difficult intra-class variation, background clutter, partial occlusion, etc. Surveillance Entertainment Sport HCI

Group Activity Recognition Motivation human actions are rarely performed in isolation, the actions of individuals in a group can serve as context for each other. Goal explore the benefit of contextual information in group activity recognition in challenging real-world applications

Group Activity Recognition Context

Group Activity Recognition Two types of Context Talk … … group-person interaction person-person interaction

Latent Structured Model y h1h1 h2h2 y h x1x1 x 2 x n image action class activity class x0x0 … Activity Action Feature Hidden layer

y h1h1 h2h2 y hnhn x1x1 x 2 x n image action class activity class x0x0 … Latent Structured Model group-person Interaction person-person Interaction Structure-level Feature-level

Difference from Previous Work Group Activity Recognition Previous Work Single-person action recognition Schuldt et al. icpr 04 Relative simple activity recognition Vaswani et al. cvpr 03 Dataset in controlled conditions Our work Group activity recognition in realistic videos Two new types of contextual information A unified framework

Difference from Previous Work Latent Structured Models Our work latent structure for the hidden layer, automatically infer it during learning and inference. Previous work a pre-defined structure for the hidden layer, e.g. tree (HCRF) ( Quattoni et al. pami 07, Felzenszwalb et al. cvpr 08)

y h1h1 h2h2 y hnhn x1x1 x 2 x n image action class activity class x0x0 … Structure-level Approach person-person Interaction Structure-level Feature-level

Structure-level Approach Latent Structure Queue ? Talk

Model Formulation y h1h1 h2h2 y hnhn x1x1 x 2 x n x0x0 … Image-Activity Image-Action Action-Activity Action-Action Input: image-label pair (x,h,y)

Inference Score an image x with activity label y Infer the latent variables NP hard !

Inference Holding G y fixed, Holding h y fixed, Loopy BP ILP

Learning with Latent SVM Optimization: Non-convex bundle method (Do & Artieres, ICML 09)

y h1h1 h2h2 y hnhn x1x1 x 2 x n image action class activity class x0x0 … Feature-level Approach person-person Interaction Structure-level Feature-level

Feature-level Approach Model y h1h1 h2h2 y h x1x1 x 2 x n image action class activity class x0x0 … Action Context Descriptor

τ (a) action (c) τ z + action Focal personContext (b)

Action Context Descriptor Feature Descriptor Multi-class SVM action class score action class score … action class score max action class score e.g. HOG by Dalal & Triggs

Dataset Collective Activity Dataset (Choi et al. VS 09) 5 action categories: crossing, waiting, queuing, walking, talking. (per person) 44 video clips

Collective Activity Dataset

Dataset Nursing Home Dataset activity categories: fall, non-fall. (per image) 5 action categories: walking, standing, sitting, bending and falling. (per person) In total 22 video clips (2990 frames), 8 clips for test, the rest for training. 1/3 are labeled as fall.

Nursing Home Dataset

Baselines root (x 0 ) + svm (no structure) No connection Min-spanning tree Complete graph within r h1h1 h2h2 h3h3 h4h4 h1h1 h2h2 h3h3 h4h4 r h1h1 h2h2 h3h3 h4h4 h1h1 h2h2 h3h3 h4h4 Structure-level approach Hidden layer

System Overview Person Detector Person Detector Person Descriptor Person Descriptor Video u v Model Pedestrian Detection by Felzenszwalb et al. Background Subtraction HOG by Dalal & Triggs LST by Loy et al. at cvpr 09

Results – Collective Activity Dataset

Results – Correct Examples

Results – Incorrect Examples Crossing Waiting

Walking Talking Queuing

Results – Nursing Home Dataset

Results – Incorrect Examples

Conclusion A discriminative model for group activity recognition with context. Two new types of contextual information: – group-person interaction – person-person interaction structure-level: Latent structure Feature-level: Action Context descriptor Experimental results demonstrate the effectiveness of the proposed model

Future Work Modeling Complex Structures – Temporal dependencies among action Contextual Feature Descriptors – How to encode discriminative context? Weakly supervised Learning – e.g. multiple instance learning for fall detection

Pairwise Weight yhjhj hkhk

Infer the graph structures

0/1 loss – optimize overall accuracy Results – Nursing Home Dataset

new loss – optimize mean per-class accuracy

Person Detectors Collective Activity Dataset: Pedestrian Detector (Felzenszwalb et al., CVPR 08) Nursing Home Dataset Background Subtraction Moving Regions Video

Person Descriptors Collective Activity Dataset: HOG Nursing Home Dataset Local Spatial Temporal (LST) Descriptor (Loy et al., ICCV 09) u v

Results – Incorrect Examples

Results – Collective Activity Dataset Root+SVM Structure-level Feature-level

Group Context Descriptor

y h1h1 h2h2 y hnhn x1x1 x 2 x n x0x0 …

Learning Training data consists of {x n,h n,y n }

Structure-level Feature-level No connection

Results – Nursing Home Dataset

Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc.

Similar presentations

Presentation on theme: "Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc.

Similar presentations

Presentation on theme: "Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc."— Presentation transcript:

Similar presentations

About project

Feedback