Download presentation
Presentation is loading. Please wait.
1
Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc. Thesis Defense
2
Outline Group Activity Recognition with Context – Structure-level (latent structures) – Feature-level (Action Context descriptor) Experiments Introduction
3
Activity Recognition Goal Enable computers to analyze and understand human behavior. Answering a phone Kissing
4
Action vs. Activity Activity: a group of people forming a queue Action: Stand in a queue and facing left
5
Activity Recognition Activity Recognition is important Activity Recognition is difficult intra-class variation, background clutter, partial occlusion, etc. Surveillance Entertainment Sport HCI
6
Group Activity Recognition Motivation human actions are rarely performed in isolation, the actions of individuals in a group can serve as context for each other. Goal explore the benefit of contextual information in group activity recognition in challenging real-world applications
7
Group Activity Recognition Context
8
Group Activity Recognition Two types of Context Talk … … group-person interaction person-person interaction
9
Latent Structured Model y h1h1 h2h2 y h x1x1 x 2 x n image action class activity class x0x0 … Activity Action Feature Hidden layer
10
y h1h1 h2h2 y hnhn x1x1 x 2 x n image action class activity class x0x0 … Latent Structured Model group-person Interaction person-person Interaction Structure-level Feature-level
11
Difference from Previous Work Group Activity Recognition Previous Work Single-person action recognition Schuldt et al. icpr 04 Relative simple activity recognition Vaswani et al. cvpr 03 Dataset in controlled conditions Our work Group activity recognition in realistic videos Two new types of contextual information A unified framework
12
Difference from Previous Work Latent Structured Models Our work latent structure for the hidden layer, automatically infer it during learning and inference. Previous work a pre-defined structure for the hidden layer, e.g. tree (HCRF) ( Quattoni et al. pami 07, Felzenszwalb et al. cvpr 08)
13
Outline Group Activity Recognition with Context – Structure-level (latent structures) – Feature-level (Action Context descriptor) Experiments Introduction
14
y h1h1 h2h2 y hnhn x1x1 x 2 x n image action class activity class x0x0 … Structure-level Approach person-person Interaction Structure-level Feature-level
15
Structure-level Approach Latent Structure Queue ? Talk
16
Model Formulation y h1h1 h2h2 y hnhn x1x1 x 2 x n x0x0 … Image-Activity Image-Action Action-Activity Action-Action Input: image-label pair (x,h,y)
17
Inference Score an image x with activity label y Infer the latent variables NP hard !
18
Inference Holding G y fixed, Holding h y fixed, Loopy BP ILP
19
Learning with Latent SVM Optimization: Non-convex bundle method (Do & Artieres, ICML 09)
20
y h1h1 h2h2 y hnhn x1x1 x 2 x n image action class activity class x0x0 … Feature-level Approach person-person Interaction Structure-level Feature-level
21
Feature-level Approach Model y h1h1 h2h2 y h x1x1 x 2 x n image action class activity class x0x0 … Action Context Descriptor
22
τ (a) action (c) τ z + action Focal personContext (b)
23
Action Context Descriptor Feature Descriptor Multi-class SVM action class score action class score … action class score max action class score e.g. HOG by Dalal & Triggs
24
Outline Group Activity Recognition with Context – Structure-level (latent structures) – Feature-level (Action Context descriptor) Experiments Introduction
25
Dataset Collective Activity Dataset (Choi et al. VS 09) 5 action categories: crossing, waiting, queuing, walking, talking. (per person) 44 video clips
26
Collective Activity Dataset
27
Dataset Nursing Home Dataset activity categories: fall, non-fall. (per image) 5 action categories: walking, standing, sitting, bending and falling. (per person) In total 22 video clips (2990 frames), 8 clips for test, the rest for training. 1/3 are labeled as fall.
28
Nursing Home Dataset
29
Baselines root (x 0 ) + svm (no structure) No connection Min-spanning tree Complete graph within r h1h1 h2h2 h3h3 h4h4 h1h1 h2h2 h3h3 h4h4 r h1h1 h2h2 h3h3 h4h4 h1h1 h2h2 h3h3 h4h4 Structure-level approach Hidden layer
30
System Overview Person Detector Person Detector Person Descriptor Person Descriptor Video u v Model Pedestrian Detection by Felzenszwalb et al. Background Subtraction HOG by Dalal & Triggs LST by Loy et al. at cvpr 09
31
Results – Collective Activity Dataset
32
Results – Correct Examples
34
Results – Incorrect Examples Crossing Waiting
35
Walking Talking Queuing
36
Results – Nursing Home Dataset
37
Results – Correct Examples
38
Results – Incorrect Examples
39
Conclusion A discriminative model for group activity recognition with context. Two new types of contextual information: – group-person interaction – person-person interaction structure-level: Latent structure Feature-level: Action Context descriptor Experimental results demonstrate the effectiveness of the proposed model
40
Future Work Modeling Complex Structures – Temporal dependencies among action Contextual Feature Descriptors – How to encode discriminative context? Weakly supervised Learning – e.g. multiple instance learning for fall detection
43
Pairwise Weight yhjhj hkhk
46
Infer the graph structures
47
0/1 loss – optimize overall accuracy Results – Nursing Home Dataset
48
new loss – optimize mean per-class accuracy
49
Person Detectors Collective Activity Dataset: Pedestrian Detector (Felzenszwalb et al., CVPR 08) Nursing Home Dataset Background Subtraction Moving Regions Video
50
Person Descriptors Collective Activity Dataset: HOG Nursing Home Dataset Local Spatial Temporal (LST) Descriptor (Loy et al., ICCV 09) u v
51
Results – Correct Examples
52
Results – Incorrect Examples
53
Results – Collective Activity Dataset Root+SVM Structure-level Feature-level
54
Group Context Descriptor
55
y h1h1 h2h2 y hnhn x1x1 x 2 x n x0x0 …
56
Learning Training data consists of {x n,h n,y n }
57
Structure-level Feature-level No connection
58
Structure-level Feature-level No connection
59
Results – Nursing Home Dataset
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.