Presentation is loading. Please wait.

Presentation is loading. Please wait.

Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc.

Similar presentations


Presentation on theme: "Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc."— Presentation transcript:

1 Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc. Thesis Defense

2 Outline Group Activity Recognition with Context – Structure-level (latent structures) – Feature-level (Action Context descriptor) Experiments Introduction

3 Activity Recognition Goal Enable computers to analyze and understand human behavior. Answering a phone Kissing

4 Action vs. Activity Activity: a group of people forming a queue Action: Stand in a queue and facing left

5 Activity Recognition Activity Recognition is important Activity Recognition is difficult intra-class variation, background clutter, partial occlusion, etc. Surveillance Entertainment Sport HCI

6 Group Activity Recognition Motivation human actions are rarely performed in isolation, the actions of individuals in a group can serve as context for each other. Goal explore the benefit of contextual information in group activity recognition in challenging real-world applications

7 Group Activity Recognition Context

8 Group Activity Recognition Two types of Context Talk … … group-person interaction person-person interaction

9 Latent Structured Model y h1h1 h2h2 y h x1x1 x 2 x n image action class activity class x0x0 … Activity Action Feature Hidden layer

10 y h1h1 h2h2 y hnhn x1x1 x 2 x n image action class activity class x0x0 … Latent Structured Model group-person Interaction person-person Interaction Structure-level Feature-level

11 Difference from Previous Work Group Activity Recognition Previous Work Single-person action recognition Schuldt et al. icpr 04 Relative simple activity recognition Vaswani et al. cvpr 03 Dataset in controlled conditions Our work Group activity recognition in realistic videos Two new types of contextual information A unified framework

12 Difference from Previous Work Latent Structured Models Our work latent structure for the hidden layer, automatically infer it during learning and inference. Previous work a pre-defined structure for the hidden layer, e.g. tree (HCRF) ( Quattoni et al. pami 07, Felzenszwalb et al. cvpr 08)

13 Outline Group Activity Recognition with Context – Structure-level (latent structures) – Feature-level (Action Context descriptor) Experiments Introduction

14 y h1h1 h2h2 y hnhn x1x1 x 2 x n image action class activity class x0x0 … Structure-level Approach person-person Interaction Structure-level Feature-level

15 Structure-level Approach Latent Structure Queue ? Talk

16 Model Formulation y h1h1 h2h2 y hnhn x1x1 x 2 x n x0x0 … Image-Activity Image-Action Action-Activity Action-Action Input: image-label pair (x,h,y)

17 Inference Score an image x with activity label y Infer the latent variables NP hard !

18 Inference Holding G y fixed, Holding h y fixed, Loopy BP ILP

19 Learning with Latent SVM Optimization: Non-convex bundle method (Do & Artieres, ICML 09)

20 y h1h1 h2h2 y hnhn x1x1 x 2 x n image action class activity class x0x0 … Feature-level Approach person-person Interaction Structure-level Feature-level

21 Feature-level Approach Model y h1h1 h2h2 y h x1x1 x 2 x n image action class activity class x0x0 … Action Context Descriptor

22 τ (a) action (c) τ z + action Focal personContext (b)

23 Action Context Descriptor Feature Descriptor Multi-class SVM action class score action class score … action class score max action class score e.g. HOG by Dalal & Triggs

24 Outline Group Activity Recognition with Context – Structure-level (latent structures) – Feature-level (Action Context descriptor) Experiments Introduction

25 Dataset Collective Activity Dataset (Choi et al. VS 09) 5 action categories: crossing, waiting, queuing, walking, talking. (per person) 44 video clips

26 Collective Activity Dataset

27 Dataset Nursing Home Dataset activity categories: fall, non-fall. (per image) 5 action categories: walking, standing, sitting, bending and falling. (per person) In total 22 video clips (2990 frames), 8 clips for test, the rest for training. 1/3 are labeled as fall.

28 Nursing Home Dataset

29 Baselines root (x 0 ) + svm (no structure) No connection Min-spanning tree Complete graph within r h1h1 h2h2 h3h3 h4h4 h1h1 h2h2 h3h3 h4h4 r h1h1 h2h2 h3h3 h4h4 h1h1 h2h2 h3h3 h4h4 Structure-level approach Hidden layer

30 System Overview Person Detector Person Detector Person Descriptor Person Descriptor Video u v Model Pedestrian Detection by Felzenszwalb et al. Background Subtraction HOG by Dalal & Triggs LST by Loy et al. at cvpr 09

31 Results – Collective Activity Dataset

32 Results – Correct Examples

33

34 Results – Incorrect Examples Crossing Waiting

35 Walking Talking Queuing

36 Results – Nursing Home Dataset

37 Results – Correct Examples

38 Results – Incorrect Examples

39 Conclusion A discriminative model for group activity recognition with context. Two new types of contextual information: – group-person interaction – person-person interaction structure-level: Latent structure Feature-level: Action Context descriptor Experimental results demonstrate the effectiveness of the proposed model

40 Future Work Modeling Complex Structures – Temporal dependencies among action Contextual Feature Descriptors – How to encode discriminative context? Weakly supervised Learning – e.g. multiple instance learning for fall detection

41

42

43 Pairwise Weight yhjhj hkhk

44

45

46 Infer the graph structures

47 0/1 loss – optimize overall accuracy Results – Nursing Home Dataset

48 new loss – optimize mean per-class accuracy

49 Person Detectors Collective Activity Dataset: Pedestrian Detector (Felzenszwalb et al., CVPR 08) Nursing Home Dataset Background Subtraction Moving Regions Video

50 Person Descriptors Collective Activity Dataset: HOG Nursing Home Dataset Local Spatial Temporal (LST) Descriptor (Loy et al., ICCV 09) u v

51 Results – Correct Examples

52 Results – Incorrect Examples

53 Results – Collective Activity Dataset Root+SVM Structure-level Feature-level

54 Group Context Descriptor

55 y h1h1 h2h2 y hnhn x1x1 x 2 x n x0x0 …

56 Learning Training data consists of {x n,h n,y n }

57 Structure-level Feature-level No connection

58 Structure-level Feature-level No connection

59 Results – Nursing Home Dataset


Download ppt "Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc."

Similar presentations


Ads by Google