Download presentation
Presentation is loading. Please wait.
1
Human Activity Analysis
吴心筱 Based on the CVPR 2011 Tutorial: Human Activity Analysis.
2
Introduction
3
Understanding People in Video
Goal Find the people Infer their poses Recognize what they do
4
Level of People Understanding
Object-Level Understanding Locations of persons and objects Tracking-Level Understanding Object trajectories --correspondence
5
Level of People Understanding
Pose-Level Understanding Human body parts Activity-Level Understanding Recognition of human activities and events
6
Object Detection Pedestrian (i.e. human) detection
Detect all the persons in the video 找个行人检测的video
7
Object Tracking Person tracking Tracking the person in every frame
找个行人跟踪的video
8
Pose Estimation Human Pose
Joint locations or angles of a person measured per frame Video as a sequence of poses 找个行人检测的video
9
Human Activity Recognition
A collection of human/object movements with a particular semantic meaning Activity Recognition Finding of video segments containing such movements 找个行人检测的video
10
Levels of Human Activities
Categorized based on their complexity Hierarchy # of participants Gestures Actions Interactions Group Activities 找个行人检测的video
11
Gestures Atomic components Single body-part movements 找个行人检测的video
12
Actions Single actor movement
Composed of multiple gestures organized temporally 找个行人检测的video
13
Interactions Human-human interaction Human-object interaction
增加:Human-scene interaction
14
Group Activities Multiple persons or objects 找个行人检测的video
15
Applications 找个行人检测的video
16
Surveillance Monitor suspicious activities for real-time reactions. (e.g.,‘Fighting’, ‘stealing’) Currently, surveillance systems are mainly for recording. Activity recognition is essential for surveillance and other monitoring systems in public places 找个行人检测的video
17
Intelligent Environments (HCI)
Intelligent home, office, and workspace Monitoring of elderly people and children. Recognition of ongoing activities and understanding of current context is essential. 找个行人检测的video
18
Sports Play Analysis 找个行人检测的video
19
Web-based video retrieval
YouTube 20 hours of videos uploaded every minute Content-based search Search based on contents of the video, instead of user-attached keywords Example: search ‘kiss’ from long movies 找个行人检测的video
20
Challenges 找个行人检测的video
21
Robustness Environment variations Background Moving backgrounds
Pedestrians Occlusions View-points – moving camera 找个行人检测的video
22
Motion Style Each person has his/her own style of executing an activity Who stretches his hand first? How long does one stay his hand stretched? 找个行人检测的video
23
Various Activities There are various types of activities
The ultimate goal is to make computer recognize all of them reliably. 找个行人检测的video
24
Learning Insufficient amount of training videos
Traditional setting: Supervised learning Human efforts are expensive! Unsupervised learning Interactive learning Interactive learning?
25
Overview 找个行人检测的video
26
Activity Classification
Simple task of identifying videos Classify given videos into their types. Known, limited number of classes Assumes that each video contains a single activity 找个行人检测的video
27
找个行人检测的video
28
Activity Detection Search for the particular time interval
<starting time, ending time> Video segment containing the activity 找个行人检测的video
29
Activity Detection by Classification
Binary classifier Yes, Puch ! Sliding window technique Classify all possible time intervals 找个行人检测的video
30
Recognition Process Represent videos in terms of features
Captures properties of activity videos Classify activities by comparing video representations Decision boundary 找个行人检测的video
31
Approaches 找个行人检测的video
32
Approach based taxonomy
找个行人检测的video
33
Single-layered vs. hierarchical
Single-layered approaches Hierarchical approaches
34
Single-layered approaches
找个行人检测的video
35
Two different types Space-time approaches (data-orientated)
Activities as video observations 3D space-time volume (3D XYT volume) A set of features extracted from the volume 找个行人检测的video
36
Two different types Sequential approaches (semantic-oriented)
Activities as human movements A sequence of particular observations(feature vectors) 找个行人检测的video
37
Space-time approaches
Space-time volumes Space-time local features Space-time trajectories 找个行人检测的video
38
Space-time volumes Problem: matching between two volumes 找个行人检测的video
39
Motion history images Bobick and J. Davis, 2001
Motion history images (MHIs) Weighted projection of a XYT foreground volume Template matching Bobick and J. Davis, The recognition of human movement using temporal templates,IEEE T PAMI 23(3),2001
40
Segments Ke, Suktankar, Herbert 2007
Volume matching based on its segments. Segment matching scores are combined. [ Ke, Y., Sukthankar, R., and Hebert, M., Spatio-temporal shape and flow correlation for action recognition. CVPR 2007]
41
Space-time local features
Local descriptors / interest points From 2D to 3D ; Sparse Low-level: which local features to be extracted Mid-level: How to represent the activity using local features High-level: What method to classify activities 找个行人检测的video
42
Cuboid Dollar et al., Cuboid, VS-PETS 2005
2D Gaussian smoothing kernel: 1D Gabor filters applied temporally: 加入公式 [Dollar, P., Rabaud, V., Cottrell, G., and Belongie, S., Behavior recognition via sparse spatio-temporal features, VS-PETS 2005]
43
Cuboid Dollar et al., Cuboid, VS-PETS 2005
Appearances of local 3-D XYT volumes Raw appearance Gradients Optical flows Captures salient periodic motion. 加入公式 [Dollar, P., Rabaud, V., Cottrell, G., and Belongie, S., Behavior recognition via sparse spatio-temporal features, VS-PETS 2005]
44
STIP interest points Laptev and Linderberg 2003
Introduced the KTH dataset Local descriptor based on Harris corner detector Simple periodic actions 加入公式 [Schuldt, C., Laptev, I., and Caputo, B., Recognizing human actions: A local SVM approach, ICPR 2004]
45
STIP interest points Laptev and Linderberg 2003
加入公式 [Schuldt, C., Laptev, I., and Caputo, B., Recognizing human actions: A local SVM approach, ICPR 2004]
46
Bag-of-words Representation
找个行人检测的video
47
SVMs classification SVMs classifier Shake hands ! Multiple kernel
learning 找个行人检测的video …… [Schuldt, C., Laptev, I., and Caputo, B., Recognizing human actions: A local SVM approach, ICPR 2004]
48
pLSA models pLSA from text recognition
Probabilistic latent semantic analysis Reasoning the probability of features originated from a particular action video. 加入公式 [Niebles, J. C., Wang, H., and Fei-Fei, L., Unsupervised learning of human action categories using spatial-temporal words, BMVC 2006]
49
pLSA models 加入公式 [Niebles, J. C., Wang, H., and Fei-Fei, L., Unsupervised learning of human action categories using spatial-temporal words, BMVC 2006]
50
Space-time trajectories
Trajectory patterns Yilmaz and Shah, 2005 – UCF Joint trajectories in 3-D XYT space. [[Yilmaz, A. and Shah, M., Recognizing human actions in videos acquired by uncalibrated moving cameras, ICCV 2005]
51
Space-time trajectories
Trajectory patterns Yilmaz and Shah, 2005 – UCF Compared trajectory shapes to classify human actions. [[Yilmaz, A. and Shah, M., Recognizing human actions in videos acquired by uncalibrated moving cameras, ICCV 2005]
52
Space-time approaches Summary
Space-time volumes A straightforward solution Difficult in handling speed and motion variations 找个行人检测的video
53
Space-time approaches Summary
Space-time local features Robust to noise and illumination changes Recognize multiple activities without background subtraction or body-part modeling Difficult to model more complex activities 找个行人检测的video
54
Space-time approaches Summary
Space-time trajectories Perform detailed-level analysis View-invariant in most cases Difficult to extract the trajectories 找个行人检测的video
55
Two different types Sequential approaches (semantic-oriented)
Activities as human movements A sequence of particular observations(feature vectors) 找个行人检测的video
56
Sequential approaches
Exemplar-based approaches State model-based approaches
57
Exemplar-based approaches
Matching between the input sequence of feature vectors and the template sequences Problem: how to match two sequences in different styles/different rates
58
Dynamic Time warping Match two sequences with variations
Find an optimal nonlinear match 找个行人检测的video [[Yilmaz, A. and Shah, M., Recognizing human actions in videos acquired by uncalibrated moving cameras, ICCV 2005]
59
State model-based approaches
A human activity as a model composed of a set of states Each class has a corresponding model Measure the likelihood between the model an the input image sequence HMMs 找个行人检测的video
60
HMMs Given observations V (a sequence of poses), find the HMM Mi that maximizes P(V|Mi). Transition probabilities aij and observations probabilities bik are pre-trained using training data. 找个行人检测的video
61
HMMs for Actions Each hidden state is trained to generate a particular body posture. Each HMM produces a pose sequence: action 找个行人检测的video
62
HMMs for Hand Gestures HMMs for gesture recognition
American Sign Language (ASL) Sequential HMMs shapes and position of hands 找个行人检测的video [Starner, T. and Pentland, A., Real-time American Sign Language recognition from video using hidden Markov models. International Symposium on Computer Vision, 1995.]
63
Sequential approaches summary
Designed for modeling sequential dynamics Markov process Motion features are extracted per frame Limitations Feature extraction Assumes good observation models Complex human activities? Large amount of training data 找个行人检测的video
64
exemplar-based vs. state model-based
Provide more flexibility for the recognition system: multiple sample sequences Less training data State model-based make a probabilistic analysis of the activity model more complex activity 找个行人检测的video
65
Hierarchical approaches
找个行人检测的video
66
Hierarchy Hierarchy implies decomposition into subparts 找个行人检测的video
67
Hierarchical approaches
Statistical approaches Syntactic approaches Description-based approaches 找个行人检测的video
68
Statistical approaches
Statistical state-based models for recognition multiple layers of state-based models low-level: a sequence of feature vectors mid-level: a sequence of atomic action labels high-level: label of activity 找个行人检测的video
69
Layered hidden Markov models (LHMMs)
Oliver et al. 2002 Bottom layer HMMs recognize atomic actions of a single person Upper layer HMMs treat recognized atomic actions as observations …… 找个行人检测的video [Oliver, N., Horvitz, E., and Garg, A Layered representations for human activity recognition. In Proceedings of the IEEE International Conference on Multimodal Interfaces (ICMI). IEEE, Los Alamitos, CA, 3-8.
70
Statistical approaches summary
Reliably recognize activities in the case of noisy inputs with enough training data Inability to recognize activities with complex temporal structures (e.g., concurrent subevents) subevent A Subevent B 找个行人检测的video subevent A subevent A subevent B subevent B
71
Syntactic approaches Model activity as a string of symbols, each symbol corresponds to an atomic-level action Parsing techniques from the field of programming languages Context-free grammars (CFG) and stochastic context-free grammars (SCFGs)
72
CFG for human activities
Given the recognized atomic actions Production rules
73
Parse tree 找个行人检测的video
74
Gesture analysis with CFGs
找个行人检测的video
75
Primitive recognition with HMMs
76
Parse Tree
77
Syntactic approaches summary
Difficult in the recognition of concurrent activities Require a set of production rules fro all possible events Directions: how to learn grammar rules from observations automatically? 找个行人检测的video
78
Description-based approaches
Represents a high-level human activity in terms of simpler activities Describe various relationships between subevents (atomic actions) temporal relationship spatial relationship logical relationship
79
Description-based approaches
Recognize activities using semantic matching. Hand shake = “two persons do shake-action (stretches, stays stretched, withdraw) simultaneously, while touching”. Recognition by finding observations satisfying the definition.
80
Hierarchical approaches summary
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.