Download presentation
Presentation is loading. Please wait.
Published byEmilian Dobre Modified over 5 years ago
1
Knowledge-based event recognition from salient regions of activity
Nicolas Moënne-Loccoz Viper group Computer vision & multimedia laboratory University of Geneva Knowledge-based event recognition from salient regions of activity M4 – Meeting – January 2004 January /
2
Outline Context Salient Regions of Activity (SRA)
Learning the semantic of SRA Visual Event Query language Conclusion NML - CVML - UniGe
3
Context Retrieval of visual events based on user query
Abstract representation of the visual content Query Language to express visual events Approach Region-based description of the content Classification of the regions Events queried as spatio-temporal constraints on the regions NML - CVML - UniGe
4
Overview Domain Knowledge Region extraction Classification
Salient regions of activity Labelled regions Videos database Region extraction Classification User queries NML - CVML - UniGe
5
Salient regions of activity
Regions of the image space Moving in the scene Having an homogenous colour distribution Moving objects or meaningful parts of moving objects Extraction : From moving salient points By an adaptive mean-shift algorithm NML - CVML - UniGe
6
Salient points extraction
Scale invariant interest points (Mikolajczyk, Schmid 2001) Extracted in the linear scale-space Local maxima of the scale normalized Harris function (image space) Local maxima of the scale normalized Laplacian (scale space) NML - CVML - UniGe
7
Salient points extraction
Example : scale NML - CVML - UniGe
8
Salient points trajectories
Trajectories used to : Find salient points moving in the scene Track salient points along the time Points matching using Local grayvalue invariants (Schmid) NML - CVML - UniGe
9
Salient points trajectories
Mahalanobis distance : Set of matching points minimize Greedy Winner-Takes-All algorithm Set of points trajectories Moving salient points : NML - CVML - UniGe
10
Salient regions estimation
Estimate characteristic regions of the moving salient points Mean-Shift algorithm : estimate the position Likelihood of pixels (RGB colour distribution) Ellipsoidal Epanechnikov Kernel NML - CVML - UniGe
11
Salient regions estimation
Kernel adaptation step : estimate shape and size Algorithm : NML - CVML - UniGe
12
Salient regions representation
Set of salient regions of activity represented by : Position Ellipsoid Colour distribution Set of salient points Salient regions tracking Regions are matched by a majority vote of their salient points NML - CVML - UniGe
13
Salient regions of activity
NML - CVML - UniGe
14
Regions classification
To obtain an abstract description : Map regions to a domain-specific basic vocabulary Meetings : {Arm, Head, Body, Noise} SVM classifier : Set of 500 annotated salient regions of activity (~200 frames) NML - CVML - UniGe
15
Regions classification
Confusion Matrix : Discussion : Noise class is ill-defined Good results explained by the limited number of classes Arm Head Body Noise 1.000 0.909 0.091 0.052 0.946 NML - CVML - UniGe
16
Visual event language To express visual events queries
Spatio-temporal constraints on labelled regions (LR) To integrate domain Knowledge As specification of the layout (L) As set of basic events a formula of the language is a conjunctive form of : Temporal relations {after, just-after} between 2 LR Spatial relations {above, left} between 2 LR {in} between a LR and a L Identity relations {is} between 2 LR {is-a} between a LR and a label NML - CVML - UniGe
17
Knowledege - Meetings Scene layout : L = {SEATS, DOOR, BOARD}
NML - CVML - UniGe
18
Knowledege - Meetings Basic events : {Meeting-participant, sitting, standing} Meeting-participant : actors LR constraints is-a(head, LR). Sitting : actor : LR constraints : Meeting-participant(LR), in(SEATS, LR). Standing : actor : LR ~in(SEATS, LR). NML - CVML - UniGe
19
Events queries Example of user queries :
Sitting-down : actors LR1, LR2 constraints is(LR1, LR2), sitting(LR1), standing(LR2), just-after(LR1, LR2). Go-to-board : actors LR1, LR2 standing(LR1), ~in(Board, LR1), in(Board, LR2), just-after(LR2, LR1). NML - CVML - UniGe
20
Events queries - Results
Discussion : Recall validate the retrieval capability False alarms occur because of the hard decision Precision Recall Sit-down 0.43 1.00 Stand-up 0.50 Go-to-board Enter 0.20 Leave 0.25 NML - CVML - UniGe
21
Conclusion Contributions Limitations Ongoing work
Well-suited framework for constraint domains Generic representation of the visual content Paradigm to retrieve visual events from videos Limitations Cannot retrieve all visual events (e.g. emotion) Ongoing work Uncertainty handling and fuzziness Integration of other modalities (e.g. transcripts) NML - CVML - UniGe
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.