Download presentation
Presentation is loading. Please wait.
1
1 Temporal Scenarios, learning and Video Understanding Francois BREMOND, Monique THONNAT, … INRIA Sophia Antipolis, PULSAR team, FRANCE Fremond.Bremond@sophia.inria.fr http://www-sop.inria.fr/pulsar Key words: Artificial intelligence, knowledge-based systems, cognitive vision, human behavior representation, scenario recognition
2
2 Alarms access to forbidden area 3D scene model Scenario models A priori Knowledge Objective: Real-time Interpretation of videos from pixels to events Segmentation Classification Tracking Scenario Recognition Video Understanding
3
3 Global approach integrating all video understanding functionalities, while focusing on the easy generation of dedicated systems based on cognitive vision: 4D analysis (3D + temporal analysis) artificial intelligence: explicit knowledge (scenario, context, 3D environment) software engineering: reusable & adaptable platform (control, library of dedicated algorithms) Extract and structure knowledge (invariants & models) for Perception for video understanding (perceptual, visual world) Maintenance of the 3D coherency throughout time (physical world of 3D spatio- temporal objects) Event recognition (semantics world) Evaluation, control and learning (systems world) Video Understanding: Approach
4
4 Strong impact for visual surveillance in transportation (metro station, trains, airports, aircraft, harbors) Control access, intrusion detection and Video surveillance in building Traffic monitoring (parking, vehicle counting, street monitoring, driver assistance) Bank agency monitoring Risk management (simulation) Video communication (Mediaspace) Sports monitoring (Tennis, Soccer, F1, Swimming pool monitoring) New application domains : Aware House, Health (HomeCare), Teaching, Biology, Animal Behaviors, … Creation of a start-up Keeneo July 2005 (15 persons): http://www.keeneo.com/ Video Understanding Applications
5
5 Scene Models (3D) - Scene objects - zones - calibration matrices - Scene objects - zones - calibration matrices Alarms Multi-cameras Combination Behavior Recognition - States - Events - Scenarios Individual Tracking Group Tracking Crowd Tracking - Motion Detector - F2F Tracker - Motion Detector - F2F Tracker - Motion Detector - F2F Tracker Mobile objects Annotation Scenario Models Video Understanding: platform Tools: - Evaluation - Acquisition - Learning, …
6
6 Classification into more than 8 classes (e.g. Person, Groupe, Train) based on 2D and 3D descriptors (position, 3D ratio height/width, …) Example of 4 classes: Person, Group, Noise, Unknown People detection
7
7 Classification into 3 people classes : 1Person, 2Persons, 3Persons, Unknown, …, based on 3D parallelepiped People detection (M. Zuniga)
8
8 Posture Recognition : silhouette comparison Real worldVirtual world Generated silhouettes
9
9 Posture Recognition : results
10
10 Scene Models (3D) - Scene objects - zones - calibration matrices - Scene objects - zones - calibration matrices Alarms Multi-cameras Combination Behavior Recognition - States - Events - Scenarios Individual Tracking Group Tracking Crowd Tracking - Motion Detector - F2F Tracker - Motion Detector - F2F Tracker - Motion Detector - F2F Tracker Mobile objects Annotation Scenario Models Video Understanding Tools: - Evaluation - Acquisition - Learning, …
11
11 Event Recognition Several formalisms can be used: Event representation: n-ary tree, frame, aggregate (structure). finite state automaton, sequence (evolution). graph, set of constraints. Event recognition: feature based routine. Classification, Bayesian, neural network, SVM, clustering. DBN, HMM. Petri net, grammar, constraint resolution and propagation, verification of temporal constraints.
12
12 Event Recognition Outline: Event Representation Temporal Scenario Recognition Scenarios representation recognition process Results: recognition of several scenarios CARETAKER: management of large multimedia collections Trajectory clustering Activity clustering Frequent Composite Event Discovery in Videos (Learning Scenario Models)
13
13 Event Representation Several entities are involved in the scene understanding process: Moving region: any intensity change between images. Context object: predefined static object of the scene environment (entrance zone, wall, equipment, door...). Physical object : any moving region which has been tracked and classified (person, group of persons, vehicle, … etc). Physical object of interest: meaningful object, but depending on applications (person/ door, parked vehicle, … etc).
14
14 Event Representation Events and scenarios : large variety more or less composed of sub-events (running/fighting). involving few/many actors (football game). general (standing)/sensor and application/view (sit down, stop) dependent. spatial granularity: the view observed by one camera/the whole site. temporal granularity: instantaneous/long term with complex relationships (synchronize). 3 levels of complexity depending on the complexity of temporal relations and on the number of physical objects : non-temporal constraint relative to one physical object (sitting). Intuitive association of probabilities to get precision. temporal sequence of sub-scenarios relative to one physical object (open the door, go toward the chair then sit down). complex temporal constraints relative to several physical objects (A meets B at the coffee machine then C gets up and leaves). Need of logic expressiveness but high complexity for meaningful probabilities and sensitive to vision errors.
15
15 Video events: real world notions corresponding to short actions up to activities. Primitive State: a spatio-temporal property linked to vision routines involving one or several actors, valid at a given time point or stable on a time interval Ex : « close», « walking», « sitting» Composite State: a combination of primitive states Primitive Event: a significant change of states Ex : « enters», « stands up», « leaves » Composite Event: a combination of states and events. Corresponds to a long term (symbolic, application dependent) activity. Ex : « fighting», « vandalism» Event Representation
16
16 A video event is mainly constituted of five parts: Physical objects: all real world objects present in the scene observed by the cameras Mobile objects, contextual objects, zones of interest Components: list of states and sub-events involved in the event Forbidden Components: list of states and sub-events that must not be detected in the event Constraints: symbolic, logical, spatio-temporal relations between components or physical objects Action : a set of tasks to be performed when the event is recognized Event Representation
17
17 Event Representation Representation Language to describe Temporal Events of interest. Example: a “Bank_Attack” scenario model composite-event (Bank_attack, physical-objects ((employee : Person), (robber : Person)) components( (e1 : primitive-state inside_zone (employee, "Back")) (e2 : primitive-event changes_zone (robber, "Entrance", "Infront")) (e3 : primitive-state inside_zone (employee, "Safe")) (e4 : primitive-state inside_zone (robber, "Safe")) ) constraints ((e2 during e1) (e2 before e3) (e1 before e3) (e2 before e4) (e4 during e3) ) action (“Bank attack!!!”) )
18
18 Scenario Recognition: Temporal Constraints Overview of the recognition process Recognition of elementary scenarios Scenario compilation Recognition of composed scenarios Scenario recognition and uncertainty Example of the recognition of a “Bank attack” scenario and more…
19
19 Scenario Recognition: issues Many event representations are not easy to use and not re-usable (need learning, e.g. incremental, supervised) scenarios defined at one time point / interval, do not let the experts to describe their scenarios in a natural way (context awareness, user feedback). Event recognition approaches : allow an efficient recognition of events, but some temporal constraints cannot be processed (e.g. static scenes, synchronization) they require that the events are bounded in time (temporal complexity). deal partially with inaccuracy and uncertainty (e.g. lost tracks), are not linked to signal.
20
20 Scenario (algorithmic notion): any type of video events Two types of scenarios: elementary (primitive states) composed (composite states and events). Algorithm in two steps. Scenario Recognition: Temporal Constraints (T. Vu) 1) Recognize all Elementary Scenario models 2) Trigger the recognition of selected Composed Scenario models 1) Recognize all triggered Composed Scenario models 2) Trigger the recognition of other Composed Scenario models Tracked Mobile Objects Recognized Scenarios A priori Knowledge - Scenario knowledge base - 3D geometric & semantic information of the observed environment
21
21 Scenario Recognition: Elementary Scenario The recognition of a compiled elementary scenario model m e consists of a loop: 1. Choosing a physical object for each physical-object variable 2. Verifying all constraints linked to this variable m e is recognized if all the physical-object variables are assigned a value and all the linked constraints are satisfied.
22
22 Scenario Recognition: Composed Scenario Problem: given a scenario model m c = (m 1 before m 2 before m 3 ); if a scenario instance i 3 of m 3 has been recognized then the main scenario model m c may be recognized. However, the classical algorithms will try all combinations of scenario instances of m 1 and of m 2 with i 3 a combinatorial explosion. Solution: decompose the composed scenario models into simpler scenario models in an initial (compilation) stage such as each composed scenario model is composed of two components: m c = (m 4 before m 3 ) a linear search.
23
23 Scenario Recognition: Composed Scenario Example: original “Bank_attack” scenario model composite-event(Bank_attack, physical-objects((employee : Person), (robber : Person)) components( (1) (e1 : primitive-state inside_zone(employee, "Back")) (2) (e2 : primitive-event changes_zone(robber, "Entrance", "Infront")) (3) (e3 : primitive-state inside_zone(employee, "Safe")) (4) (e4 : primitive-state inside_zone(robber, "Safe")) ) constraints((e2 during e1) (e2 before e3) (e1 before e3) (e2 before e4) (e4 during e3) ) alert(“Bank attack!!!”) )
24
24 Scenario Recognition: Composed Scenario Compilation: Original scenario model is decomposed into 3 new scenarios composite-event(Bank_attack_1, physical-objects((employee : Person), (robber : Person)) components( (1)(e1 : primitive-state inside_zone (employee, "Back")) (2)(e2 : primitive-event changes_zone (robber, "Entrance", "Infront")) constraints((e1 during e2) )) composite-event( Bank_attack_2, physical-objects((employee : Person), (robber : Person)) components( (3)(e3 : primitive-state inside_zone (employee, "Safe")) (4)(e4 : primitive-state inside_zone (robber, "Safe")) ) constraints((e3 during e4) )) composite-event( Bank_attack_3, physical-objects((employee : Person), (robber : Person)) components( (att_1 : composite-event Bank_attack_1 (employee, robber)) (att_2 : composite-event Bank_attack_2 (employee, robber)) ) constraints(((termination of att_1) before (start of att_2)) ) alert(“Bank attack!!!”) )
25
25 Scenario Recognition: Composed Scenario A compiled scenario model m c is composed of two components: start and termination. To start the recognition of m c, its termination needs to be already instantiated. The recognition of a compiled scenario model m c consists of a loop: 1. Choosing a scenario instance for the start of m c, 2. Verifying the temporal constraints of m c, 3. Instantiating the physical-objects of m c with physical-objects of the start and of the termination of m c, 4. Verifying the non-temporal constraints of m c. 5. Verifying forbidden constraints.
26
26 Scenario Recognition: Temporal Constraints Results A generic formalism to help experts model intuitively states, events and scenarios. Recognition algorithm processes temporal operators in an efficient way. The recognition of complex scenarios (large number of actors) becomes real time. However, uncertainty needs to be taken care
27
27 Scenario recognition: capacity of prediction Issue: in the bank monitoring application, an alert “Bank attack!!!” is triggered when a scenario “Bank_attack” is recognized. However, it can be too late for security agents to cope with the situation. Requirement: is the temporal scenario recognition method able to predict scenarios that may occur in the future? Answer: YES, the recognition algorithm can predict scenarios that may occur by adding automatically alerts (during the compilation) to some generated scenario models. This task can be specified in scenario models.
28
28 Scenario recognition : uncertainty Temporal precision Issue: several scenario models are defined with too precise temporal constraints they cannot be recognized with real videos. Solution: we defined a temporal tolerance Δt as an integer, then all temporal comparisons are estimated using an approximation of Δt. Incorrect mobile object tracking Issue: the vision algorithms may loose the track of several detected mobile objects the system cannot recognize correctly scenario occurrences in several videos. Solution1: experts describe different scenario models representing various situations corresponding to several combinations of physical objects.
29
29 Uncertainty Representation Solution2: management of the vision uncertainty (likelihood): within predefined event models (off-line) –coefficients (on mobile objects and components) are provided by default. propagated (on-line) through the event instances 1.mobile objects: computed by vision algorithms. 2.primitive states (elementary): –a coefficient to each physical object for representing the likelihood relation between the state and each involved mobile object. 3.events and composite states (composed): –a coefficient to each component for representing the likelihood relation between the event and each component. –defining a threshold into each state/event model for specifying at which likelihood level the given state/event should be recognized.
30
30 Uncertainty Representation PrimitiveState (Person_Close_To_Vehicle, Physical Objects ( (p : Person, 0.7), (v : Vehicle, 0.3) ) Constraints ((p distance v ≤ close_distance) (recognized if likelihood > 0.8)) ) CompositeEvent (Crowd_Splits, Physical Objects ((c1: Crowd, 0.5), (c2 : Crowd, 0.5), (z1: Zone) ) Components ((s1 : CompositeState Move_toward (c1, z1), 0.3) (e2 : CompositeEvent Move_away (c2, c1), 0.7) ) Constraints ( (e2 during s1) (c2's Size > Threshold) (recognized if likelihood > 0.8)) )
31
31 The likelihood of an event is calculated by the following formula: The likelihood of an event is calculated by the following formula: like is the likelihood, coef is the coefficient Two functions f are defined: Two functions f are defined: – for a primitive state (elementary) e : – for an event or composite state (composed) Ec : An event E is recognized if and only if: likelihood(E) ≥ threshold(E) An event E is recognized if and only if: likelihood(E) ≥ threshold(E) Issue: event uncertainty directly links to tracking uncertainty (difficult for real-world scenes). Using the Uncertainty
32
32 Scenario recognition: Results Bank agency monitoring : Paris (M. Maziere)
33
33 Scenario Recognition: Results Vandalism in metro in Nuremberg
34
34 Video Understanding for Trichogramma Monitoring
35
35 Scenario recognition: Results HomeCare Monitoring (N. Zouba) Visualization of a recognized event in the Gerhome laboratory The person is recognized with the posture "standing with one arm up”, “located in the kitchen” and “using the microwave”. The person is recognized with the posture "standing with one arm up”, “located in the kitchen” and “using the microwave”.
36
36 Example of the Unloading Front Operation (global) CompositeEvent (UnLoading_Front_Global_Operation, PhysicalObjects ( (v1 : Vehicle), (v2 : Vehicle), (z1 : Zone), (z2 : Zone), (z3 :Zone)) Components ( (c1 : CompositeEvent Loader_Arrival(v1, z1, z2)) (c2 : CompositeEvent Transporter_Arrival(v2, z1, z3)) Constraints ( (v1->SubType = LOADER) (v2->SubType = TRANSPORTER) (z1->Name = ERA) (z2->Name = RF_DoorC_Access) (z3->Name = LOADER_BackZone) (c1 before c2))) Scenario recognition: Results Example: “Unloading Front Operation ” event
37
37 “Unloading Global Operation” Scenario recognition: Results Example: “Unloading Global Operation” event
38
38 Example of the Unloading Front Operation (detailed) CompositeEvent (UnLoading_Front_Detailed_Operation, PhysicalObjects ( (p1 : Person), (v1 : Vehicle), (v2 : Vehicle), (v3 : Vehicle), (z1 : Zone), (z2 : Zone), (z3 :Zone), (z4 : Zone)) Components ( (c1 : CompositeEvent Loader_Arrival(v1, z1, z2)) (c2 : CompositeEvent Transporter_Arrival(v2, z1, z3)) (c3 : CompositeState Worker_Manipulating_Container(p1, v3, v2, z3, z4))) Constraints ( (v1->SubType = LOADER) (v2->SubType = TRANSPORTER) (z1->Name = ERA) (z2->Name = RF_DoorC_Access) (z3->Name = LOADER_BackZone) (z4->Name = Behind_RF_DoorC_Access) (c1 before c2) (c2 before c3))) Scenario recognition: Results Example: “Unloading Front Operation ” event
39
39 Scenario recognition: Results Parked aircraft monitoring in Toulouse (F Fusier) “Unloading Front Operation”
40
40 CARETAKER: An FP6 IST European initiative to provide an efficient tool for the management of large multimedia collections. Applications to surveillance and safety issues, in urban/environment planning, resource optimization, disabled/elderly person monitoring. Currently being validated on large underground video recordings ( Torino, Roma). Complex Events Raw Data Simple Events Knowledge Discovery Raw data Primitives Event and Meta data Audio/Video acquisition and encoding Multiple Audio/Video sensors Knowledge Discovery Generic Event recognition Video Understanding : Knowledge Discovery (E. Corvee, JL. Patino_Vilchis)
41
41 Event detection examples
42
42 Data Flow Object/Event Detection Information Modelling Object Detection Id Type Info 2D Info 3D Event Detection Id Type (inside_zone, stays_inside_zone) Involved Mobile Object Involved Contextual Object Mobile object table Event table Contextual object table
43
43 Mobile Objects People characterised by: Trajectory Shape Significant Event in which they are involved … Contextual Objects Find interactions between mobile objects and contextual objects Interaction type Time … Table Contents Events Model the normal activities in the metro station Event type Involved objects Time …
44
44 Knowledge Discovery: trajectory clustering Objective: Clustering of trajectories into k groups to match people activities Feature set Entry and exit points of an object Direction, speed, duration, … Clustering techniques Agglomerative Hierarchical Clustering. K-means Self-Organizing (Kohonen) Maps Evaluation of each cluster set based on Ground-Truth
45
45 Feature Vector Key points: x y x entry y entry m1m1 m2m2 mkmk mKmK Trajectory: Clustering Methods Parameter tuning: which features?
46
46 Agglomerative clustering Trajectory: Clustering Parameter tuning: which distance function?
47
47 Results on Torino subway (45min), 2052 trajectories
48
48 SOM K-meansAgglomerative Groups with mixed overlap Trajectory: Analysis
49
49 Intraclass & Interclass variance SOM algorithm has the lowest intraclass and higher interclass separation, Parameter tuning: which clustering techniques? Trajectory: Analysis
50
50 Video features modeled under three different tables with topological and temporal relations for quantitative and semantic description. Trajectory clustering gives information about frequent enter exit zones, density occupation and behavior characterization. Meaningful trajectory clusters are validated by the consistency through different algorithms Trajectory: Analysis
51
51 Mobile Objects
52
52 Mobile Object Analysis Building statistics on Objects There is an increase of people after 6:45
53
53 Contextual Object Analysis Vending Machine 2 Vending Machine 1 With an increase of people, there is an increase on the use of vending machines
54
54 Contextual Object Analysis Gates 192345678
55
55 Analysis: Use of the Gates Gates 7 to 9 are the most used (right side of gates) Gates 1 to 3 are the less in use (left side of gates)
56
56 Results : Trajectory Clustering
57
57 Semantic knowledge extracted by the off-line long term analysis of on-line interactions between moving objects and contextual objects: 70% of people are coming from north entrance Most people spend 10 sec in the hall 64% of people are going directly to the gates without stopping at the ticket machine At rush hours people are 40% quicker to buy a ticket, … Issues: At which level(s), should be designed clustering techniques: low level (image features)/ middle level (trajectories, shapes)/ high level (primitive events)? to learn what : visual concepts, scenario models? uncertainty (noise/outliers/rare), what are the activities of interest? Parameter tuning (e.g. distance, clustering tech.) and performance evaluation (criteria, ground-truth). Knowledge Discovery: achievements
58
58 Video Understanding : Learning Scenario Models (A. Toshev) or Frequent Composite Event Discovery in Videos event time series
59
59 Why unsupervised model learning in Video Understanding? Complex models containing many events, Large variety of models, Different parameters for different models The learning of models should be automated. Learning Scenarios: Motivation Video surveillance in a parking lot
60
60 Input: A set of primitive events from the vision module: object-inside-zone(Vehicle, Entrance) [5,16] Output: frequent event patterns. A pattern is a set of events: object-inside-zone(Vehicle, Road) [0, 35] object-inside-zone(Vehicle, Parking_Road) [36, 47] object-inside-zone(Vehicle, Parking_Places) [62, 374] object-inside-zone(Person, Road) [314, 344] Learning Scenarios: Problem Definition Goals: Automatic data-driven modeling of composite events, Reoccurring patterns of primitive events correspond to frequent activities, Find classes with large size & similar patterns. Zones
61
61 Approach: Iterative method from data mining for efficient frequent patterns discovery in large datasets, A PRIORI: Sub-patterns of frequent patterns are also frequent (Agrawal & Srikant, 1995), At i th step consider only i-patterns which have frequent (i-1) – sub-patterns the search space is thus pruned. A PRIORI-property for activities represented as classes: size(C m-1 ) ≥ size(C m ) where C m is a class containing patterns of length m, C m-1 is a sub-activity of C m. Learning Scenarios: A PRIORI Method
62
62 Learning Scenarios: A PRIORI Method Merge two i-patterns with (i-1) primitive events in common to form an (i+1)-pattern:
63
63 2 types of Similarity Measure between event patterns : similarities between event attributes similarities between pattern structures Generic Similarity Measure : Generic properties when possible easy usage in different domains, It should incorporate domain-dependent properties relevance to the concrete application. Learning Scenarios: Similarity Measure
64
64 Attributes: the corresponding events in two patterns should have similar (same) attributes (duration, names, object types,...). Learning Scenarios: Attribute Similarity Comparison between corresponding events (same type, same color). For numeric attributes: G(x,y)= attr(p i, p j ) = average of all event attribute similarities.
65
65 Test data: Video surveillance at a parking lot, 4 hours records from 2 days in 2 test sets, Every test set contains appr. 100 primitive events. Learning Scenarios: Evaluation Results: In both test sets the following event pattern was recognized: object-inside-zone(Vehicle, Road) object-inside-zone(Vehicle, Parking_Road) object-inside-zone(Vehicle, Parking_Places) object-inside-zone(Person, Parking_Road)
66
66 Test data: Video surveillance at a parking lot, 4 hours records from 2 days in 2 test sets, Every test set contains appr. 100 primitive events. Learning Scenarios: Evaluation Results: In both test sets the following event pattern was recognized: object-inside-zone(Vehicle, Road) object-inside-zone(Vehicle, Parking_Road) object-inside-zone(Vehicle, Parking_Places) object-inside-zone(Person, Parking_Road)
67
67 Test data: Video surveillance at a parking lot, 4 hours records from 2 days in 2 test sets, Every test set contains appr. 100 primitive events. Learning Scenarios: Evaluation Results: In both test sets the following event pattern was recognized: object-inside-zone(Vehicle, Road) object-inside-zone(Vehicle, Parking_Road) object-inside-zone(Vehicle, Parking_Places) object-inside-zone(Person, Parking_Road)
68
68 Test data: Video surveillance at a parking lot, 4 hours records from 2 days in 2 test sets, Every test set contains appr. 100 primitive events. Learning Scenarios: Evaluation Results: In both test sets the following event pattern was recognized: object-inside-zone(Vehicle, Road) object-inside-zone(Vehicle, Parking_Road) object-inside-zone(Vehicle, Parking_Places) object-inside-zone(Person, Parking_Road) Maneuver Parking!
69
69 Conclusion: Application of a data mining approach, Handling of uncertainty without losing computational effectiveness, General framework: only a similarity measure and a primitive event library must be specified. Future Work: Other similarities, Handling of different aspects of uncertainty, Qualification of the learned patterns, Frequent equal interesting ? Different applications: different event libraries or features. Learning Scenarios: Conclusion & Future Work
70
70 A global framework for building video understanding systems, For Individuals, Groups of People, Vehicles, Crowd, or Animals … Perspectives: Object and video event detection Finer human shape description : gesture models, face detection Video analysis robustness: reliability computation (e.g. tracking) Knowledge Acquisition and Learning Design of learning techniques (clustering of low/middle/high level features) to complement a priori knowledge (e.g. visual concept learning, scenario model learning) Issues: uncertainty (rare/noise/outliers), annotation, evaluation. System Reusability Use of program supervision techniques: dynamic configuration of programs and parameters Scaling issue: managing large network of heterogeneous sensors (cameras, PTZ, microphones, optical cells, radars….) Conclusion and perspectives
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.