Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning and activity recognition. Francois BREMOND PULSAR project-team, INRIA.

Similar presentations


Presentation on theme: "1 Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning and activity recognition. Francois BREMOND PULSAR project-team, INRIA."— Presentation transcript:

1 1 Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning and activity recognition. Francois BREMOND PULSAR project-team, INRIA Sophia Antipolis, FRANCE Francois.Bremond@sophia.inria.fr http://www-sop.inria.fr/pulsar/ Key words: Artificial intelligence, knowledge-based systems, cognitive vision, human behavior representation, scenario recognition

2 2 ETISEO: French initiative for algorithm validation and knowledge acquisition: http://www-sop.inria.fr/orion/ETISEO/ Approach: 3 critical evaluation concepts Selection of test video sequences Follow a specified characterization of problems Study one problem at a time, several levels of difficulty Collect long sequences for significance Ground truth definition Up to the event level Give clear and precise instructions to the annotator E.g., annotate both visible and occluded part of objects Metric definition Set of metrics for each video processing task Performance indicators: sensitivity and precision Video Understanding: Performance Evaluation (V. Valentin, R. Ma)

3 3 Evaluation : current approach (AT. NGHIEM) ETISEO limitations: Selection of video sequence according to difficulty levels is subjective Generalization of evaluation results is subjective. One video sequence may contain several video processing problems at many difficulty levels Approach: treat each video processing problem separately Define a measure to compute difficulty levels of input data (e.g. video sequences) Select video sequences containing only the current problems at various difficulty levels For each algorithm, determine the highest difficulty level for which this algorithm still has acceptable performance. Approach validation : applied to two problems Detect weakly contrasted objects Detect objects mixed with shadows

4 4 Evaluation : conclusion A new evaluation approach to generalise evaluation results. Implement this approach for 2 problems. Limitations: only detect the upper bound of algorithm capacity. The difference between the upper bound and the real performance may be significant if: The test video sequence contains several video processing problems The same set of parameters is tuned differently to adapt to several concurrent problems Ongoing evaluation campaigns: PETS at ECCV2008 TRECVid (NIST) with ILids video Benchmarking databases: http://homepages.inf.ed.ac.uk/cgi/rbf/CVONLINE/entries.pl?TAG363 http://www.hitech-projects.com/euprojects/cantata/datasets_cantata/dataset.html

5 5 Video Understanding: Program Supervision

6 6 Goal : easy creation of reliable supervised video understanding systems Approach Use of a supervised video understanding platform A reusable software tool composed of three separate components: program library – control – knowledge base Formalize a priori knowledge of video processing programs Explicit the control of video processing programs Issues ? Video processing programs which can be supervised A friendly formalism to represent knowledge of programs A general control engine to implement different control strategies A learning tool to adapt system parameters to the environment Supervised Video Understanding : Proposed Approach

7 7 Control Application Domain Expert Video Processing Expert Application domain knowledge base Scene environment knowledge base Video processing program knowledge base Learning Evaluation Particular System Evaluation Video Processing Program Library Proposed Approach

8 8 Use of an operator formalism [Clément and Thonnat, 93] to represent knowledge of video processing programs Composed of frames and production rules Frames: declarative knowledge Operators: abstract model of a video processing program –primitive: particular program –composite: particular combination of programs Production rules: inferential knowledge Choice and optional criteria Initialization criteria Assessment criteria Adjustment and repair criteria Supervised Video Understanding Platform: Operator Formalism

9 9 Program Supervision: Knowledge and Reasoning Primitive operator Functionality Characteristics Input data Parameters Output data Preconditions Postconditions Effects Calling syntax Rule Bases Parameter initialization rules Parameter adjustment rules Result evalutation rules Repair rules Composite operator Functionality Characteristics Input data Parameters Output data Preconditions Postconditions Effects Decomposition into suboperators (sequential, parallel, alternative) Data flow Rule bases Parameter initialization rules Parameter adjustment rules Choice rules Result evalutation rules Repair rules

10 10 Objective: a learning tool to automatically tune algorithm parameters with experimental data Used for learning the segmentation parameters with respect to the illumination conditions Method Identify a set of parameters of a task 18 segmentation thresholds depending on environment characteristics Image intensity histogram Study the variability of the characteristic Histogram clustering -> 5 clusters Determine optimal parameters for each cluster Optimization of the 18 segmentation thresholds Video Understanding: Learning Parameters (B.Georis)

11 11 Video Understanding: Learning Parameters Camera View

12 12 Learning Parameters Clustering the Image Histograms Number of pixels [%] Pixel intensity [0-255] X Z Y A X-Z slice represents an image histogram ß i opt4 ß i opt1 ß i opt2 ß i opt5 ß i opt3

13 13 CARETAKER: An FP6 IST European initiative to provide an efficient tool for the management of large multimedia collections. Applications to surveillance and safety issues, in urban/environment planning, resource optimization, disabled/elderly person monitoring. Currently being validated on large underground video recordings ( Torino, Roma). Complex Events Raw Data Simple Events Knowledge Discovery Raw data Primitives Event and Meta data Audio/Video acquisition and encoding Multiple Audio/Video sensors Knowledge Discovery Generic Event recognition Video Understanding : Knowledge Discovery (E. Corvee, JL. Patino_Vilchis)

14 14 Event detection examples

15 15 Data Flow Object/Event Detection Information Modelling Object Detection Id Type Info 2D Info 3D Event Detection Id Type (inside_zone, stays_inside_zone) Involved Mobile Object Involved Contextual Object Mobile object table Event table Contextual object table

16 16 Mobile Objects People characterised by: Trajectory Shape Significant Event in which they are involved … Contextual Objects Find interactions between mobile objects and contextual objects Interaction type Time … Table Contents Events Model the normal activities in the metro station Event type Involved objects Time …

17 17 Knowledge Discovery: trajectory clustering Objective: Clustering of trajectories into k groups to match people activities Feature set Entry and exit points of an object Direction, speed, duration, … Clustering techniques Agglomerative Hierarchical Clustering. K-means Self-Organizing (Kohonen) Maps Evaluation of each cluster set based on Ground-Truth

18 18 Feature Vector Key points: x y x entry y entry m1m1 m2m2 mkmk mKmK Trajectory: Clustering Methods Parameter tuning: which features?

19 19 Agglomerative clustering Trajectory: Clustering Parameter tuning: which distance function?

20 20 Results on Torino subway (45min), 2052 trajectories

21 21 SOM K-meansAgglomerative Groups with mixed overlap Trajectory: Analysis

22 22 Trajectory: Semantic characterisation SOM CL14 / Kmeans CL12 Agglomerative CL 21 Consistency of clusters between algorithms Semantic meaning: walking towards vending machines

23 23 Intraclass & Interclass variance SOM algorithm has the lowest intraclass and higher interclass separation, Parameter tuning: which clustering techniques? Trajectory: Analysis

24 24 Video features modeled under three different tables with topological and temporal relations for quantitative and semantic description. Trajectory clustering gives information about frequent enter exit zones, density occupation and behavior characterization. Meaningful trajectory clusters are validated by the consistency through different algorithms Trajectory: Analysis

25 25 Mobile Objects

26 26 Mobile Object Analysis Building statistics on Objects There is an increase of people after 6:45

27 27 Contextual Object Analysis Vending Machine 2 Vending Machine 1 With an increase of people, there is an increase on the use of vending machines

28 28 Contextual Object Analysis Gates 192345678

29 29 Analysis: Use of the Gates Gates 7 to 9 are the most used (right side of gates) Gates 1 to 3 are the less in use (left side of gates)

30 30 Results : Trajectory Clustering

31 31 Semantic knowledge extracted by the off-line long term analysis of on-line interactions between moving objects and contextual objects: 70% of people are coming from north entrance Most people spend 10 sec in the hall 64% of people are going directly to the gates without stopping at the ticket machine At rush hours people are 40% quicker to buy a ticket, … Issues: At which level(s), should be designed clustering techniques: low level (image features)/ middle level (trajectories, shapes)/ high level (primitive events)? to learn what : visual concepts, scenario models? uncertainty (noise/outliers/rare), what are the activities of interest? Parameter tuning (e.g. distance, clustering tech.) and performance evaluation (criteria, ground-truth). Knowledge Discovery: achievements

32 32 Video Understanding : Learning Scenario Models (A. Toshev) or Frequent Composite Event Discovery in Videos event time series

33 33 Why unsupervised model learning in Video Understanding? Complex models containing many events, Large variety of models, Different parameters for different models  The learning of models should be automated. Learning Scenarios: Motivation Video surveillance in a parking lot

34 34 Input: A set of primitive events from the vision module: object-inside-zone(Vehicle, Entrance) [5,16] Output: frequent event patterns. A pattern is a set of events: object-inside-zone(Vehicle, Road) [0, 35] object-inside-zone(Vehicle, Parking_Road) [36, 47] object-inside-zone(Vehicle, Parking_Places) [62, 374] object-inside-zone(Person, Road) [314, 344] Learning Scenarios: Problem Definition Goals: Automatic data-driven modeling of composite events, Reoccurring patterns of primitive events  correspond to frequent activities, Find classes with large size & similar patterns. Zones

35 35 Approach: Iterative method from data mining for efficient frequent patterns discovery in large datasets, A PRIORI: Sub-patterns of frequent patterns are also frequent (Agrawal & Srikant, 1995), At i th step consider only i-patterns which have frequent (i-1) – sub-patterns  the search space is thus pruned. A PRIORI-property for activities represented as classes: size(C m-1 ) ≥ size(C m ) where C m is a class containing patterns of length m, C m-1 is a sub-activity of C m. Learning Scenarios: A PRIORI Method

36 36 Learning Scenarios: A PRIORI Method Merge two i-patterns with (i-1) primitive events in common to form an (i+1)-pattern:

37 37 2 types of Similarity Measure between event patterns : similarities between event attributes similarities between pattern structures Generic Similarity Measure : Generic properties when possible  easy usage in different domains, It should incorporate domain-dependent properties  relevance to the concrete application. Learning Scenarios: Similarity Measure

38 38 Attributes: the corresponding events in two patterns should have similar (same) attributes (duration, names, object types,...). Learning Scenarios: Attribute Similarity Comparison between corresponding events (same type, same color). For numeric attributes: G(x,y)= attr(p i, p j ) = average of all event attribute similarities.

39 39 Test data: Video surveillance at a parking lot, 4 hours records from 2 days in 2 test sets, Every test set contains appr. 100 primitive events. Learning Scenarios: Evaluation Results: In both test sets the following event pattern was recognized: object-inside-zone(Vehicle, Road) object-inside-zone(Vehicle, Parking_Road) object-inside-zone(Vehicle, Parking_Places) object-inside-zone(Person, Parking_Road)

40 40 Test data: Video surveillance at a parking lot, 4 hours records from 2 days in 2 test sets, Every test set contains appr. 100 primitive events. Learning Scenarios: Evaluation Results: In both test sets the following event pattern was recognized: object-inside-zone(Vehicle, Road) object-inside-zone(Vehicle, Parking_Road) object-inside-zone(Vehicle, Parking_Places) object-inside-zone(Person, Parking_Road)

41 41 Test data: Video surveillance at a parking lot, 4 hours records from 2 days in 2 test sets, Every test set contains appr. 100 primitive events. Learning Scenarios: Evaluation Results: In both test sets the following event pattern was recognized: object-inside-zone(Vehicle, Road) object-inside-zone(Vehicle, Parking_Road) object-inside-zone(Vehicle, Parking_Places) object-inside-zone(Person, Parking_Road)

42 42 Test data: Video surveillance at a parking lot, 4 hours records from 2 days in 2 test sets, Every test set contains appr. 100 primitive events. Learning Scenarios: Evaluation Results: In both test sets the following event pattern was recognized: object-inside-zone(Vehicle, Road) object-inside-zone(Vehicle, Parking_Road) object-inside-zone(Vehicle, Parking_Places) object-inside-zone(Person, Parking_Road) Maneuver Parking!

43 43 Conclusion: Application of a data mining approach, Handling of uncertainty without losing computational effectiveness, General framework: only a similarity measure and a primitive event library must be specified. Future Work: Other similarities, Handling of different aspects of uncertainty, Qualification of the learned patterns, Frequent equal interesting ? Different applications: different event libraries or features. Learning Scenarios: Conclusion & Future Work

44 44 GERHOME (CSTB, INRIA, CHU Nice) : Ageing population http://gerhome.cstb.fr/ Approach : Multi-sensor analysis based on sensors embedded in the home environment Detect in real-time any alarming situation Identify a person profile – his/her usual behaviors - from the global trends of life parameters, and then to detect any deviation from this profile HealthCare Monitoring: (N. Zouba)

45 45 Monitoring of Activities of Daily Living for Elderly Goal: Increase independence and quality of life:Goal: Increase independence and quality of life: Enable elderly to live longer in their preferred environment.Enable elderly to live longer in their preferred environment. Reduce costs for public health systems.Reduce costs for public health systems. Relieve family members and caregivers.Relieve family members and caregivers. Approach:Approach: Detecting alarming situations (eg. Falls)Detecting alarming situations (eg. Falls) Detecting changes in behaviorDetecting changes in behavior (missing activities, disorder, interruptions, repetitions, inactivity). repetitions, inactivity). Calculate the degree of frailty of elderly people.Calculate the degree of frailty of elderly people. Example of normal activity: Meal preparation (in kitchen) (11h– 12h) Eating (in dinning room) (12h -12h30) Resting, TV watching, (in living room) (13h– 16h) …

46 46 GERHOME (Gerontology at Home) : homecare laboratoryGERHOME (Gerontology at Home) : homecare laboratory http://www-sop.inria.fr/orion/personnel/Francois.Bremond/topicsText/gerhomeProject.html Experimental site in CSTB (Centre Scientifique et Technique du Bâtiment) at Sophia AntipolisExperimental site in CSTB (Centre Scientifique et Technique du Bâtiment) at Sophia Antipolis http://gerhome.cstb.fr Partners: INRIA, CSTB, CHU-Nice, Philips-NXP, CG06…Partners: INRIA, CSTB, CHU-Nice, Philips-NXP, CG06… Gerhome laboratory

47 47 Gerhome laboratory Position of the sensors in Gerhome laboratory Video cameras installed in the kitchen and in the living-room to detect and track the person in the apartment. Video cameras installed in the kitchen and in the living-room to detect and track the person in the apartment. Contact sensors mounted on many devices to determine the interactions with the person. Contact sensors mounted on many devices to determine the interactions with the person. Presence sensors installed in front of sink and cooking stove to detect the presence of people near sink and stove. Presence sensors installed in front of sink and cooking stove to detect the presence of people near sink and stove.

48 48 Dans la cuisine Sensors installed in Gerhome laboratory Video camera in the living-room Pressure sensor underneath the legs of armchair Contact sensor in the window Contact sensor in the cupboard door

49 49 We have modelled a set of activities by using a event recognition language developedWe have modelled a set of activities by using a event recognition language developed in our team. This is an example for “Meal preparation” event.in our team. This is an example for “Meal preparation” event. Composite Event (Prepare_meal_1, “detected by a video camera combined with a contact sensors” Physical Objects ( (p: Person), (Microwave: Equipment), (Fridge: Equipment), (Kitchen: Zone)) Components ((p_inz: PrimitiveState Inside_zone (p, Kitchen)) “detected by video camera” (open_fg: PrimitiveEvent Open_Fridge (Fridge)) “detected by contact sensor” (close_fg: PrimitiveEvent Close_Fridge (Fridge)) “detected by contact sensor” (open_mw: PrimitiveEvent Open_Microwave (Microwave)) “detected by contact sensor” (close_mw: PrimitiveEvent Close_Microwave (Microwave))) “detected by contact sensor” Constraints ((open_fg during p_inz ) (open_mw before_meet open_fg ) (open_fg Duration>= 10) (open_mw Duration>=5)) Action ( AText (“Person prepares meal”) AType (“NOT URGENT”)) ) Event modelling

50 50 Multi-sensor monitoring: results and evaluation We have validated and visualized the recognized events with a 3D visualization tool. We have validated and visualized the recognized events with a 3D visualization tool. Activity # Videos # Events TPFNFPPrecisionSensitivity In the kitchen 1045405010,888 In the living-room 103540050,8881 Open microwave 815 0011 Open fridge 824 0011 Open cupboard 830 0011 Preparing meal 1 8330011 We have studied and tested a range of activities in the Gerhome laboratory, such as: using microwave, using fridge, preparing meal, … We have studied and tested a range of activities in the Gerhome laboratory, such as: using microwave, using fridge, preparing meal, …

51 51 Recognition of the “Prepare meal” event Visualization of a recognized event in the Gerhome laboratory The person is recognized with the posture "standing with one arm up”, “located in the kitchen” and “using the microwave”. The person is recognized with the posture "standing with one arm up”, “located in the kitchen” and “using the microwave”.

52 52 Recognition of the “Resting in living-room” event Recognition of the “Resting in living-room” event The person is recognized with the posture “sitting in the armchair” and “located in the living- room”. The person is recognized with the posture “sitting in the armchair” and “located in the living- room”. Visualization of a recognized event in the Gerhome laboratory

53 53End-users There are several end-users in homecare:There are several end-users in homecare: Doctors (gerontologists): Doctors (gerontologists): Frailty measurement (depression, …)Frailty measurement (depression, …) Alarm detection (falls, gas, dementia, …).Alarm detection (falls, gas, dementia, …). Caregivers and nursing home: Caregivers and nursing home: Cost reduction: no false alarm and reduction employee involvement.Cost reduction: no false alarm and reduction employee involvement. Employee protection.Employee protection. Persons with special needs, including young children, disabled and elderly people: Persons with special needs, including young children, disabled and elderly people: Feeling safe at home.Feeling safe at home. Autonomy: at night, lighting up the way to bathroom.Autonomy: at night, lighting up the way to bathroom. Improving life: smart mirror, summary of user day, week, month, in terms of walking distance, TV, water consumption.Improving life: smart mirror, summary of user day, week, month, in terms of walking distance, TV, water consumption. Family members and relatives: Family members and relatives: Elderly safety and protection.Elderly safety and protection. Social connectivity.Social connectivity.

54 54 Social problems and solutions ProblemsSolutions Privacy confidentiality and ethics: video (and other data) recording, processing and transmission. No video recording and transmission, only textual alarms. Acceptability for elderly User empowerment. Usability Easy ergonomic interface (no keyboard, large screen), friendly usage of the system. Cost effectiveness The right service for the right price, large variety of solutions. Legal issues, no certification Robustness, benchmarking, on site evaluation Installation, maintenance, training, interoperability with other home devices Adaptability, X-Box integration, wireless, standards (OSGI, …) Research financing ? France (no money, lobbies), Europe (delay), US, Asia.

55 55 Conclusion A global framework for building video understanding systems: Hypotheses: mostly fixed cameras 3D model of the empty scene predefined behavior models Results: Video understanding real-time systems for Individuals, Groups of People, Vehicles, Crowd, or Animals … Knowledge structured within the different abstraction levels (i.e. processing worlds) Formal description of the empty scene Structures for algorithm parameters Structures for object detection rules, tracking rules, fusion rules, … Operational language for event recognition (more than 60 states and events), video event ontology Tools for knowledge management Metrics, tools for performance evaluation, learning Parsers, Formats for data exchange …

56 56 Object and video event detection Finer human shape description: gesture models Video analysis robustness: reliability computation Knowledge Acquisition Design of learning techniques to complement a priori knowledge: visual concept learning scenario model learning System Reusability Use of program supervision techniques: dynamic configuration of programs and parameters Scaling issue: managing large network of heterogeneous sensors (cameras, microphones, optical cells, radars….) Conclusion: perspectives


Download ppt "1 Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning and activity recognition. Francois BREMOND PULSAR project-team, INRIA."

Similar presentations


Ads by Google