Download presentation
Presentation is loading. Please wait.
Published byAdele Simpson Modified over 9 years ago
1
1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings Dr. Alexia Briassouli, Dr. Yiannis Kompatsiaris Multimedia Knowledge Laboratory CERTH-ITI October 24, 2008 Thessaloniki, Greece
2
2 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute 2 Evolution of Content 1-2 exabytes (millions of terabytes) of new information produced world-wide annually 80 billion of digital images are captured each year Over 1 billion images related to commercial transactions are available through the Internet This number is estimated to increase by ten times in the next two years. 4 000 new films are produced each year 300 000 world-wide available films 33 000 television stations and 43 000 radio stations 100 billions of hours of audiovisual content Personal Content Sport - News Movies Web Mobile
3
3 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute 3 Multimedia Content Networks Storage & Devices Segmentation KA Analysis Labeling Cross-media analysis Context Reasoning Metadata Generation & Representation Content adaptation and distribution - Multiple Terminal & Networks Hybrid / Content-based retrieval recommendations and personalization Semantic technology in Markets Web 2.0 photo - video applications
4
4 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute 4 Need for annotation + medatata “The value of information depends on how easily it can be found, retrieved, accessed, filtered or managed in an active, personalized way”
5
5 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Video Analysis Video analysis that exploits knowledge provides significant advantages: Improved accuracy of semantics from video Higher level concepts inferred through exploitation of knowledge combined with video processing: Knowledge about behavior, event detection More efficient storage, access, retrieval, dissemination of multimodal data because of the (automatically generated) annotations
6
6 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Video Analysis in JUMAS
7
7 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute 7 Text-based indexing Manual annotation + Straightforward + High/Semantic level + Efficient during content creation Most commonly used Necessary in a number of applications - Time consuming - Operator-application dependent - Text related problems (synonyms etc) Annotation using captions and related text Web, Video, Documents etc + Straightforward + High/Semantic level + Multimodal approach - Text processing restrictions and limitations - Captions must exist
8
8 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute 8 Semantic Gap Addressing the Semantic Gap Semantic Gap for multimedia: To map automatically generated numerical low level- features to higher level human-understandable semantic concepts 31 31 19 23 29 0 0 0 31 31 19 23 29 0 0 0 Dominant Color Descriptor of a sky region This image contains a sky region and is a holiday image
9
9 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute 9 Problem definition Semantic image analysis: how to translate the automatically extracted visual descriptions into human like conceptual ones Low-level features provide cues for strengthen/weaken evidence based on visual similarity Prior knowledge is needed to support semantics disambiguation
10
10 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute 10 Additional Analysis Information Knowledge Infrastructure (Multimedia Ontology) Manual Annotation - Models Semantic Analysis Single Modality Analysis Knowledge Extraction A common view Feature extraction Text, Image analysis Segmentation, SVMs Evidence generation “Vehicle”, “Building” Classifiers fusion Global vs. Local Modalities fusion Context “Ambulance” Reasoning Fusion of annotations Consistency checking Higher-level concepts/events “Emergency scene” Multimedia content annotation tools Training (Statistical) Modeling Domain Multimedia content Annotations Algorithms - Features Context
11
11 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Knowledge from Video analysis Semantics from video Implicitly derived via machine learning methods i.e. based on training: SVM, HMM, Neural Networks, Bayesian Networks Training uses appropriate data, relevant to the semantics that interest us Training finds models that connect low level features (e.g. motion trajectories) with high-level annotations These models are then applied to test data
12
12 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute 12 Natural-Person: 0.456798 Sailing-Boat: 0.463645 Sand: 0.476777 Building: 0.415358 Pavement: 0.454740 Road: 0.503242 Body-Of-Water: 0.489957 Cliff: 0.472907 Cloud: 0.757926 Mountain: 0.512597 Sea: 0.455338 Sky: 0.658825 Stone: 0.471733 Waterfall: 0.500000 Wave: 0.476669 Dried-Plant: 0.494825 Dried-Plant-Snowed: 0.476524 Foliage: 0.497562 Grass: 0.491781 Tree: 0.447355 Trunk: 0.493255 Snow: 0.467218 Sunset: 0.503164 Car: 0.456347 Ground: 0.454769 Lamp-Post: 0.499387 Statue: 0.501076 Classification Results aceMedia Segment’s hypothesis set
13
13 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute 13 Frame Region – Concept Association Region feature vector formed from local descriptors Region feature vector formed from local descriptors Individual SVM introduced for every defined local concept, receiving as input the region feature vector Individual SVM introduced for every defined local concept, receiving as input the region feature vector Training identical to global concept training case Training identical to global concept training case Every region evaluated by all trained SVMs, segment’s local concept hypothesis set created ( ) Every region evaluated by all trained SVMs, segment’s local concept hypothesis set created ( ) Ground: 0.89 Grass: 0.44 Mountain: 0.21 Boat: 0.07 Smoke: 0.41 Dirty-Water: 0.18 Trunk: 0.12 Foam: 0.19 Debris: 0.34 Mud: 0.31 Water: 0.42 Sky: 0.22 Ashes: 0.11 Subtitles: 0.24 Flames: 0.13 Vehicle: 0.12 Building: 0. 25 Foliage: 0.84 Person: 0.32 Road: 0.39 Segment’s hypothesis set
14
14 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute 14 Initial Region-Concept Association Region feature vector formed from local descriptors Individual SVM introduced for every defined concept, receiving as input the region feature vector Training identical to global training case Every region evaluated by all trained SVMs, segment’s concept hypothesis set created ( ) Building: 0.89 Roof: 0.29 Grass: 0.21 Tree: 0.07 Stone: 0.41 Ground: 0.15 Dried-plant: 0.12 Sky: 0.19 Person: 0.34 Trunk: 0.31 Vegetation: 0.42 Rock: 0.22 Boat: 0.11 Sand: 0.44 Sea: 0.13 Wave: 0.12 Segment’s hypothesis set
15
15 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Knowledge for Video analysis Explicit Semantics from video Based on previously known models Explicitly defined models, rules, facts Rules from preliminary scripts and standards from similar cases Explicit and implicit knowledge can be combined with results from low-level video processing to extract meaningful high-level knowledge
16
16 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute System Overview Multimedia Content Video Analysis (face recognition, motion segmentation etc) Knowledge Infrastructure (explicit or implicit) Semantic Multimedia Description
17
17 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Video analysis Motion Analysis Motion detection Tracking Detection of when motion occurs Motion Segmentation Object segmentation based on motion characteristics Generation of ‘active regions’
18
18 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Activity Areas from motion analysis
19
19 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Sub-activity Areas After statistical processing for temporal localization of motion and events People walking towards each other People meet People leave together
20
20 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute 20 Fight Sequence
21
21 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Video Processing (1) Pre-processing Separate video from audio Split video into frames Noise removal via spatiotemporal filtering Scene/shot detection Shot = frames taken by single camera Detect transition between frames Uses only low-level information Scene = story-telling unit Uses higher-level knowledge, semantics
22
22 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Video Processing (2) Spatial segmentation: Spatial segmentation in images, video frames Extracts object(s) based on color, texture features Motion segmentation: Groups pixels with similar motion Spatiotemporal segmentation: Finds objects over several frames through combination of motion, appearance features Merges spatial and motion segmentation results
23
23 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Knowledge in Video Analysis (1) Low level features can be combined with knowledge/rules for higher-level results Spatiotemporally segmented objects can be used for object recognition Face/gesture recognition after training with faces/gestures of significance Motion in specific parts of a video (e.g. near court entrance, near prisoner’s seat) has additional significance: Needs prior knowledge of which parts of the video frames are important and why
24
24 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Knowledge in Video Analysis (2) Knowledge structures can provide additional information about the relations between different low-level features Interactions e.g. two motions in opposite directions, relation of extracted gestures, may mean something: people meeting, fighting, pointing, gesticulating Face recognition combined with prior knowledge can show who is present when an event occurs
25
25 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Conclusions Combined use of video processing with knowledge can lead to richer and more accurate high-level descriptions of multimedia data Can be used in many more applications than currently, because the knowledge introduces flexibility and adaptability to the system: The same algorithms and low-level features can provide much more information when used in combination with explicit and implicit knowledge
26
26 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute 26 Thank you! CERTH-ITI / Multimedia Knowledge Laboratory http://mklab.iti.gr
27
27 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Video Analysis State of the Art Spatiotemporal segmentation: Find spatiotemporally homogeneous objects i.e. similar appearance and motion Apply spatial segmentation on each frame Match segmented objects in successive frames using low-level features (e.g. similar color, texture, continuous motion) Use motion information – project position of object in current/next frames
28
28 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Video Analysis State of the Art
29
29 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Video Analysis State of the Art
30
30 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Video Analysis State of the Art Spatial segmentation: Spatial segmentation in images, video frames: Region Based: Most methods are based on grouping similar features like color, texture, location – based on homogeneity of intensity, texture, position Gradient/edge based: detecting changes in spatial distribution of features e.g. pixel illumination Some methods combine region/edge information
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.