Ontology-based Automatic Video Annotation Technique in Smart TV Environment Jin-Woo Jeong, Hyun-Ki Hong, and Dong-Ho Lee IEEE Transactions on Consumer Electronics, Vol. 57, No. 4, November 2011 Presented by: You Tithrottanak
Contents 1.Introduction 2.Overview of the proposed approach 3.Semantic web technologies for video annotation 4.High-level concept extraction 5.Experiments and analysis
Introduction The survey of Korean domestic market for smart TV rate of smart TV in Korea will be 8.9 million households which amounts to 52.6% of all households in Korea by 2022 The reason for this positive prospect comes from the following attractive features of smart TV: ▫Open contents ▫Entertainment & Communication ▫N-screen service ▫Smart multi-tasking ▫Smart advertisement ▫Smart home server
Overview of the proposed approach
Semantic web technologies for video annotation VideoAnnotation Ontology
Semantic web technologies for video annotation There are four important classes: ▫ShotAnno class Describe about information of Objects that exist in the representative frame (key-frame) of each video shot ▫GroupAnno class Describe about information of video groups, dominant concepts in the group and group meta-data ▫SceneAnno class Describe about information of Concepts of each video scene (animal-tracking, interview with tamer) ▫Meta class It presents the meta data of a video sequence (title, createdBy, modifiedBy, length, type, year )
Semantic web technologies for video annotation Domain Ontology ▫For providing a sharable and reusable vocabulary ▫LSCOM (Large Scale Concept Ontology for Multimedia) ontology provides a set of standardized concepts for video annotation Full LSCOM : 3,000 high-level concepts They used Light-weight version of the LSCOM : 400 high- level concepts This ontology are used for both a group-level and scene- level annotation ▫Object ontology knowledge-base for a specific domain This type of ontology can be made manually or derived from an existing ontology such as WordNet
High-level concept extraction
Key Frame Detection and Object Segmentation ▫The detection of a video shot and its corresponding key frame are performed based on the visual similarity ▫They use color structure descriptor (CSD) for representing color feature of each video frame ▫To calculate the visual similarity between video frames, They use Euclidean distance measure
High-level concept extraction Semi-concept Mapping ▫the low-level visual features of an object are extracted by MPEG-7 visual descriptors ▫Then mapped to their corresponding semi-concept values 1.For the color feature, they exploit MPEG-7 color structure descriptor (CSD)
High-level concept extraction ▫A set of semi-concept values for color feature is defined as {Red-Orange, Yellow-YellowGreen, …, Gray, White} 2.The texture feature of an object is extracted by MPEG-7 Edge Histogram Descriptor (EHD)
High-level concept extraction ▫5 types: Vertical Horizontal 45-degree diagonal 135-degree diagonal Non-directional edges
High-level concept extraction ▫A shape feature of a region in the image are provide by MPEG-7 (region-based shape descriptor and contour shape descriptor (contour- SD)). ▫Region-based shape descriptor is able to describe the complex objects (company logo and trademark) ▫contour-SD efficiently describes the objects with a single contour(animal objects such as tiger, horse)
High-level concept extraction Semantic Inference Rules for High-level Concept Extraction ▫The inference procedure is performed as follows 1.Assume that the extracted semi-concept values are: Color = {“Red_Orange”, “Yellow_Green”, “Black”} Texture = {“Non-directional”, “Horizontal”} 2.Apply rule Cheetah to determine the high-level concept of the object in the video shot
High-level concept extraction
SVM Learning for High-level Concept Extraction ▫SVM classifier ▫The feature vector of the training set for a particular concept C i is represented as T i = {L i, DC 0-3, DT, DS } L i : a label DC 0-3 : four color semi-concepts, DT : a texture semi-concept DS : a shape semi-concept
High-level concept extraction
Experiments and analysis They focus on the accuracy of annotation for the video shots The evaluation is performed on the parts of National Geographic videos containing various concepts such as animals, landscape scene. 3 kinds of evaluations for the proposed approaches to investigate: ▫Effectiveness of Semantic Rule-based Approach ▫Effectiveness of Semi-concept based SVM Classifier ▫Relative Effectiveness of the Proposed Approaches
Experiments and analysis Effectiveness of Semantic Rule-based Approach
Experiments and analysis Effectiveness of Semi-concept based SVM Classifier
Experiments and analysis Relative Effectiveness of the Proposed Approaches
Thanks