Download presentation
Presentation is loading. Please wait.
Published byBenny Agusalim Modified over 6 years ago
1
A mechanism for Large Image/Videos’ Automatic Annotation Considering Semantic Tolerance Relation
Ying Dai Faculty of software and information science, Iwate Pref. University 2019/2/23
2
Outline Background Mechanism of large Image/video semantic representation Method of automatic representation of image/video semantics Experiments and analysis 2019/2/23
3
Background More and more people express themselves by sharing images and videos on line It is still hard to manipulate, index, or search on-line images and videos Machine retrieving feature-similar contents User retrieving semantic-similar contents, following color, structural similarity. The objectives to our retrieving systems focus on representing image/videos’ semantics in a form intuitive to humans Indexing and retrieving large image/videos by distributed network With the technological advances in digital imaging, networking, and data storage, more and more people communicate with one other and express themselves by sharing images, video and other forms of media on line. However, it is still hard to manipulate, index, filter, summarize, or search through them, because of the semantic gap between user and machine, which means that there are many queries for which visual similarity does not correlate strongly with human similarity judgments, because machine retrieves feature-similar contents, but human retrieves semantic-similar contents, following color, structural similarity . Therefore, developing systems capable of understanding images and able to represent their content in a form intuitive to humans becomes one of most motivate things in the filed of large image/video retrieval. 2019/2/23
4
Diagram of system construction
2019/2/23
5
Functions (1) For the nodes of image/video provider
Segmenting video into shots, and extracting a key-frame of the shot. Representing image/videos’ semantics by associative values with defined domain-categories; Uploading data to the node of image/video manger Generated associative values with defined domain-categories Image/key-frames’ thumbnail index files Providing requested image/videos to the nodes of users. 2019/2/23
6
Functions (2) For the node of image/video contents manger
Integrating data from nodes of providers in a representation DB; Responding the query from the node of user, and providing links of retrieved image/videos and corresponding thumbnails the user; Conducting the user to the node of provider providing requested image/videos. 2019/2/23
7
Functions (3) For the nodes of users
Querying available indexes by domain-categories to the manger; Sending similar criteria adjustable by interface; Presenting retrieved image/videos’ thumbnails; Downloading corresponed image/videos from providers. 2019/2/23
8
Data structure for indexing image/video by domain-categories
type … Path1 Path2 Path3 1 image 0.2 0.6 0.7 Path11 2 video 0.3 0.8 Path12 Path22 Path23 ….. k: domain’s number i: category’s number path1: image/key-frames’ location path2: shots’ location Path3: videos’ location ID: image/key-frames’ number : associative value with category i in domain k 2019/2/23
9
Semantic tolerance relation of image/key-frame (1)
Image/key-frames are described by different domains, such as temporal, spacial, impression, nature vs. man-made, human vs. non-human, copyright, etc. Divided classes can be inter-tolerated or intra-tolerated tolerance degree is measured by two indices: semantic similarity (SS); visual similarity (VS) because the concepts of image/video in many domains are imprecise, and the interpretation of finding similar image/video is ambiguous and subjective on the level of human perception, we define the semantic categories of image and key-frame, together with the tolerance degrees between them. Images are described by different domains. For a certain domain, concepts are divided into some classes. The class may be associated with the other in a same domain. The relation between them is called as intra-association. Also, the class may be associated with the other in the different domains. The relation between them is called as inter-association. 2019/2/23
10
Semantic tolerance relation of image/key-frame (2)
Based on the fact that people judge image similarity by semantic similarity (SS), following visual similarity (VS). Semantic tolerance relation index (STRI) two facts of SS and VS, denoted by : STRI of class i regarding dimension k to j regarding l : SS index, generated by the knowledge of subjects : VS index, calculated based on the learned samples’ visual features All of them in the range of [0,1] 2019/2/23
11
Calculation of SS index
Class co-occurrence matrix Regarding the number of images which are both assigned to class i and class j. Semantic similarity degree between two classes Semantic similarity tolerance index 2019/2/23
12
Calculation of VS index
Bidirectional associative memories (BAM) (by B.Kosko) Associating two patterns (X,Y) such that when one is encountered, the other can be recalled. Storing the associated pattern pairs by connection weight matrix 1 W11 Wmn Y X1 Xm X 2019/2/23
13
Calculation of VS index
Stored image patterns in X layer 40 image patterns being stored for nature vs. man-made domain Some pattern Examples Food (P14) Building (P6) Restaurant (P8) landscape (P2) flower ((P3) 2019/2/23
14
Calculation of VS index
Stored patterns of Y layer Encoding units recall value of the inputted pattern image I VS index of class i to class j regarding dimension k : the recall value of inputted pattern image i to class i Stored patterns of Y layer are the encoding units of classes with I components, if I classes are pre-defined. Yi=[0…010…0] is the encoding unit of class i 2019/2/23
15
Classed-based image/key-frame representation
Units’ input and output in Y layer for an input image Input : weighted sum of each pixel values of the inputted image Output: recall values of the inputted image : recall value of input image n to class i 2019/2/23
16
Classed-based image/key-frame representation
Each image/key-frame is represented by a vector of associative values with pre-defined classes regarding domain k Generating associative values with defined classes Using BAM to obtain units’ outputs of Y layer for an input image n Determining the mostly belonged class of image n Finding Assigning image n to class m where Ws, Wv: weights adjusting the weight of SS and VS in generating the associative values 2019/2/23
17
Classed-based image/key-frame representation
Generating associative values Calculating visual similarity degree of the image n to the class i Generating associative value Effected by SS and VS : weights 2019/2/23
18
Image/key-frame retrieval
For the image categorization regarding single domain For the image categorization regarding cross-domains For the image categorization regarding single domain , images are grouped to a class i , when the associative values of these images with class i are larger than a clustering threshold , For the image categorization regarding cross-domains, the grouped images are the intersection of those which either belong to the class i in domain k, or belong to the class j in domain l. 2019/2/23
19
Examples of image retrieval
Some images of furniture with human 2019/2/23
20
Performance evaluation (1)
For the precision-recall of classes furniture with human 2019/2/23
21
Performance evaluation (2)
Influence on varying the number of defined classes For the class of building 2019/2/23
22
Conclusion & Future work
Mechanism for automatically annotating large image/videos on distributed network is available. Method of representing images’ semantics by associative values of images with pre-defined classes is effective for image retrieval Bidirectional associative memories is efficient in generating associative values Evaluating the influence of changing the number of defined classes on precision-recall shows the possibility of increasing the number of defined classes 2019/2/23
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.