Image/Video’s Automatic Annotation Considering Semantics’ Tolerance Relation Ying Dai Faculty of software and information science, Iwate Pref. University 2019/5/2
Outline Background class-based image/key-frame representation Semantic tolerance relation Semantic similarity Visual similarity Associative values with pre-defined classes Calculation of SS, VS and associative values Bidirectional associative memories Image/key-frame retrieval Experiments and analysis 2019/5/2
Background More and more people express themselves by sharing images and videos on line It is still hard to manipulate, index, or search on-line images and videos The objectives to our research focus on understanding image/videos’ semantics by machine representing image/videos’ semantics in a form intuitive to humans With the technological advances in digital imaging, networking, and data storage, more and more people communicate with one other and express themselves by sharing images, video and other forms of media on line. However, it is still hard to manipulate, index, filter, summarize, or search through them, because of the semantic gap between user and machine, which means that there are many queries for which visual similarity does not correlate strongly with human similarity judgments, because machine retrieves feature-similar contents, but human retrieves semantic-similar contents, following color, structural similarity . Therefore, developing systems capable of understanding images and able to represent their content in a form intuitive to humans becomes one of most motivate things in the filed of large image/video retrieval. 2019/5/2
Semantic tolerance relation of image/key-frame (1) Image/key-frames are described by different domains, such as temporal, spacial, impression, nature vs. man-made, human vs. non-human, copyright, etc. Divided classes can be inter-tolerated or intra-tolerated tolerance degree is measured by two indices: semantic similarity (SS); visual similarity (VS) because the concepts of image/video in many domains are imprecise, and the interpretation of finding similar image/video is ambiguous and subjective on the level of human perception, we define the semantic categories of image and key-frame, together with the tolerance degrees between them. Images are described by different domains. For a certain domain, concepts are divided into some classes. The class may be associated with the other in a same domain. The relation between them is called as intra-association. Also, the class may be associated with the other in the different domains. The relation between them is called as inter-association. 2019/5/2
Semantic tolerance relation of image/key-frame (2) Based on the fact that people judge image similarity by semantic similarity (SS), following visual similarity (VS). Semantic tolerance relation index (STRI) two facts of SS and VS, depicted by : STRI of class i regarding dimension k to j regarding l : SS index, generated by the knowledge of subjects : VS index, calculated based on the learned samples’ visual features All of them in the range of [0,1] 2019/5/2
Calculation of SS index Class co-occurrence matrix Regarding the number of images which are both assigned to class i and class j. Semantic similarity degree between two classes Semantic similarity tolerance index 2019/5/2
Calculation of VS index Bidirectional associative memories (BAM) (by B.Kosko) Associating tow patterns (X,Y) such that when one is encountered, the other can be recalled. Storing the associated pattern pairs by connection weight matrix 1 W11 Wmn Y X1 Xm X 2019/5/2
Calculation of VS index Bidirectional associative memories Units’ input and output in the X layer and Y layer The energy of the BAM decreases or remains the same after each unit update. BAM eventually converge to a local minimum that corresponds to a stored associated pattern pair. 2019/5/2
Calculation of VS index Stored pattern images of X layer 40 pattern images being stored for nature vs. man-made domain Some pattern Examples Food (P14) Building (P6) Restaurant (P8) landscape (P2) flower ((P3) 2019/5/2
Calculation of VS index Stored patterns of Y layer Encoding units Units’ output of Y layer for pattern image i recall values of the inputted pattern image i Stored patterns of Y layer are the encoding units of classes with I components, if I classes are pre-defined. Yi=[0…010…0] is the encoding unit of class i : the recall value of inputted pattern image i to class i 2019/5/2
Calculation of VS index Units’ input and output in Y layer for an input image Input : weighted sum of each pixel values of the inputted image Output: recall values of the inputted image VS index of class i to class j regarding dimension k : recall value of input image n to class i 2019/5/2
Some examples of VS value c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 1 0.25 0.74 0.16 0.33 0.53 0.37 0.22 0.5 0.08 0.31 2 0.21 0.28 0.24 0.02 0.2 0.14 3 0.36 0.38 0.05 0.82 0.59 0.19 0.35 0.61 0.23 4 0.26 0.3 0.94 0.48 0.15 0.09 5 0.17 0.43 0.92 0.32 0.69 6 0.34 0.77 0.66 0.8 0.11 0.65 0.01 0.52 0.06 7 0.76 0.27 0.46 0.12 0.58 0.41 0.56 8 0.1 0.57 0.39 9 0.51 0.67 0.07 10 0.03 0.84 0.18 0.13 0.68 11 0.63 0.79 12 13 0.62 0.44 0.47 14 0.81 0.87 15 0.73 0.04 …… 2019/5/2
Classed-based image/key-frame representation Each image/key-frame is represented by a vector of associative values with category i regarding domain k Generating associative values with defined classes Using BAM to generate units’ outputs of Y layer for an input image n Determining the mostly belonged class of image n Finding Assigning image n to class m where Ws, Wv: weights adjusting the weight of SS and VS in generating the associative values 2019/5/2
Classed-based image/key-frame representation Generating associative values Calculating VS degree of the image n to the class I Generating associative value Effected by SS degree and VS degree and their weights : weights 2019/5/2
Data structure for representing image/video type … Path1 Path2 Path3 1 image 0.2 0.6 0.7 Path11 2 video 0.3 0.8 Path12 Path22 Path23 k: domain’s number i: category’s number path1: image/key-frames’ location path2: shots’ location Path3: videos’ location ID: image/key-frames’ number : associative value with category i regarding domain k 2019/5/2
Image/key-frame retrieval For the image categorization regarding single domain For the image categorization regarding cross-domains For the image categorization regarding single domain , images are grouped to a class i , when the associative values of these images with class i are larger than a clustering threshold , For the image categorization regarding cross-domains, the grouped images are the intersection of those which either belong to the class i in domain k, or belong to the class j in domain l. 2019/5/2
Examples of image retrieval Some images of interior with human 2019/5/2
Performance evaluation (1) For the precision-recall of class building and interior when 30 classes were pre-defined “Using STRM” means that the SS index and VS index are considered in generating associative values of classes. “not using STRM” means that units’ output values of BAM are merely used in generating associative values of classes. interior building 2019/5/2
Performance evaluation (2) For the precision-recall of classes interior with human 2019/5/2
Performance evaluation (3) Influence when varying the number of defined classes For the precision-recall of class building 2019/5/2
Conclusion & Future work Method of representing images’ semantics by associative values of images with pre-defined classes is effective for image retrieval Bidirectional associative memories is efficient in generating associative values Performance of precision-recall is improved by considering two facts of SS and VS in generating associative values Evaluating the influence of increasing defined classes on precision-recall 2019/5/2