2004 謝俊瑋 NTU, CSIE, CMLab 1 A Rule-Based Video Annotation System Andres Dorado, Janko Calic, and Ebroul Izquierdo, Senior Member, IEEE
謝俊瑋 NTU, CSIE, CMLab Outline Introduction System overview Learning unit Automatic video annotation Result Conclusion
謝俊瑋 NTU, CSIE, CMLab Introduction Semantic gap: information extracted from visual data and interpretation of the same data for a user in a given situation. Rules needed to infer a set of high-level concepts from low-level descriptors cannot be defined a priori. Knowledge embedded in the database and interaction with an expert user is exploited to enable system learning.
謝俊瑋 NTU, CSIE, CMLab System overview System component Learning unit Low-level feature extraction Knowledge representation Rule mining Annotation unit Fuzzification Fuzzy inference Defuzzification
謝俊瑋 NTU, CSIE, CMLab Learning unit
謝俊瑋 NTU, CSIE, CMLab Low-level feature extraction Temporal features of video and its structure are considered as the foremost expressive elements One-dimensional frame-to-frame difference metrics[18][19] Shot change: local maxima of simplified curve Key frame: negative peaks in the second derivative of the simplified metric
謝俊瑋 NTU, CSIE, CMLab Knowledge representation. Each word corresponds to a fuzzy set, which is the “interpretation” of w. Membership function:
謝俊瑋 NTU, CSIE, CMLab Rule mining(1/3) Log of transactions T T = [t 0, t 1, t 2, …, t n ] t k = W f ∪ W c W f are names of fuzzy sets mapped by M W c is a set of words denoting concepts Association rule mining
謝俊瑋 NTU, CSIE, CMLab Rule mining(2/3) Let I = {i 1, i 2, …, i n } be a set of literals called items. D be a set of transactions where each transaction T is a set of items such that T I The rule holds in the transaction set D with confidence c if c% of transactions in D containing X also contains Y. The rule holds in the transaction set D with support s if s% of transactions in D contains.
謝俊瑋 NTU, CSIE, CMLab Rule mining(3/3)
謝俊瑋 NTU, CSIE, CMLab Automatic video annotation Fuzzification Map the input variable to the degree of membership of fuzzy sets Same process as the knowledge representation unit
謝俊瑋 NTU, CSIE, CMLab Fuzzy inference Use rules in the form: IF THEN Condition expresses the instances of low- level features Actions denotes annotations Calculate fuzzy output values for the corresponding variable
謝俊瑋 NTU, CSIE, CMLab Defuzzification Combine the fuzzy values of each output variable to obtain a real number for each variable Using weighted average method
謝俊瑋 NTU, CSIE, CMLab Result(1/4) Over 100 randomly chosen MPEG2 video with duration of 2-6 minute
謝俊瑋 NTU, CSIE, CMLab Result(2/4)
謝俊瑋 NTU, CSIE, CMLab Result(3/4)
謝俊瑋 NTU, CSIE, CMLab Result(4/4)
謝俊瑋 NTU, CSIE, CMLab Conclusion Extensions: Easily extended to support audio Scenes or groups of videos Evaluation on few focused features Disadvantages: Too many keywords can cause conflict Future works: Utilization of user relevance feedback
謝俊瑋 NTU, CSIE, CMLab Reference [18] L.J. Latecki and R. Lakimper, “Convexity rule for shape decomposition based on discrete contour evolution,” in Computer Vision and Image Understanding. New York: Academic, 1999 [19] J. Calic and E. lzquierdo, “A multiresolution technique for video indexing and retrieval,” in Proc. IEEE ICIP2002 [20] T.J. Ross, Fuzzy Logic With Engineering Applications. New York: McGraw-Hill, 1995.