Download presentation
Presentation is loading. Please wait.
Published byLester Barrett Modified over 9 years ago
2
Image Retrieval John Tait University of Sunderland, UK
3
2 Outline of Afternoon –Introduction Why image retrieval is hard How images are represented Current approaches –Indexing and Retrieving Images Navigational approaches Relevance Feedback Automatic Keywording –Advanced Topics, Futures and Conclusion Video and music retrieval Towards practical systems Conclusions and Feedback
4
3 Scope General Digital Still Photographic Image Retrieval –Generally colour Some different issues arise –Narrower domains E.g.Medical images especially where part of body and/or specific disorder is suspected –Video –Image Understanding - object recognition
5
4 Thanks to Chih-Fong Tsai Sharon McDonald Ken McGarry Simon Farrand And members of the University of Sunderland Information Retrieval Group
6
Introduction
7
6 Why is Image Retrieval Hard ? ? What is the topic of this image ? What are right keywords to index this image ? What words would you use to retrieve this image ? The Semantic Gap
8
7 Problems with Image Retrieval A picture is worth a thousand words A picture is worth a thousand words The meaning of an image is highly individual and subjective The meaning of an image is highly individual and subjective
9
8 How similar are these two images
10
How Images are represented
11
10
12
11
13
12 Compression In practice images are stored as compressed raster –Jpeg –Mpeg Cf Vector … Not Relevant to retrieval
14
13 Image Processing for Retrieval Representing the Images –Segmentation –Low Level Features Colour Texture Shape
15
14 Image Features Information about colour or texture or shape which are extracted from an image are known as image features –Also a low-level features Red, sandy –As opposed to high level features or concepts Beaches, mountains, happy, serene, George Bush
16
15 Image Segmentation Do we consider the whole image or just part ? –Whole image - global features –Parts of image - local features
17
16 Global features Averages across whole image Tends to loose distinction between foreground and background Poorly reflects human understanding of images Computationally simple A number of successful systems have been built using global image features including Sunderland’s CHROMA
18
17 Local Features Segment images into parts Two sorts: –Tile Based –Region based
19
18 Regioning and Tiling Schemes Tiles Regions
20
19 Tiling Break image down into simple geometric shapes Similar Problems to Global Plus dangers of breaking up significant objects Computational Simple Some Schemes seem to work well in practice
21
20 Regioning Break Image down into visually coherent areas Can identify meaningful areas and objects Computationally intensive Unreliable
22
21 Colour Produce a colour signature for region/whole image Typically done using colour correllograms or colour histograms
23
22 Colour Histograms Identify a number of buckets in which to sort the available colours (e.g. red green and blue, or up to ten or so colours) Allocate each pixel in an image to a bucket and count the number of pixels in each bucket. Use the figure produced (bucket id plus count, normalised for image size and resolution) as the index key (signature) for each image.
24
23 Global Colour Histogram
25
24 Other Colour Issues Many Colour Models –RGB (red green blue) –HSV (Hue Saturation Value) –Lab, etc. etc. Problem is getting something like human vision –Individual differences
26
25 Texture Produce a mathematical characterisation of a repeating pattern in the image –Smooth –Sandy –Grainy –Stripey
27
26
28
27
29
28 Texture Reduces an area/region to a (small - 15 ?) set of numbers which can be used a signature for that region. Proven to work weel in practice Hard for people to understand
30
29 Shape Straying into the realms of object recognition Difficult and Less Commonly used
31
30 Ducks again All objects have closed boundaries Shape interacts in a rather vicious way with segmentation Find the duck shapes
32
31
33
32 Summary of Image Representation Pixels and Raster Image Segmentation –Tiles –Regions Low-level Image Features –Colour –Texture –Shape
34
Indexing and Retrieving Images
35
34 Overview of Section 2 Quick Reprise on IR Quick Reprise on IR Navigational Approaches Navigational Approaches Relevance Feedback Relevance Feedback Automatic Keyword Annotation Automatic Keyword Annotation
36
35 Reprise on Key Interactive IR ideas Index Time vs Query Time Processing Index Time vs Query Time Processing Query Time Query Time Must be fast enough to be interactive Must be fast enough to be interactive Index (Crawl) Time Index (Crawl) Time Can be slow(ish) Can be slow(ish) There to support retrieval There to support retrieval
37
36 An Index A data structure which stores data in a suitably abstracted and compressed form in order to faciliate rapid processing by an application A data structure which stores data in a suitably abstracted and compressed form in order to faciliate rapid processing by an application
38
37 Indexing Process
39
Navigational Approaches to Image Retrieval
40
39 Essential Idea Layout images in a virtual space in an arrangement which will make some sense to the user Layout images in a virtual space in an arrangement which will make some sense to the user Project this onto the screen in a comprehensible form Project this onto the screen in a comprehensible form Allow them to navigate around this projected space (scrolling, zooming in and out) Allow them to navigate around this projected space (scrolling, zooming in and out)
41
40 Notes Typically colour is used Typically colour is used Texture has proved difficult for people to understand Texture has proved difficult for people to understand Shape possibly the same, and also user interface - most people can’t draw ! Shape possibly the same, and also user interface - most people can’t draw ! Alternatives include time (Canon’s Time Tunnel) and recently location (GPS Cameras) Alternatives include time (Canon’s Time Tunnel) and recently location (GPS Cameras) Need some means of knowing where you are Need some means of knowing where you are
42
41 Observation It appears people can take in and will inspect many more images than texts when searcing It appears people can take in and will inspect many more images than texts when searcing
43
42 CHROMA Development in Sunderland: Development in Sunderland: mainly by Ting Sheng Lai now of National Palace Museum, Taipei, Taiwan mainly by Ting Sheng Lai now of National Palace Museum, Taipei, Taiwan Structure Navigation System Structure Navigation System Thumbnail Viewer Thumbnail Viewer Similarity Searching Similarity Searching Sketch Tool Sketch Tool
44
43 The CHROMA System General Photographic Images General Photographic Images Global Colour is the Primary Indexing Key Global Colour is the Primary Indexing Key Images organised in a hierarchical classification using 10 colour descriptors and colour histograms Images organised in a hierarchical classification using 10 colour descriptors and colour histograms
45
44 Access System
46
45 The Navigation Tool
47
46 Technical Issues Fairly Easy to arrange image signatures so they support rapid browsing in this space Fairly Easy to arrange image signatures so they support rapid browsing in this space
48
Relevance Feedback More Like this
49
48 Relevance Feedback Well established technique in text retrieval Well established technique in text retrieval Experimental results have always shown it to work well in practice Experimental results have always shown it to work well in practice Unfortunately experience with search engines has show it is difficult to get real searchers to adopt it - too much interaction Unfortunately experience with search engines has show it is difficult to get real searchers to adopt it - too much interaction
50
49 Essential Idea User performs an initial query User performs an initial query Selects some relevant results Selects some relevant results System then extracts terms from these to augment the initial query System then extracts terms from these to augment the initial query Requeries Requeries
51
50 Many Variants Pseudo Pseudo Just assume high ranked documents are relevant Just assume high ranked documents are relevant Ask users about terms to use Ask users about terms to use Include negative evidence Include negative evidence Etc. etc. Etc. etc.
52
51 Query-by-Image-Example
53
52 Why useful in Image Retrieval? 1. Provides a bridge between the users understanding of images and the low level features (colour, texture etc.) with which the systems is actually operating 2. Is relatively easy to interface to
54
53 Image Retrieval Process Ducks Green Water Texture Leaf Texture
55
54 Observations Most image searchers prefer to use key words to formulate initial queries Most image searchers prefer to use key words to formulate initial queries Eakins et al, Enser et al Eakins et al, Enser et al First generation systems all operated using low level features only First generation systems all operated using low level features only Colour, texture, shape etc. Colour, texture, shape etc. Smeulders et al Smeulders et al
56
55 Ideal Image Retrieval Process Thumbnail Browsing NeedKeyword Query More Like this
57
56 Image Retrieval as Text Retrieval What we really want to do is make the image retrieval problem text retrieval
58
57 Three Ways to go Manually Assign Keywords to each image Manually Assign Keywords to each image Use text associated with the images (captions, web pages) Use text associated with the images (captions, web pages) Analyse the image content to automatically assign keywords Analyse the image content to automatically assign keywords
59
58 Manual Keywording Expensive Expensive Can only really be justified for high value collections – advertising Can only really be justified for high value collections – advertising Unreliable Unreliable Do the indexers and searchers see the images in the same way Do the indexers and searchers see the images in the same way Feasible Feasible
60
59 Associated Text Cheap Cheap Powerful Powerful Famous names/incidents Famous names/incidents Tends to be “one dimensional” Tends to be “one dimensional” Does not reflect the content rich nature of images Does not reflect the content rich nature of images Currently Operational - Google Currently Operational - Google
61
60 Possible Sources of Associated text Filenames Filenames Anchor Text Anchor Text Web Page Text around the anchor/where the image is embedded Web Page Text around the anchor/where the image is embedded
62
61 Automatic Keyword Assignment A form of Content Based Image Retrieval Cheap (ish) Cheap (ish) Predictable (if not always “right”) Predictable (if not always “right”) No operational System Demonstrated No operational System Demonstrated Although considerable progress has been made recently Although considerable progress has been made recently
63
62 Basic Approach Learn a mapping from the low level image features to the words or concepts Learn a mapping from the low level image features to the words or concepts
64
63 Two Routes 1. Translate the image into piece of text n Forsyth and other s n Manmatha and others 2. Find that category of images to which a keyword applies n Tsai and Tait n (SIGIR 2005)
65
64 Second Session Summary Separating Index Time and Retrieval Time Operations Separating Index Time and Retrieval Time Operations “First generation CBIR” “First generation CBIR” Navigation (by colour etc.) Navigation (by colour etc.) Relevance Feedback Relevance Feedback Keyword based Retrieval Keyword based Retrieval Manual Indexing Manual Indexing Associated Text Associated Text Automatic Keywording Automatic Keywording
66
Advanced Topics, Futures and Conclusions
67
66 Outline Video and Music Retrieval Towards Practical Systems Conclusions and Feedback
68
Video and Music Retrieval
69
68 Video Retrieval All current Systems are based on one or more of: –Narrow domain - news, sport –Use automatic speech recognition to do speech to text on the soundtrack –Do key frame extraction and then treat the problem as still image retrieval
70
69 Missing Opportunities in Video Retrieval Using delta’s - frame to frame differences - to segment the image into foreground/background, players, pitch, crowd etc. Trying to relate image data to language/text data
71
70 Music Retrieval Distinctive and Hard Problem –What makes one piece of music similar to another Features –Melody –Artist –Genre ?
72
Towards Practical Systems
73
72 Ideal Image Retrieval Process Thumbnail Browsing NeedKeyword Query More Like this
74
73 Requirements > 5000 Key word vocabulary > 5% accuracy of keyword assignment for all keywords > 5% precision in response to single key word queries The Semantic Gap Bridged!
75
74 CLAIRE Example State of the Art Semantic CBIR System Colour and Texture Features Simple Tiling Scheme Two Stage Learning Machine SVM/SVM and SVM/k-NN Colour to 10 basic colours Texture to one texture term per category
76
75 Tiling Scheme
77
76 Texture Classifier Texture Classifier Architecture of Claire Image Segmentation Colour Texture Key word Annotation Data Extractor Data Extractor Known Key Word/class
78
77 Training/Test Collection Randomly Selected from Corel Training Set 30 images per category Test Collection 20 images per category
79
78 SVM/SVM Keywording with 100+50 Categories
80
79 Examples Keywords Concrete Beaches Dogs Mountain Orchids Owls Rodeo Tulips Women Abstract Architecture City Christmas Industry Sacred Sunsets Tropical Yuletide
81
80 SVM vs kNN
82
81 Reduction in Unreachable Classes
83
82 Labelling Areas of Feature Space Mountain Sea Tree
84
83 Overlap in Feature Space
85
84 Keywording 200+200 Categories
86
85 Discussion Results still promising 5.6% of images have at least one relevant keyword assigned Still useful - but only for a vocabulary of 400 words ! See demo at http://osiris.sunderland.ac.uk/~da2wli/system/silk1/ High proportion of categories which are never assigned
87
86 Segmentation Are the results dependent on the specific tiling/regioning scheme used ?
88
87 Regioning
89
88 Effectiveness Comparison Five Tiles vs Five Regions 1-NN Data Extractor
90
89 Next Steps More categories Integration into complete systems Systematic Comparison with Generative approach pioneered by Forsyth and others
91
90 Other Promising Examples Jeon, Manmatha and others - High number of categories - results difficult to interpret Carneiro and Vasconcelos Also problems with missing concepts Srikanth et al Possibly leading results in terms of precision and vocabulary scale
92
91 Conclusions Image Indexing and Retrieval is Hard Effective Image Retrieval needs a cheap and predictable way of relating words and images Adaptive and Machine Learning approaches offer one way forward with much promise
93
Feedback Comments and Questions
94
Selected Bibliography
95
94 Early Systems Early Systems The following leads into all the major trends in systems based on colour, texture and shape A. Smeaulder, M. Worring, S. Santini, A. Gupta and R. Jain “Content-based Image Retrieval: the end of the early years” IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349- 1380, 2000. A. Smeaulder, M. Worring, S. Santini, A. Gupta and R. Jain “Content-based Image Retrieval: the end of the early years” IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349- 1380, 2000. CHROMA CHROMA Sharon McDonald and John Tait “Search Strategies in Content-Based Image Retrieval” Proceedings of the 26 th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003), Toronto, July, 2003. pp 80-87. ISBN 1-58113-646-3 Sharon McDonald and John Tait “Search Strategies in Content-Based Image Retrieval” Proceedings of the 26 th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003), Toronto, July, 2003. pp 80-87. ISBN 1-58113-646-3 Sharon McDonald, Ting-Sheng Lai and John Tait, “Evaluating a Content Based Image Retrieval System” Proceedings of the 24 th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001), New Orleans, September 2001. W.B. Croft, D.J. Harper, D.H. Kraft, and J. Zobel (Eds). ISBN 1-58113-331-6 pp 232-240. Sharon McDonald, Ting-Sheng Lai and John Tait, “Evaluating a Content Based Image Retrieval System” Proceedings of the 24 th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001), New Orleans, September 2001. W.B. Croft, D.J. Harper, D.H. Kraft, and J. Zobel (Eds). ISBN 1-58113-331-6 pp 232-240. Translation Based Approaches Translation Based Approaches P. Duygulu, K. Barnard, N. de Freitas and D. Forsyth “Learning a Lexicon for a Fixed Image Vocabulary” European Conference on Computer Vision, 2002. P. Duygulu, K. Barnard, N. de Freitas and D. Forsyth “Learning a Lexicon for a Fixed Image Vocabulary” European Conference on Computer Vision, 2002. K. Barnard, P. Duygulu, N. de Freitas and D. Forsyth “Matching Words and Pictures” Journal of machine Learning Research 3: 1107-1135, 2003. K. Barnard, P. Duygulu, N. de Freitas and D. Forsyth “Matching Words and Pictures” Journal of machine Learning Research 3: 1107-1135, 2003. Very recent new paper on this is: Very recent new paper on this is: P. Virga, P. Duygulu “Systematic Evaluation of Machine Translation Methods for Image and Video Annotation” Images and Video Retrieval, Proceedings of CIVR 2005, Singapore, Springer, 2005. P. Virga, P. Duygulu “Systematic Evaluation of Machine Translation Methods for Image and Video Annotation” Images and Video Retrieval, Proceedings of CIVR 2005, Singapore, Springer, 2005.
96
95 Cross-media Relevance Models etc Cross-media Relevance Models etc J. Jeon, V. Lavrenko, R. Manmatha “Automatic Image Annotation and Retrieval using Cross-Media Relevance Models” Proceedings of the 26 th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003), Toronto, July, 2003. Pp 119-126 J. Jeon, V. Lavrenko, R. Manmatha “Automatic Image Annotation and Retrieval using Cross-Media Relevance Models” Proceedings of the 26 th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003), Toronto, July, 2003. Pp 119-126 See also recent unpublished papers on http://ciir.cs.umass.edu/~manmatha/mmpapers.html More recent stuff More recent stuff G Carneiro and N. Vasconcelos “A Database Centric View of Sentic Image Annotation and Retrieval” Proceedings of the 28 th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005), Salvador, Brazil, August, 2005 G Carneiro and N. Vasconcelos “A Database Centric View of Sentic Image Annotation and Retrieval” Proceedings of the 28 th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005), Salvador, Brazil, August, 2005 M. Srikanth, J. Varner, M. Bowden, D. Moldovan “Exploiting Ontologies for Automatic Image Annotation” Proceedings of the 28 th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005), Salvador, Brazil, August, 2005 M. Srikanth, J. Varner, M. Bowden, D. Moldovan “Exploiting Ontologies for Automatic Image Annotation” Proceedings of the 28 th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005), Salvador, Brazil, August, 2005 See also the SIGIR workshop proceedings http://mmir.doc.ic.ac.uk/mmir2005
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.