Rough Guide to Image Management CILIP, 31 March 2010 SESSION TWO Using stuff
Rough Guide to Image Management CILIP, 31 March 2010 SESSION TWO Using stuff Metadata content and ontologies: requirements for effective retrieval
Rough Guide to Image Management CILIP, 31 March 2010 Create metadata
Rough Guide to Image Management CILIP, 31 March 2010 Create metadata © Radio Times
Rough Guide to Image Management CILIP, 31 March 2010 Metadata needs: ‘Bibliographic’ description: creator, title, subject etc Format details Relationships, source Context, language Rights Technical data Standards
Rough Guide to Image Management CILIP, 31 March 2010 Standards For a convenient listing see DCMI: Dublin Core Metadata Initiative MODS: Metadata Object Description Schema METS: Metadata Encoding and Transmission Schema RDF: Resource Description Framework
Rough Guide to Image Management CILIP, 31 March 2010 Why bother? Machine indexing of texts is advanced and quite efficient Not so for pictures: where meaning/significance is often attributed by context E.g. ‘the first computer’, ‘the last man on the moon’ Context must be described in metadata
Rough Guide to Image Management CILIP, 31 March 2010 Ontologies Ontologies provide a way of defining context A three-dimensional thesaurus If we need words, we need definitions of words Especially in multiple languages
Rough Guide to Image Management CILIP, 31 March 2010 Getting started with ontologies Useful page from AI Topics: Marine Metadata Interoperability Gives comprehensive guidance on using ontologies and related tools, applicable beyond the marine domain
Rough Guide to Image Management CILIP, 31 March 2010 Getting started with ontologies
Rough Guide to Image Management CILIP, 31 March
Rough Guide to Image Management CILIP, 31 March 2010 Finding ontologies and tools Swoogle Domain-specific e.g. FAO Agricultural Information Management Standards (AIMS)
Rough Guide to Image Management CILIP, 31 March
Rough Guide to Image Management CILIP, 31 March
Rough Guide to Image Management CILIP, 31 March 2010 Linguistic tools ULAN: Union List of Artist’s Names Online rch/vocabularies/ulan/ rch/vocabularies/ulan/ TGN: Thesaurus of Geographic Names Online rch/vocabularies/tgn/ rch/vocabularies/tgn/ AAT: Art & Architecture Thesaurus Online rch/vocabularies/aat/ rch/vocabularies/aat/ ICONCLASS WORDNET
Rough Guide to Image Management CILIP, 31 March 2010 Content-based Image Retrieval Automatic analysis of colour distribution and shapes Edge detection to determine shape
Just how big is the ‘semantic gap’? To what extent is it now possible for computers to identify objects within images by direct inspection of the pixel information? The results I am about to show you are from two state-of-the-art automated methods for object detection semantic segmentation Independently they produce good results, and in combination they are remarkable Credits: Jamie Shotton (2007) Contour and Texture for Visual Recognition of Object Categories. Ph. D. Thesis, University of Cambridge
Object detection using contour fragments These results are obtained using the first method, based upon contour fragments, used here to detect the presence of horses in images The algorithm has been ‘educated’ using a set of training images, and has then been let loose on these and other test images, which it has analysed automatically On the left of each pair, the green boxes surround the detected horses, while on the right the contour fragments used in the detection are shown
This method works well on a variety of objects It gives few false positives and few false negatives, with almost perfect results for motorbikes and cows! However, it does require training, and has not yet been tested on biological research images
Automatic image segmentation using texture The second method combines texture, colour, shape and context It learns from a set of 591 training images pre-labelled for 21 object classes
Results of the ‘texture’ method Results of the ‘texture’ method for the semantic segmentation of test images
....but the method is not perfect As Jamie says in his conclusion, concerning the capabilities of machine vision: “While we are still a considerable way from accurately recognizing the tens of thousands of classes that humans effortlessly distinguish, despite incredible variations in appearance, we believe that this thesis has taken a positive step towards a solution” So the semantic gap between the capabilities of machine vision and the necessity for human metadata annotation is perhaps not as wide as I made out initially!
Rough Guide to Image Management CILIP, 31 March 2010 Content-based Video Retrieval Works better: moving objects easier to anaylse Broadcasting systems use audio stream to help index video Informedia Digital Video Library “combines speech, image and natural language understanding to automatically transcribe, segment and index linear video for intelligent search and image retrieval”
Rough Guide to Image Management CILIP, 31 March 2010
Rough Guide to Image Management CILIP, 31 March 2010 SESSION TWO Using stuff Format and delivery issues
Rough Guide to Image Management CILIP, 31 March 2010 There’s no such thing as a digital image! Digital images are just a stream of 1’s and 0’s They have to be processed to be seen Almost all processing degrades the image How much degradation is acceptable?
Rough Guide to Image Management CILIP, 31 March 2010 Typical formats RAW : unprocessed, exactly as captured by camera. TIFF : processed but uncompressed. Generally best for archiving JPEG : processed and compressed. Best for ‘working’ copies, usually OK for web, not always for publication
Rough Guide to Image Management CILIP, 31 March 2010 How big do you want it? DPI no guide to quality: depends on size of original and size of output. Better to quote size in pixels Output size depends on resolution of output device An image that is 1000 × 800 pixels On an old 72ppi monitor will view at 13.9” × 11.1” On a new 96ppi monitor will view at 10.4” × 8.3” On an average inkjet (150lpi) will print at 6.6” × 5.3” On a high quality printer (250lpi) will print at 4” × 3.2” No. of pixels ÷ Output resolution = Output size ( in-the-real-world/)
Rough Guide to Image Management CILIP, 31 March 2010 Choosing a file format Archive highest quality – generally TIFF Use working copies – generally JPEG – for display PDF or PSD may be appropriate for some projects see ce/choosing-a-file-format-for-digital-still-images/ ce/choosing-a-file-format-for-digital-still-images/
Delivering to the end user Low-res JPEGs ok for web or PowerPoint High-res JPEGs normally needed for publication Author’s responsibility to check publisher’s requirements Normally chargeable – plus reproduction rights To keep or not to keep a library copy? Rough Guide to Image Management CILIP, 31 March 2010
If you keep a copy… Needs long-term storage Needs adequate metadata May need additional scanning to create logical unit … so needs institutional policy decision Rough Guide to Image Management CILIP, 31 March 2010
Rough Guide to Image Management CILIP, 31 March 2010 SESSION TWO Using stuff Rights issues and commercial factors
Rough Guide to Image Management CILIP, 31 March 2010 Copyright in images Photographs and images are protected as artistic works, provided original and ‘fixed’ This right does not need to be stated Electronic/digital copyright not specifically mentioned in law, which lags behind technology Ease of copying and conversion makes infringement easy; permission given for one format may not apply to another
Rough Guide to Image Management CILIP, 31 March 2010 Who has the rights? The creator of the image The creator of the object imaged The subject of the image
Rough Guide to Image Management CILIP, 31 March 2010 Don’t do it! The Internet is NOT a copyright-free zone DO seek copyright permission DO acknowledge the source DON’T alter the image Paul Pedley, Copyright and images, Library and Information Update, 6(6) May 2007, 36-37
Rough Guide to Image Management CILIP, 31 March 2010 Fair dealing You may use images for private study and NON-COMMERCIAL research But not on websites OR INTRANETS because equivalent to multiple copying Permission must always be sought for that Establishing the copyright owner can be extremely difficult
Rough Guide to Image Management CILIP, 31 March 2010 Gowers proposals Gowers Review of Intellectual Property HM Treasury, The Stationery Office, 2006 Proposes provision for ‘orphan works’ where copyright owner cannot be traced Intellectual Property Office [=Patent Office] should issue guidance on parameters of ‘reasonable search’ And establish a voluntary register of copyright
Rough Guide to Image Management CILIP, 31 March 2010
Rough Guide to Image Management CILIP, 31 March 2010
Rough Guide to Image Management CILIP, 31 March 2010 How long? 70 years after death of photographer (if UK citizen) for photos taken after August 1989; earlier, can be longer or shorter Take advice!
Rough Guide to Image Management CILIP, 31 March 2010
Rough Guide to Image Management CILIP, 31 March 2010 Open Access Creative Commons Creative Archive (BBC) Science Commons All offer opportunity for creators to license material for web use: non-commercial, credited, share-alike
Rough Guide to Image Management CILIP, 31 March 2010
Rough Guide to Image Management CILIP, 31 March 2010
Rough Guide to Image Management CILIP, 31 March 2010
Rough Guide to Image Management CILIP, 31 March 2010 More info JISC Digital Media: ice/copyright-and-digital-images/
Rough Guide to Image Management CILIP, 31 March 2010 Pricing your own material No standard guidelines Reproduction fees vary widely V&A ( often taken as ‘best practice’: now scrapped repro fees for scholarly publicationshttp:// Remember quoted prices are maxima – may be discounted or waived Administration is costly Remember original aim of digitising
Rough Guide to Image Management CILIP, 31 March 2010
Rough Guide to Image Management CILIP, 31 March 2010 Buying material Unless for library collection, best for enquirer to deal direct with source May need advice on format, type of rights required etc For library retention use highest quality possible