Metadata ARLIS Study Day 9 September 2009 John Hargreaves Technical Support Officer JISC Digital Media
JISC Digital Media JISC Digital Media is a JISC Advisory Service providing advice and guidance to the UK Further and Higher Education communities on all aspects of finding, making, managing and using digital images, moving images and sound files.
Services –Web resources –Helpdesk (for FE/HE; limited support for other sectors) –Training – list and blog –Consultancy
Metadata Content: Areas to cover What is metadata? What metadata do I need to collect? Where does metadata come from? How is metadata organised? The importance of vocabularies Real examples of metadata collection and use
What is Metadata? Image courtesy of stock.xchng
What is Metadata? OED: “Data operating at a higher level of abstraction” “Useful information about stuff” - serves purpose - has structure - is referential - potentially anything! Common definition: “Structured data about data”
“Useful” – Purposes –Finding, identifying and understanding a resource Descriptive/Discovery metadata e.g. “Title”, “Subject” –Creating, managing and preserving a resource Administrative, Technical, and Preservation metadata e.g. “Format”, “Filesize”
“Useful” – Purposes –Organising and relating resources Structural and Packaging metadata e.g. “Is part of”, “Master image location” –Using a resource Usage and User-contributed metadata e.g. “Published in”, “License requirements”, “User rating”
“Information” - Structure –General categories e.g. “Format” or “Subject” Metadata schemas –Specific values e.g. “JPEG” or “Dog” Metadata vocabularies
“About Stuff” – Reference –Different ‘levels’ of a resource (e.g. collection, item, component) –Different ‘layers’ within a resource (e.g. physical resources, intermediaries, digital resources) –Things outside the resource (e.g. rights ownership)
Some initial questions… –What am I actually describing? –For whom? –For what purposes? –What categories and vocabularies might I need to assemble? Where am I going to get the metadata from? Where am I going to keep it?
Metadata can have different origins… –“Implicit”– derived from the image itself (typically technical data) –“Explicit” – brought to the image (typically descriptive metadata; might be ‘legacy’ data, or newly created) –New metadata might be: Provided by an image contributor Inferred from a context Added by a cataloguer Added by a user All of above
… and can exist in different locations –Embedded within the digital resource itself –Held in a traditional database –Within an XML encoding Jga-0019a Sanctuary of Apollo
“Metadata Communities” –Libraries (e.g. Yale Library Catalogue –Individual, published, non-unique items –Long tradition of highly standardised metadata, shared cataloguing, interoperability (e.g. AACR2/MARC, DDC, LC Name Authorities…)
“Metadata Communities” –Archives (e.g. Online Archive of California –Large, unique collections, context very important, limited resources –Common standards are relatively recent, Collection descriptions (“Finding aids”) (e.g. ISAD(G)/EAD, ISAAR(CPF)…)
“Metadata Communities” –Museums and galleries (e.g. British Museum –Large, unique and often diverse collections, context and administration important –Have typically developed in-house approaches, common standards relatively recent (e.g. CDWA, Spectrum…)
“Metadata Communities” –Photographers/Picture Libraries (e.g. UCAR Atmospheric Research Photo Library –Individual items, simple systems, focus on metadata within images –In-house approaches, “niche” standardisation (for e.g. technical and embedded metadata)
Choosing, adapting, and mapping schemas –Ideally we’d pull a schema off the shelf and begin cataloguing –Choice is clear for some collections but difficult for others (esp. where collection spans resource types or communities) –Adaptation is common and generally necessary (but needs to be done carefully!) –You might be combining several standard schemas or developing your own and mapping to standards for particular purposes
Dublin Core International (ISO ) cross-community standard for describing digital resources Concentrates on descriptive/ discovery metadata “1:1 rule” (1 record for 1 thing) Frequently adapted, mapped-to, used to achieve interoperability Title Creator Subject Description Publisher Contributor Date Type Format Identifier Source Language Relation Coverage Rights
Three ways to adapt a schema Adapting schemas (1) Extend (2) Qualify (3) Simplify Consequences for interoperability
VRA Core –Visual Resources Association –Version 4.0 is now also available –Concentrates on descriptive/discovery metadata –For art and cultural images –Influenced by Dublin Core –1:1 rule (Work/Image) –Frequently adapted – Record Type Type Title Measurements Material Technique Creator Date Location ID Number Style/Period Culture Subject Relation Description Source Rights
IPTC Core International Press and Telecommunications Council Schema for embedding metadata within an image Version 1.0 (for XMP) launched in 2004 Contact Information (e.g. Creator, Address, ) Content Information (e.g. Description, Keywords) Image Information (e.g. Intellectual Genre, Location) Status Information (e.g. Title, Source, Copyright)
SEPIADES Safeguarding European Photographic Images for Access –For photographic collections –Very extensive, with many sub-categories –Covers description and administration, physical works and their digital reproductions –Multi-level description which can describe a whole collection at many levels at once (based on archival metadata) – sepiadesdef.pdfhttp:// sepiadesdef.pdf
CDWA Categories for the Description of Works of Art Describes art works or cultural objects Museum/gallery community Extensive with many sub-categories Covers description and administration, original works and their reproductions Can describe complex objects with multiple parts Note that there is a ‘lite’ version rch/standards/cdwa/index.htmlhttp:// rch/standards/cdwa/index.html
Some Established Mappings –Mapping metadata schemas: Getty crosswalks: ting_research/standards/intrometadat a/crosswalks.html ting_research/standards/intrometadat a/crosswalks.html UKOLN resources:
Vocabularies: “Controlling your Language” Image courtesy of stock.xchng
Why Use Controlled Vocabularies? –Better retrieval –Improved cataloguing efficiency and consistency –‘Disambiguate’ the language (e.g. ‘bank’) –Put things in their place (e.g. classify, identify relationships) –Support interoperability (improved cross-searching and metadata sharing)
Ways to Control Vocabularies –Data entry rules or guidelines –Formal subject headings –Thesauri –Classifications –Authority lists (people, places, events…) –In-house keyword lists –Uncontrolled cataloguer-added keywords? –Combination of approaches
Formal Controlled Vocabularies Great Britain - - History - - Norman period, Anglo-Norman William the Conqueror William I, King of England, 1027 or Library of Congress Subject Heading (LCSH) Art and Architecture Thesaurus (AAT) Full hierarchy = Styles and Periods \ European \ Medieval \ Anglo-Norman Dewey Decimal Classification (DDC) 900=History, 940=European History, 942=British History, =Norman period Library of Congress Name Authorities Cataloguer keyword
What about ‘Uncontrolled’ Keywords? –Made up by a cataloguer at the point of cataloguing –Not an either/or situation – your metadata can accommodate both –A mix of both can assist with retrieval
Alternative Vocabularies Consider some more creative approaches: –Ask some of your users to ‘catalogue’ a representative sample of your collection –Get your users to do the cataloguing! (e.g. tagging or “folksonomies” – more later) –Get the technology to do the cataloguing! (e.g. CBIR – more later) –Draw on vocabularies from other communities, traditions and disciplines –Use an alternative vocabulary source (e.g. a children’s encyclopaedia, book index)
CBIR & ‘Folksonomy’ using Flickr Exploring Flickr by colour: Using Flickr to catalogue a collection
Another kind of user metadata User-generated metadata Web browser ‘cookies’ Page tracking Failed search analysis Can provide very useful feedback Can enable you to offer additional services to users (e.g. customisation and notification)
Examples for evaluation –Galaxy Zoo - –Staffordshire PastTrack –History Wired -
Back to those initial questions –What am I actually describing? –For whom? –For what purpose? –What categories and vocabularies might I need to assemble? –Where am I going to get the metadata from? –Where am I going to keep it? How am I going to exploit it?
Further Support and Guidance Web site: helpdesk: JISC Mail: bin/webadmin?A1=ind0907&L=JISCDIGITALMEDIA